List Info

Thread: Failover, omapi and Manual Adjustment




Failover, omapi and Manual Adjustment
user name
2006-03-02 16:50:13
   *** From dhcp-server -- To unsubscribe, see the end of
this message. ***

"David W. Hankins" writes:
>Generally I've found it marginally safer to restart the
secondary
>prior to the primary.
>
>In practice we hope the software isn't prone to failure
out of that,
>but we have some anecdotal evidence that it does behave
better overall
>if you avoid reconfiguring the secondary after the
primary.

	This is interesting because I more or less accidentally do
exactly what you describe.  Before posting another message,
I looked
at the script I devised some years ago to remind me what I
did.  It
sends the no-lease configuration to the secondary server,
runs the
test-if-good then restart script.  I then wait 5 seconds
which is
totally arbitrary but gives the secondary time to come back
up and
settle down, then I send the lease-granting script to the
primary
server and run its test and start if good script.  A final
touch on
the primary box is to save the last few dhcpd.conf files so
that if,
by some fluke, the newest dhcpd.conf kills things, we've
got the last
one that worked.  Yes!  When it comes to this sort of thing,
I am
thoroughly paranoid and the voices tell me that's good.

	Seriously, we don't have true panic moments very often and
that is the payoff from being vigilant.  The dhcp server
along with
bind is some of the most brilliant software I have ever seen
as far as
robustness and good design, but if you can think of
something that can
happen, there is that plus ten more things you didn't think
of that
could or may happen.

	Our bind platform had the SCSI electronics fail on the hard
drive that contained everything but the /var file system. 
Like the
story of the headless cockroach that can reportedly live for
two weeks
after decapitation, the box continued to properly execute
bind which
was in RAM and could still write to /var.  Any automated
system to
detect a failure of this box would probably not have seen
anything
wrong.  We knew something was wrong when the syslog began
carpet-bombing us with SCSI bus error messages.  Guess what!
 You
could log in and su to root, but could not execute any
commands that
weren't part of something that was resident in RAM.

halt, shutdown now, reboot, etc all produced I/O error
messages.  You
could almost hear the box laughing out loud.  We finally
yanked the
network cable and power to free the IP address for a backup
server.  I
am not sure a raid would have saved things because the SCSI
controller
was so hosed that it could bring down any SCSI bus one
connected it
to, probably a chip select line perpetually in Active mode.

	Incidents like that make one humble.

Martin McCormick WB5AGZ  Stillwater, OK 
Systems Engineer
OSU Information Technology Department Network Operations
Group

------------------------------------------------------------
-----------
List Archives : http://www.isc.org/ops/
lists/
Unsubscribe   : http://www.
isc.org/sw/dhcp/dhcp-lists.php    
-or-          : mailto:dhcp-server-requestisc.org?Subject=unsubscribe  
------------------------------------------------------------
-----------

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )