List Info

Thread: Understanding failover log entries




Understanding failover log entries
user name
2006-02-21 19:23:48
   *** From dhcp-server -- To unsubscribe, see the end of
this message. ***

Is there any documentation describing the various
failover-related log entries?

Specifically, I'm curious about the meaning of the
"lease imbalance"
entries and the "backup" and "lts"
values from the "pool" entries.

Examples:

swozzle:~ ben$ egrep '(imbalance|pool)' /var/log/dhcpd.log
Feb 21 13:09:21 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 31  backup 21  lts -5
Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 33  backup 20  lts -6
Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8 150.253.90/24 
total 181 
free 33  backup 20  lts 6
Feb 21 13:09:42 swozzle dhcpd: lease imbalance - lts = 4
Feb 21 13:10:10 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 26  backup 26  lts 0
Feb 21 13:10:18 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 25  backup 26  lts 0
Feb 21 13:11:30 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 26  backup 26  lts 0
<snip>

Thanks!

-Ben

------------------------------------------------------------
-----------
List Archives : http://www.isc.org/ops/
lists/
Unsubscribe   : http://www.
isc.org/sw/dhcp/dhcp-lists.php    
-or-          : mailto:dhcp-server-requestisc.org?Subject=unsubscribe  
------------------------------------------------------------
-----------

Understanding failover log entries
user name
2006-02-21 19:47:49
   *** From dhcp-server -- To unsubscribe, see the end of
this message. ***

On Tue, Feb 21, 2006 at 01:23:48PM -0600, Ben Beuchler
wrote:
> Is there any documentation describing the various
failover-related log entries?

Sure, RTC (Read The Code). :(

> Specifically, I'm curious about the meaning of the
"lease imbalance"
> entries and the "backup" and
"lts" values from the "pool"
entries.

There are two 'pool' loglines.  They look the same. 
They're supposed to
be.  They're executed in opposing conditions.

Can you tell the difference between these two:

        log_info ("pool %lx %s total %d  free %d 
backup %d  lts %d",
        log_info ("pool %lx %s  total %d  free %d 
backup %d  lts %d",

The first is logged by dhcp_failover_pool_check(), which is
called
whenever the system allocates an address...it wants to know
if it is
running out of leases and should ask for more (send a
POOLREQ to the
peer, or steal leases back, depending on secondary/primary).

The second is dhcp_failover_pool_rebalance(), which is
called when a
system wants to give leases to its peer (eg, always the
primary).  This
is either due to the system discovering a misbalance on its
own, or in
response to receiving a POOLREQ message.

In this second case, LTS means 'leases to send'.  If the
server were to
send this many leases to the peer, then the pool would be
50/50 balanced.

But in the first case, LTS actually is calculated as
'leases to request'.
Leases that would need to be transferred from the peer to
yourself in
order to balance the pool 50/50.

And the only way to tell the difference is that extra space
in the second
logline.

Right nasty of us, isn't it.

This is why:

> Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8
150.253.90/24 total 181 
> free 33  backup 20  lts -6
> Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8
150.253.90/24  total 181 
> free 33  backup 20  lts 6

-6 leases to request, means the same thing as 6 leases to
send.

> Feb 21 13:09:42 swozzle dhcpd: lease imbalance - lts =
4

But somehow your server only sent 2!  This is an error
condition that
really shouldn't happen, I don't think.

I'm not sure how it's possible.

Might have a look at your dhcpd.leases file to see if the
leases in these
states match these log lines:

awk 'BEGIN { curlease = ""; } /^lease / {
curlease = $2; } /^  binding state / { leases[curlease] =
$3; } END { for(lease in leases) { states[leases[lease]]++;
} for(state in states) { print state, states[state]; } }'
/var/db/dhcpd.leases

Also, what version are you running?

-- 
David W. Hankins		"If you don't do it right the first
time,
Software Engineer			you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins

------------------------------------------------------------
-----------
List Archives : http://www.isc.org/ops/
lists/
Unsubscribe   : http://www.
isc.org/sw/dhcp/dhcp-lists.php    
-or-          : mailto:dhcp-server-requestisc.org?Subject=unsubscribe  
------------------------------------------------------------
-----------

Understanding failover log entries
user name
2006-02-21 22:51:17
   *** From dhcp-server -- To unsubscribe, see the end of
this message. ***

> Sure, RTC (Read The Code). :(

Of course...  My favorite kind of documentation.

> And the only way to tell the difference is that extra
space in the second
> logline.
>
> Right nasty of us, isn't it.

It's an... "interesting design choice."

> This is why:
>
> > Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8
150.253.90/24 total 181
> > free 33  backup 20  lts -6
> > Feb 21 13:09:42 swozzle dhcpd: pool 8157fc8
150.253.90/24  total 181
> > free 33  backup 20  lts 6
>
> -6 leases to request, means the same thing as 6 leases
to send.
>
> > Feb 21 13:09:42 swozzle dhcpd: lease imbalance -
lts = 4
>
> But somehow your server only sent 2!  This is an error
condition that
> really shouldn't happen, I don't think.

So the "lease imbalance" message is indicating
the mismatch between
what was requested and what was sent?

I am seeing the same pattern several times in the logs,
interestingly
always in reference to the same subnet/pool:

Feb 21 15:35:49 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 50  backup 29  lts -10
Feb 21 15:35:49 swozzle dhcpd: pool 8157fc8 150.253.90/24 
total 181 
free 50  backup 29  lts 10
Feb 21 15:35:49 swozzle dhcpd: lease imbalance - lts = 4

Feb 21 16:08:47 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 50  backup 32  lts -9
Feb 21 16:08:47 swozzle dhcpd: pool 8157fc8 150.253.90/24 
total 181 
free 50  backup 32  lts 9
Feb 21 16:08:47 swozzle dhcpd: lease imbalance - lts = 5

Feb 21 16:40:25 swozzle dhcpd: pool 8157fc8 150.253.90/24
total 181 
free 56  backup 33  lts -11
Feb 21 16:40:25 swozzle dhcpd: pool 8157fc8 150.253.90/24 
total 181 
free 56  backup 33  lts 11
Feb 21 16:40:25 swozzle dhcpd: lease imbalance - lts = 6

Are "lease imbalance" messages always indicative
of a problem?

> Might have a look at your dhcpd.leases file to see if
the leases in these
> states match these log lines:
>
> awk 'BEGIN { curlease = ""; } /^lease / {
curlease = $2; } /^  binding state / { leases[curlease] =
$3; } END { for(lease in leases) { states[leases[lease]]++;
} for(state in states) { print state, states[state]; } }'
/var/db/dhcpd.leases

Do you mean for me to check the state of specific leases or
just to
run that awk script on the entire lease database?  If the
latter,
here's the output:

active; 343
free; 270
backup; 263
expired; 1

> Also, what version are you running?

V3.0.4b2

-Ben

------------------------------------------------------------
-----------
List Archives : http://www.isc.org/ops/
lists/
Unsubscribe   : http://www.
isc.org/sw/dhcp/dhcp-lists.php    
-or-          : mailto:dhcp-server-requestisc.org?Subject=unsubscribe  
------------------------------------------------------------
-----------

Understanding failover log entries
user name
2006-02-21 23:42:14
   *** From dhcp-server -- To unsubscribe, see the end of
this message. ***

On Tue, Feb 21, 2006 at 04:51:17PM -0600, Ben Beuchler
wrote:
> So the "lease imbalance" message is
indicating the mismatch between
> what was requested and what was sent?

Correct.  After attempting to give or steal
"lts" leases, "imbalance"
remained.

So in practice this log line really shouldn't be observed
unless
it matches the previously reported lts (there's a kind of
bug where
if the server decides to take no action because the
misbalance is not
large enough (not greater than 10%), it will not attempt to
move any
leases...so the lease imbalance log line is emitted in this
case,
but it would have the same value then).

It's only possible to exit the loop that decrements lts if
either lts
reaches zero, or the loop reaches the end of the linear list
of
free/backup leases.

The presence of the log line clearly indicates you're
reaching the end
of the list, which means the in-memory database is getting
corrupted
somehow.

> > awk 'BEGIN { curlease = ""; } /^lease
/ { curlease = $2; } /^  binding state / { leases[curlease]
= $3; } END { for(lease in leases) {
states[leases[lease]]++; } for(state in states) { print
state, states[state]; } }' /var/db/dhcpd.leases
> 
> Do you mean for me to check the state of specific
leases or just to
> run that awk script on the entire lease database?  If
the latter,
> here's the output:
> 
> active; 343
> free; 270
> backup; 263
> expired; 1

Hm, that's not nearly as helpful as I thought.

You should look at that one expired lease however.  That's
a
transitionary state, leases shouldn't hang there (unless
you just
happened to catch it).

> > Also, what version are you running?
> 
> V3.0.4b2

I think at this point your best choice is to wait for
3.0.4b3 (this
week) and see if the problem persists.

We're backing out some changes in b3 that was making the
server unstable
in somewhat random ways.

-- 
David W. Hankins		"If you don't do it right the first
time,
Software Engineer			you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins

------------------------------------------------------------
-----------
List Archives : http://www.isc.org/ops/
lists/
Unsubscribe   : http://www.
isc.org/sw/dhcp/dhcp-lists.php    
-or-          : mailto:dhcp-server-requestisc.org?Subject=unsubscribe  
------------------------------------------------------------
-----------

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )