List Info

Thread: channel_walk_locked possible bug / possible fix?




channel_walk_locked possible bug / possible fix?
user name
2006-11-16 13:25:30
Hi,

I have been looking in an issue where we occasionally get
deadlocked
channels on our asterisk systems, which are visible through
the
logging of messages such as:

...
Zap/56-1(isdn-external s 1) Up Bridged Call SIP/306-5a66
45 active channel(s)
WARNING[13001]: channel.c:537 ast_channel_walk_locked:
Avoided
deadlock for 'SIP/306-5a66', 10 retries!

In the above example, the channel SIP/306-5a66 cannot be
seen in any
show commands, and the "Avoided deadlock" message
appears periodically
in the logs during normal usage.

I have not found anything significant in the bug tracker,
but I have a
theory, and would like to air it here for comments:

'ast_channel_walk_locked' calls 'channel_find_locked' with
*prev set
to the current list position. 'channel_find_locked' then
steps forward
through the list by one position, and tries to lock the next
channel
list entry. If the lock fails 10 times, then the above
warning is
printed, and NULL is returned.

Assuming that some code is looping through repeated calls to
'ast_channel_walk_locked' until NULL is returned (I found at
least 6
cases of this in the 1.2.13 codebase), returning NULL at
this point
implies that the end of the list has been reached, which may
not be
true! It is also conceivable that the deadlock will not
clear until
some processing on a channel later in the list has been
completed.
This processing can now never happen.

I am attaching a patch which causes the deadlocked channel
to be
skipped rather than returning NULL in this case. If people
do not find
this suggestion laughably silly, I will add it to the bug
tracker.

(As an aside, am I mistaken, or does
'ast_walk_channel_by_name_prefix_locked' never use the
provided 'name'
value, and is therefore identical to
'ast_channel_walk_locked'? Not
that its important as the method is never used  )

Kind regards,
Steve
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
Array
user name
2006-11-16 21:48:50
Steve Davies wrote:
> I am attaching a patch which causes the deadlocked
channel to be
> skipped rather than returning NULL in this case. If
people do not find
> this suggestion laughably silly, I will add it to the
bug tracker.

I think, in fact, that this is probably a good idea.
Skipping channels
we cannot get the lock for makes much more sense than
stopping the
channel walk completely, especially when we are doing the
walk to find a
specific channel in the first place (and haven't found it
yet).

Your point about name_prefix searching seems wrong; all the
different
variations end up calling channel_find_locked, which most
certainly does
use the name/namelen pair of arguments if they are supplied.

I would make two observations regarding your proposed patch:

1) There are some cases where the channel_find_locked call
can only
return one channel (full-name matching, for one, there may
be others).
In those cases, returning NULL if we can't get the lock is
the proper
thing to do, because 'skipping' won't ever find that channel
again.

2) This will be a significant behavior change for the 1.2
branch, and as
such it will require some significant testing. Once we have
a proposed
patch that everyone is happy with I can schedule some
testing time in
our BE testing lab (since this patch would end up going into
BE anyway),
but I'd like to see some others in the community that have
stress-testing setups already built willing to help test
this patch as
well. Jeremy McNamara and Greg Boehnlein are two people that
I know for
sure have existing test networks built and test scripts that
can hammer
on Asterisk until it breaks... testing with this patch would
be useful
there.
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
Array
user name
2006-11-16 22:13:33
On 11/16/06, Kevin P. Fleming <kpflemingdigium.com> wrote:
> I think, in fact, that this is probably a good idea.
Skipping channels
> we cannot get the lock for makes much more sense than
stopping the
> channel walk completely, especially when we are doing
the walk to find a
> specific channel in the first place (and haven't found
it yet).

I for one would be willing to test out this patch -- should
we put it
in the bug tracker, so that we can keep better track of it
as we work
on it.

-Jared
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
channel_walk_locked possible bug / possible fix?
user name
2006-11-16 22:49:10
On 11/16/06, Jarenessto <jaredsmithjaredsmith.net> wrote:
> On 11/16/06, Kevin P. Fleming <kpflemingdigium.com> wrote:
> > I think, in fact, that this is probably a good
idea. Skipping channels
> > we cannot get the lock for makes much more sense
than stopping the
> > channel walk completely, especially when we are
doing the walk to find a
> > specific channel in the first place (and haven't
found it yet).
>
> I for one would be willing to test out this patch --
should we put it
> in the bug tracker, so that we can keep better track of
it as we work
> on it.
>

Raised as http://bugs.d
igium.com/view.php?id=8376

Disclaimer to follow.

Steve
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
channel_walk_locked possible bug / possible fix?
user name
2006-11-16 22:54:43
On 11/16/06, Kevin P. Fleming <kpflemingdigium.com> wrote:
> Steve Davies wrote:
> > I am attaching a patch which causes the deadlocked
channel to be
> > skipped rather than returning NULL in this case.
If people do not find
> > this suggestion laughably silly, I will add it to
the bug tracker.
>
> I think, in fact, that this is probably a good idea.
Skipping channels
> we cannot get the lock for makes much more sense than
stopping the
> channel walk completely, especially when we are doing
the walk to find a
> specific channel in the first place (and haven't found
it yet).

Raised in Mantis with the patch attached. id=8376.

> Your point about name_prefix searching seems wrong; all
the different
> variations end up calling channel_find_locked, which
most certainly does
> use the name/namelen pair of arguments if they are
supplied.

The reason I made the observation about name_prefix
searching is
because all of the name/prefix matches are inside an
  if(!prev) {...}
block, which will never evaluate to true because the
name_prefix
searching method provides a value for prev.

> I would make two observations regarding your proposed
patch:
>
> 1) There are some cases where the channel_find_locked
call can only
> return one channel (full-name matching, for one, there
may be others).
> In those cases, returning NULL if we can't get the lock
is the proper
> thing to do, because 'skipping' won't ever find that
channel again.

Agreed - As far as I could tell, checking for whether prev
is set
ensures that the name search behaviour is unaffected. I will
check
more carefully for additional uses of channel_find_locked.

> 2) This will be a significant behavior change for the
1.2 branch, and as
> such it will require some significant testing. Once we
have a proposed
> patch that everyone is happy with I can schedule some
testing time in
> our BE testing lab (since this patch would end up going
into BE anyway),
> but I'd like to see some others in the community that
have
> stress-testing setups already built willing to help
test this patch as
> well. Jeremy McNamara and Greg Boehnlein are two people
that I know for
> sure have existing test networks built and test scripts
that can hammer
> on Asterisk until it breaks... testing with this patch
would be useful
> there.

I will of course also be testing it myself 

Regards,
Steve
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
channel_walk_locked possible bug / possible fix?
user name
2006-11-17 18:41:57
Steve Davies wrote:
> The reason I made the observation about name_prefix
searching is
> because all of the name/prefix matches are inside an
>  if(!prev) {...}
> block, which will never evaluate to true because the
name_prefix
> searching method provides a value for prev.

You are absolutely correct... the only time the name/namelen
or
context/exten matching is done is for the first channel that
is found,
after that we just return all channels. This is clearly
wrong.
Apparently there is no code that is trying to use the API
calls in that
way, but it's obviously wrong.

I'll fix that part separately from your proposed change.
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
channel_walk_locked possible bug / possible fix?
user name
2006-11-17 22:22:15
On 11/17/06, Kevin P. Fleming <kpflemingdigium.com> wrote:
> Steve Davies wrote:
> > The reason I made the observation about
name_prefix searching is
> > because all of the name/prefix matches are inside
an
> >  if(!prev) {...}
> > block, which will never evaluate to true because
the name_prefix
> > searching method provides a value for prev.
>
> You are absolutely correct... the only time the
name/namelen or
> context/exten matching is done is for the first channel
that is found,
> after that we just return all channels. This is clearly
wrong.
> Apparently there is no code that is trying to use the
API calls in that
> way, but it's obviously wrong.

Just a comment on the new 1.2 BRANCH version of this code.
If 'prev'
and 'name' are both provided, the function will only return
a locked
channel if the named channel is the one immediately after
prev, is
that the intended behaviour, or should it search from prev+1
to the
end of the list for 'name'

I assume the latter, which is why in my other patch I broke
the
channel loop into 2 chunks. The 1st to get to 'prev' if
specified, and
the second to scan all of the remaining channels in the list
for the
supplied conditions.

As you say, nothing actually uses this combination so it is
low
priority. I will include this change in my revised patch.

(Disclaimer has been sent, so bug can be updated to reflect
this)

Cheers,
Steve
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
channel_walk_locked possible bug / possible fix?
user name
2006-11-28 23:39:31
Steve Davies wrote:
> I assume the latter, which is why in my other patch I
broke the
> channel loop into 2 chunks. The 1st to get to 'prev' if
specified, and
> the second to scan all of the remaining channels in the
list for the
> supplied conditions.

This is correct; I missed that condition in my quick rewrite
:-(
_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
[1-8]

about | contact  Other archives ( Real Estate discussion Medical topics )