|
List Info
Thread: timeout/untimeout race conditions/crash
|
|
| timeout/untimeout race conditions/crash |
  United States |
2008-03-14 21:41:14 |
We think we tracked down a defect in timeout/untimeout in
FreeBSD.
We have reduced the problem to the following scenario:
2+ cpu system, one cpu is running softclock at the same
time
another thread is running on another cpu which makes use of
timeout/untimeout.
CPU 0 is running "softclock"
CPU 1 is running "driver" with Giant held.
softclock: mtx_lock_spin(&callout_lock)
softclock: CACHES the callout structure's fields.
softclock: sees that it's a CALLOUT_LOCAL_ALLOC
softclock: executes this code:
if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
c->c_func = NULL;
c->c_flags = CALLOUT_LOCAL_ALLOC;
SLIST_INSERT_HEAD(&callfree, c,
c_links.sle);
curr_callout = NULL;
} else {
NOTE: that c->c_func has been set to NULL and
curr_callout
is also NULL.
softclock: mtx_unlock_spin(&callout_lock)
driver: calls untimeout(), the following sequence happens:
mtx_lock_spin(&callout_lock);
if (handle.callout->c_func == ftn &&
handle.callout->c_arg == arg)
callout_stop(handle.callout);
mtx_unlock_spin(&callout_lock);
NOTE: untimeout() sees that handle.callout->c_func is
not set
to the function so it does NOT call
callout_stop(9)!
driver: free's backing structure for c->c_arg.
softclock: executes callout.
softclock: likely crashes at this point due to access after
free.
I have a patch I'm trying out here, but I need feedback on
it.
The way the patch works is to treat CALLOUT_LOCAL_ALLOC
(timeout/untimeout)
callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves
the
freelist manipulation to the end of the callout dispatch.
Some light testing seems to have the system work.
We are doing some testing in-house to also make sure this
works.
Please provide feedback.
See attached delta.
--
- Alfred Perlstein
_______________________________________________
freebsd-smp freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to
"freebsd-smp-unsubscribe freebsd.org"
|
|
|
| Re: timeout/untimeout race
conditions/crash |
  United States |
2008-03-17 10:27:20 |
On Friday 14 March 2008 10:41:14 pm Alfred Perlstein wrote:
> We think we tracked down a defect in timeout/untimeout
in
> FreeBSD.
>
> We have reduced the problem to the following scenario:
>
> 2+ cpu system, one cpu is running softclock at the same
time
> another thread is running on another cpu which makes
use of
> timeout/untimeout.
>
> CPU 0 is running "softclock"
> CPU 1 is running "driver" with Giant held.
>
> softclock: mtx_lock_spin(&callout_lock)
> softclock: CACHES the callout structure's fields.
> softclock: sees that it's a CALLOUT_LOCAL_ALLOC
> softclock: executes this code:
> if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
> c->c_func = NULL;
> c->c_flags = CALLOUT_LOCAL_ALLOC;
> SLIST_INSERT_HEAD(&callfree, c,
> c_links.sle);
> curr_callout = NULL;
> } else {
>
> NOTE: that c->c_func has been set to NULL and
curr_callout
> is also NULL.
> softclock: mtx_unlock_spin(&callout_lock)
> driver: calls untimeout(), the following sequence
happens:
> mtx_lock_spin(&callout_lock);
> if (handle.callout->c_func == ftn &&
handle.callout->c_arg == arg)
> callout_stop(handle.callout);
> mtx_unlock_spin(&callout_lock);
>
> NOTE: untimeout() sees that handle.callout->c_func
is not set
> to the function so it does NOT call
callout_stop(9)!
> driver: free's backing structure for c->c_arg.
> softclock: executes callout.
> softclock: likely crashes at this point due to access
after free.
>
> I have a patch I'm trying out here, but I need feedback
on it.
>
> The way the patch works is to treat CALLOUT_LOCAL_ALLOC
(timeout/untimeout)
> callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and
moves the
> freelist manipulation to the end of the callout
dispatch.
>
> Some light testing seems to have the system work.
>
> We are doing some testing in-house to also make sure
this works.
>
> Please provide feedback.
>
> See attached delta.
This is not a bug. Don't use untimeout(9) as it is not
guaranteed to be
reliable. Instead, use callout_*(). Your patch doesn't
solve any races as
the driver detach routine needs to use callout_drain() and
not just
callout_stop/untimeout anyways. Fix your broken drivers.
--
John Baldwin
_______________________________________________
freebsd-smp freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to
"freebsd-smp-unsubscribe freebsd.org"
|
|
| Re: timeout/untimeout race
conditions/crash |
  United States |
2008-03-17 15:10:14 |
* John Baldwin <jhb freebsd.org> [080317 09:43] wrote:
>
> This is not a bug. Don't use untimeout(9) as it is not
guaranteed to be
> reliable. Instead, use callout_*(). Your patch
doesn't solve any races as
> the driver detach routine needs to use callout_drain()
and not just
> callout_stop/untimeout anyways. Fix your broken
drivers.
I understand that some old Giant locked code can issue
timeout/untimeout
without Giant held, which would certainly cause this issue
to happen
and is uncorrectable, however, this is with completely Giant
locked
code.
We are not trying to use timeout(9) for mpsafe code, this is
old
code and relies upon Giant.
Giant locked code should be timeout/untimeout safe. As
explained
in my email, there exists a condition where the Giant locked
code
can have a timer fire even though proper Giant locking is
observed.
For a Giant locked subsystem, one should be able to have the
following
code work:
mtx_lock(&Giant); /* formerly spl higher than softclock
*/
untimeout(&func, arg, &sc->handle);
free(sc);
mtx_unlock(&Giant); /* formerly splx() */
Normally splsoftclock would completely block the timeout
from firing
and this sort of code would be safe. It is no longer safe
due to
a BUG in the way that Giant is used.
Please reread the original mail to better understand the
synopsis
of the problem.
thank you,
-Alfred
_______________________________________________
freebsd-smp freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to
"freebsd-smp-unsubscribe freebsd.org"
|
|
| Re: timeout/untimeout race
conditions/crash |
  United States |
2008-03-17 15:59:33 |
On Monday 17 March 2008 04:10:14 pm Alfred Perlstein wrote:
> * John Baldwin <jhb freebsd.org> [080317
09:43] wrote:
> >
> > This is not a bug. Don't use untimeout(9) as it
is not guaranteed to be
> > reliable. Instead, use callout_*(). Your patch
doesn't solve any races
as
> > the driver detach routine needs to use
callout_drain() and not just
> > callout_stop/untimeout anyways. Fix your broken
drivers.
>
> I understand that some old Giant locked code can issue
timeout/untimeout
> without Giant held, which would certainly cause this
issue to happen
> and is uncorrectable, however, this is with completely
Giant locked
> code.
>
> We are not trying to use timeout(9) for mpsafe code,
this is old
> code and relies upon Giant.
>
> Giant locked code should be timeout/untimeout safe. As
explained
> in my email, there exists a condition where the Giant
locked code
> can have a timer fire even though proper Giant locking
is observed.
>
> For a Giant locked subsystem, one should be able to
have the following
> code work:
>
> mtx_lock(&Giant); /* formerly spl higher than
softclock */
> untimeout(&func, arg, &sc->handle);
> free(sc);
> mtx_unlock(&Giant); /* formerly splx() */
>
> Normally splsoftclock would completely block the
timeout from firing
> and this sort of code would be safe. It is no longer
safe due to
> a BUG in the way that Giant is used.
>
> Please reread the original mail to better understand
the synopsis
> of the problem.
Hmm. My worry is about leaving the callout structure around
while invoking
the timeout routine itself, but it is already off the
callwheel so it
shouldn't be visible via untimeout() to any other code. I
guess the patch is
ok, but I'll be happy when we can axe timeout/untimeout
altogether.
--
John Baldwin
_______________________________________________
freebsd-smp freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to
"freebsd-smp-unsubscribe freebsd.org"
|
|
[1-4]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|