List Info

Thread: Semaphore>>#critical: race condition




Semaphore>>#critical: race condition
country flaguser name
Switzerland
2007-10-01 04:41:01
 From a message on the Squeak mailing list, by Andreas
Raab:

> Some of you (mostly those who run heavy servers) may
have noticed that 
> at times Squeak locks up in mysterious and unforeseen
ways. One of those 
> lockups involves Delay's AccessProtect in an unsignaled
state and 
> consequently the entire image locking up since Delay
access is required 
> in many, many places.
> 
> Today, David presented me an image that was locked up
in such a state 
> but by sheer luck he managed to save it right before it
happened which 
> allowed me to investigate the situation. The result can
best be 
> explained by the little test case shown here:
> 
>    "Create mutex unsignaled so we can manually
signal it"
>    mutex := Semaphore new.
>    "Create a process which will wait inside the
mutex"
>    p := [mutex critical:[]] forkAt: Processor
userBackgroundPriority.
>    "Wait until process has entered mutex"
>    [p suspendingList == mutex]
>        whileFalse:[(Delay forMilliseconds: 10) wait].
>    "Signal mutex"
>    mutex signal.
>    "Kill process"
>    p terminate.
>    "and check to see if the mutex is
signaled"
>    mutex isSignaled ifFalse:[self error: 'Mutex not
signaled'].
> 
> Note that despite the somewhat complex setup the basic
idea is that a 
> low priority process waiting in a critical section
receives a signal on 
> the semaphore it is waiting on but gets terminated by a
higher priority 
> process inbetween receiving the signal and execution of
the process itself.
> 
> This situation (manually executed in the above to make
it more easily 
> repeatable) can happen in many situations where
processes get terminated 
> "from the outside" and it would cause
particular grief in the timing 
> semaphore because it gets served by the highest
priority process which 
> makes the unfortunate cause of events much more
likely.
> 
> All Squeak versions that I have access to expose this
behavior. Looking at
> Semaphore>>critical: which says [this is the code
in GNU Smalltalk -pb]
> 
> Semaphore>>critical: aBlock
>    self wait.
>    ^aBlock ensure: [self signal].
> 
> makes it seem as if moving the wait into the ensured
block is the 
> correct answer, but that ain't necessarily so. When we
move the wait 
> into the block we risk that the entering process is
terminated after 
> entering the block but before entering the wait which
would leave the 
> semaphore signaled twice, which is just as bad as not
signaled at all.

This patch solves, or rather tries to solve :-( this problem
for GNU 
Smalltalk by wrapping the "self wait" in an
exception handler.  There 
may be a small problematic window between the end of the
#wait and the 
beginning of ensure

There are other problems with Delay, basically involving the
Queue left 
in an half-updated state when the process manipulating it is
terminated. 
  The only way to fix this is to move all the processing of
the Queue 
inside the high-priority process that handles timer events. 
I'll apply 
a patch to this end after converting the source code to the
new syntax 
(I have the code, but it uses the new syntax and I don't
want to have 
half-converted directories).

Paolo

_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

  
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )