From a message on the Squeak mailing list, by Andreas
Raab:
> Some of you (mostly those who run heavy servers) may
have noticed that
> at times Squeak locks up in mysterious and unforeseen
ways. One of those
> lockups involves Delay's AccessProtect in an unsignaled
state and
> consequently the entire image locking up since Delay
access is required
> in many, many places.
>
> Today, David presented me an image that was locked up
in such a state
> but by sheer luck he managed to save it right before it
happened which
> allowed me to investigate the situation. The result can
best be
> explained by the little test case shown here:
>
> "Create mutex unsignaled so we can manually
signal it"
> mutex := Semaphore new.
> "Create a process which will wait inside the
mutex"
> p := [mutex critical:[]] forkAt: Processor
userBackgroundPriority.
> "Wait until process has entered mutex"
> [p suspendingList == mutex]
> whileFalse:[(Delay forMilliseconds: 10) wait].
> "Signal mutex"
> mutex signal.
> "Kill process"
> p terminate.
> "and check to see if the mutex is
signaled"
> mutex isSignaled ifFalse:[self error: 'Mutex not
signaled'].
>
> Note that despite the somewhat complex setup the basic
idea is that a
> low priority process waiting in a critical section
receives a signal on
> the semaphore it is waiting on but gets terminated by a
higher priority
> process inbetween receiving the signal and execution of
the process itself.
>
> This situation (manually executed in the above to make
it more easily
> repeatable) can happen in many situations where
processes get terminated
> "from the outside" and it would cause
particular grief in the timing
> semaphore because it gets served by the highest
priority process which
> makes the unfortunate cause of events much more
likely.
>
> All Squeak versions that I have access to expose this
behavior. Looking at
> Semaphore>>critical: which says [this is the code
in GNU Smalltalk -pb]
>
> Semaphore>>critical: aBlock
> self wait.
> ^aBlock ensure: [self signal].
>
> makes it seem as if moving the wait into the ensured
block is the
> correct answer, but that ain't necessarily so. When we
move the wait
> into the block we risk that the entering process is
terminated after
> entering the block but before entering the wait which
would leave the
> semaphore signaled twice, which is just as bad as not
signaled at all.
This patch solves, or rather tries to solve :-( this problem
for GNU
Smalltalk by wrapping the "self wait" in an
exception handler. There
may be a small problematic window between the end of the
#wait and the
beginning of ensure
There are other problems with Delay, basically involving the
Queue left
in an half-updated state when the process manipulating it is
terminated.
The only way to fix this is to move all the processing of
the Queue
inside the high-priority process that handles timer events.
I'll apply
a patch to this end after converting the source code to the
new syntax
(I have the code, but it uses the new syntax and I don't
want to have
half-converted directories).
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|