List Info

Thread: fix for scriptlet unlink-before-exec




fix for scriptlet unlink-before-exec
country flaguser name
United States
2007-02-24 23:34:45
As seen on Solaris with multiple CPUs, scriptlets get
unlinked before
they can be exec'd.  This is because of illegal use of a
mutex <whistle>,
10 yard penalty!

You cannot attempt to recursively lock a non-recursive
mutex; the behavior
is undefined or will return EDEADLK depending on the
implementation and
the mutex type.  Code in rpmsq.c:rpmsqWaitUnregister() was
trying to
lock sq->mutex while it was already locked ... a good
idea but the
pthreads spec doesn't support that.  I guess because mutexes
are designed
to protect data, not work as signalling devices.  The code
should be
changed to use a semaphore, or (for more semantic
correctness) a
pthread condvar.

What was happening is that pthread_mutex_lock() in
rpmsqWaitUnregister()
would return right away (with EDEADLK) instead of actually
waiting for
the child to be reaped, this would return prematurely from
psmWait() in
lib/psm.c:runScripts() and then the parent would unlink the
script
file before the child got down to the execv().  This bug
doesn't occur
with rpm -vv because that leaves the script file behind.

My patch simply disables the use of the mutex altogether.  I
don't see
much advantage to it anyway ...

Also a couple other small patches for Solaris-isms.

-frank
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

  
  
  
Re: fix for scriptlet unlink-before-exec
country flaguser name
Belgium
2007-02-25 13:47:32
On Feb 25, 2007, at 2:19 PM, Frank Cusack wrote:

> On February 24, 2007 9:34:45 PM -0800 Frank Cusack  
> <fcusackfcusack.com> wrote:
>> You cannot attempt to recursively lock a
non-recursive mutex; the  
>> behavior
>> is undefined or will return EDEADLK depending on
the  
>> implementation and
>> the mutex type.
>
> That might be wrong; it's just a quirk of the Solaris 

> implementation that
> a normal mutex doesn't deadlock (and instead returns
EDEADLK).
>

Hmmm, is there an attempt to lock recursively within rpm?

I can't remember any atm. If there is, avoiding the
recursive locking
is one path to a clean fix, handling EDEADLK as a quirk of
sun is
the other reasonable fix.

73 de Jeff

_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
country flaguser name
United States
2007-02-25 14:31:48
On February 25, 2007 2:47:32 PM -0500 Jeff Johnson
<n3npq.jbjgmail.com> 
wrote:
>
> On Feb 25, 2007, at 2:19 PM, Frank Cusack wrote:
>
>> On February 24, 2007 9:34:45 PM -0800 Frank Cusack
>> <fcusackfcusack.com> wrote:
>>> You cannot attempt to recursively lock a
non-recursive mutex; the
>>> behavior
>>> is undefined or will return EDEADLK depending
on the
>>> implementation and
>>> the mutex type.
>>
>> That might be wrong; it's just a quirk of the
Solaris
>> implementation that
>> a normal mutex doesn't deadlock (and instead
returns EDEADLK).
>>
>
> Hmmm, is there an attempt to lock recursively within
rpm?

Yes, if you read the mail that started this thread I
described where
in the code the recursive locking is attempted.

> I can't remember any atm. If there is, avoiding the
recursive locking
> is one path to a clean fix, handling EDEADLK as a quirk
of sun is
> the other reasonable fix.

Yes, the best fix is to replace the mutex with a condvar.

I actually did try handling EDEADLK which works well except
that it's
stupid because the parent (waiting for scriptlet to finish)
is in a
busy wait loop.

Actually I think the best fix is just not to use the mutex
at all.
That was the patch I sent.

-frank
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
user name
2007-02-26 07:00:06
On 2/25/07, Frank Cusack <fcusackfcusack.com> wrote:
> On February 25, 2007 2:47:32 PM -0500 Jeff Johnson
<n3npq.jbjgmail.com>
> wrote:
> >
> > On Feb 25, 2007, at 2:19 PM, Frank Cusack wrote:
> >
> >> On February 24, 2007 9:34:45 PM -0800 Frank
Cusack
> >> <fcusackfcusack.com> wrote:
> >>> You cannot attempt to recursively lock a
non-recursive mutex; the
> >>> behavior
> >>> is undefined or will return EDEADLK
depending on the
> >>> implementation and
> >>> the mutex type.
> >>
> >> That might be wrong; it's just a quirk of the
Solaris
> >> implementation that
> >> a normal mutex doesn't deadlock (and instead
returns EDEADLK).
> >>
> >
> > Hmmm, is there an attempt to lock recursively
within rpm?
>
> Yes, if you read the mail that started this thread I
described where
> in the code the recursive locking is attempted.
>
> > I can't remember any atm. If there is, avoiding
the recursive locking
> > is one path to a clean fix, handling EDEADLK as a
quirk of sun is
> > the other reasonable fix.
>
> Yes, the best fix is to replace the mutex with a
condvar.
No Frank, I disagree.  I'm the one who changed it to a
mutex, and I'll
tell you why.  What was happening when it is used a condvar
is that it
was running into a dead lock from time to time.  Now its
been so long
(there is a bugzilla report for this), but the ultimate
issue is a
condvar is meant for a threaded environment.  The reaper in
rpm is
not, I repeat, not running in a threaded context, and there
were cases
where it would dead lock because it was waiting for the
condvar to be
tripped, but since it was waiting and there was only one
thread of
execution it would have achieved dead lock.

Now the way I used the mutex maybe in error (my first use of
one), and
perhaps that could be thought through, but please, please do
not put a
condvar back in a context that has only a single thread of
execution.
It took me a week of staring that code to figure out how to
fix the
dead lock, and I really don't want to do that again.

>
> I actually did try handling EDEADLK which works well
except that it's
> stupid because the parent (waiting for scriptlet to
finish) is in a
> busy wait loop.
>
> Actually I think the best fix is just not to use the
mutex at all.
> That was the patch I sent.
This is were I only have here say.  Jeff had explained to me
that
there were customers using librpm in a threaded environment,
and that
the rpmsq reaper solution somehow helped them.  I'm not sure
why
because you could always put your rpm stuff in its own
thread, such
that when waitpid was called it did not block the entire
app, but that
was what I was told.

Meanwhile, everytime Jeff and I talked about it in the
context of the
deadlocks I was seeing, he always suggeste just going back
to
waitpid().  In my case I could have done that but if there
were a set
of customers that did not work for, and the current solution
for them
was buggy, why not just fix the bug, so I tried to do that.

Cheers...james
Cheers...
>
> -frank
> _______________________________________________
> Rpm-devel mailing list
> Rpm-devellists.dulug.duke.edu
> https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel
>
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
country flaguser name
United States
2007-02-26 16:23:11
In regard to: [Rpm-devel] fix for scriptlet
unlink-before-exec, Frank...:

> As seen on Solaris with multiple CPUs, scriptlets get
unlinked before
> they can be exec'd.  This is because of illegal use of
a mutex <whistle>,
> 10 yard penalty!

I'm glad you've had time to look at this issue.  I haven't
had time to
even think about building 4.4.8.

I also haven't been able to reproduce the problem with
4.4.7.  I saw it
all the time with 4.4.6, but haven't seen it once with
4.4.7.

Tim
-- 
Tim Mooney                                          
Tim.Mooneyndsu.edu
Information Technology Services                      (701)
231-1076 (Voice)
Room 242-J6, IACC Building                           (701)
231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
country flaguser name
United States
2007-02-26 16:37:27
On February 26, 2007 4:23:11 PM -0600 Tim Mooney
<Tim.Mooneyndsu.edu> 
wrote:
> In regard to: [Rpm-devel] fix for scriptlet
unlink-before-exec, Frank...:
>
>> As seen on Solaris with multiple CPUs, scriptlets
get unlinked before
>> they can be exec'd.  This is because of illegal use
of a mutex <whistle>,
>> 10 yard penalty!
>
> I'm glad you've had time to look at this issue.  I
haven't had time to
> even think about building 4.4.8.
>
> I also haven't been able to reproduce the problem with
4.4.7.  I saw it
> all the time with 4.4.6, but haven't seen it once with
4.4.7.

I went straight from 4.2.1 to 4.4.8.  I did stop at 4.4.7
after all of
your activity but I only built it, never really tested it.

I see it 100% of the time in 4.4.8 on a 4 cpu machine.

The compelling reason to upgrade for me was your patches
that made it into
the distro (always good to have less patching) and erasure
ordering.  Since
using 4.4.8 I've found some other things that are really
nice!  Like linktos
and parent directory requirements.  Other stuff also, of
course, just those
are the most user visible items.

I had to patch back in the (64bit) marker stuff for
upgradeability
and correct manual (sysinfo) provides support.

My next upgrade will probably be 5.0. 

-frank
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
user name
2007-03-01 01:38:50
Hello,

Frank Cusack wrote:
> As seen on Solaris with multiple CPUs, scriptlets get
unlinked before
> they can be exec'd.  This is because of illegal use of
a mutex <whistle>,
> 10 yard penalty!
>
> [...]
>
> My patch simply disables the use of the mutex
altogether.  I don't see
> much advantage to it anyway ...

just for the record: This also solves the problem of *never*
finishing
*any* scriptlet on AIX [1] (except "--noscripts"
and being truss'ed,
that is  ). So
i'd too be interested in a proper fix, or at least
being able to disable at configure time.

Thanks & Regards,

	Frank


[1] https://lists.dulug.duke.edu/pipermail/rpm
-devel/2006-July/001274.html
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
user name
2007-03-09 03:42:14
On 3/1/07, Frank Fegert <fra.nospam.nkgmx.de> wrote:
> Hello,
>
> Frank Cusack wrote:
> > As seen on Solaris with multiple CPUs, scriptlets
get unlinked before
> > they can be exec'd.  This is because of illegal
use of a mutex <whistle>,
> > 10 yard penalty!
> >
> > [...]
> >
> > My patch simply disables the use of the mutex
altogether.  I don't see
> > much advantage to it anyway ...
>
> just for the record: This also solves the problem of
*never* finishing
> *any* scriptlet on AIX [1] (except
"--noscripts" and being truss'ed,
> that is  ). So
i'd too be interested in a proper fix, or at least
> being able to disable at configure time.
>

This is the first rport of a problem.

Details will get you a fix. What is needed is strace/truss
(or
whatever AIX) uses so that SIGCHLD delivery can be pin
pointed.

And -- no matter what -- there is a pathway through the
existing
rpmio/rpmsq.c that simply uses waitpid(3), which presumably
works as well on AIX as on any other uglix platform.

See rpm-devel archives for the waitpid patch, I've posted at
least twice.

73 de Jeff
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
country flaguser name
United States
2007-03-21 19:59:03
On March 21, 2007 7:44:20 PM +0100 Frank Fegert
<fra.nospam.nkgmx.de> 
wrote:
> and is still sitting in $VAR. Building rpm with Frank
Cusaks
> patch (rpm-4.4.8-scriptlet-mutex.patch) seems to
resolve the
> issue. But i'm not able to tell what else might break
by only
> disabling threading with 'int nothreads = 1;'.

Nothing.  That is not actually disabling threading.

-frank
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

Re: fix for scriptlet unlink-before-exec
user name
2007-03-22 01:18:36
On 3/21/07, Frank Fegert <fra.nospam.nkgmx.de> wrote:
> >
> > This is the first rport of a problem.
>
> no offense, but that's not entirely true, see:
>   https://lists.dulug.duke.edu/pipermail/rpm
-devel/2006-July/001274.html
> and following. Although the problem is now reproducable
even
> when rpm is truss'ed.
>

My confusion comes when a known problem (on Solaris, the
pipe used
to insure that the parent runs before child, fixed in
September) is claimed
to also be the cause of an identical problem on AIX.

FWIW, I did read the 7/23/truss and verified that SIGCHLD
was delivered
abt where I expected. Everything else about AIX is a mystery
to me.


> > Details will get you a fix. What is needed is
strace/truss (or
> > whatever AIX) uses so that SIGCHLD delivery can be
pin pointed.
>
> Please see the attached truss output. The scriptlet
contains only:
>
> # cat /opt/afw/var/tmp/rpm-tmp.13059
> /opt/afw/bin/install-info /opt/afw/share/info/grep.info
--dir=/opt/afw/share/info/dir;
>
> and is still sitting in $VAR. Building rpm with Frank
Cusaks
> patch (rpm-4.4.8-scriptlet-mutex.patch) seems to
resolve the
> issue. But i'm not able to tell what else might break
by only
> disabling threading with 'int nothreads = 1;'.
>

The rpm-4.4.8-scriptlet-mutex.patch patch as written cannot
be added to rpm.

Instead of
    #if 0
    ...
    #endif
ripping out code that is necessary (and working) on linux, a
run-time test
on whether a signal handler or waitpid is being used needs
to be attempted.

The variable that determines sighandler vs. waitpid is
    sq->reaper

The nothreads variable and run-time logic needs to be ripped
out as well.

Finally, DO_LOCK et al wrappers rather than pthread_mutex_*
calls need
to be used consistently, the mixture of 2 paradigms is
harder to debug
than either paradigm is.

> > And -- no matter what -- there is a pathway
through the existing
> > rpmio/rpmsq.c that simply uses waitpid(3), which
presumably
> > works as well on AIX as on any other uglix
platform.
> >
> > See rpm-devel archives for the waitpid patch, I've
posted at least twice.
>
> I tried this (see thread mentioned above), but problem
remained
> the same.
>

I can do all of the above, but I need someone on Solaris and
AIX to actually
test the results.

Otherwise, it takes months and months for the next problem
report to appear.

Any solaris/AIX volunteers?

73 de Jeff
_______________________________________________
Rpm-devel mailing list
Rpm-devellists.dulug.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

[1-10] [11-20] [21-22]

about | contact  Other archives ( Real Estate discussion Medical topics )