List Info

Thread: ASSERTION(granted_lock != NULL)




ASSERTION(granted_lock != NULL)
user name
2007-01-04 13:53:47
Please don't reply to lustre-devel. Instead, comment in
Bugzilla by using the following link:
htt
ps://bugzilla.lustre.org/show_bug.cgi?id=11277

           What    |Removed                     |Added
------------------------------------------------------------
----------------
                 CC|                            |shadowclusterfs.com


That's what happens here:
1. client does some operation, which take long time on
server (trying to take a
lock)
2. client times out and resends request.
3. new request arrives at server, we look in the list of
locks and find original
requested lock from (1), we get this lock and its cookie,
decide that this is
the lock we sent to client and start to do resending
preparations (passes the 
lustre_handle_is_used() check in mds_getattr_lock).
4. at this time thread serving request from (1) finally gets
the lock it was
waiting for and decides to return it to client, it rewrites
remote handle and
returns with ELDLM_LOCK_REPLACED from mds_intent_policy().
This makes
ldlm_lock_enqueue() to drop existing lock from (1) (and it's
cookie).
5. thread serving resent request is trying to get a lock
from cookie(now
invalid), fails and trips on the assertion.

Easy fix I think about is to add an extra check
fixup_handle_for_resent_req()
that the lock we are about to choose is granted (req_mode ==
granted_mode).
The drawback is we will obviously redo entire lock granting
for second request.
But should be enough for a quick fix.

Shadow: please start with a recovery-small.sh test to
replicate the issue (you
will need to strategically place OBD_FAIL/OBD_RACE in
mds/handler.c), then try
the solution I described above.

_______________________________________________
Lustre-devel mailing list
Lustre-develclusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-devel


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )