Please don't reply to lustre-devel. Instead, comment in
Bugzilla by using the following link:
htt
ps://bugzilla.lustre.org/show_bug.cgi?id=12326
Sanity test 78 now fails silently for me and fills up an
OST, causing later
failures. The root cause of this for me is likely that I
recently increased the
amount of memory available in my test environment, so now
it's trying to write a
much larger file than before.
== test 78: handle large O_DIRECT writes correctly
============= 13:37:25
(1177349845)
directio on /mnt/lustre/f78 for 125x1048576 bytes
Write error Success (rc = 89128960, len = 131072000)
lustre.fail_loc = 0
sanity.sh: FAIL: test_78 exit with rc=1
Debug log: 24266 lines, 24266 kept, 0 dropped.
lustre.fail_loc = 0
PASS (5s)
Here we can see the undetected failure. The write actually
failed but the test
shows as "PASS". But then later, other tests fail
and:
# lfs df /mnt/lustre
UUID 1K-blocks Used Available Use%
Mounted on
lustre-MDT0000_UUID 34984 8344 26640 23%
/mnt/lustre[MDT:0]
lustre-OST0000_UUID 46856 45936 920 98%
/mnt/lustre[OST:0]
lustre-OST0001_UUID 46856 46856 0 100%
/mnt/lustre[OST:1]
lustre-OST0002_UUID 46856 44480 2376 94%
/mnt/lustre[OST:2]
filesystem summary: 140568 137272 3296 97%
/mnt/lustre
I have a full OST, which was why the write in test 78
failed.
So to fix, I suggest:
1. Detecting write failures and failing the test.
2. Writing a file small enough not to fill up any OST.
3. Unlinking the file at the end of the test.
I may find time to work on this issue this week but I'm not
promising anything
so I'll leave it unassigned.
_______________________________________________
Lustre-devel mailing list
Lustre-devel clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-devel
|