List Info

Thread: Commented: (LUCENE-140) docs out of order




Commented: (LUCENE-140) docs out of order
user name
2007-01-08 13:10:27
    [ https://issues.apache.org/jira/browse/
LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12463029 ] 

Michael McCandless commented on LUCENE-140:
-------------------------------------------

OK: I finally found one way that this corruption can occur! 
I will
create a unit test & commit a fix.

If you delete by document number, and, that document number
is larger
than maxDoc, but only by a little, then the call to
deletedDocs.set(num) may in fact succeed (ie, no exception),
but will
have set bits that are "out of bounds" in the
BitVector's bits array.

This is because the bits array is an array of bytes and so
you can
have up to 7 of these unused bits at the end.  Once this has
happened,
any attempt to merge this segment will hit the "docs
out of order"
exception because the BitVector's count() method will count
these
"illegally set" bits and thus make the
SegmentMerger think too many
docs are deleted. 

Unfortunately, this case only occurs if you use
deleteDocument(int),
so I can't yet explain how this happens when using only
deleteDocument(Term).


> docs out of order
> -----------------
>
>                 Key: LUCENE-140
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-140
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: Linux
> Platform: PC
>            Reporter: legez
>         Assigned To: Lucene Developers
>         Attachments: bug23650.txt, corrupted.part1.rar,
corrupted.part2.rar
>
>
> Hello,
>   I can not find out, why (and what) it is happening
all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
>
org.apache.lucene.index.SegmentMerger.appendPostings(Segment
Merger.java:219)
>         at
>
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentM
erger.java:191)
>         at
>
org.apache.lucene.index.SegmentMerger.mergeTermInfos(Segment
Merger.java:172)
>         at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerg
er.java:135)
>         at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.ja
va:88)
>         at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWrite
r.java:341)
>         at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.jav
a:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what
happened to it? I can not find
> it neither in download nor in version list in this
form). Everything seems OK. I
> can search through index, but I can not optimize it.
Even worse after this
> exception every time I add new documents and close
IndexWriter new segments is
> created! I think it has all documents added before,
because of its size.
> My index is quite big: 500.000 docs, about 5gb of index
directory.
> It is _repeatable_. I drop index, reindex everything.
Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader
reader, String data_wydania,
> String id_wydania, String id_gazety, String
data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania",
data_wydania));
>     doc.add(Field.Keyword("id_wydania",
id_wydania));
>     doc.add(Field.Text("id_gazety",
id_gazety));
>     doc.add(Field.Keyword("data_wstawienia",
data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: https://issues.apache.org/jira/secure/Administrators.js
pa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )