List Info

Thread: Broken optimization?




Broken optimization?
user name
2006-02-01 18:39:35
On Wed, 2006-02-01 at 10:25 -0800, Andi Vajda wrote:
> What version of PyLucene are you running ?
PyLucene.py states 1.0

> On what operating system ?
Ubuntu 5.04

> Did you build it yourself ? if so, with what version of
gcj ?
I don't actually know but I'll try to find out. In any case,
gcj is
version 4.0.2

> 
> Andi..
> 
> On Wed, 1 Feb 2006, Jared Kuolt wrote:
> 
> > Hello all,
> >
> > I've recently inherited a PyLucene project with
very little knowledge of
> > PyLucene and Lucene itself.
> >
> > To cut to the chase, I think I've screwed
something up. Basically there
> > is a script we have that runs, pulling records
from a text file, and
> > then puts them into a "queue" to update
the indexes later.
> >
> > In any case, the basic jist of the script is this:
> >
> > #### START ####
> >
> > def main():
> >    reader = Reader("../text.txt")
> >    '''to add id for reindexing profile'''
> >    queue = StaleQueue()
> >    '''lucene index to store events'''
> >    indexDirectory =
"/home/data/qdb/events"
> >    analyzer = PyLucene.StandardAnalyzer()
> >    writer = PyLucene.IndexWriter(indexDirectory,
analyzer, False)
> >    counter = 0
> >    line = reader.get_line()
> >    while(line):
> >        counter +=1
> >        if counter % 10000 == 0:
> >            print "passing: %s" %counter
> >            writer.optimize()
> >            print "optimized - resuming"
> >        handler = EfDemo(line)
> >        indexer = EventIndexer(writer)
> >        indexer.index(handler.get_id(), handler)
> >        queue.add(handler.get_id())
> >        line = reader.get_line()
> >    writer.close()
> >
> > #### END ####
> >
> > It dies after 1000 records (probably has to do
with the optimization...
> > but I have no clue how to check):
> >
> > #### START ####
> >
> > Traceback (most recent call last):
> >  File "./qdbefdemohandler.py", line 48,
in ?
> >    main()
> >  File "./qdbefdemohandler.py", line 29,
in main
> >    indexer.index(handler.get_andii_id(), handler)
> >  File
"/usr/local/lib/python2.4/site-packages/qdb/qdbindexer.
py", line
> > 17, in index
> >    self.writer.addDocument(doc)
> >  File
"/usr/lib/python2.4/site-packages/PyLucene.py",
line 1902, in
> > addDocument
> >    def addDocument(*args): return
> > _PyLucene.IndexWriter_addDocument(*args)
> > PyLucene.JavaError:
> > java.io.FileNotFoundException:
/home/data/qdb/events/_30fvb.fnm (No such
> > file or directory)
> >
> > #### END ####
> >
> > Thoughts? Help a PyLu newbie out! 
> >
> > -- 
> > Jared Kuolt <jaredkmorefocus.com>
> >
> >
> > _______________________________________________
> > pylucene-dev mailing list
> > pylucene-devosafoundation.org
> > http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
> >
> _______________________________________________
> pylucene-dev mailing list
> pylucene-devosafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
-- 
Jared Kuolt <jaredkmorefocus.com>
morefocus, inc.

_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
Broken optimization?
user name
2006-02-01 19:29:03
On Wed, 1 Feb 2006, Jared Kuolt wrote:

> On Wed, 2006-02-01 at 10:25 -0800, Andi Vajda wrote:
>> What version of PyLucene are you running ?
> PyLucene.py states 1.0
>
>> On what operating system ?
> Ubuntu 5.04
>
>> Did you build it yourself ? if so, with what
version of gcj ?
> I don't actually know but I'll try to find out. In any
case, gcj is
> version 4.0.2

PyLucene 1.0 is built from Java Lucene 1.4.3, you may have
hit a bug with Java 
Lucene itself. You could ask the java-userlucene.apache.org mailing list 
about it or upgrade to the latest PyLucene 1.9 which is very
close to the Java 
Lucene 1.9's svn HEAD revision. Even though there is no
official Java Lucene 
1.9 release yet, it appears to be very stable and has had
many bugs fixed 
since release 1.4.3. Indexes created with 1.4.3 are supposed
to be readable 
from Lucene 1.9 (the opposite is not true).

For a recent source tarball of PyLucene 1.9 see 
http://pylucene.os
afoundation.org.

You should also use gcj 3.4.x, x >= 3, gcj building
instructions are included 
near the bottom of PyLucene's INSTALL file. I've had little
luck using gcj 4.x 
so far.

Andi..

>
>>
>> Andi..
>>
>> On Wed, 1 Feb 2006, Jared Kuolt wrote:
>>
>>> Hello all,
>>>
>>> I've recently inherited a PyLucene project with
very little knowledge of
>>> PyLucene and Lucene itself.
>>>
>>> To cut to the chase, I think I've screwed
something up. Basically there
>>> is a script we have that runs, pulling records
from a text file, and
>>> then puts them into a "queue" to
update the indexes later.
>>>
>>> In any case, the basic jist of the script is
this:
>>>
>>> #### START ####
>>>
>>> def main():
>>>    reader = Reader("../text.txt")
>>>    '''to add id for reindexing profile'''
>>>    queue = StaleQueue()
>>>    '''lucene index to store events'''
>>>    indexDirectory =
"/home/data/qdb/events"
>>>    analyzer = PyLucene.StandardAnalyzer()
>>>    writer =
PyLucene.IndexWriter(indexDirectory, analyzer, False)
>>>    counter = 0
>>>    line = reader.get_line()
>>>    while(line):
>>>        counter +=1
>>>        if counter % 10000 == 0:
>>>            print "passing: %s"
%counter
>>>            writer.optimize()
>>>            print "optimized -
resuming"
>>>        handler = EfDemo(line)
>>>        indexer = EventIndexer(writer)
>>>        indexer.index(handler.get_id(), handler)
>>>        queue.add(handler.get_id())
>>>        line = reader.get_line()
>>>    writer.close()
>>>
>>> #### END ####
>>>
>>> It dies after 1000 records (probably has to do
with the optimization...
>>> but I have no clue how to check):
>>>
>>> #### START ####
>>>
>>> Traceback (most recent call last):
>>>  File "./qdbefdemohandler.py", line
48, in ?
>>>    main()
>>>  File "./qdbefdemohandler.py", line
29, in main
>>>    indexer.index(handler.get_andii_id(),
handler)
>>>  File
"/usr/local/lib/python2.4/site-packages/qdb/qdbindexer.
py", line
>>> 17, in index
>>>    self.writer.addDocument(doc)
>>>  File
"/usr/lib/python2.4/site-packages/PyLucene.py",
line 1902, in
>>> addDocument
>>>    def addDocument(*args): return
>>> _PyLucene.IndexWriter_addDocument(*args)
>>> PyLucene.JavaError:
>>> java.io.FileNotFoundException:
/home/data/qdb/events/_30fvb.fnm (No such
>>> file or directory)
>>>
>>> #### END ####
>>>
>>> Thoughts? Help a PyLu newbie out! 
>>>
>>> --
>>> Jared Kuolt <jaredkmorefocus.com>
>>>
>>>
>>> _______________________________________________
>>> pylucene-dev mailing list
>>> pylucene-devosafoundation.org
>>> http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
>>>
>> _______________________________________________
>> pylucene-dev mailing list
>> pylucene-devosafoundation.org
>> http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
> -- 
> Jared Kuolt <jaredkmorefocus.com>
> morefocus, inc.
>
>

_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )