On Tue, May 22, 2007 at 02:33:34PM +0200, Thomas Koch
wrote:
> Hi,
>
> We're using PyLucene to create an index of our
python-based database (for
> searching).
...
> Next we observed "GC Warning" messages during
index creation :
> GC Warning: Repeated allocation of very large block
(appr. size 1466368):
> May lead to memory leak and poor
performance.
We had to rebuild GCJ 3.4.6 with LARGE_CONFIG defined to
avoid this
message. I checked GCJ 4.2.0, and LARGE_CONFIG still
doesn't seem to
be defined by default. The comment from 4.2.0's
Makefile.direct still
reads:
"# -DLARGE_CONFIG tunes the collector for unusually
large heaps.
# Necessary for heaps larger than about 500 MB on most
machines.
# Recommended for heaps larger than about 64 MB.
"
It's possible I'm missing something about the 4.2 build
process
which sets LARGE_CONFIG, of course.
Also, the "Large stack limit" message comes from
boehm-gc/solaris_threads.c in gcj, so that warning seems
solaris-specific. You might be able to avoid that by
setting your
maximum stack size lower than 8M with ulimit (the number
reported is
2G?)
> If there's any other way to get rid of the GC Warning
(and memory leak) that
> would be of interest of course...
You could probably divide up your documents, and index, say,
50K in
one process, exit, do the next 50K in a new process, etc.,
tuning
the batch sizes as needed. Inelegant, but it'd probably
work.
Aaron Lav (asl2 pobox.com)
_______________________________________________
pylucene-dev mailing list
pylucene-dev osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
|