List Info

Thread: Re: Flock 0.9.1 pre-release -- fixes to lucene freeze issue




Re: Flock 0.9.1 pre-release -- fixes to lucene freeze issue
country flaguser name
United States
2007-09-13 16:21:06
On Thu, Sep 13, 2007 at 03:27:20PM -0400, Otis Gospodnetic
wrote:
> Hola,
> 
> I'm one of Lucene developers (n.b. Flock is using
CLucene, the C++ port 
> of Lucene).  Out of curiosity - have you considered
using Lucene (the 
> original Java version) in Flock?  I'm asking because
CLucene and other 
> ports are always pretty far behind the Java version,
and over at Lucene 
> Java we've made some major performance improvements
recently (plus a 
> good number of new features).  I imagine Flock would
benefit from using 
> the most advanced version, but perhaps there are
technical reasons why 
> the Java version cannot be used in Flock?

Yeah, I see 3 problems:

1) No guaranteed pre-installed JRE on Windows or Linux. This
either
   means the user must install a JRE beforehand, which is an
additional
   barrier for new users, or we bundle a JRE, which means
the download
   much bigger, which is also a barrier for new users.

2) The Java XPCOM binding has never seen wide testing, since
it's never
   been part of any default builds, and thus is still
immature.

3) Java has poor memory profile characteristics. Java's
general reputation
   of sucking up whatever memory it can grab, there's the
specific
   problem that because it's GC'd, the GC might not get run
fast enough
   to keep up with indexing operations (consider pages which
refresh
   themselves often), which to the user, looks like memory
spirialing
   out of control, even though at some later point, it'll
actually
   get cleaned up. This isn't specific to Java, we pushed
some code down
   from JavaScript to C++ for the same reason.

Lucene was also slower performance wise than CLucene, but I
haven't
actually revisited this in over a year, so I'm sure you guys
have made
some improvements since then. Do you have any recent
benchmarks between
the two?

-Manish
_______________________________________________
Flockstars mailing list
Flockstarsflock.com
h
ttps://lists.flock.com/mailman/listinfo/flockstars

Re: Flock 0.9.1 pre-release -- fixes to lucene freeze issue
country flaguser name
United States
2007-09-13 21:26:39
Hi Manish,

Thanks for the info.

Manish Singh wrote:
> On Thu, Sep 13, 2007 at 03:27:20PM -0400, Otis
Gospodnetic wrote:
>> Hola,
>>
>> I'm one of Lucene developers (n.b. Flock is using
CLucene, the C++ port 
>> of Lucene).  Out of curiosity - have you considered
using Lucene (the 
>> original Java version) in Flock?  I'm asking
because CLucene and other 
>> ports are always pretty far behind the Java
version, and over at Lucene 
>> Java we've made some major performance improvements
recently (plus a 
>> good number of new features).  I imagine Flock
would benefit from using 
>> the most advanced version, but perhaps there are
technical reasons why 
>> the Java version cannot be used in Flock?
> 
> Yeah, I see 3 problems:
> 
> 1) No guaranteed pre-installed JRE on Windows or Linux.
This either
>    means the user must install a JRE beforehand, which
is an additional
>    barrier for new users, or we bundle a JRE, which
means the download
>    much bigger, which is also a barrier for new users.

I was always why this is still an argument.  5MB or 20MB
download.... 
does that still represent a problem?  Esp. for the type of
people Flock 
is aimed at?

> 2) The Java XPCOM binding has never seen wide testing,
since it's never
>    been part of any default builds, and thus is still
immature.

I see.  That is what I thought....though I remember
something from 
Stefano Mazzocchi on that topic....google...  aha 
http://www.betaversion.org/~stefano/linotype/news/89/

> 3) Java has poor memory profile characteristics. Java's
general reputation
>    of sucking up whatever memory it can grab, there's
the specific
>    problem that because it's GC'd, the GC might not get
run fast enough
>    to keep up with indexing operations (consider pages
which refresh
>    themselves often), which to the user, looks like
memory spirialing
>    out of control, even though at some later point,
it'll actually
>    get cleaned up. This isn't specific to Java, we
pushed some code down
>    from JavaScript to C++ for the same reason.

I'm not sure I'd agree 100%, but I'm old enough not to get
into that 
discussion   When you
talk about indexing and frequently refreshing 
pages... are you saying Flock periodically refetches and
reindexes them? 
  I thought it indexed pages only as people visit them, no?

> Lucene was also slower performance wise than CLucene,
but I haven't
> actually revisited this in over a year, so I'm sure you
guys have made
> some improvements since then. Do you have any recent
benchmarks between
> the two?

I don't think there are any recent benchmarks.  I didn't
realize the 
main concern is indexing performance (as opposed to search
or features), 
but I know some recent improvements made indexing 25% faster
(background 
threading and such).

Otis
--
Simpy -- http://www.simpy.com/ --
Tag.  Search.  Share.

_______________________________________________
Flockstars mailing list
Flockstarsflock.com
h
ttps://lists.flock.com/mailman/listinfo/flockstars

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )