List Info

Thread: RE: IndexWriter shutdown




RE: IndexWriter shutdown
country flaguser name
United States
2007-05-23 13:56:05
"Michael McCandless" wrote:

> That's correct.
>
> On seeing the "shutdown in progress"
exception, the current "finally"
> clause in mergeSegments would revert the internal state
of the
> IndexWriter to be consistent, ie, put back the segments
that were in
> the process of being merged into its segmentInfos.  It
will also
> remove any partially created but now unusable newly
merged segments
> files.
>
> If the application catches this exception and calls
> IndexWriter.close(), then the state until just before
the aborted
> merge would be committed to the index.  If instead the
application
> catches the exception and does nothing, then the state
of the index
> reverts back to where it was when this IndexWriter
instance was first
> opened.
>
> So the semantics of autoCommit=false will be correctly
enforced if any
> exception (not just this new one) comes up through
mergeSegments.

Great.

So my comment on Antony's "mini-optimize" scenario
was
partially wrong, because under autcCommit=true (which is
the default), those sub-merges that completed before
shutdown
are not lost, only the last one, the one that was
interrupted.

Mmmm... I can see how autocommit=true works fine, because
anything (auto)committed is already saved, and there
is no need to write anything more.  But for
autoCommit=false
it is not clear to me how such further call to
indexWriter.close()
by the application can work - because a shutdown state is
in
effect, and any attempt to write/flush anything would just
throw
the same exception again...  or am I missing something?


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


RE: IndexWriter shutdown
user name
2007-05-23 16:19:19
"Doron Cohen" <DORONCil.ibm.com> wrote:
>
> "Michael McCandless" wrote:
> 
> > That's correct.
> >
> > On seeing the "shutdown in progress"
exception, the current "finally"
> > clause in mergeSegments would revert the internal
state of the
> > IndexWriter to be consistent, ie, put back the
segments that were in
> > the process of being merged into its segmentInfos.
 It will also
> > remove any partially created but now unusable
newly merged segments
> > files.
> >
> > If the application catches this exception and
calls
> > IndexWriter.close(), then the state until just
before the aborted
> > merge would be committed to the index.  If instead
the application
> > catches the exception and does nothing, then the
state of the index
> > reverts back to where it was when this IndexWriter
instance was first
> > opened.
> >
> > So the semantics of autoCommit=false will be
correctly enforced if any
> > exception (not just this new one) comes up through
mergeSegments.
> 
> Great.
> 
> So my comment on Antony's "mini-optimize"
scenario was
> partially wrong, because under autcCommit=true (which
is
> the default), those sub-merges that completed before
shutdown
> are not lost, only the last one, the one that was
interrupted.

Right.
 
> Mmmm... I can see how autocommit=true works fine,
because
> anything (auto)committed is already saved, and there
> is no need to write anything more.  But for
autoCommit=false
> it is not clear to me how such further call to
indexWriter.close()
> by the application can work - because a shutdown state
is in
> effect, and any attempt to write/flush anything would
just throw
> the same exception again...  or am I missing
something?

Ahh, you are correct: the global/static shutdown state would
prevent
any further writes, so if the IndexWriter.close() tried to
write the
new segments_N, it would hit the same exception.

Maybe this isn't really a big deal?  Ie people who open an
IndexWriter
with autoCommit=false should be prepared on shutdown to lose
all that
had been done during the lifetime of that writer? 
Presumably faced
with this you would just open a new writer exclusively to do
the
optimize.  Though for the merging case, which you can't
control (just
happens on certain addDocument(...) calls) that's harder
because you
could then lose added documents.

Or, maybe, you have a way to "un-shutdown" and you
call this before
calling close?  Or instead of "shutdown" it's more
of a "interrupt the
merge if it's in progress" which then doesn't prevent
further IO?
This is getting somewhat complex...

Maybe we should leave this out of the core, and instead
implement as
[external] subclass of FSDirectory, until we can get a
better handle
on it?

Mike

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


RE: IndexWriter shutdown
user name
2007-05-23 20:35:43
> Or instead of "shutdown" it's more of a
"interrupt the
> merge if it's in progress" which then doesn't
prevent further IO?

At a high level, this would seem like the most valuable
approach. But I
think we would want to distinguish between writing new
documents and
merges of existing segments. The way things stand, that
means that the
merges from the ram segments onto disk should not be
interruptible,
though disk to disk merges should be.

It does sound a little more complicated, but it might still
not be too
messy. Especially given that the way things are looking,
that the merge
from ram segments to disk is going to go away/be handled
differently
than disk to disk anyway.

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )