List Info

Thread: Commented: (SOLR-139) Support updateable/modifiable documents




Commented: (SOLR-139) Support updateable/modifiable documents
country flaguser name
United States
2007-07-13 15:53:04
    [ https://issues.apache.org/jira/browse/SO
LR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12512617 ] 

Yonik Seeley commented on SOLR-139:
-----------------------------------

>> ... ParallelReader, where some fields are in one
sub-index ...
> the processor would ask the updateHandler for the
existing document - the updateHandler deals with
> getting it to/from the right place.

The big reason you would use ParallelReader is to avoid
touching the less-modified/bigger fields in one index when
changing some of the other fields in the other index.

> What are you thinking? Adding the processor as a
parameter to AddUpdateCommand?

I didn't have a clear alternative... I was just pointing out
the future pitfalls of assuming too much implementation
knowledge.



> Support updateable/modifiable documents
> ---------------------------------------
>
>                 Key: SOLR-139
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-139
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Ryan McKinley
>            Assignee: Ryan McKinley
>         Attachments:
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-IndexDocumentCommand.patch,
SOLR-139-ModifyInputDocuments.patch,
SOLR-139-ModifyInputDocuments.patch,
SOLR-139-XmlUpdater.patch,
SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a
document without having to insert the entire document.
> Given the way lucene is structured, (for now) one can
only modify stored fields.
> While we are at it, we can support incrementing an
existing value - I think this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-doc
uments-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Re: Commented: (SOLR-139) Support updateable/modifiable documents
country flaguser name
Canada
2007-07-13 19:48:06
On 13-Jul-07, at 1:53 PM, Yonik Seeley (JIRA) wrote:

>
>>> ... ParallelReader, where some fields are in
one sub-index ...
>> the processor would ask the updateHandler for the
existing  
>> document - the updateHandler deals with
>> getting it to/from the right place.
>
> The big reason you would use ParallelReader is to avoid
touching  
> the less-modified/bigger fields in one index when
changing some of  
> the other fields in the other index.

I've pondered this a few times: it could be a huge win for 

highlighting apps, which can be stored-field-heavy.

However, I wonder if there is something that I am missing:
PR  
requires perfect synchro of lucene doc ids, no?  If you
update fields  
for a doc in one index, need not you (re-)store the fields
in all  
other indices too, to keep the doc ids in sync?

-mike

Re: Commented: (SOLR-139) Support updateable/modifiable documents
user name
2007-07-13 20:25:33
On 7/13/07, Mike Klaas <mike.klaasgmail.com> wrote:
> >>> ... ParallelReader, where some fields are
in one sub-index ...
> >> the processor would ask the updateHandler for
the existing
> >> document - the updateHandler deals with
> >> getting it to/from the right place.
> >
> > The big reason you would use ParallelReader is to
avoid touching
> > the less-modified/bigger fields in one index when
changing some of
> > the other fields in the other index.
>
> I've pondered this a few times: it could be a huge win
for
> highlighting apps, which can be stored-field-heavy.
>
> However, I wonder if there is something that I am
missing: PR
> requires perfect synchro of lucene doc ids, no?  If you
update fields
> for a doc in one index, need not you (re-)store the
fields in all
> other indices too, to keep the doc ids in sync?

Well, it would be tricky... one PR usecase would be to
entirely
re-index one field (in it's own separate index) thus
maintaining
synchronization with the main index. As Doug said
"ParallelReader was not really designed to support
incremental updates of
fields, but rather to accellerate batch updates. For
incremental
updates you're probably better served by updating a single
index."

That's probably not too useful for a general purpose
platform like Solr.

Another way to support a more incremental model is perhaps
to split up
the smaller volatile index into many segments so that
updating a
single doc involves rewriting just that segment.

There might also be possibilities in different types of
IndexReader
implementations:  one could map docids to maintain
synchronization.
This brings up a slightly different problem that lucene
scorers expect
to go in docid order.

-Yonik

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )