List Info

Thread: Search results problem




Search results problem
country flaguser name
Germany
2007-10-16 05:03:19
Hi,

the content of one document is completely contained in
another,
but search for a special word I only get one document as
result.
I am absolutely sure it is contained in the other document,
but I will
only get the "parent" doc if I add a word.

Maybe I am doing something wrong with the way the schema is
setup, for
easy searching I added a fulltext field, that I use as
default search
field where the content of all fields is duplicated. Might
that be a
problem? Could I do that with copyfields? But the schema is
generated,
so I don't really know which fields are there.

- Max
-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  max.huetterblue-elephant-systems.com
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB
24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger
Dietrich

Re: Search results problem
country flaguser name
Germany
2007-10-16 11:36:41
ON TUESDAY 16 OCTOBER 2007 12:03, MAXIMILIAN HüTTER WROTE:

> THE CONTENT OF ONE DOCUMENT IS COMPLETELY CONTAINED IN
ANOTHER,
> BUT SEARCH FOR A SPECIAL WORD I ONLY GET ONE DOCUMENT
AS RESULT.
> I AM ABSOLUTELY SURE IT IS CONTAINED IN THE OTHER
DOCUMENT, BUT I WILL
> ONLY GET THE "PARENT" DOC IF I ADD A WORD.

YOU SHOULD TRY DEBUGGING THE PROBLEM WITH LUKE, E.G. USE
"RECONSTRUCT & 
EDIT" TO SEE IF THE TERM IS REALLY INDEXED IN BOTH
DOCUMENTS.

REGARDS
 DANIEL

-- 
HTTP://WWW.DANIELNABER.DE

Re: Search results problem
country flaguser name
Germany
2007-10-17 05:26:25
Daniel Naber schrieb:
> On Tuesday 16 October 2007 12:03, Maximilian Hütter
wrote:
> 
>> the content of one document is completely contained
in another,
>> but search for a special word I only get one
document as result.
>> I am absolutely sure it is contained in the other
document, but I will
>> only get the "parent" doc if I add a
word.
> 
> You should try debugging the problem with Luke, e.g.
use "reconstruct & 
> edit" to see if the term is really indexed in both
documents.
> 
> Regards
>  Daniel
> 

Thank you for the tip, after using luke I can see that the
term is
really missing in the other document.
Is there a size restriction for field content in
Solr/Lucene? Because
from the "fulltext" field I use as default field
(after luke
reconstruction) seem to be missing a lot strings I expected
to find there.

Best regards,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  max.huetterblue-elephant-systems.com
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB
24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger
Dietrich

Re: Search results problem
user name
2007-10-17 05:44:44
There is a configuration option called
"<maxFieldLength>" in
solrconfig.xmlwith the default value of 10,000.  You may
need to
increase this value if
you are indexing fields that are longer.



On 17/10/2007, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
>
> Daniel Naber schrieb:
> > On Tuesday 16 October 2007 12:03, Maximilian
Hütter wrote:
> >
> >> the content of one document is completely
contained in another,
> >> but search for a special word I only get one
document as result.
> >> I am absolutely sure it is contained in the
other document, but I will
> >> only get the "parent" doc if I add a
word.
> >
> > You should try debugging the problem with Luke,
e.g. use "reconstruct &
> > edit" to see if the term is really indexed in
both documents.
> >
> > Regards
> >  Daniel
> >
>
> Thank you for the tip, after using luke I can see that
the term is
> really missing in the other document.
> Is there a size restriction for field content in
Solr/Lucene? Because
> from the "fulltext" field I use as default
field (after luke
> reconstruction) seem to be missing a lot strings I
expected to find there.
>
> Best regards,
>
> Max
>
> --
> Maximilian Hütter
> blue elephant systems GmbH
> Wollgrasweg 49
> D-70599 Stuttgart
>
> Tel            :  (+49) 0711 - 45 10 17 578
> Fax            :  (+49) 0711 - 45 10 17 573
> e-mail         :  max.huetterblue-elephant-systems.com
> Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB
24106
> Geschäftsführer:  Joachim Hörnle, Thomas Gentsch,
Holger Dietrich
>
Re: Search results problem
country flaguser name
Spain
2007-10-17 05:46:59
ON WED, 2007-10-17 AT 20:44 +1000, PIETER BERKEL WROTE:
> THERE IS A CONFIGURATION OPTION CALLED
"<MAXFIELDLENGTH>" IN
> SOLRCONFIG.XMLWITH THE DEFAULT VALUE OF 10,000.  YOU
MAY NEED TO
> INCREASE THIS VALUE IF
> YOU ARE INDEXING FIELDS THAT ARE LONGER.
> 

IS THERE A WAY TO DEFINE A UNLIMITED VALUE? LIKE -1?

TIA

SALU2

> 
> 
> ON 17/10/2007, MAXIMILIAN HüTTER <MHBLUE-ELEPHANT-SYSTEMS.COM> WROTE:
> >
> > DANIEL NABER SCHRIEB:
> > > ON TUESDAY 16 OCTOBER 2007 12:03, MAXIMILIAN
HüTTER WROTE:
> > >
> > >> THE CONTENT OF ONE DOCUMENT IS COMPLETELY
CONTAINED IN ANOTHER,
> > >> BUT SEARCH FOR A SPECIAL WORD I ONLY GET
ONE DOCUMENT AS RESULT.
> > >> I AM ABSOLUTELY SURE IT IS CONTAINED IN
THE OTHER DOCUMENT, BUT I WILL
> > >> ONLY GET THE "PARENT" DOC IF I
ADD A WORD.
> > >
> > > YOU SHOULD TRY DEBUGGING THE PROBLEM WITH
LUKE, E.G. USE "RECONSTRUCT &
> > > EDIT" TO SEE IF THE TERM IS REALLY
INDEXED IN BOTH DOCUMENTS.
> > >
> > > REGARDS
> > >  DANIEL
> > >
> >
> > THANK YOU FOR THE TIP, AFTER USING LUKE I CAN SEE
THAT THE TERM IS
> > REALLY MISSING IN THE OTHER DOCUMENT.
> > IS THERE A SIZE RESTRICTION FOR FIELD CONTENT IN
SOLR/LUCENE? BECAUSE
> > FROM THE "FULLTEXT" FIELD I USE AS
DEFAULT FIELD (AFTER LUKE
> > RECONSTRUCTION) SEEM TO BE MISSING A LOT STRINGS I
EXPECTED TO FIND THERE.
> >
> > BEST REGARDS,
> >
> > MAX
> >
> > --
> > MAXIMILIAN HüTTER
> > BLUE ELEPHANT SYSTEMS GMBH
> > WOLLGRASWEG 49
> > D-70599 STUTTGART
> >
> > TEL            :  (+49) 0711 - 45 10 17 578
> > FAX            :  (+49) 0711 - 45 10 17 573
> > E-MAIL         :  MAX.HUETTERBLUE-ELEPHANT-SYSTEMS.COM
> > SITZ           :  STUTTGART, AMTSGERICHT
STUTTGART, HRB 24106
> > GESCHäFTSFüHRER:  JOACHIM HöRNLE, THOMAS GENTSCH,
HOLGER DIETRICH
> >
-- 
THORSTEN SCHERLER                                
THORSTEN.AT.APACHE.ORG
OPEN SOURCE JAVA                      CONSULTING, TRAINING
AND SOLUTIONS


Re: Search results problem
user name
2007-10-17 07:16:01
Just to clarify, <maxFieldLength> refers to the
maximum number of *terms*
that will be indexed per field, not the character length of
the field (I
wasn't clear about that in my previous post).

Unfortunately there is no way to specify an unlimited value,
although if you
set it to a suitably large value, you shouldn't really have
any problems
(other than running out of memory).

Piete



On 17/10/2007, Thorsten Scherler
<thorsten.scherler.extjuntadeandalucia.es>
wrote:
>
> On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel
wrote:
> > There is a configuration option called
"<maxFieldLength>" in
> > solrconfig.xmlwith the default value of 10,000. 
You may need to
> > increase this value if
> > you are indexing fields that are longer.
> >
>
> Is there a way to define a unlimited value? Like -1?
>
> TIA
>
> salu2
>
> >
> >
> > On 17/10/2007, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
> > >
> > > Daniel Naber schrieb:
> > > > On Tuesday 16 October 2007 12:03,
Maximilian Hütter wrote:
> > > >
> > > >> the content of one document is
completely contained in another,
> > > >> but search for a special word I only
get one document as result.
> > > >> I am absolutely sure it is contained
in the other document, but I
> will
> > > >> only get the "parent" doc
if I add a word.
> > > >
> > > > You should try debugging the problem
with Luke, e.g. use
> "reconstruct &
> > > > edit" to see if the term is really
indexed in both documents.
> > > >
> > > > Regards
> > > >  Daniel
> > > >
> > >
> > > Thank you for the tip, after using luke I can
see that the term is
> > > really missing in the other document.
> > > Is there a size restriction for field content
in Solr/Lucene? Because
> > > from the "fulltext" field I use as
default field (after luke
> > > reconstruction) seem to be missing a lot
strings I expected to find
> there.
> > >
> > > Best regards,
> > >
> > > Max
> > >
> > > --
> > > Maximilian Hütter
> > > blue elephant systems GmbH
> > > Wollgrasweg 49
> > > D-70599 Stuttgart
> > >
> > > Tel            :  (+49) 0711 - 45 10 17 578
> > > Fax            :  (+49) 0711 - 45 10 17 573
> > > e-mail         :  max.huetterblue-elephant-systems.com
> > > Sitz           :  Stuttgart, Amtsgericht
Stuttgart, HRB 24106
> > > Geschäftsführer:  Joachim Hörnle, Thomas
Gentsch, Holger Dietrich
> > >
> --
> Thorsten Scherler                                
thorsten.at.apache.org
> Open Source Java                      consulting,
training and solutions
>
>
Re: Search results problem
country flaguser name
Germany
2007-10-17 07:29:59
Thorsten Scherler schrieb:
> On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel
wrote:
>> There is a configuration option called
"<maxFieldLength>" in
>> solrconfig.xmlwith the default value of 10,000. 
You may need to
>> increase this value if
>> you are indexing fields that are longer.
>>
> 
> Is there a way to define a unlimited value? Like -1?
> 
> TIA
>
I didn't see the maxFieldLength option, but that is surely
the problem,
as the document is truncated at 10000 terms.
The question is what to do about it, I certainly need a much
higher
number. I doubt if it is possible to set it to unlimited.

I also found this:

"Controls the maximum number of terms that can be added
to a Field for a
given Document, thereby truncating the document. Increase
this number if
large documents are expected. However, setting this value
too high may
result in out-of-memory errors."

Coming from: http://www.ibm.com/developerworks/library/j-solr2/in
dex.html

That might be a problem for me.

I was thinking about using copyFields, instead of one large
fulltext
field. Would that solve my problem, or would the
maxFieldLength still
apply when using copyFields?

Best regards,

Max


-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  max.huetterblue-elephant-systems.com
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB
24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger
Dietrich

Re: Search results problem
user name
2007-10-17 08:51:37
On 10/17/07, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
> I also found this:
>
> "Controls the maximum number of terms that can be
added to a Field for a
> given Document, thereby truncating the document.
Increase this number if
> large documents are expected. However, setting this
value too high may
> result in out-of-memory errors."
>
> Coming from: http://www.ibm.com/developerworks/library/j-solr2/in
dex.html
>
> That might be a problem for me.
>
> I was thinking about using copyFields, instead of one
large fulltext
> field. Would that solve my problem, or would the
maxFieldLength still
> apply when using copyFields?

maxFieldLength is a setting on the IndexWriter and applies
to all fields.
If you want more tokens indexed, simply increase the value
of
maxFieldLength to something like 2000000000 and you should
be fine.

There's no penalty for setting it higher than the largest
field you
are indexing (no diff between 1M and 2B if all your docs
have field
lengths less than 1M tokens anyway).

-Yonik

Re: Search results problem
country flaguser name
Germany
2007-10-19 06:52:56
Yonik Seeley schrieb:
> On 10/17/07, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
>> I also found this:
>>
>> "Controls the maximum number of terms that can
be added to a Field for a
>> given Document, thereby truncating the document.
Increase this number if
>> large documents are expected. However, setting this
value too high may
>> result in out-of-memory errors."
>>
>> Coming from: http://www.ibm.com/developerworks/library/j-solr2/in
dex.html
>>
>> That might be a problem for me.
>>
>> I was thinking about using copyFields, instead of
one large fulltext
>> field. Would that solve my problem, or would the
maxFieldLength still
>> apply when using copyFields?
> 
> maxFieldLength is a setting on the IndexWriter and
applies to all fields.
> If you want more tokens indexed, simply increase the
value of
> maxFieldLength to something like 2000000000 and you
should be fine.
> 
> There's no penalty for setting it higher than the
largest field you
> are indexing (no diff between 1M and 2B if all your
docs have field
> lengths less than 1M tokens anyway).
> 
> -Yonik
> 
Yes, that would be an easy solution, as there is no
performance penalty
as say.
I am still unsure, if the maxFieldLength applies to
copyFields?
When using copyFields I get an array back for that field (I
copied to).
So it seems to be different.
Is there a performance penalty for using copyFields when
indexing? How
about the mixed fieldtypes in the source fields? What
happens when I
copy an sint based field and a string based field to a
string based field?

Best regards,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  max.huetterblue-elephant-systems.com
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB
24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger
Dietrich

Re: Search results problem
user name
2007-10-22 22:18:41
On 10/19/07, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
> Yonik Seeley schrieb:
> > On 10/17/07, Maximilian Hütter <mhblue-elephant-systems.com> wrote:
> >> I also found this:
> >>
> >> "Controls the maximum number of terms
that can be added to a Field for a
> >> given Document, thereby truncating the
document. Increase this number if
> >> large documents are expected. However, setting
this value too high may
> >> result in out-of-memory errors."
> >>
> >> Coming from: http://www.ibm.com/developerworks/library/j-solr2/in
dex.html
> >>
> >> That might be a problem for me.
> >>
> >> I was thinking about using copyFields, instead
of one large fulltext
> >> field. Would that solve my problem, or would
the maxFieldLength still
> >> apply when using copyFields?
> >
> > maxFieldLength is a setting on the IndexWriter and
applies to all fields.
> > If you want more tokens indexed, simply increase
the value of
> > maxFieldLength to something like 2000000000 and
you should be fine.
> >
> > There's no penalty for setting it higher than the
largest field you
> > are indexing (no diff between 1M and 2B if all
your docs have field
> > lengths less than 1M tokens anyway).
> >
> > -Yonik
> >
> Yes, that would be an easy solution, as there is no
performance penalty
> as say.
> I am still unsure, if the maxFieldLength applies to
copyFields?

maxFieldLength applies to all fields (it's a Lucene concept,
not a Solr one).

copyField and maxFieldLength are not related.

> When using copyFields I get an array back for that
field (I copied to).
> So it seems to be different.

???  maxFieldLength only applies to the number of tokens
indexed.  You
will always get the complete field back if it's stored,
regardless of
what maxFieldLength is.

> Is there a performance penalty for using copyFields
when indexing?

copyFields are done as a discrete step before indexing...
almost no
cost to do that.
Indexing itself will have a performance impact if there are
more
fields to index + store as a result of the copyField
commands.

> How
> about the mixed fieldtypes in the source fields? What
happens when I
> copy an sint based field and a string based field to a
string based field?

copyField is done based on the string values, before any
analysis.
Mixed content should be fine.

-Yonik

[1-10] [11-12]

about | contact  Other archives ( Real Estate discussion Medical topics )