List Info

Thread: controlled vocabulary




controlled vocabulary
user name
2006-08-25 15:27:20
Hi Xin

   then perhaps you can change it to Field.Index.TOKENIZED,
but i was 
not aware that pubmed boosts mesh terms, they broadly
classify terms as 
major and minor, if you plan to use this simple system of
classification 
consider adding the major terms twice to the document ?

Zhao, Xin wrote:
> Hi, Rupinder,
> My understanding is Field.Index.NO_NORMS disables 
index-time boosting 
> and field length normalization at the same time. But I
do need 
> index-time boosting to store the scoring of each mesh
term. Have I 
> missed anything?
> Thank you very much for your help,
> Xin
>
> ----- Original Message ----- From: "Rupinder
Singh Mazara" 
> <rmazaramasterfile.com>
> To: <java-userlucene.apache.org>
> Sent: Friday, August 25, 2006 10:49 AM
> Subject: Re: controlled vocabulary
>
>
>> hi Xin
>>
>>  this is take a look at this you can add multiple
fields with the 
>> name mesh
>> for ( i=0; i< meshList.size() ; i++ ){
>>    meshTerm = meshList.get(i)
>>  document.addField( new Field( "mesh",
meshTerm.semanticWebConceptId, 
>> Field.Store.YES , Field.Index.NO_NORMS  );
>> }
>>
>>  when querying this index, create a analyzer that
infers the text 
>> string and generates id's that correspond to the
mesh term in the 
>> semantic web
>>
>>
>>
>> Zhao, Xin wrote:
>>> Hi,
>>> Thank you for your reply. I had thought about
the first two 
>>> solutions before. If we apply one doc for each
MeSH term, it would 
>>> be 26 docs for each item digested(we actually
need the top 25 MeSH 
>>> terms generated), would it be any problem if
there are too many 
>>> documents? If we apply field name like
"mesh_1", "mesh_2"..., when 
>>> it comes to search, we will have to generate a
loop for each single 
>>> one of the query terms( there will be more than
20-30 terms on 
>>> average, since we are using sematic web to
implement concept 
>>> search), do you think it would affect the
performance in a very bad 
>>> way?
>>> Regards,
>>> Xin
>>>
>>>
>>> ----- Original Message ----- From:
"Dedian Guo" <gdediangmail.com>
>>> To: <java-userlucene.apache.org>;
"Zhao, Xin" <xzhaojhu.edu>
>>> Sent: Thursday, August 24, 2006 4:22 PM
>>> Subject: Re: controlled library
>>>
>>>
>>>> in my solution, you can apply one doc for
each mesh term, or apply 
>>>> different
>>>> keyword such as
"mesh_1"...."mesh_10" for your top
10 terms...or u 
>>>> can group
>>>> your mesh terms as one string then add into
a field, which requires 
>>>> a simple
>>>> string parser for the group string when you
wanna read the terms...
>>>>
>>>> not sure if that works or answers your
question...
>>>>
>>>> On 8/24/06, Zhao, Xin <xzhao9jhmi.edu> wrote:
>>>>>
>>>>> Hi,
>>>>> I have a design question. Here is what
we try to do for indexing:
>>>>> We designed an indexing tool to
generate standard MeSH terms from 
>>>>> medical
>>>>> citations, and then use Lucene to save
the terms and citations for 
>>>>> future
>>>>> search. The information we need to save
are:
>>>>> a) the exact mesh terms (top 10)
>>>>> b) the score for each term
>>>>> so the codings are like
>>>>> -----------------------------------
>>>>> for the top 10 MeSH Terms
>>>>>
myField=Field.Keyword("mesh",
mesh.toLowerCase());
>>>>> myField.setBoost(score);
>>>>> doc.add(myFiled);
>>>>> end for
>>>>> ------------------------------------
>>>>> as you could see we generate all the
terms under named field 
>>>>> "mesh". If I
>>>>> understand correctly, all the fields
under the same name would
>>>>> eventually  save into one field, with
all the scores be normalized 
>>>>> into
>>>>> filed boost. In this case, we wouldn't
be able to save separate 
>>>>> score, so
>>>>> the information is lost. Am I correct?
Is there anyway we could 
>>>>> change it? I
>>>>> understand Lucene is for keyword
search, and what we try to do is 
>>>>> Controlled
>>>>> Vocabulary search, Any other tool we
could use?
>>>>>
>>>>> Thank you,
>>>>> Xin
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
------------------------------------------------------------
---------
>>> To unsubscribe, e-mail:
java-user-unsubscribelucene.apache.org
>>> For additional commands, e-mail:
java-user-helplucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
>> For additional commands, e-mail: java-user-helplucene.apache.org
>>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
>
>




------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )