List Info

Thread: highlight search keywords on html page




highlight search keywords on html page
country flaguser name
United States
2007-02-16 10:45:45
With solr, I can generate a list of links containing
highlighted fragments.
After a user clicks a link, I will fetch the stored and
not-indexed html
from solr and return it to user.
But I want search keywords within the html to be highlighted
just like
google.
I'm wondering what people are using to accomplish this very
common task.
-- 
View this message in context: http://www.nabble.com/highli
ght-search-keywords-on-html-page-tf3240492.html#a9007823

Sent from the Solr - User mailing list archive at
Nabble.com.


Re: highlight search keywords on html page
country flaguser name
United States
2007-02-16 13:25:57
I'm not sure i'm understanding your question ... is it how
to highlight a
stored field that has HTML in it, or how to index a chunk of
HTML text?

the first should be no difference then highlighting any
other bit of text
-- the second can be accomplished using the
HTMLStripStandardTokenizerFactory (or
HTMLStripWhitespaceTokenizerFactory) in your schema.

: With solr, I can generate a list of links containing
highlighted fragments.
: After a user clicks a link, I will fetch the stored and
not-indexed html
: from solr and return it to user.
: But I want search keywords within the html to be
highlighted just like
: google.
: I'm wondering what people are using to accomplish this
very common task.



-Hoss


Re: highlight search keywords on html page
country flaguser name
United States
2007-02-16 18:01:30

Chris Hostetter wrote:
> 
> I'm not sure i'm understanding your question ... is it
how to highlight a
> stored field that has HTML in it, or how to index a
chunk of HTML text?
> 
> the first should be no difference then highlighting any
other bit of text
> -- the second can be accomplished using the
> HTMLStripStandardTokenizerFactory (or
> HTMLStripWhitespaceTokenizerFactory) in your schema.
> 
> -Hoss
> 


It seems both cases you described are not what I want:
Please allow me to explain it again:

I have two fields in my doc:
 <field name="html" type="string"
indexed="false" stored="true"
compressed="true"/>
 <field name="pageContent"
type="text" indexed="true"
stored="true"
compressed="true"/>
 
In "html" I store the raw html grabbed from
internet. It's not indexed, and
just stored as string.
After removing tags in "html", I get text and
store it as "pageContent".
This field
will be indexed and stored.

When a user performs a search, I will return a list of links
containing
highlighted fragments
from "pageContent". If a link is clicked, I want
to return the associated
raw html back
to user AND have search keywords in it to be highlighted,
just like google
cached page.

-- 
View this message in context: http://www.nabble.com/highli
ght-search-keywords-on-html-page-tf3240492.html#a9014907

Sent from the Solr - User mailing list archive at
Nabble.com.


Re: highlight search keywords on html page
country flaguser name
United States
2007-02-18 19:59:50
: When a user performs a search, I will return a list of
links containing
: highlighted fragments
: from "pageContent". If a link is clicked, I want
to return the associated
: raw html back
: to user AND have search keywords in it to be highlighted,
just like google
: cached page.

i'm not really sure that Solr can help you in this case ...
it only know
about the data you give it -- if you want it to highlight
the raw html of
hte entire page, then you're going to need to store the raw
html of hte
entire page in the index.

you can still highlight pageContent with heavy fragmentation
on your main
search page where you list multiple results, and then when a
user picks
one redo the search with an fq restricting to the doc they
picked and
hl.fl=rawHtml and hl.fragsize=0 so you'll get the whole
highlighted
without fragmentation.

-Hoss


Re: highlight search keywords on html page
country flaguser name
United States
2007-02-20 03:51:27

Chris Hostetter wrote:
> 
> i'm not really sure that Solr can help you in this case
... it only know
> about the data you give it -- if you want it to
highlight the raw html of
> hte entire page, then you're going to need to store the
raw html of hte
> entire page in the index.
> 
> you can still highlight pageContent with heavy
fragmentation on your main
> search page where you list multiple results, and then
when a user picks
> one redo the search with an fq restricting to the doc
they picked and
> hl.fl=rawHtml and hl.fragsize=0 so you'll get the whole
highlighted
> without fragmentation.
> 
> -Hoss
> 
> 

Thank you very much for clearing things up for me. I have
this misconception
that
I can only index pure text with solr or lucene. I don't know
where I got
this notion. But
as you pointed out in your first reply, with
HTMLStripStandardTokenizerFactory I
can actually index html with solr. This is a brand-new idea
to me.


-- 
View this message in context: http://www.nabble.com/highli
ght-search-keywords-on-html-page-tf3240492.html#a9057239

Sent from the Solr - User mailing list archive at
Nabble.com.


[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )