List Info

Thread: Solr Newbie question: doubts about how to handle html content




Solr Newbie question: doubts about how to handle html content
user name
2006-10-05 11:17:59
Solr Newbie question: doubts about html content



My "current" problem is to know the best approach
to handle content which
have html code.



I have some docs that may or may not have html tag.



My first attempt, I defined a field "text" in my
schema.xml :



  <field name="text" type="text"
indexed="true" stored="true"/>
<field name="texto"> <br><p>   A
Brasil Telecom …
<br/><br/><br/>]]></field>


But some docs that have html code throw an error when I
tried to send them
to Solr.



My second attempt, I put
"<![CDATA[<br><p>   A Brasil Telecom …
<br/><br/><br/>]]>" and I could
send the docs to Solr, and,  I could make a
search for "<br>" and retrieve the doc.



But consulting the result page source,  as you can see,

<str name="text">

&lt;br&gt;&lt;p&gt;  A Brasil Telecom ...
</str>

the html code was "changed".





My third approach  is to create 2 fields in my schema:



. One with original content

. One with no html code, which will be indexed.



But I don't know how to preserve this html content in my new
field. My
question is:

How to put these docs in Solr, search them, and retrieve de
original <html>
content.



Thanks for attention.



BR,



Marcio
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )