Solr Newbie question: doubts about html content
My "current" problem is to know the best approach
to handle content which
have html code.
I have some docs that may or may not have html tag.
My first attempt, I defined a field "text" in my
schema.xml :
<field name="text" type="text"
indexed="true" stored="true"/>
<field name="texto"> <br><p> A
Brasil Telecom …
<br/><br/><br/>]]></field>
But some docs that have html code throw an error when I
tried to send them
to Solr.
My second attempt, I put
"<![CDATA[<br><p> A Brasil Telecom …
<br/><br/><br/>]]>" and I could
send the docs to Solr, and, I could make a
search for "<br>" and retrieve the doc.
But consulting the result page source, as you can see,
<str name="text">
<br><p> A Brasil Telecom ...
</str>
the html code was "changed".
My third approach is to create 2 fields in my schema:
. One with original content
. One with no html code, which will be indexed.
But I don't know how to preserve this html content in my new
field. My
question is:
How to put these docs in Solr, search them, and retrieve de
original <html>
content.
Thanks for attention.
BR,
Marcio
|