List Info

Thread: Snippet contents.




Snippet contents.
user name
2007-08-10 02:25:22
I've noticed that the snippets returned in nutch's search
seem to have
the formatting added to them, and are then escaped into xml
strings.
How would I go about changing the process so that the
content was
escaped, then formatting added, then the snippet escaped?

the reason I want this is so that I can return valid xml
with the
formatting as xml entities, but the actual snippet text
escaped.

example of how nutch does it:
origional text:
"red fox & lazy dog"
formatting applied:
"red <span
class="highlight">fox</span> & lazy
dog"
escaped:
"red &lt;span
class="highlight"&gt;fox&lt;/span&gt;
&amp; dog"

example of what I'm after:
origional text:
"red fox & lazy dog"
escaped text"
"red fox &amp; lazy dog"
formatting applied:
"red <span
class="highlight">fox</span> &amp;
lazy dog"
escaped:
"red &lt;span
class="highlight"&gt;fox&lt;/span&gt;
&amp;amp; lazy dog"

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )