I've noticed that the snippets returned in nutch's search
seem to have
the formatting added to them, and are then escaped into xml
strings.
How would I go about changing the process so that the
content was
escaped, then formatting added, then the snippet escaped?
the reason I want this is so that I can return valid xml
with the
formatting as xml entities, but the actual snippet text
escaped.
example of how nutch does it:
origional text:
"red fox & lazy dog"
formatting applied:
"red <span
class="highlight">fox</span> & lazy
dog"
escaped:
"red <span
class="highlight">fox</span>
& dog"
example of what I'm after:
origional text:
"red fox & lazy dog"
escaped text"
"red fox & lazy dog"
formatting applied:
"red <span
class="highlight">fox</span> &
lazy dog"
escaped:
"red <span
class="highlight">fox</span>
&amp; lazy dog"
|