XML escaping is probably the best approach. Either surround
the whole
thing with "<[CDATA[" and "]]>",
or do use one of the many libraries out
there that will escape the string for you.
While an MD5 is designed to be cryptographically secure one
way
function, it is NOT guaranteed to be a one-to-one
(invertible) function.
You could theoretically have two distinct URLs that have the
same MD5.
> -----Original Message-----
> From: Nuno Leitao [mailto:nuno scaletrix.com]
> Sent: Monday, July 23, 2007 5:22 PM
> To: solr-user lucene.apache.org
> Subject: Re: Computing an md5 of a text field.
>
> Thanks Yonik,
>
> Basically, I am indexing a number of items where the
unique
> ID is a URL. Because URL's can contain invalid XML
> characters, and I will be doing some XSLT
postprocessing, I
> was thinking that a good way to solve the problem would
be to
> store these unique ID's as md5's instead.
>
> I think I found another alternative - it follows the
> pre-processing avenue you suggested.
>
> Best Regards.
>
> --Nuno
>
> On 23 Jul 2007, at 18:25, Yonik Seeley wrote:
>
> > On 7/23/07, Nuno Leitao <nuno scaletrix.com> wrote:
> >> I would like to be able to compute and store
the MD5 sum
> for a given
> >> text in a field (in my case, I am talking
about a URL string). For
> >> example, if I have a field called 'url' the
following would happen:
> >>
> >> 'http://wiki.apache.org'
-> 'cb4f7e6ca1a0c00b146894b75d9f98dc'
> >
> > First, what are you trying to achieve by this? If
you give
> people the
> > higher level problem, they might be able to
suggest a better way.
> >
> > Since you construct the XML document to send to
Solr,
> simply compute
> > the MD5 and add that also:
> >
> > <field name="url">http://wiki.apac
he.org</field>
> > <field
name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc&
lt;/field>
> >
> > Or did you want to store the MD5 instead of the
URL? Did
> you want it
> > searchable somehow?
> >
> > -Yonik
>
|