List Info

Thread: Computing an md5 of a text field.




Computing an md5 of a text field.
country flaguser name
Portugal
2007-07-23 06:55:34
Hi,

I would like to be able to compute and store the MD5 sum for
a given  
text in a field (in my case, I am talking about a URL
string). For  
example, if I have a field called 'url' the following would
happen:

'http://wiki.apache.org'
-> 'cb4f7e6ca1a0c00b146894b75d9f98dc'

I've been scratching my head trying to figure out how to go
about  
this, but so far I can only think of one way which would be
to write  
a new analyzer which computes the actual MD5 and creates the
computed  
MD5 as a token.

Any other, perhaps simpler ways (and which don't involve
writing a  
whole new analyzer class) ?

Thanks and Best Regards.

--Nuno

Re: Computing an md5 of a text field.
user name
2007-07-23 12:25:30
On 7/23/07, Nuno Leitao <nunoscaletrix.com> wrote:
> I would like to be able to compute and store the MD5
sum for a given
> text in a field (in my case, I am talking about a URL
string). For
> example, if I have a field called 'url' the following
would happen:
>
> 'http://wiki.apache.org'
-> 'cb4f7e6ca1a0c00b146894b75d9f98dc'

First, what are you trying to achieve by this?  If you give
people the
higher level problem, they might be able to suggest a better
way.

Since you construct the XML document to send to Solr,
simply compute the MD5 and add that also:

<field name="url">http://wiki.apac
he.org</field>
<field
name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc&
lt;/field>

Or did you want to store the MD5 instead of the URL?  Did
you want it
searchable somehow?

-Yonik

Re: Computing an md5 of a text field.
country flaguser name
Portugal
2007-07-23 19:22:26
Thanks Yonik,

Basically, I am indexing a number of items where the unique
ID is a  
URL. Because URL's can contain invalid XML characters, and I
will be  
doing some XSLT postprocessing, I was thinking that a good
way to  
solve the problem would be to store these unique ID's as
md5's instead.

I think I found another alternative - it follows the
pre-processing  
avenue you suggested.

Best Regards.

--Nuno

On 23 Jul 2007, at 18:25, Yonik Seeley wrote:

> On 7/23/07, Nuno Leitao <nunoscaletrix.com> wrote:
>> I would like to be able to compute and store the
MD5 sum for a given
>> text in a field (in my case, I am talking about a
URL string). For
>> example, if I have a field called 'url' the
following would happen:
>>
>> 'http://wiki.apache.org'
-> 'cb4f7e6ca1a0c00b146894b75d9f98dc'
>
> First, what are you trying to achieve by this?  If you
give people the
> higher level problem, they might be able to suggest a
better way.
>
> Since you construct the XML document to send to Solr,
> simply compute the MD5 and add that also:
>
> <field name="url">http://wiki.apac
he.org</field>
> <field
name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc&
lt;/field>
>
> Or did you want to store the MD5 instead of the URL? 
Did you want it
> searchable somehow?
>
> -Yonik


RE: Computing an md5 of a text field.
user name
2007-07-23 19:39:01
XML escaping is probably the best approach. Either surround
the whole
thing with "<[CDATA[" and "]]>",
or do use one of the many libraries out
there that will escape the string for you.

While  an MD5 is designed to be cryptographically secure one
way
function, it is NOT guaranteed to be a one-to-one
(invertible) function.
You could theoretically have two distinct URLs that have the
same MD5.

> -----Original Message-----
> From: Nuno Leitao [mailto:nunoscaletrix.com] 
> Sent: Monday, July 23, 2007 5:22 PM
> To: solr-userlucene.apache.org
> Subject: Re: Computing an md5 of a text field.
> 
> Thanks Yonik,
> 
> Basically, I am indexing a number of items where the
unique 
> ID is a URL. Because URL's can contain invalid XML 
> characters, and I will be doing some XSLT
postprocessing, I 
> was thinking that a good way to solve the problem would
be to 
> store these unique ID's as md5's instead.
> 
> I think I found another alternative - it follows the 
> pre-processing avenue you suggested.
> 
> Best Regards.
> 
> --Nuno
> 
> On 23 Jul 2007, at 18:25, Yonik Seeley wrote:
> 
> > On 7/23/07, Nuno Leitao <nunoscaletrix.com> wrote:
> >> I would like to be able to compute and store
the MD5 sum 
> for a given 
> >> text in a field (in my case, I am talking
about a URL string). For 
> >> example, if I have a field called 'url' the
following would happen:
> >>
> >> 'http://wiki.apache.org'
-> 'cb4f7e6ca1a0c00b146894b75d9f98dc'
> >
> > First, what are you trying to achieve by this?  If
you give 
> people the 
> > higher level problem, they might be able to
suggest a better way.
> >
> > Since you construct the XML document to send to
Solr, 
> simply compute 
> > the MD5 and add that also:
> >
> > <field name="url">http://wiki.apac
he.org</field>
> > <field
name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc&
lt;/field>
> >
> > Or did you want to store the MD5 instead of the
URL?  Did 
> you want it 
> > searchable somehow?
> >
> > -Yonik
> 

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )