List Info

Thread: "ocaml_beginners"::[] string compare




"ocaml_beginners"::[] string compare
user name
2006-08-18 09:36:28

Zsolt Minier wrote:
> 
> --- Peter Halacsy <peterhalacsy.com> wrote:
> 
> > Hi guys,
> >
> > I've implemented a word frequency counter program
> > that uses a hash
> > table. This need a lot of string comparison (as
many
> > token are to be
> > counted). After profiling it turned out that the
> > string comparision
> > eats the CPU time.
> >
> 
> I believe you need a dictionary algorithm.
> 
> h
ttp://en.wikipedia.org/wiki/Aho-Corasick_algorithm
> 
> Bye,
> Zsolt
> 
> Minier Zsolt
> http://cs.ubbcluj.ro/~mi
nier - my h0m3p4g3

Another, less (programmer-time-)costly solution is to
precompute hashes, and use comparison of precomputed hashes
as a first filter:

type hashed_string = { hash:int; s:string }

let hashed_string s = { hash = Hashtbl.hash s; s = s }

let compare_hashed_string s t = 
	let c0 = compare s.hash t.hash in
	if c0 <> 0 then c0
	else compare s.s t.s

Frédéric



Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] string compare
user name
2006-08-18 19:22:00
On Aug 18, 2006, at 11:36 AM, Frederic van der Plancke
wrote:

> Another, less (programmer-time-)costly solution is to
precompute  
> hashes, and use comparison of precomputed hashes as a
first filter:
>
> type hashed_string = { hash:int; s:string }
>
> let hashed_string s = { hash = Hashtbl.hash s; s = s }
>
> let compare_hashed_string s t =
> 	let c0 = compare s.hash t.hash in
> 	if c0 <> 0 then c0
> 	else compare s.s t.s
>
> Frédéric

Hello,

this doesn't help a lot for a good hashtable. Why? Because
if the  
bucket lists are short, the most comparasions are to compare
equal  
strings, because first we compute the hash value of the
string we are  
updating and then we look for a candidate position in a long
array.

with your hashed_string I measure:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
31.20    179.63   179.63 76767221     0.00     0.00   
camlHashlex__update_201
13.65    258.25    78.62  9774334     0.00     0.00 
caml_fl_allocate
10.50    318.74    60.48 75915982     0.00     0.00   
camlHashlexhas__compare_hashed_string_155


the old code uses compare

   %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
30.60    169.38   169.38                              
camlHashlex__update_204
12.77    240.05    70.67  7665339     0.00     0.00 
caml_fl_allocate
11.02    301.04    60.99                              
camlHashlex__strcmp2_140

as you can see the running time is worse with hashed
strings. The  
plain comparasion and compare_hashed_string uses the same
amount of  
CPU time.

peter




[Non-text portions of this message have been removed]



Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )