List Info

Thread: "ocaml_beginners"::[] About Hashtbl




"ocaml_beginners"::[] About Hashtbl
country flaguser name
France
2007-04-05 05:38:23

Hi !

I've 2 questions about this module :

* I have no idea about any possible use of Hashtbl.fold...

* Iterating through the table by key ?

Hashtabl.iter doc for "Hashtbl.iter f tbl" says "...if the table contains several bindings for the same key, they are passed to f in reverse order of introduction...";
OK, for a given key, iter exhibits the bindings in this order. But the output of different keys are mixed.

In fact the issue I encounter is :
I'm only interested in list of rare multiple data that share the same hash key. ( Numerous lists of keys only once bound do not care. )

Please how to get the list of these keys that share multiple bindings, in order then to get these data with Hashtbl.find_all ?

Maybe must I keep track of the fact some keys are multiple, (checking this with Hashtabl.find) in parallel with hash-table filling ?

Or is there any straightforward way to perform this ?

I trust in your clever advices.

Fabrice

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United Kingdom
2007-04-05 10:58:31

On Thu, Apr 05, 2007 at 12:38:23PM +0200, Fabrice Marchant wrote:
> Hi !
>
> I've 2 questions about this module :
>
> * I have no idea about any possible use of Hashtbl.fold...

Hashtbl itself is a little bit strange because of the single key /
multiple value thing. It was originally designed to store the symbol
tables inside the OCaml compiler (or so legend goes anyway ...), and
in that context this is useful. In other contexts it's not very
useful.

A common use of Hashtbl is to limit it so that you can only add a
single value for each key. You might also consider using Map instead,
which has its own set of problems.

Hashtbl.fold is used to iterate over all (key, value) pairs in the
hash table. It's very useful.

> Maybe must I keep track of the fact some keys are multiple,
> (checking this with Hashtabl.find) in parallel with hash-table filling

Basically yes.

Have a look at the attached implementation of a Counter, using a hash
table in this mode. In particular at the implementation of the
function 'get_ref'.

As a bonus this implementation also uses Hashtbl.fold.

Rich.

----------------------------------------------------------
(** Basic counting module.
* $Id: counter.mli,v 1.4 2006/09/05 15:37:59 rich Exp $
*)

type 'a t
(** Count items of type ['a]. *)

val create : unit -> 'a t
(** Create a new counter. *)

val incr : 'a t -> 'a -> unit
(** [incr counter thing] adds one to the count of [thing]s in [counter]. *)

val decr : 'a t -> 'a -> unit
(** [decr counter thing] subtracts one to the count of [thing]s in [counter]. *)

val add : 'a t -> 'a -> int -> unit
(** [add counter thing n] adds [n] to the count of [thing]s in [counter]. *)

val sub : 'a t -> 'a -> int -> unit
(** [sub counter thing n] subtracts [n] to the count of [thing]s in [counter]. *)

val set : 'a t -> 'a -> int -> unit
(** [set counter thing n] sets the count of [thing]s to [n]. *)

val get : 'a t -> 'a -> int
(** [get counter thing] returns the count of [thing]s. (Returns 0 for
* [thing]s which have not been added.
*)

val incr_get : 'a t -> 'a -> int
(** Faster form of {!Counter.incr} followed by {!Counter.get}. *)

val zero : 'a t -> 'a -> unit
(** [zero counter thing] sets the count of [thing]s to 0.
* See also {!Counter.clear}.
*)

val read : 'a t -> (int * 'a) list
(** [read counter] reads the frequency of each thing. They are sorted
* with the thing appearing most frequently first. Only things occurring
* non-zero times are returned.
*)

val length : 'a t -> int
(** Return the number of distinct things. See also {!Counter.total} *)

val total : 'a t -> int
(** Return the number of things counted (the total number of counts).
* See also {!Counter.length}
*)

val clear : 'a t -> unit
(** [clear counter] zeroes all counts. *)
----------------------------------------------------------
(* Basic counting module.
* $Id: counter.ml,v 1.4 2006/09/05 15:37:59 rich Exp $
*)

type 'a t = ('a, int ref) Hashtbl.t

let create () =
Hashtbl.create 13

let get_ref counter thing =
try
Hashtbl.find counter thing
with
Not_found ->
let r = ref 0 in
Hashtbl.add counter thing r;
r

let incr counter thing =
let r = get_ref counter thing in
incr r

let decr counter thing =
let r = get_ref counter thing in
decr r

let add counter thing n =
let r = get_ref counter thing in
r := !r + n

let sub counter thing n =
let r = get_ref counter thing in
r := !r - n

let set counter thing n =
let r = get_ref counter thing in
r := n

(* Don't use get_ref, to avoid unnecessarily creating 'ref 0's. *)
let get counter thing =
try
!(Hashtbl.find counter thing)
with
Not_found -> 0

(* This is a common pair of operations, worth optimising. *)
let incr_get counter thing =
let r = get_ref counter thing in
Pervasives.incr r;
!r

let zero = Hashtbl.remove

let read counter =
let counts =
Hashtbl.fold (
fun thing r xs ->
let r = !r in
if r <> 0 then (r, thing) :: xs
else xs
) counter [] in
List.sort (fun (a, _) (b, _) -> compare (b : int) (a : int)) counts

let length = Hashtbl.length

let total counter =
let total = ref 0 in
Hashtbl.iter (fun _ r -> total := !total + !r) counter;
!total

let clear = Hashtbl.clear
----------------------------------------------------------

--
Richard Jones
Red Hat

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
France
2007-04-06 17:40:13

Hi Rich !

Thank you for your precious answer.

> Hashtbl.fold is used to iterate over all (key, value) pairs in the
> hash table. It's very useful.
Ah, this solves my two questions !
It is useful to build a list from the table. I will hack your Counter.read and keep only multiple values before the sort of the list.

> You might also consider using Map instead,...
I've begun to experiment with it.

&gt; ...which has its own set of problems.
Please, what are they ?

Best regards.

Fabrice
--------
Never put off until tomorrow that which can be done the day after tomorrow.
( Who said that ?)

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United Kingdom
2007-04-06 20:47:54

On Friday 06 April 2007 23:40, Fabrice Marchant wrote:
&gt; > You might also consider using Map instead,...
> > ...which has its own set of problems.
>
&gt; Please, what are they ?

Hash tables are faster and smaller than maps in OCaml. Hash tables provide a
default structural hash function which, when it works (when structural
equality is semantic equality) makes hash tables easier to use than maps.

Maps have better worse-case performance. Maps are threadsafe. Maps are kept
sorted.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United Kingdom
2007-04-07 03:41:14

On Sat, Apr 07, 2007 at 12:40:13AM +0200, Fabrice Marchant wrote:
&gt; > You might also consider using Map instead,...
> I've begun to experiment with it.
>
> > ...which has its own set of problems.
> Please, what are they ?

In addition to Jon's answer: Map requires use of the functor syntax.
This can be quite obscure -- I know that I usually have to look up the
syntax when I use it.

In more general terms, Map is persistent. This Wikipedia page has
some diagrams I copied from the Okasaki book which explains
persistence:

http://en.wikipedia.org/wiki/Purely_functional

Rich.

--
Richard Jones
Red Hat

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United States
2007-04-07 15:18:46

On Sat, 7 Apr 2007, Richard Jones wrote:

> On Sat, Apr 07, 2007 at 12:40:13AM +0200, Fabrice Marchant wrote:
&gt;>> You might also consider using Map instead,...
>>; I've begun to experiment with it.
>>
>>> ...which has its own set of problems.
>> Please, what are they ?
>
> In addition to Jon's answer: Map requires use of the functor syntax.
> This can be quite obscure -- I know that I usually have to look up the
> syntax when I use it.

Also, since Map requires the type of keys to be given at module
construction time, it's not possible to do certain things with Map like
write a function that is polymorphic across all Maps.

For instance, we can write a function to return the keys from an IntMap:

module IntMap = Map.Make(struct
type t = int
let compare = compare
end)

let keys m =
List.rev (IntMap.fold (fun k v a -> k :: a) m [])

But we can't use this function with a StringMap or any other kind of Map
since there's no way to abstract over the module itself.

The PMap module in Extlib implements a persistent Map without using
functors, so writing functions like "keys" is again possible.

The only downside to PMap is that you can't safely compare two PMaps like
you can with Map.compare, since you can't guarantee that two PMaps have
equivalent comparison functions. Though from my experience, the need to do
this is rare.

Back to the original question: Most of the time, if you just use
Hashtbl.replace and pretend like Hashtbl.add does not exist, you'll get
the behavior you want. I like Map because it's purely functional, but for
most tasks Hashtbl is faster and easier to use (ignoring the
multiple-value thing).

Dave

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United Kingdom
2007-04-07 17:03:26

On Saturday 07 April 2007 21:18, Dave Benjamin wrote:
&gt; let keys m =
> List.rev (IntMap.fold (fun k v a -> k :: a) m [])
>
> But we can't use this function with a StringMap or any other kind of Map
> since there's no way to abstract over the module itself.

On the contrary, there are many ways to abstract that function. The simplest
is to factor out the fold that is used:

let keys fold m =
List.rev (IntMap.fold (fun k _ t -> k :: t) m [])

An alternative is to put keys in a functor that accepts a module implementing
fold as an argument.

F# doesn't have functors and implements Sets and Maps the way you suggest. I
find it easier to use than OCaml in this context but it has some other
important differences:

1. Thanks to .NET being OO, structural comparison defers to customized
comparison functions, e.g. comparing two Maps with = works.

2. Because .NET is OO, it can be significantly slower than OCaml.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
France
2007-04-07 17:29:11

Thanks Jon for the explanations about differences between Map and Hashtbl.
I wonder how it is possible to discover these properties because I do not saw such a comparison in the doc.
Maybe we can learn this in studying the sources of these modules ?

Thanks Rich for your Wikipedia page.
I discover there is about the same things in the Book of Philippe Narbel "Programmation fonctionnelle générique et objet";
http://www.vuibert.com/livre12724.html
(and Okasaki is in its Bibliography).
Chapter 6.4. "L'effet de persistance" Persistance benefits, and drawbacks too, are studied.

Dave, thank you for your explanations. I'm glad to discover this alternative with PMap !
And it make me remember about Pomap from Markus Mottl ( who has read Okasaki too ) : I must consider this 4th lib.
http://www.ocaml.info/home/ocaml_sources.html

Best regards.

Fabrice.

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] About Hashtbl
country flaguser name
United Kingdom
2007-04-07 18:37:10

On Saturday 07 April 2007 23:29, Fabrice Marchant wrote:
&gt; Thanks Jon for the explanations about differences between Map and
> Hashtbl. I wonder how it is possible to discover these properties because I
> do not saw such a comparison in the doc. Maybe we can learn this in
> studying the sources of these modules ?

Read my book.

While you're there, please add it to the list of books on Wikipedia's OCaml
page!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

__._,_.___
.

__,_._,___
[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )