List Info

Thread: "ocaml_beginners"::[] A question




"ocaml_beginners"::[] A question
country flaguser name
United States
2007-02-21 16:01:35

Hello,
would you help me with a problem I want to solve,
This problem is about how to program an anagram using ocaml ..
?

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] A question
country flaguser name
United States
2007-02-21 16:07:38

carla.gonzalo wrote:
> would you help me with a problem I want to solve,
> This problem is about how to program an anagram using ocaml ..

Maybe you should give it a shot, then if you have trouble, post your
code/algorithm here for suggestions.

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] A question
country flaguser name
United Kingdom
2007-02-21 16:25:02

On Wednesday 21 February 2007 22:01, carla.gonzalo wrote:
> Hello,
> would you help me with a problem I want to solve,
> This problem is about how to program an anagram using ocaml ..
> ?

Off the top of my head:

Take a dictionary of words:

val dictionary : string list

Make a hash table that maps the sorted chars in the word onto the word itself:

let map = Hashtbl.create (List.length dictionary)

Convert a string into an array of characters:

let array_of_string string =
Array.init (String.length string) (String.get string)

Sort the characters in a word:

let sort word = Array.sort compare (array_of_string word)

Build the hash table:

let () = List.iter (fun word -> Hashtbl.add map (sort word) word) dictionary

Find anagrams by searching the hash table for words containing the same
letters:

let anagrams_of_word word = Hashtbl.find_all map (sort word)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

__._,_.___
.

__,_._,___
RE: "ocaml_beginners"::[] A question
country flaguser name
United States
2007-02-22 13:29:43

Array.sort is in-place, returning unit.

# let sort word = let a = array_of_string word in Array.sort compare a; a;;
val sort : string -> char array = <fun>;
# sort "asdf";;
- : char array = [|'a'; 'd'; 'f'; 's'|]

Now if you wanted a scrabble-cheat type program (all "partial-anagrams" of
given letters), you might use a different approach. I ended up scanning the
entire word list and counting occurrences of each character, returning only
words with the proper counts based on optional (lower-case) and required
(upper-case) letters and blanks.

(Ocaml programmers might do well to ignore the rest of this message, about
my "anagram.c" program.) My algorithm is probably overly-complex or poorly
coded and a case of premature optimization. Unfortunately, it's also a
linear search. If only I could form a tree of words containing each letter,
then linked to words containing any two letters...... It would grow huge,
I'm afraid. There are too many combinations of letters. Building the tree
would take immense resources, perhaps more time and RAM than in the
world...looks to me like only a linear search could work.

Anyway, my algorithm;

Searching for "caddy" finds words with any of C or A or D or Y in them... I
initialize my optional (lowercase) letter-counts array with
[|1;0;2;1;0;0;0;...;1;0|], required (uppercase)-letter counts with all 0s,
and blank count set to 0, then decrement my optional letter-counts as I
iterate over the letters in the dictionary word, or decrement the blank
count if I hit a letter in the dictionary word which is not found, and
consider a word to be matching if each letter is found and all required
letters are found, as can be seen in the last few lines of the function
below.

It was a bit inefficient in that I reinitialize the counts for the search
word in each call to this function, so I did a non-threadsafe caching of the
last word searched, since my calling function in the program passed the same
word repeatedly.

[ Ick yuck gag .. I know, horrible C code below... ]

But surely an Ocaml program for this could be written much more succinctly.
It should look more like a simple grep, with a sort at the finish. Maybe
I'll take it on as an exercise someday...

(Oh, I know, *no one* uses the "register" keyword anymore; the compiler is
probably smarter than me about optimization...)

This algorithm takes about 30ms to scan 170,000 words (Athlon 1.6ghz)

/* anagram.c
* Copyright (C) 2004 by aaron w west
* tallpeak%40hotmail.com">tallpeakhotmail.com / awilliamwest%40yahoo.com">awilliamwestyahoo.com
* 11/27/2004 to 11/30
...
int matchword(unsigned char *dictword, unsigned char *search, unsigned char
*pattern, int flags, int length)
{
int blanks = 0;
register unsigned c;
unsigned char *s;
register unsigned char *d;
static unsigned char prevsearch[257] = "&quot;;
static int capsfound = 0;
static int initblanks = 0;
static unsigned int lowercounts[26]; // lowercase letter counts in
search
static unsigned int uppercounts[26];
unsigned int searchletters[26];
unsigned int capletters[26];
if (countalpha((char*)dictword) != length)
{
return 0;
}
if (pattern[0])
{
s = (unsigned char *)strstr((char*)dictword, (char*)pattern);
if (!s || (flags & BEGIN_WORD_FLAG) && s != dictword
|| (flags & END_WORD_FLAG)
&& s != dictword + strlen((char*)dictword) -
strlen((char*)pattern))
return 0;
}
if (strcmp((char*)search, (char*)prevsearch) != 0)
{
memset(lowercounts, 0, sizeof(lowercounts));
memset(uppercounts, 0, sizeof(uppercounts));
initblanks = 0;
capsfound = 0;
strcpy((char*)prevsearch, (char*)search);
/* first count blanks, # or ?, and each letter */
for (s = search; (c = *s) != ''; s++)
{
c -= 'A';
if (c < 26) {
capsfound++;
uppercounts[c]&#43;+;
lowercounts[c]++;
} else {
c -= 32;
if (c < 26)
lowercounts[c]++;
else if ((c = *s) == '?' || c == '#')
initblanks++;
}
}
}
memcpy(searchletters, lowercounts, sizeof(lowercounts));
memcpy(capletters, uppercounts, sizeof(uppercounts));
blanks = initblanks;

for (d = dictword; (c = *d) != ''; d++)
{
c -= 'a';
if (c < 26)
{
if (searchletters[c] > 0)
{
searchletters[c]--;
} else {
if (blanks > 0)
{
*d -= 32; // *d = toupper(*d)
blanks--;
} else {
return 0;
}
}
if (capletters[c] > 0)
capletters[c]--;
}
}
//check for caps=required letters in search
if (capsfound)
{
for (c = 0; c < 26; c++)
if (capletters[c] > 0)
return 0; // a cap letter specified wasnt used
}
return 1;
}

________________________________

From: ocaml_beginners%40yahoogroups.com">ocaml_beginnersyahoogroups.com
[mailto: ocaml_beginners%40yahoogroups.com">ocaml_beginnersyahoogroups.com] On Behalf Of Jon Harrop
Sent: Wednesday, February 21, 2007 2:25 PM
To: ocaml_beginners%40yahoogroups.com">ocaml_beginnersyahoogroups.com
Subject: Re: "ocaml_beginners"::[] A question

On Wednesday 21 February 2007 22:01, carla.gonzalo wrote:
&gt; Hello,
&gt; would you help me with a problem I want to solve,
&gt; This problem is about how to program an anagram using ocaml ..
> ?

Off the top of my head:

Take a dictionary of words:

val dictionary : string list

Make a hash table that maps the sorted chars in the word onto the word
itself:

let map = Hashtbl.create (List.length dictionary)

Convert a string into an array of characters:

let array_of_string string =
Array.init (String.length string) (String.get string)

Sort the characters in a word:

let sort word = Array.sort compare (array_of_string word)

Build the hash table:

let () = List.iter (fun word -> Hashtbl.add map (sort word) word) dictionary

Find anagrams by searching the hash table for words containing the same
letters:

let anagrams_of_word word = Hashtbl.find_all map (sort word)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
<http://www.ffconsultancy.com/products/ocaml_for_scientists&gt;

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.18.3/696 - Release Date: 2/21/2007
3:19 PM


__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] A question
country flaguser name
Italy
2007-02-23 03:53:03

carla.gonzalo wrote:
&gt; Hello,
> would you help me with a problem I want to solve,
&gt; This problem is about how to program an anagram using ocaml ..
> ?
>
>

Personally I'd create a target array, as a char option array, as long as
the initial string, i.e. the original anagram.

Then I'd write a fun "inserter" for inserting a char option in the first
free place (i.e. containing None instead of (Some char) ) of a char
option array.
This function should be able to walk the array either in one direction,
from a starting index until the end and if necessary from the beginning
to the starting index, or in two directions (I'd prefer this) and fail
if there are no more free places in one/both direction(s). The
one-direction version would make it just similar to a circular array.

Then I'd walk the initial string with a for loop and ask to the inserter
to insert the current char, as Some <char&gt;, in the target array, at
starting index found by using the first int readable at /dev/random (on
Unixes) and modulus of it to the anagram length.

At the end of the loop, I'd Array.fold the target array so to get the
anagram string.

I do not see how to do such a order-breaking activity in a simple and
efficient purely functional way. I don't think that Set.S.choose would
be random enough.

Hope it helps: if you have another solution, I'd be glad to know about it.

Ernesto

__._,_.___
.

__,_._,___
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )