List Info

Thread: Re: Alphabetize a word




Re: Alphabetize a word
country flaguser name
United States
2007-09-05 11:59:46

Ah, excellent. However, allow my noobness to kick in. Alright so you
want me to create a hash that contains the entire list of words (values)
associated with their alphabetized version (keys)? Should I do this as
soon as I read in the input file? And should I continue with the line of
code "my WORDS = <INFILE>;" and create the bucket directly after?

Now for the "dumping each bucket&quot; portion I don't follow. Assuming I am
following correctly on the first part, you're sorting the hash we made in
the first step correct? I don't understand the print line. And is this
used solely for sorting our newly created hash and then using my
conditions against this new word list?

Once the "bucket" hash is created would it be beneficial to my programs
memory/execution time to close the INFILE and is there a way to delete the
original WORDS array now that the improved hash is in place?

Thanks,
Ryan

merlyn%40stonehenge.com">merlynstonehenge.com
Sent by: perl-beginner%40yahoogroups.com">perl-beginneryahoogroups.com
09/05/2007 12:36 PM
Please respond to
perl-beginner%40yahoogroups.com">perl-beginneryahoogroups.com

To
Ryan J Nauman < RJNauman%40uss.com">RJNaumanuss.com&gt;
cc
perl-beginner%40yahoogroups.com">perl-beginneryahoogroups.com
Subject
Re: [PBML] Alphabetize a word

>>>;>> "Ryan" == Ryan J Nauman < RJNauman%40uss.com">RJNaumanuss.com&gt; writes:

Ryan> Ah, thanks. Neat little trick. Well I finished my first complete
working
Ryan>; version of the code and is a bit sluggish. Though this is expected
since
Ryan>; the input file contains about 170,000 words stuffed into my WORDS
array.
Ryan>; Wondering if you guys know any tricks to speed things up anywhere?
Here
Ryan>; is a link to my source code: http://pastie.textmate.org/94246

Ooof, yeah. You're doing an exponential matching O(n squared)
instead of just stashing everything according to its anagram.

You need to use a "bucket" approach. As you compute each anagram,
dump the original word into a bucket keyed by anagram, using a hash
of arrayrefs...

my %buckets;
for my $word (words) {
my $alphaword = alphabetize($word);
push {$bucket{$alphaword}}, $word;
}

Now it's simply a matter of dumping each bucket:

for my $alphaword (sort keys %bucket) {
my $aref = $bucket{$alphaword};
print "$arefn&quot;;
}

That will be infinitely faster when you get above a few hundred words.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
0095
< merlyn%40stonehenge.com">merlynstonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
training!


[Non-text portions of this message have been removed]

__._,_.___
.

__,_._,___
Re: Alphabetize a word
country flaguser name
Czech Republic
2007-09-08 15:16:02

- OK then.
- Because then it's all backwards.
- Why is that?
- Please don't top-post!

Ryan J Nauman < RJNauman%40uss.com">RJNaumanuss.com&gt; wrote:
&gt; Ah, excellent. However, allow my noobness to kick in. Alright so you
> want me to create a hash that contains the entire list of words (values)
> associated with their alphabetized version (keys)? Should I do this as
> soon as I read in the input file? And should I continue with the line of
> code "my WORDS = <INFILE>;" and create the bucket directly after?
&gt;
> Now for the "dumping each bucket&quot; portion I don't follow. Assuming I am
> following correctly on the first part, you're sorting the hash we made in
> the first step correct? I don't understand the print line. And is this
> used solely for sorting our newly created hash and then using my
> conditions against this new word list?
&gt;
> Once the "bucket" hash is created would it be beneficial to my programs
> memory/execution time to close the INFILE and is there a way to delete the
> original WORDS array now that the improved hash is in place?
&gt;
> Thanks,
> Ryan

OK. Let me try to dissect the merlyn's code for you:

> merlyn%40stonehenge.com">merlynstonehenge.com wrote:
&gt; Ooof, yeah. You're doing an exponential matching O(n squared)
> instead of just stashing everything according to its anagram.
>
> You need to use a "bucket" approach. As you compute each anagram,
> dump the original word into a bucket keyed by anagram, using a hash
>; of arrayrefs...
>
> my %bucket;

Declare the %bucket hash. It's not initialized to anything, but will
be used as a HashOfArrays.

&gt; for my $word (words) {
> my $alphaword = alphabetize($word);

Let's sort the letters in the word

> push {$bucket{$alphaword}}, $word;

And now let's the word at the the end of the array referenced by the
value of the hash %bucket using the key $alphaword.

When storing the first word containing a set of letters the
$bucket{$alphaword} doesn't exist in the hash, but that doesn't
matter. Perl will create it for us and as we are aparently expecting
the value to be a reference to an array ( $ref is the array pointed
to by the reference stored in $ref) Perl even creates a brand new
array for us and stores its reference in $bucket{$alphaword}.

> }

Once this loop completes all words in the words array have been
processed and are stored in the HashOfArrays named %bucket. They keys
are the sorted sets of letters and the values of the hash are
references to arrays containing the words.

You may clear the words array now.

If your words was created by
my WORDS = <INFILE>;

You might actually get rid of the array completely and save memory.
Just remove that statement and change the loop to

while (my $word = <INFILE>) {
chomp($word);
my $alphaword = alphabetize($word);
push {$bucket{$alphaword}}, $word;
}

> Now it's simply a matter of dumping each bucket:
>
> for my $alphaword (sort keys %bucket) {
> my $aref = $bucket{$alphaword};
> print "$arefn&quot;;
> }

As I said, %bucket is a hash of arrayrefs so in the loop above we get
the list of keys (the sorted sets of letters)
keys %bucket
sort it alphabetically
sort keys %bucket
and then loop through the sorted list, get the reference contained in
the hash and print the values in the referenced array.

HTH, Jenda

===== Jenda%40Krynicky.cz">JendaKrynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery

__._,_.___
.

__,_._,___
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )