List Info

Thread: speeding up 3N+1




speeding up 3N+1
user name
2007-04-07 08:18:59
Hello,

First of all, please keep in mind that I am a newbie in
Erlang. (sorry
if the codes annoy you)

Please have a look at this famous problem. http://acm.uva.es/p/v
1/100.html

The input number can be upto 1,000,000. (let's just suppose
1 million
is included, even though the problem states the maximal
input is less
than 1 million) Let's see the worst case(when the start/end
pair is
{1, 1000000}) time performance of the program.

Following is a trial at solving this problem:

-module(tnp1_process_dict).
-compile(export_all).

cycle(1) ->
    1;
cycle(N) ->
    case get(N) of
    undefined ->
        SN = if
             N rem 2 =:= 0 ->
             1 + cycle(N div 2);
             true ->
             2 + cycle((3 * N + 1) div 2)
         end,
        put(N, SN),
        SN;
    Cached ->
        Cached
    end.

max(A, B) ->
    lists:max(lists:map(fun cycle/1, lists:seq(A, B))).

Actually, it's from a friend of mine, who is also in the
same Erlang
community(we are all just novices though) with me. It's
quite
effiecient.

1> timer:tc(tnp1_process_dict,max,[1,1000000]).
{2547000,525}

Less than 3 seconds on my sluggish machine. Good.

However, I didn't quite like the solution. It uses the
process
dictionary. From reading Erlang books and manuals, I thought
the
process dictionary should be the last resort for this kind
of usage. I
thought I could deal the problem without using the process
dictionary,
without trading off the speed.

Following is my trial:

-module(tnp1_functional).
-compile(export_all).

newCache()->
    dict:from_list([{1,1}]).
store(K,V,Cache)->
    dict:store(K,V,Cache).
find(K,Cache)->
    dict:find(K,Cache).

nn(N) when N rem 2 =:= 0 ->
    N div 2;
nn(N) ->
    3*N+1.

update([],_,Cache)->
    Cache;
update([H|ToStore],V,Cache)->
    Cache2=store(H,V+1,Cache),
    update(ToStore,V+1,Cache2).

cl(N,Cnt,Cache,ToStore)->
    case find(N,Cache) of
	{ok,V} ->
	    Cache2=update(ToStore,V,Cache),
	    {Cnt+V,Cache2};
 	error  ->
	    cl(nn(N),Cnt+1,Cache,[N|ToStore])
    end.

maxcl(From,To,Cache,Max) when From>To ->
     {Max,Cache};
maxcl(From,To,Cache,Max)->
    {Cl,Cache2}=cl(From,0,Cache,[]),
    maxcl(From+1,To,Cache2,lists:max([Cl,Max])).

maxcl(From,To)->
    Cache=newCache(),
    maxcl(From,To,Cache,1).

max(From,To)->
    {V,_}=maxcl(From,To),
    V.

It's much longer. And much slower; I'd say excruciatingly.

1> timer:tc(tnp1_functional,max,[1,100000]).
{7953000,351}

See? It's 100,000, not one million, even. When I ran it with
one
million, it didn't end at least in a couple of minutes.

Maybe my choice of using dict is wrong? Could ets be
substantially
better? Or my approach totally wrong? I don't know yet.
(I'll
experiment with other data structures and approaches,
though)

Okay, that's it. I would appreciate any suggestions for
improvement.

June
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
user name
2007-04-07 08:57:14
Well, I tried it with ets:

Replace the three cache functions with the following:

newCache()->
    Cache=ets:new(test,[set]),
    ets:insert(Cache,{1,1}),
    Cache.
store(K,V,Cache)->
    ets:insert(Cache,{K,V}),
    Cache.
find(K,Cache)->
    case ets:lookup(Cache,K) of
        [] -> error;
        [{K,V}] -> {ok,V}
    end.

Now max(1,100000) takes about 0.9 secs, and max(1,1000000)
takes about 10 secs.

Exploiting the trick in the process dictionary version(when
N is odd,
3N+1 is even), for one million, it reduces to about 8 secs.

It's a huge improvement, but still inferior to the process
dictionary
version. Moreover, ets isn't functional data structure(with
side-effects).

So the process dictionary is the king?

2007/4/7, June Kim <juneaftngmail.com>:
> Hello,
>
> First of all, please keep in mind that I am a newbie in
Erlang. (sorry
> if the codes annoy you)
>
> Please have a look at this famous problem. http://acm.uva.es/p/v
1/100.html
>
> The input number can be upto 1,000,000. (let's just
suppose 1 million
> is included, even though the problem states the maximal
input is less
> than 1 million) Let's see the worst case(when the
start/end pair is
> {1, 1000000}) time performance of the program.
>
> Following is a trial at solving this problem:
>
> -module(tnp1_process_dict).
> -compile(export_all).
>
> cycle(1) ->
>    1;
> cycle(N) ->
>    case get(N) of
>    undefined ->
>        SN = if
>             N rem 2 =:= 0 ->
>             1 + cycle(N div 2);
>             true ->
>             2 + cycle((3 * N + 1) div 2)
>         end,
>        put(N, SN),
>        SN;
>    Cached ->
>        Cached
>    end.
>
> max(A, B) ->
>    lists:max(lists:map(fun cycle/1, lists:seq(A, B))).
>
> Actually, it's from a friend of mine, who is also in
the same Erlang
> community(we are all just novices though) with me. It's
quite
> effiecient.
>
> 1> timer:tc(tnp1_process_dict,max,[1,1000000]).
> {2547000,525}
>
> Less than 3 seconds on my sluggish machine. Good.
>
> However, I didn't quite like the solution. It uses the
process
> dictionary. From reading Erlang books and manuals, I
thought the
> process dictionary should be the last resort for this
kind of usage. I
> thought I could deal the problem without using the
process dictionary,
> without trading off the speed.
>
> Following is my trial:
>
> -module(tnp1_functional).
> -compile(export_all).
>
> newCache()->
>    dict:from_list([{1,1}]).
> store(K,V,Cache)->
>    dict:store(K,V,Cache).
> find(K,Cache)->
>    dict:find(K,Cache).
>
> nn(N) when N rem 2 =:= 0 ->
>    N div 2;
> nn(N) ->
>    3*N+1.
>
> update([],_,Cache)->
>    Cache;
> update([H|ToStore],V,Cache)->
>    Cache2=store(H,V+1,Cache),
>    update(ToStore,V+1,Cache2).
>
> cl(N,Cnt,Cache,ToStore)->
>    case find(N,Cache) of
>        {ok,V} ->
>            Cache2=update(ToStore,V,Cache),
>            {Cnt+V,Cache2};
>        error  ->
>            cl(nn(N),Cnt+1,Cache,[N|ToStore])
>    end.
>
> maxcl(From,To,Cache,Max) when From>To ->
>     {Max,Cache};
> maxcl(From,To,Cache,Max)->
>    {Cl,Cache2}=cl(From,0,Cache,[]),
>    maxcl(From+1,To,Cache2,lists:max([Cl,Max])).
>
> maxcl(From,To)->
>    Cache=newCache(),
>    maxcl(From,To,Cache,1).
>
> max(From,To)->
>    {V,_}=maxcl(From,To),
>    V.
>
> It's much longer. And much slower; I'd say
excruciatingly.
>
> 1> timer:tc(tnp1_functional,max,[1,100000]).
> {7953000,351}
>
> See? It's 100,000, not one million, even. When I ran it
with one
> million, it didn't end at least in a couple of
minutes.
>
> Maybe my choice of using dict is wrong? Could ets be
substantially
> better? Or my approach totally wrong? I don't know yet.
(I'll
> experiment with other data structures and approaches,
though)
>
> Okay, that's it. I would appreciate any suggestions for
improvement.
>
> June
>
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
country flaguser name
Canada
2007-04-07 09:33:32
On Sat, Apr 07, 2007 at 10:18:59PM +0900, June Kim wrote:
}  Please have a look at this famous problem. http://acm.uva.es/p/v
1/100.html

June,

In functional programming you pass arguments instead of
using
global variables.

-module(threeNplus1).
-export([do/1]).

do(1) ->
        io:fwrite(" 1~n");
do(N) when N rem 2 /= 0 ->
        io:fwrite(" ~b", [N]),
        do(3 * N + 1);
do(N) ->
        io:fwrite(" ~b", [N]),
        do(N div 2).

1> threeNplus1:do(22).
 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
ok

}  cycle(1) ->
}      1;
}  cycle(N) ->
}      case get(N) of
}      undefined ->
}          SN = if
}               N rem 2 =:= 0 ->
}               1 + cycle(N div 2);
}               true ->
}               2 + cycle((3 * N + 1) div 2)
}           end,
}          put(N, SN),
}          SN;
}      Cached ->
}          Cached
}      end.

The really big problem above is in tail recursion.  You
don't
want to have to wait for the answer to cycle(N div 2)
before
knowing the answer to cycle(N).

	-Vance

_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
user name
2007-04-07 09:55:38
2007/4/7, Vance Shipley <vancesmotivity.ca>:
> On Sat, Apr 07, 2007 at 10:18:59PM +0900, June Kim
wrote:
> }  Please have a look at this famous problem. http://acm.uva.es/p/v
1/100.html
>
> June,
>
> In functional programming you pass arguments instead of
using
> global variables.
>
> -module(threeNplus1).
> -export([do/1]).
>
> do(1) ->
>        io:fwrite(" 1~n");
> do(N) when N rem 2 /= 0 ->
>        io:fwrite(" ~b", [N]),
>        do(3 * N + 1);
> do(N) ->
>        io:fwrite(" ~b", [N]),
>        do(N div 2).
>
> 1> threeNplus1:do(22).
>  22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
> ok
>
> }  cycle(1) ->
> }      1;
> }  cycle(N) ->
> }      case get(N) of
> }      undefined ->
> }          SN = if
> }               N rem 2 =:= 0 ->
> }               1 + cycle(N div 2);
> }               true ->
> }               2 + cycle((3 * N + 1) div 2)
> }           end,
> }          put(N, SN),
> }          SN;
> }      Cached ->
> }          Cached
> }      end.
>
> The really big problem above is in tail recursion.  You
don't
> want to have to wait for the answer to cycle(N div 2)
before
> knowing the answer to cycle(N).
>
>        -Vance
>
>

Yes. I am well aware of those issues. (the code isn't mine,
but from a
friend of mine who's into functional programming for the
first time
with Erlang -- I have programmed in other FP languages
before)

His code uses global variable(process dictionary) and not
tail-recursive. However, it's the fastest. (even in a
"functional
programming" language -- ironic, isn't it?)

My version using dict, which is also included in my original
posting(I
don't think you have seen it yet), is tail-recursive and
functional,
but slow.

I did some experimentations.

%%<data structure> <time(sec) for upto
10^5>/<time for upto 10^6>

%%dict 7s/?s

newCache()->
   dict:from_list([{1,1}]).
store(K,V,Cache)->
   dict:store(K,V,Cache).
find(K,Cache)->
   dict:find(K,Cache).




%%ets/set 0.1s/10s

newCache()->
   Cache=ets:new(test,[set]),
   ets:insert(Cache,{1,1}),
   Cache.
store(K,V,Cache)->

   ets:insert(Cache,{K,V}),
   Cache.
find(K,Cache)->

   case ets:lookup(Cache,K) of
       [] -> error;
       [{K,V}] -> {ok,V}
   end.


%%gb_trees 2.6s/32s
newCache()->
   T1=gb_trees:empty(),
   gb_trees:enter(1,1,T1).
store(K,V,Cache)->
   gb_trees:enter(K,V,Cache).
find(K,Cache)->
   case gb_trees:lookup(K,Cache) of
       none -> error;
       {value,V} -> {ok,V}
   end.


%%process dict 0.4s/5s
newCache()->
   erase(),
   put(1,1),
   cache.
store(K,V,_Cache)->
   put(K,V),
   cache.
find(K,_Cache)->
   case get(K) of
       undefined -> error;
       V -> {ok,V}
   end.

Could you help me improving the speed of these codes with FP
approach?
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
country flaguser name
Canada
2007-04-07 18:07:42
June,

OK, I took more than thirty seconds this time. 

On Sat, Apr 07, 2007 at 11:54:24PM +0900, June Kim wrote:
}  My version using dict, which is also included in my
original posting(I
}  don't think you have seen it yet), is tail-recursive and
functional,
}  but slow.

It's more functional than your friend's but not purely
functional.
Here's a completely functional solution to the problem:

-module(threeNplus1).
-export([max/2]).

do(1, Acc) ->
    Acc + 1;
do(N, Acc) when N rem 2 /= 0 ->
    do(3 * N + 1, Acc + 1);
do(N, Acc) ->
    do(N div 2, Acc + 1).

max(I, J) ->
    io:fwrite("~b ~b ", [I, J]),
    max(I, J, 0).
max(I, J, Max) when I > J ->
    io:fwrite("~b~n", [Max]);
max(I, J, Max) ->
    Count = do(I, 0),
    if
        Count > Max ->
            max(I + 1, J, Count);
        true ->
            max(I + 1, J, Max)
    end.

1> threeNplus1:max(1, 10).
1 10 20
ok
2> threeNplus1:max(100, 200).
100 200 125
ok
3> threeNplus1:max(201, 210).
201 210 89
ok
4> threeNplus1:max(900, 1000).
900 1000 174
ok

}  Could you help me improving the speed of these codes with
FP approach?

5> timer:tc(threeNplus1, max, [1, 1000000]).
1 1000000 525
{9860100,ok}

The reference you gave didn't mention any performance
objectives.
For my money under ten seconds is good enough for this
exercise.
One of the tenants in the Erlang community is "optimize
last".
Quoting from the original Erlang book "The primary
concern must
always be of correctness, and to this aim we encourage the
development
of small and beautiful algorithms which are `obviously'
correct."

    -Vance
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
user name
2007-04-07 21:07:43
For benchmarking, you don't want the export_all directive in
there,
because it prevents certain kinds of optimizations.  For
best results,
export only the functions you need.
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
country flaguser name
New Zealand
2007-04-12 01:08:04
On 8 Apr 2007, at 1:18 am, June Kim wrote:
> Please have a look at this famous problem. http://acm.uva.es/p/ 
> v1/100.html

cycle_length(N) when is_integer(N), N >= 1 ->
     cycle_length(N, 1).

cycle_length(1, L) -> L;
cycle_length(N, L) ->
     cycle_length(case N band 1 of 1 -> 3*N+1; 0 -> N
div 2 end, L+1).

The conditions of the problem ensure that this will not in
fact involve
any cycle at all; "cycle length" is an
astoundingly bad name for the
quantity to be computed.

max_cycle_length(I, J) when is_integer(I), is_integer(J), I
=< J ->
     max_cycle_length(I+1, J, cycle_length(I)).

max_cycle_length(I, J, M) when I > J -> M;
max_cycle_length(I, J, M) ->
     max_cycle_length(I+1, J, max(L, cycle_length(J)).

max(X, Y) when X > Y -> X;
max(_, Y) -> Y.

Simple.  Quite fast enough.  I just asked for
max_cycle_length 
(1,1000000)
and got an answer in 43.8 CPU seconds on a 500 MHz machine. 
I know you
speak of your "sluggish machine", but was it as
slow as mine?  How fast
does the calculation NEED to be?

> cycle(N) ->
>     case get(N) of

Yes, memoisation can give you very worthwhile speedups; IF
you have a  
big
problem with repeated calculations to start with.  However,
maintaining
the memory of function arguments and results itself takes
time, and  
above
all, space.  And on today's machines, more space definitely
means less
space.  (I've been measuring load times as
	from L1 cache:	2-3 cycles
	from L2 cache:  10 cycles
	from main memory
	(sequential): ~100 cycles
	(random):    ~1000 cycles
on three different kinds of machines lately.  The C version
of the code
above runs entirely from registers; making it poke around in
hash tables
would probably slow it down by a factor of a thousand just
because of  
using
more memory.

If you do want to memoise, it's probably worth keeping the
memo small.
Try keeping the cache to just 2*(0..1023)+1.

The comparatively small ratio between 3 seconds on your
"sluggish  
machine"
and 43.8 seconds on my 500 MHz machine (14.6) suggests to me
that  
there IS
a certain amount of repeated calculation but not a huge
amount, so a  
small
cache might work very well.


_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: speeding up 3N+1
country flaguser name
Greece
2007-04-12 08:30:55
ok wrote:
> On 8 Apr 2007, at 1:18 am, June Kim wrote:
>> Please have a look at this famous problem. http://acm.uva.es/p/ 
>> v1/100.html
> 
> cycle_length(N) when is_integer(N), N >= 1 ->
>      cycle_length(N, 1).
> 
> cycle_length(1, L) -> L;
> cycle_length(N, L) ->
>      cycle_length(case N band 1 of 1 -> 3*N+1; 0
-> N div 2 end, L+1).
> 
> The conditions of the problem ensure that this will not
in fact involve
> any cycle at all; "cycle length" is an
astoundingly bad name for the
> quantity to be computed.
> 
> max_cycle_length(I, J) when is_integer(I),
is_integer(J), I =< J ->
>      max_cycle_length(I+1, J, cycle_length(I)).
> 
> max_cycle_length(I, J, M) when I > J -> M;
> max_cycle_length(I, J, M) ->
>      max_cycle_length(I+1, J, max(L, cycle_length(J)).

This last function cannot possibly be correct.
There is a parenthesis missing and L and M are singleton
variables.
Moreover, even if the L gets substituted with M, this
implements a
different algorithm.

The correct function should read:

max_cycle_length(I, J, M) when I > J -> M;
max_cycle_length(I, J, M) ->
     max_cycle_length(I+1, J, max(M, cycle_length(I))).

and then the algorithm is that which is required by the 3N+1
problem.


I also wonder why nobody has suggested compilation to native
code for 
speeding up execution in this case.  On my system (which has
some extra 
optimizations, not yet present in R11B-4) I get:

1> c(threeNplus1_ok).
{ok,threeNplus1_ok}
2> timer:tc(threeNplus1_ok, max_cycle_length,
[1,1000000]).
{29245755,525}
3> hipe:c(threeNplus1_ok).
{ok,threeNplus1_ok}
4> timer:tc(threeNplus1_ok, max_cycle_length,
[1,1000000]).
{2517352,525}

I.e. more than 10 times faster.  The native code compiler in
R11B-4 
should give something like 3-4 times speedup compared with
BEAM.

Kostis
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

[1-8]

about | contact  Other archives ( Real Estate discussion Medical topics )