how to make it faster? mailing lists">


List Info

Thread: "ocaml_beginners"::[] reading big file -> how to make it faster?




"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-04 19:47:04
Hello,

some days ago I came back to an old quick-hack
to read in ppm-files and get the minimal bounding box of
ot, converting to ASCII and Postscript and so on.

I tried it with a file of 6524897 bytes size.

The conversion to ASCII needed about between 15 and 16
seconds.
(byte code)

To look for enhance the operformance I commented out
anything but the reading code.
It needs about 8 seconds then (about 5 seconds native
compilated).

I use arrays, and thought this must be fast.
The code might be enhanced in some ways
 (maybe different usage of indexing, or reading with
  Unix-module instead of high-level functions...)
but which would be the most effective way of enhancing
performance?

(Or is the main time be used by the GC for allocating the
memory?)



From the ocamlcp-output:

============================================================
=================
 module Input_ppm =
 (
 struct

   let exit_on_magic_error magic = (* 1 *) if magic <>
"P6" then (* 0 *) (prerr_endline
"magic-number error!"; exit 1)


   let load_picture chan =
     (* 1 *) let colornum = 3 in
     let magic_buf = "  "
     in
       really_input chan  magic_buf 0 2;
       exit_on_magic_error magic_buf;
       really_read_one_blank chan;

       let width  = int_of_string (read_word chan) in
       let height = int_of_string (read_word chan) in
       let color  = int_of_string (read_word chan) in

Printf.fprintf stderr "width: %d, height: %d, colors:
%d\n" width height color;
flush stderr;

        color_check color; (* CHECKING if color is in valid
range! (max. 255 supported now) *)

     let picture = Array.make_matrix width height
{red=0;green=0;blue=0} in
     let linewidth = colornum * width in
     let buffer = String.make  linewidth ' '
     in
       for y = 0 to height - 1
       do
         (* 1754 *) really_input chan buffer 0 linewidth;
         for xi = 0 to width - 1
         do
           (* 2174960 *) let start = 3 *xi in
           picture.(xi).(y)   <- { red = int_of_char
(buffer.[start]);
                      green = int_of_char
(buffer.[start+1]);
                      blue = int_of_char (buffer.[start+2])
}
         done
       done;
       { xdim= width; ydim = height; colors = color; data =
picture }
end
 :
sig
 val load_picture : in_channel -> picture_t
end
)
============================================================
=================


TIA,
  Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Get to your groups with one click. Know instantly when new
email arrives
http://us.click.yahoo.com/.7bhrC/MGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-04 20:07:13
>  module Input_ppm =
>  (
>  struct
>
>   let exit_on_magic_error magic = (* 1 *) if magic
<> "P6" then (* 0 *) (prerr_endline
"magic-number error!"; exit 1)
>
>
>   let load_picture chan =
>     (* 1 *) let colornum = 3 in
>     let magic_buf = "  "
>     in
>       really_input chan  magic_buf 0 2;
>       exit_on_magic_error magic_buf;
>       really_read_one_blank chan;
>
>       let width  = int_of_string (read_word chan) in
>       let height = int_of_string (read_word chan) in
>       let color  = int_of_string (read_word chan) in
>
> Printf.fprintf stderr "width: %d, height: %d,
colors: %d\n" width height color;
> flush stderr;
>
>        color_check color; (* CHECKING if color is in
valid range! (max. 255 supported now) *)
>
>     let picture = Array.make_matrix width height
{red=0;green=0;blue=0} in
>     let linewidth = colornum * width in
>     let buffer = String.make  linewidth ' '
>     in
>       for y = 0 to height - 1
>       do
>         (* 1754 *) really_input chan buffer 0
linewidth;
>         for xi = 0 to width - 1
>         do
>           (* 2174960 *) let start = 3 *xi in
>           picture.(xi).(y)   <- { red = int_of_char
(buffer.[start]);
>                      green = int_of_char
(buffer.[start+1]);
>                      blue = int_of_char
(buffer.[start+2]) }
>         done
>       done;
>       { xdim= width; ydim = height; colors = color;
data = picture }
> end
>  :
> sig
>  val load_picture : in_channel -> picture_t
> end
> )

This may not be useful, but have you thought of using the
string
itself as the internal representation of the image?

Another alternative is to make the red, blue, and green
components
mutable, and updating their values in the array directly,
rather than
create new records to replace the initial ones. Or, if that
isn't
feasible, use Array.init instead of Array.make_matrix.

let init_matrix n m f = Array.init n (fun n -> Array.init
m (f n))

In this case, you'd want a local function to refill a
buffer once exhausted.

That should limit the runtime cost anyways. As for the
reading taking
8 seconds... I don't know what to do about that. Mind you,
a 6MB file
isn't exactly small... you may consider reading the whole
file into
memory in one go (Sys.max_string_length is 16M chars on
32-bit arch).

Jonathan


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Get to your groups with one click. Know instantly when new
email arrives
http://us.click.yahoo.com/.7bhrC/MGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-04 20:59:18
Hi,

On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan Roewen
wrote:
[...]
> This may not be useful, but have you thought of using
the string
> itself as the internal representation of the image?

Not really.
As I need to grab parts of the original out (the original
picture might have wide areas which are only blank, and
I want to look for the minimal "bounding box" of
the picture.
So, having a (x,y) representation makes things easier.

I look for better performance, but not at the cost of
extraordinary
long development time. 


> 
> Another alternative is to make the red, blue, and green
components
> mutable, and updating their values in the array
directly, rather than
> create new records to replace the initial ones.

Well, I ran the program again (bytecode).
In the original version it takes about 9.2 ... 9,4 seconds
(why not 8.x as before? well... my system is not at high
load...
 ...hmhhh).

OK, from these about 9 seconds by using mutables and
directly updating them
I got about 5.6 seconds with the bytecode.

That's cool. 

Faster would be better, but enhancing the tool on nearly 50%
in performance,
by changing four lines of code is really amazing. 


> Or, if that isn't
> feasible, use Array.init instead of Array.make_matrix.
> 
> let init_matrix n m f = Array.init n (fun n ->
Array.init m (f n))
> 
> In this case, you'd want a local function to refill a
buffer once exhausted.

Well... I didn't tried yet, but I think you mean the
advantage here is
to have the correct initialized Array at the time it is
created, so
that the memory isn't first initialized with
"something" and then
set to the needed values later.

Maybe that also enhances performance.

I can try.


> 
> That should limit the runtime cost anyways. As for the
reading taking
> 8 seconds... I don't know what to do about that. Mind
you, a 6MB file
> isn't exactly small...

Yes, 6 MB isn't really small. But when comparing to other
code I've
written in OCaml or when looking at what other (ok, simpler)
tools need 
for working on that file, it seems to me that it should be
faster.


> you may consider reading the whole file into
> memory in one go (Sys.max_string_length is 16M chars on
32-bit arch).

Well... hmhhh.

Ciao,
   Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Home is just a click away.  Make Yahoo! your home page now.
http://us.click.yahoo.com/DHchtC/3FxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 09:31:33
On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan Roewen
wrote:
[...] 
> Another alternative is to make the red, blue, and green
components
> mutable, and updating their values in the array
directly, rather than
> create new records to replace the initial ones. Or, if
that isn't
> feasible, use Array.init instead of Array.make_matrix.
> 
> let init_matrix n m f = Array.init n (fun n ->
Array.init m (f n))

I have changed my code so that it now uses both: Array.init
and Array.make.

=========
let picture = Array.init width (fun x -> Array.make
height {red=0;green=0;blue=0}) in
(...)
  do
    let start = 3 *xi in
    picture.(xi).(y).red <- int_of_char buffer.[start];
    picture.(xi).(y).green <- int_of_char
buffer.[start+1];
    picture.(xi).(y).blue <- int_of_char buffer.[start+2]
  done
=========

I have done this, because a second init is not necessary,
when I refill the array with the data I read.

In native-code its between 5 and 6 seconds now.

Maybe more optimizations are possible,
but it's now much better than before. 
And for this tool I will use native code now.

Thanks to all. 

Regards,
   Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Get to your groups with one click. Know instantly when new
email arrives
http://us.click.yahoo.com/.7bhrC/MGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 09:48:06
On Mon, Jun 05, 2006 at 11:31:33AM +0200, Oliver Bandel
wrote:
> On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan
Roewen wrote:
> [...] 
> > Another alternative is to make the red, blue, and
green components
> > mutable, and updating their values in the array
directly, rather than
> > create new records to replace the initial ones.
Or, if that isn't
> > feasible, use Array.init instead of
Array.make_matrix.
> > 
> > let init_matrix n m f = Array.init n (fun n ->
Array.init m (f n))
> 
> I have changed my code so that it now uses both:
Array.init and Array.make.
> 
> =========
> let picture = Array.init width (fun x -> Array.make
height {red=0;green=0;blue=0}) in
[...]

didn't work?!
Need both init's?
Or must make and init be changed? I thought the inner array
can be used like above...

Today I will study the OReilly book on that topic...

...come back later. 

Ciao,
   Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Protect your PC from spy ware with award winning anti spy
technology. It's free.
http://us.click.yahoo.com/97bhrC/LGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 10:03:36
> > I have changed my code so that it now uses both:
Array.init and Array.make.

Array.make works fine with primitives .. e.g. ints, bools,
chars,
enums, as they're not allocated blocks. All other
structures that are
allocated blocks need Array.init.


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Home is just a click away.  Make Yahoo! your home page now.
http://us.click.yahoo.com/DHchtC/3FxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 16:12:51
On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen
wrote:
> > > I have changed my code so that it now uses
both: Array.init and Array.make.
> 
> Array.make works fine with primitives .. e.g. ints,
bools, chars,
> enums, as they're not allocated blocks. All other
structures that are
> allocated blocks need Array.init.

OK, thanks for the hint; this is what I now also found in
the OReilly-book.
Float and all datastructures that do not fit into one
machine word will
be referred to by pointers.

(How will that be on 64 Bit machines? Is it the same?)

So, if I have to use Array.init inside Array.init I doubt
that will be
faster than using make_matrix and inserting new records.

Maybe my first way was ok (with immutable records)?
I can test this again.


Ciao,
   Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Protect your PC from spy ware with award winning anti spy
technology. It's free.
http://us.click.yahoo.com/97bhrC/LGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 16:57:51
Floats are primitive types, too, AFAIK. (i.e., not referred
to by pointers)

Fred

Oliver Bandel wrote:
> On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan
Roewen wrote:
>  > > > I have changed my code so that it now
uses both: Array.init and Array.make.
>  >
>  > Array.make works fine with primitives .. e.g.
ints, bools, chars,
>  > enums, as they're not allocated blocks. All
other structures that are
>  > allocated blocks need Array.init.
>
> OK, thanks for the hint; this is what I now also found
in the OReilly-book.
> Float and all datastructures that do not fit into one
machine word will
> be referred to by pointers.
>
> (How will that be on 64 Bit machines? Is it the same?)
>
> So, if I have to use Array.init inside Array.init I
doubt that will be
> faster than using make_matrix and inserting new
records.
>
> Maybe my first way was ok (with immutable records)?
> I can test this again.
>
>
> Ciao,
>    Oliver
>   



------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Everything you need is one click away.  Make Yahoo! your
home page now.
http://us.click.yahoo.com/AHchtC/4FxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 16:41:43
On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen
wrote:
> > > I have changed my code so that it now uses
both: Array.init and Array.make.
> 
> Array.make works fine with primitives .. e.g. ints,
bools, chars,
> enums, as they're not allocated blocks. All other
structures that are
> allocated blocks need Array.init.
[...]

But when Array.make is called by the Array.init each time
again...
..isn't it the creating a new record every time?

============================================================
# let mm4 = Array.init 3 (fun x -> Array.make 3
{x=9;y=99;z=999});;
val mm4 : xyz array array =
  [|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
      {x = 9; y = 99; z = 999}|];
    [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
      {x = 9; y = 99; z = 999}|];
    [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
      {x = 9; y = 99; z = 999}|]|]
# mm4.(01).(1) <- { x=12;y=55;z=99};;
- : unit = ()
# mm4;;
- : xyz array array =
[|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
    {x = 9; y = 99; z = 999}|];
  [|{x = 9; y = 99; z = 999}; {x = 12; y = 55; z = 99};
    {x = 9; y = 99; z = 999}|];
  [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
    {x = 9; y = 99; z = 999}|]|]
# 
============================================================


Look like that it should work...?!

Why not in my code?

Ciao,
   Oliver


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Protect your PC from spy ware with award winning anti spy
technology. It's free.
http://us.click.yahoo.com/97bhrC/LGxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 



"ocaml_beginners"::[] reading big file -> how to make it faster?
user name
2006-06-05 16:58:41
> Floats are primitive types, too, AFAIK. (i.e., not
referred to by pointers)

No, floats are double-precision, and therefore boxed. But
OCaml also
has an optimisation of unboxing floats in arrays (and
records when all
fields are floats).


------------------------ Yahoo! Groups Sponsor
--------------------~--> 
Home is just a click away.  Make Yahoo! your home page now.
http://us.click.yahoo.com/DHchtC/3FxNAA/yQLSAA/saFolB/TM

------------------------------------------------------------
--------~-> 

Archives up to August 22, 2005 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
The archives of the very official ocaml list (the seniors'
one) can be found at http://caml.inria.fr
Attachments are banned and you're asked to be polite, avoid
flames etc. 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http:/
/groups.yahoo.com/group/ocaml_beginners/

<*> To unsubscribe from this group, send an email to:
    ocaml_beginners-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.c
om/info/terms/
 


[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )