List Info

Thread: "ocaml_beginners"::[] threads




"ocaml_beginners"::[] threads
country flaguser name
United States
2007-08-15 03:47:59

Hello! I maintain a quite complex ocaml program doing morphological analysis. (I didn't originally write it: I really am an ocaml-beginner.)

We have bindings, so that we can call the code from C. (And we have Java bindings as a second layer.) Now we would like to use it in a multi-threaded fashion. The structure of the whole thing looks like this: First we build a single instance of an analyzer function. This is a complex beast, but it has a very simple interface: it maps strings to stringlists (words to analyses). Then, from C code, we bombard this function with inputs. If we do this in a single thread, everything works fine. If we do it in parallel, it crashes.

Now my newbie question: Do we even have a chance? I mean, should this work? If not, are there simple modifications to our code to make it work? Where can I learn about the issues of multi-threaded ocaml code, and specifically, about calling ocaml code from multi-threaded native code?

I note that queuing the requests would not be an optimal solution. The running time of the algorithm varies wildly between words, and "rare, complex" words could deny service for "typical, easy" words, which is not nice in a search engine.

By the way, the code is open source, check it out if you are interested in morphological analysis:
http://mokk.bme.hu/resources/hunmorph/

Thank you for any insights,
Dániel Varga

---------------------------------
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.

[Non-text portions of this message have been removed]

__._,_.___
.

__,_._,___
Re: "ocaml_beginners"::[] threads
country flaguser name
United Kingdom
2007-08-15 04:58:14

On Wednesday 15 August 2007 09:47:59 Daniel Varga wrote:
> Hello! I maintain a quite complex ocaml program doing morphological
> analysis. (I didn't originally write it: I really am an ocaml-beginner.)
>
> We have bindings, so that we can call the code from C. (And we have Java
>; bindings as a second layer.) Now we would like to use it in a
> multi-threaded fashion. The structure of the whole thing looks like this:
> First we build a single instance of an analyzer function. This is a complex
> beast, but it has a very simple interface: it maps strings to stringlists
> (words to analyses). Then, from C code, we bombard this function with
>; inputs. If we do this in a single thread, everything works fine. If we do
> it in parallel, it crashes.
>
>; Now my newbie question: Do we even have a chance? I mean, should this work?
> If not, are there simple modifications to our code to make it work? Where
> can I learn about the issues of multi-threaded ocaml code, and
> specifically, about calling ocaml code from multi-threaded native code?
>
> I note that queuing the requests would not be an optimal solution. The
> running time of the algorithm varies wildly between words, and "rare,
> complex" words could deny service for "typical, easy" words, which is not
> nice in a search engine.

Sounds like an ideal task for forked processes rather than threads. Can you
build your internal data structure and then fork a pool of processes before
farming incoming requests out to free processes?

This adds the latency of message passing (I can't quantify the overhead) but
has the advantage of true concurrency and the potential for distribution
across a cluster.

You might also like to look at camlp3l:

http://camlp3l.inria.fr/eng.htm

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

__._,_.___
.

__,_._,___
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )