On Mar 19, 2008, at 7:16 AM, gauri dhopavkar wrote:
> I am doing Post Graduation in Computer Science. My
project topic is
> Grid computing. I have built up a Lam/Mpi cluster in
our college lab
> which allows parallel execution of job. Please answer
these queries:
>
> 1. Are there any standard ready-made applications which
can be run
> on this cluster for demonstration purpose?
>
There are many... Have a look in the examples/ directory of
the LAM/
MPI tarball for simple ones, or a quick google search should
find a
good set of MPI applications.
> 2. Is there any mechnism which allows to store process
> status(details) executing on one node of cluster to
other node? this
> is needed in case of node failure.
>
This is not part of the MPI standard. Generally
applications use
custom checkpointing mechanisms or system level
checkpointing for
handling node failures. LAM/MPI supports integration with
the BLCR
system level checkpointer on Linux systems. Have a look at
our paper
on the subject for more details:
http://www.lam-mpi.org/papers/lacsi2003/lacsi-2003.pdf
> 3. How to take snapshot of a process state?
>
This is a difficult task. I'd recommend the above paper for
more
details on system level checkpointing. If you search on ACM
or IEEE's
database, I'm sure you'll find a number of papers on system
and
application level checkpointing for MPI. It's a complex
topic with
lots of tradeoffs.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|