List Info

Thread: Re: LAM: Problem with LSDyna and LAM




Re: LAM: Problem with LSDyna and LAM
country flaguser name
Australia
2008-03-30 02:34:50
Hello Michael,

Thanks for your reply.

My master node has two network interfaces, one public and another private. but the compute node has just one i.e. private. LAM configuration is fine as i am able to run other parallel jobs (e.g. Factorial calculation) successfully on both the nodes and i get the correct output. I have the users home directory shared via NFS. But when I run LS-Dyna, i get the problem running on both the nodes.
FYI - LS-Dyna runs fine if i invoke LAM on just one node and run only on signle node.

Thanks,
Jigar

Michael Arndt <M.Arndtscience-computing.de&gt; wrote:
Hello Jigar

-does each cluster node have one or two network interfaces ?

keep in mind that for network connection on both nodes
lam/mpi must have the same opinion over which interface
to connect.

Translated: the hostnames in the lamhost file must resolve
to the same network !

Are the names in your hostfile are generated via the exec host list of
a batch system like PBS / SGE / LSF ?


-2nd trap: in case you do not have a NSF shared working directory
as common working directory for the calculation
the following recipe will help to resolve the real
problem easier:

mkdir -p /scratch/mydynajob on both nodes !
copy all input inclusive lamhosts file to both nodes
then start the job

dyna + lam7.0.3 works perfectly well and easy ...
so probably your problem ist a network / routing problem
of two prcesses not talking over the sanme interface

in case of problems verify also that the 2 CPU / single node job
runs on both nodes, so per se both nodes are configured ok

hth
Micha







On Sat, Mar 29, 2008 at 10:17:32AM -0700, Jigar Halani wrote:
>; Hello
>
> I have a problem running LSDyna with LAM-MPI 7.0.3. I am using precompiled LSDyna binaries with LAM 7.0.3. When I run the job using just one node, it runs fine. But if i run the job over the network on 2 machines, it fails giving an error
>
> "It seems that[at least] one of the processes that was started with mpirun did not invoke MPI_INIT before quitting
&gt; (it is possible that more than one process did not invoke MPI_INIT -- mpirun was only notified of the first one, which was on node n0"
>
> Can you please let me know what is the problem.
&gt;
> Thanks in advance,
&gt; Regards,
&gt; Jigar
>
>
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
&gt; _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196


_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/


Never miss a thing. Make Yahoo your homepage.
Re: LAM: Problem with LSDyna and LAM
country flaguser name
United States
2008-03-30 07:53:18
Jigar Halani wrote:
> Hello Michael,
> 
> Thanks for your reply.
> 
> My master node has two network interfaces, one public
and another private. but the compute node has just one i.e.
private. LAM configuration is fine as i am able to run other
parallel jobs (e.g. Factorial calculation) successfully on
both the nodes and i get the correct output. I have the
users home directory shared via NFS. But when I run LS-Dyna,
i get the problem running on both the nodes. 
> FYI - LS-Dyna runs fine if i invoke LAM on just one
node and run only on signle node. 
> 
As you pointedly avoid any mention of the logs, and aren't
specific about 
whether you tried to run LS-DYNA on each node separately,
with just the 
environment variables you set in the . files, we still must
suspect that 
you didn't set up the 2nd node correctly.  Your LS-DYNA mpp
executable and 
the license information must be working on each node, and
each node must 
have access to the global and local file systems you have
chosen.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )