List Info

Thread: LAM: Mac OS X heterogeneous cluster mpirun problem




LAM: Mac OS X heterogeneous cluster mpirun problem
user name
2006-06-12 17:53:32
Hello LAM/MPIers:

I have compiled the Gromacs MD program for both PPC and Intel, I can run each binary on their corresponding architectures, but when I try to do an mpirun across the two different machines, the job fails after a few seconds. Each machine can see and execute its architecture specific binary, the LAM/MPI is the universal build of LAM/MPI 7.1.2 from the website. Any ideas on what I might be doing wrong? Below is the only message that I'm getting:

calculon$ mpirun -np 4 mdrun
NNODES=4, MYRANK=1, HOSTNAME=Warner-Computer.local
NNODES=4, MYRANK=3, HOSTNAME=Warner-Computer.local
NNODES=4, MYRANK=0, HOSTNAME=portal.private
NNODES=4, MYRANK=2, HOSTNAME=portal.private
NODEID=2 argc=1
NODEID=0 argc=1
NODEID=1 argc=16777216
NODEID=3 argc=16777216
MPI_Recv: message truncated (rank 2, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD):  - MPI_Recv()
Rank (2, MPI_COMM_WORLD):  - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 341 failed on node n0 (10.0.1.1) with exit status 1.
-----------------------------------------------------------------------------
calculon$ 


Warner Yuen

Research Computing Consultant

Apple Computer

email: apple.com">wyuenapple.com

Tel: 408.718.2859

Fax: 408.715.0133



[1]

about | contact  Other archives ( Real Estate discussion Medical topics )