Email lists > > Re: LAM: MPI_BARRIER problem > Re: LAM: MPI_BARRIER problem

Re: LAM: MPI_BARRIER problem




This post if a part of  this thread

2008-05-07 19:42:04
Re: LAM: MPI_BARRIER problem
On May 7, 2008, at 8:36 PM, richard pan wrote:

> MPI_Recv: process in local group is dead (rank 2,
MPI_COMM_WORLD)
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> MPI_Recv: process in local group is dead (rank 1,
MPI_COMM_WORLD)
>
------------------------------------------------------------
--------------
> One of the processes started by mpirun has exited with
a nonzero exit
> code.  This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.


I believe this error message says it all -- one of your
processes has  
died.

Specifically: MPI_BARRIER isn't what caused your app to die;
 
MPI_BARRIER is the function that noticed that the other
process was  
dead, reported the problem, and then aborted all remaining
MPI  
processes.

Your program is a bit too long for me to debug; Brian's
advice of  
running through debuggers is probably your best bet.  Also
check for  
corefiles that may indicate where your program died.

-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

about | contact  Other archives ( Real Estate discussion Medical topics )