List Info

Thread: LAM: MPI_Init failure




LAM: MPI_Init failure
user name
2006-05-30 16:08:54
Hi,
 
I'm using the LAM/MPI 7.1.2.  I have the lam daemons running.
My main (master) process seems to be failing at MPI_Init.  Here is my command and output.
I've added some debug fprintf in the source code. ; It's failing at kinit() but my debug printf are
not coming out of that function.  I don't know why.  Any idea why kinit is failing??????
 
$ mpirun C -v -vv -ssi rpi tcp /sbox/yamend/r33/amd64_linux24/64/bin/TWTgen
IN kinit - regiester_pid getpid = 25833
IN kinit - _kid.ki_pid = 0
IN kinit - calling _ksig_init
25834 /sbox/yamend/r33/amd64_linux24/64/bin/TWTgen running on n0 (o)
mpirun: waiting for MPI_INIT from 1 processes...
Calling MPI_Init
IN MPI_Init - calling lam_setfunc
IN MPI_Init - returned from lam_setfunc
IN MPI_Init - calling lam_mpi_init
IN lam_mpi_init - calling Initialized
IN lam_mpi_init - Initialized = 0
IN lam_mpi_init - Finalized = 0
IN lam_mpi_init - calling lam_tv_load_type_defs
IN lam_mpi_init - calling lam_linit
IN lam_linit - calling kenter /sbox/yamend/r33/amd64_linux24/64/bin/TWTgen
IN lam_linit - before kenter errno = 0
IN kenter - calling kinit
IN lam_linit - errno = 471
IN lam_linit - returning errno=1239
IN lam_linit - returning LAMERROR=-1
mpirun: someone died before MPI_INIT -- rank 0
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
 
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
mpirun: receiving 0 useless MPI_INIT/MPI_FINALIZE messages...
LAM: MPI_Init failure
user name
2006-06-08 03:58:59
On May 30, 2006, at 10:08 AM, YoungHui Amend wrote:

> I'm using the LAM/MPI 7.1.2.  I have the lam daemons
running.
> My main (master) process seems to be failing at
MPI_Init.  Here is  
> my command and output.
> I've added some debug fprintf in the source code. 
It's failing at  
> kinit() but my debug printf are
> not coming out of that function.  I don't know why. 
Any idea why  
> kinit is failing??????


> IN lam_linit - calling kenter
/sbox/yamend/r33/amd64_linux24/64/bin/ 
> TWTgen
> IN lam_linit - before kenter errno = 0
> IN kenter - calling kinit
> IN lam_linit - errno = 471
> IN lam_linit - returning errno=1239

It looks like kinit is failing because it couldn't contact
the kernel  
process.  This could happen if you are trying to start a
whole lot of  
processes (more than 64) on one node.  Other than that, I'm
not  
really sure why it could be failing at that point.  Could
you try  
stepping through the code with a debugger to figure out
where the  
error first occurs?  An "easy" way to do this
would be to start your  
application in gdb running in an xterm:

   mpirun -np 1 xterm -e gdb ./myapp

Can you include some more information about the platform you
are  
using and which compilers you used to build LAM?


Thanks,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )