List Info

Thread: LAM: Error: "Fatal error in MPI_Recv: Other MPI error, error stack"... What's wrong?!




LAM: Error: "Fatal error in MPI_Recv: Other MPI error, error stack"... What's wrong?!
user name
2007-07-25 13:44:56
I have problems with mpiJava (that use MPICH2 implementation).
 

Packing pixels of images with 11MB, the same program works very well, but using pixels of images with 21MB, it doesn't work correctly. 

With 3 pcs and 3 processes, it's Ok! But with 3 pcs and 4 processes, occur the followings errors:

 

mpirun -np 4 java -Xmx300M Med21MB3x3

 

[cli_2]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c35008, count=14617628, MPI_BYTE, src=0, tag=902, MPI_COMM_WORLD, status=0x876fa50) failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)

[cli_1]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c1d008, count=14606208, MPI_BYTE, src=0, tag=901, MPI_COMM_WORLD, status=0x9a81818) failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)

rank 2 in job 10  lab07_15_33967 caused collective abort of all ranks

  exit status of rank 2: return code 1

rank 1 in job 10  lab07_15_33967 caused collective abort of all ranks

  exit status of rank 1: return code 1

rank 0 in job 10  lab07_15_33967 caused collective abort of all ranks

  exit status of rank 0: killed by signal 9

 

what's wrong?

 

Thanks,

 

Priscila.


 



--
(>';''''<)  
(  ' ; ' )  
()()  Prí
Re: LAM: Error: "Fatal error in MPI_Recv: Other MPI error, error stack"... What's wrong?!
country flaguser name
United States
2007-07-25 15:14:53
On Jul 25, 2007, at 12:44 PM, Priscila Saito wrote:

> I have problems with mpiJava (that use MPICH2
implementation).
>
> Packing pixels of images with 11MB, the same program
works very  
> well, but using pixels of images with 21MB, it doesn't
work correctly.
>
> With 3 pcs and 3 processes, it's Ok!  But with 3
pcs and 4  
> processes, occur the followings errors:

This mailing list is for issues relating to the use of
LAM/MPI.  You  
should contact the MPICH2 support mailing list (it's on
their web  
page) for help with MPICH2.

     http://www-un
ix.mcs.anl.gov/mpi/mpich2/


Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )