|
List Info
Thread: LAM: Error: "Fatal error in MPI_Recv: Other MPI error, error stack"... What's wrong?!
|
|
| LAM: Error: "Fatal error in
MPI_Recv: Other MPI error, error
stack"... What's wrong?! |

|
2007-07-25 13:44:56 |
|
I have problems with mpiJava (that use MPICH2 implementation).
Packing pixels of images with 11MB, the same program works very well, but using pixels of images with 21MB, it doesn't work correctly.
With 3 pcs and 3 processes, it's Ok! But with 3 pcs and 4 processes, occur the followings errors:
mpirun -np 4 java -Xmx300M Med21MB3x3
[cli_2]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0xb0c35008, count=14617628, MPI_BYTE, src=0, tag=902, MPI_COMM_WORLD, status=0x876fa50) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)
[cli_1]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0xb0c1d008, count=14606208, MPI_BYTE, src=0, tag=901, MPI_COMM_WORLD, status=0x9a81818) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)
rank 2 in job 10 lab07_15_33967 caused collective abort of all ranks
exit status of rank 2: return code 1
rank 1 in job 10 lab07_15_33967 caused collective abort of all ranks
exit status of rank 1: return code 1
rank 0 in job 10 lab07_15_33967 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
what's wrong?
Thanks,
Priscila.
-- (>';''''<) ( ' ; ' ) ( )( ) Prí
|
| Re: LAM: Error: "Fatal error in
MPI_Recv: Other MPI error, error
stack"... What's wrong?! |
  United States |
2007-07-25 15:14:53 |
On Jul 25, 2007, at 12:44 PM, Priscila Saito wrote:
> I have problems with mpiJava (that use MPICH2
implementation).
>
> Packing pixels of images with 11MB, the same program
works very
> well, but using pixels of images with 21MB, it doesn't
work correctly.
>
> With 3 pcs and 3 processes, it's Ok! But with 3
pcs and 4
> processes, occur the followings errors:
This mailing list is for issues relating to the use of
LAM/MPI. You
should contact the MPICH2 support mailing list (it's on
their web
page) for help with MPICH2.
http://www-un
ix.mcs.anl.gov/mpi/mpich2/
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
[1-2]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|