|
List Info
Thread: LAM: MPI error mesage
|
|
| LAM: MPI error mesage |
  United States |
2007-05-09 11:12:01 |
I am getting this error when I run my code with LAM. I was
using this
code with another system that was running with a slightly
older MPICH
and didn't get any errors like this. I would seem there is
something
with the way I am sending and receiving slices. Can you see
anything
obviously wrong with the way I am doing this?
* Starting updates
* cycle 1
MPI_Recv: invalid tag argument: Invalid argument (rank 0,
MPI_COMM_WORLD)
MPI_Send: invalid tag argument: Invalid argument: out of
range (rank 1,
MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Send()
Rank (1, MPI_COMM_WORLD): - main()
Rank (0, MPI_COMM_WORLD): - MPI_Recv()
Rank (0, MPI_COMM_WORLD): - main()
------------------------------------------------------------
------------
-----
One of the processes started by mpirun has exited with a
nonzero exit
code. This typically indicates that the process finished in
error.
If your process did not finish in error, be sure to include
a "return
0" or "exit(0)" in your C code before exiting
the application.
PID 22373 failed on node n0 (127.0.0.1) with exit status
22.
------------------------------------------------------------
------------
-----
mpirun failed with exit status 22
===========================code=============================
============
=
void hSndRcv(){
if(my_rank != comm_size-1){
MPI_Send(h_x+Z_OFFSET(my_dim_z),
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank+1,
3,
MPI_COMM_WORLD);
MPI_Send(h_y+Z_OFFSET(my_dim_z),
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank+1,
4,
MPI_COMM_WORLD);
MPI_Send(h_z+Z_OFFSET(my_dim_z),
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank+1,
5,
MPI_COMM_WORLD);
}
if(my_rank){
MPI_Recv(h_x,
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank-1,
3,
MPI_COMM_WORLD,
status);
MPI_Recv(h_y,
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank-1,
4,
MPI_COMM_WORLD,
status);
MPI_Recv(h_z,
(dim_x + 2*pml)*(dim_y + 2*pml),
MPI_FLOAT,
my_rank-1,
5,
MPI_COMM_WORLD,
status);
}
}
Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: MPI error mesage |
  United States |
2007-05-10 20:52:56 |
Well, that's pretty kooky. :-(
Here's the code from MPI_SEND that's generating the error:
if (tag < 0 || tag > lam_mpi_max_tag) {
return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
"out of range"));
}
But according to your code, that can't be happening because
your tags
are fixed positive integers (lam_mpi_max_tag is at least
32k).
Are you absolutely certain that this is where the problem is
occurring?
You might want to either run this through a debugger to
verify that
a) this is where the problem is occurring, and b) what LAM
thinks its
getting as a tag value. Or you could write some quick
MPI_Send /
MPI_Recv intercept functions that utilize the PMPI layer,
perhaps
something like this:
int MPI_Send(void *buf, int count, MPI_Datatype dtype, int
dest,
int tag, MPI_Comm comm)
{
if (tag < 0 || tag > 32767) {
char host[4096];
int i = 0;
gethostbyname(host, sizeof(host));
printf("%s:%d: got invalid tag in
MPI_Send! %dn",
host, getpid(), tag);
while (i == 0) sleep(5);
}
return PMPI_Send(buf, count, dtype, dest, tag,
comm);
}
(disclaimer: typed in e-mail; not verified!)
This will print out the host/pid of the offending
process(es) and
pause allowing you to attach a debugger. Modify the inner
part of
the block to suit your particular debugging tastes.
On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:
> I am getting this error when I run my code with LAM. I
was using this
> code with another system that was running with a
slightly older MPICH
> and didn't get any errors like this. I would seem
there is something
> with the way I am sending and receiving slices. Can
you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank
0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out
of range
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Send()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
>
------------------------------------------------------------
----------
> --
> -----
> One of the processes started by mpirun has exited with
a nonzero exit
> code. This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>
------------------------------------------------------------
----------
> --
> -----
> mpirun failed with exit status 22
>
>
===========================code=============================
==========
> ==
> =
> void hSndRcv(){
> if(my_rank != comm_size-1){
> MPI_Send(h_x+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 3,
> MPI_COMM_WORLD);
> MPI_Send(h_y+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 4,
> MPI_COMM_WORLD);
> MPI_Send(h_z+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 5,
> MPI_COMM_WORLD);
> }
> if(my_rank){
> MPI_Recv(h_x,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 3,
> MPI_COMM_WORLD,
> status);
> MPI_Recv(h_y,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 4,
> MPI_COMM_WORLD,
> status);
> MPI_Recv(h_z,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 5,
> MPI_COMM_WORLD,
> status);
> }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: MPI error mesage |
  United States |
2007-05-11 13:26:31 |
Sorry, you were right. I thought that I commented out all
of the mpi
communication except what I had posted bellow, but it turned
out that I
had another little function hiding out that was sending a
couple of
floats and it had a negative tag. I forgot about that one.
For some
reason, I guess I was thinking that the tag only had to be
an int and
not necessarily an unsigned int.
Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945
-----Original Message-----
From: lam-bounces lam-mpi.org [mailto:lam-bounces lam-mpi.org] On Behalf
Of Jeff Squyres
Sent: Thursday, May 10, 2007 8:53 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI error mesage
Well, that's pretty kooky. :-(
Here's the code from MPI_SEND that's generating the error:
if (tag < 0 || tag > lam_mpi_max_tag) {
return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
"out of range"));
}
But according to your code, that can't be happening because
your tags
are fixed positive integers (lam_mpi_max_tag is at least
32k).
Are you absolutely certain that this is where the problem is
occurring?
You might want to either run this through a debugger to
verify that
a) this is where the problem is occurring, and b) what LAM
thinks its
getting as a tag value. Or you could write some quick
MPI_Send /
MPI_Recv intercept functions that utilize the PMPI layer,
perhaps
something like this:
int MPI_Send(void *buf, int count, MPI_Datatype dtype, int
dest,
int tag, MPI_Comm comm)
{
if (tag < 0 || tag > 32767) {
char host[4096];
int i = 0;
gethostbyname(host, sizeof(host));
printf("%s:%d: got invalid tag in
MPI_Send! %dn",
host, getpid(), tag);
while (i == 0) sleep(5);
}
return PMPI_Send(buf, count, dtype, dest, tag,
comm);
}
(disclaimer: typed in e-mail; not verified!)
This will print out the host/pid of the offending
process(es) and
pause allowing you to attach a debugger. Modify the inner
part of
the block to suit your particular debugging tastes.
On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:
> I am getting this error when I run my code with LAM. I
was using this
> code with another system that was running with a
slightly older MPICH
> and didn't get any errors like this. I would seem
there is something
> with the way I am sending and receiving slices. Can
you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank
0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out
of range
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Send()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
>
------------------------------------------------------------
----------
> --
> -----
> One of the processes started by mpirun has exited with
a nonzero exit
> code. This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>
------------------------------------------------------------
----------
> --
> -----
> mpirun failed with exit status 22
>
>
===========================code=============================
==========
> ==
> =
> void hSndRcv(){
> if(my_rank != comm_size-1){
> MPI_Send(h_x+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 3,
> MPI_COMM_WORLD);
> MPI_Send(h_y+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 4,
> MPI_COMM_WORLD);
> MPI_Send(h_z+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank+1,
> 5,
> MPI_COMM_WORLD);
> }
> if(my_rank){
> MPI_Recv(h_x,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 3,
> MPI_COMM_WORLD,
> status);
> MPI_Recv(h_y,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 4,
> MPI_COMM_WORLD,
> status);
> MPI_Recv(h_z,
> (dim_x + 2*pml)*(dim_y +
2*pml),
> MPI_FLOAT,
> my_rank-1,
> 5,
> MPI_COMM_WORLD,
> status);
> }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: MPI error mesage |
  United States |
2007-05-11 13:40:56 |
That's the dichotomy of the MPI spec -- there's many places
where the
parameters are "int", but they really should be
other typed (e.g.,
signed or unsigned, specifically-sized such as int32_t,
etc.).
MPI is fun!
On May 11, 2007, at 2:26 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:
> Sorry, you were right. I thought that I commented out
all of the mpi
> communication except what I had posted bellow, but it
turned out
> that I
> had another little function hiding out that was sending
a couple of
> floats and it had a negative tag. I forgot about that
one. For some
> reason, I guess I was thinking that the tag only had to
be an int and
> not necessarily an unsigned int.
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
> -----Original Message-----
> From: lam-bounces lam-mpi.org [mailto:lam-bounces lam-mpi.org] On
> Behalf
> Of Jeff Squyres
> Sent: Thursday, May 10, 2007 8:53 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: MPI error mesage
>
> Well, that's pretty kooky. :-(
>
> Here's the code from MPI_SEND that's generating the
error:
>
> if (tag < 0 || tag > lam_mpi_max_tag) {
> return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
> "out of range"));
> }
>
> But according to your code, that can't be happening
because your tags
> are fixed positive integers (lam_mpi_max_tag is at
least 32k).
>
> Are you absolutely certain that this is where the
problem is
> occurring?
>
> You might want to either run this through a debugger to
verify that
> a) this is where the problem is occurring, and b) what
LAM thinks its
> getting as a tag value. Or you could write some quick
MPI_Send /
> MPI_Recv intercept functions that utilize the PMPI
layer, perhaps
> something like this:
>
> int MPI_Send(void *buf, int count, MPI_Datatype dtype,
int dest,
> int tag, MPI_Comm comm)
> {
> if (tag < 0 || tag > 32767) {
> char host[4096];
> int i = 0;
> gethostbyname(host, sizeof(host));
> printf("%s:%d: got invalid tag in
MPI_Send! %dn",
> host, getpid(), tag);
> while (i == 0) sleep(5);
> }
> return PMPI_Send(buf, count, dtype, dest, tag,
comm);
> }
>
> (disclaimer: typed in e-mail; not verified!)
>
> This will print out the host/pid of the offending
process(es) and
> pause allowing you to attach a debugger. Modify the
inner part of
> the block to suit your particular debugging tastes.
>
>
> On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr
AFRL/HEDR wrote:
>
>> I am getting this error when I run my code with
LAM. I was using
>> this
>> code with another system that was running with a
slightly older MPICH
>> and didn't get any errors like this. I would seem
there is something
>> with the way I am sending and receiving slices.
Can you see anything
>> obviously wrong with the way I am doing this?
>>
>> * Starting updates
>> * cycle 1
>> MPI_Recv: invalid tag argument: Invalid argument
(rank 0,
>> MPI_COMM_WORLD)
>> MPI_Send: invalid tag argument: Invalid argument:
out of range
>> (rank 1,
>> MPI_COMM_WORLD)
>> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Send()
>> Rank (1, MPI_COMM_WORLD): - main()
>> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (0, MPI_COMM_WORLD): - main()
>>
------------------------------------------------------------
---------
>> -
>
>> --
>> -----
>> One of the processes started by mpirun has exited
with a nonzero exit
>> code. This typically indicates that the process
finished in error.
>> If your process did not finish in error, be sure to
include a "return
>> 0" or "exit(0)" in your C code
before exiting the application.
>>
>> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>>
------------------------------------------------------------
---------
>> -
>
>> --
>> -----
>> mpirun failed with exit status 22
>>
>>
===========================code=============================
=========
>> =
>
>> ==
>> =
>> void hSndRcv(){
>> if(my_rank != comm_size-1){
>> MPI_Send(h_x+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank+1,
>> 3,
>> MPI_COMM_WORLD);
>> MPI_Send(h_y+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank+1,
>> 4,
>> MPI_COMM_WORLD);
>> MPI_Send(h_z+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank+1,
>> 5,
>> MPI_COMM_WORLD);
>> }
>> if(my_rank){
>> MPI_Recv(h_x,
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank-1,
>> 3,
>> MPI_COMM_WORLD,
>> status);
>> MPI_Recv(h_y,
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank-1,
>> 4,
>> MPI_COMM_WORLD,
>> status);
>> MPI_Recv(h_z,
>> (dim_x + 2*pml)*(dim_y +
2*pml),
>> MPI_FLOAT,
>> my_rank-1,
>> 5,
>> MPI_COMM_WORLD,
>> status);
>> }
>> }
>>
>> Sam Adams
>> General Dynamics - Network Systems
>> Phone: 210.536.5945
>>
>>
>> _______________________________________________
>> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
[1-4]
|
|