List Info

Thread: LAM: MPI error mesage




LAM: MPI error mesage
country flaguser name
United States
2007-05-09 11:12:01
I am getting this error when I run my code with LAM.  I was
using this
code with another system that was running with a slightly
older MPICH
and didn't get any errors like this.  I would seem there is
something
with the way I am sending and receiving slices.  Can you see
anything
obviously wrong with the way I am doing this?

* Starting updates
* cycle 1
MPI_Recv: invalid tag argument: Invalid argument (rank 0,
MPI_COMM_WORLD)
MPI_Send: invalid tag argument: Invalid argument: out of
range (rank 1,
MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD):  - MPI_Send()
Rank (1, MPI_COMM_WORLD):  - main()
Rank (0, MPI_COMM_WORLD):  - MPI_Recv()
Rank (0, MPI_COMM_WORLD):  - main()
------------------------------------------------------------
------------
-----
One of the processes started by mpirun has exited with a
nonzero exit
code.  This typically indicates that the process finished in
error.
If your process did not finish in error, be sure to include
a "return
0" or "exit(0)" in your C code before exiting
the application.

PID 22373 failed on node n0 (127.0.0.1) with exit status
22.
------------------------------------------------------------
------------
-----
mpirun failed with exit status 22

===========================code=============================
============
=
void hSndRcv(){
        if(my_rank != comm_size-1){
                MPI_Send(h_x+Z_OFFSET(my_dim_z),
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT, 
                        my_rank+1, 
                        3, 
                        MPI_COMM_WORLD);
                MPI_Send(h_y+Z_OFFSET(my_dim_z),
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT, 
                        my_rank+1, 
                        4, 
                        MPI_COMM_WORLD);
                MPI_Send(h_z+Z_OFFSET(my_dim_z),
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT, 
                        my_rank+1, 
                        5, 
                        MPI_COMM_WORLD);
        }
        if(my_rank){
                MPI_Recv(h_x,
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT,  
                        my_rank-1, 
                        3, 
                        MPI_COMM_WORLD, 
                        status);
                MPI_Recv(h_y,
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT,  
                        my_rank-1, 
                        4, 
                        MPI_COMM_WORLD, 
                        status);
                MPI_Recv(h_z,
                        (dim_x + 2*pml)*(dim_y + 2*pml), 
                        MPI_FLOAT,  
                        my_rank-1, 
                        5, 
                        MPI_COMM_WORLD, 
                        status);
        }
}

Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: MPI error mesage
country flaguser name
United States
2007-05-10 20:52:56
Well, that's pretty kooky.  :-(

Here's the code from MPI_SEND that's generating the error:

	if (tag < 0 || tag > lam_mpi_max_tag) {
		return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
				    "out of range"));
	}

But according to your code, that can't be happening because
your tags  
are fixed positive integers (lam_mpi_max_tag is at least
32k).

Are you absolutely certain that this is where the problem is
occurring?

You might want to either run this through a debugger to
verify that  
a) this is where the problem is occurring, and b) what LAM
thinks its  
getting as a tag value.  Or you could write some quick
MPI_Send /  
MPI_Recv intercept functions that utilize the PMPI layer,
perhaps  
something like this:

int MPI_Send(void *buf, int count, MPI_Datatype dtype, int
dest,
	     int tag, MPI_Comm comm)
{
	if (tag < 0 || tag > 32767) {
             char host[4096];
             int i = 0;
             gethostbyname(host, sizeof(host));
             printf("%s:%d: got invalid tag in
MPI_Send! %dn",
                    host, getpid(), tag);
             while (i == 0) sleep(5);
         }
         return PMPI_Send(buf, count, dtype, dest, tag,
comm);
}

(disclaimer: typed in e-mail; not verified!)

This will print out the host/pid of the offending
process(es) and  
pause allowing you to attach a debugger.  Modify the inner
part of  
the block to suit your particular debugging tastes.


On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:

> I am getting this error when I run my code with LAM.  I
was using this
> code with another system that was running with a
slightly older MPICH
> and didn't get any errors like this.  I would seem
there is something
> with the way I am sending and receiving slices.  Can
you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank
0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out
of range  
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Send()
> Rank (1, MPI_COMM_WORLD):  - main()
> Rank (0, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (0, MPI_COMM_WORLD):  - main()
>
------------------------------------------------------------
---------- 
> --
> -----
> One of the processes started by mpirun has exited with
a nonzero exit
> code.  This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>
------------------------------------------------------------
---------- 
> --
> -----
> mpirun failed with exit status 22
>
>
===========================code=============================
========== 
> ==
> =
> void hSndRcv(){
>         if(my_rank != comm_size-1){
>                 MPI_Send(h_x+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         3, 
>                         MPI_COMM_WORLD);
>                 MPI_Send(h_y+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         4, 
>                         MPI_COMM_WORLD);
>                 MPI_Send(h_z+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         5, 
>                         MPI_COMM_WORLD);
>         }
>         if(my_rank){
>                 MPI_Recv(h_x,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         3, 
>                         MPI_COMM_WORLD, 
>                         status);
>                 MPI_Recv(h_y,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         4, 
>                         MPI_COMM_WORLD, 
>                         status);
>                 MPI_Recv(h_z,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         5, 
>                         MPI_COMM_WORLD, 
>                         status);
>         }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: MPI error mesage
country flaguser name
United States
2007-05-11 13:26:31
Sorry, you were right.  I thought that I commented out all
of the mpi
communication except what I had posted bellow, but it turned
out that I
had another little function hiding out that was sending a
couple of
floats and it had a negative tag.  I forgot about that one. 
For some
reason, I guess I was thinking that the tag only had to be
an int and
not necessarily an unsigned int.  

Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945

-----Original Message-----
From: lam-bounceslam-mpi.org [mailto:lam-bounceslam-mpi.org] On Behalf
Of Jeff Squyres
Sent: Thursday, May 10, 2007 8:53 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI error mesage

Well, that's pretty kooky.  :-(

Here's the code from MPI_SEND that's generating the error:

	if (tag < 0 || tag > lam_mpi_max_tag) {
		return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
				    "out of range"));
	}

But according to your code, that can't be happening because
your tags  
are fixed positive integers (lam_mpi_max_tag is at least
32k).

Are you absolutely certain that this is where the problem is
occurring?

You might want to either run this through a debugger to
verify that  
a) this is where the problem is occurring, and b) what LAM
thinks its  
getting as a tag value.  Or you could write some quick
MPI_Send /  
MPI_Recv intercept functions that utilize the PMPI layer,
perhaps  
something like this:

int MPI_Send(void *buf, int count, MPI_Datatype dtype, int
dest,
	     int tag, MPI_Comm comm)
{
	if (tag < 0 || tag > 32767) {
             char host[4096];
             int i = 0;
             gethostbyname(host, sizeof(host));
             printf("%s:%d: got invalid tag in
MPI_Send! %dn",
                    host, getpid(), tag);
             while (i == 0) sleep(5);
         }
         return PMPI_Send(buf, count, dtype, dest, tag,
comm);
}

(disclaimer: typed in e-mail; not verified!)

This will print out the host/pid of the offending
process(es) and  
pause allowing you to attach a debugger.  Modify the inner
part of  
the block to suit your particular debugging tastes.


On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:

> I am getting this error when I run my code with LAM.  I
was using this
> code with another system that was running with a
slightly older MPICH
> and didn't get any errors like this.  I would seem
there is something
> with the way I am sending and receiving slices.  Can
you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank
0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out
of range  
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Send()
> Rank (1, MPI_COMM_WORLD):  - main()
> Rank (0, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (0, MPI_COMM_WORLD):  - main()
>
------------------------------------------------------------
----------

> --
> -----
> One of the processes started by mpirun has exited with
a nonzero exit
> code.  This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>
------------------------------------------------------------
----------

> --
> -----
> mpirun failed with exit status 22
>
>
===========================code=============================
==========

> ==
> =
> void hSndRcv(){
>         if(my_rank != comm_size-1){
>                 MPI_Send(h_x+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         3, 
>                         MPI_COMM_WORLD);
>                 MPI_Send(h_y+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         4, 
>                         MPI_COMM_WORLD);
>                 MPI_Send(h_z+Z_OFFSET(my_dim_z),
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT, 
>                         my_rank+1, 
>                         5, 
>                         MPI_COMM_WORLD);
>         }
>         if(my_rank){
>                 MPI_Recv(h_x,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         3, 
>                         MPI_COMM_WORLD, 
>                         status);
>                 MPI_Recv(h_y,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         4, 
>                         MPI_COMM_WORLD, 
>                         status);
>                 MPI_Recv(h_z,
>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>                         MPI_FLOAT,  
>                         my_rank-1, 
>                         5, 
>                         MPI_COMM_WORLD, 
>                         status);
>         }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: MPI error mesage
country flaguser name
United States
2007-05-11 13:40:56
That's the dichotomy of the MPI spec -- there's many places
where the  
parameters are "int", but they really should be
other typed (e.g.,  
signed or unsigned, specifically-sized such as int32_t,
etc.).

MPI is fun!  


On May 11, 2007, at 2:26 PM, Adams, Samuel D Contr AFRL/HEDR
wrote:

> Sorry, you were right.  I thought that I commented out
all of the mpi
> communication except what I had posted bellow, but it
turned out  
> that I
> had another little function hiding out that was sending
a couple of
> floats and it had a negative tag.  I forgot about that
one.  For some
> reason, I guess I was thinking that the tag only had to
be an int and
> not necessarily an unsigned int.
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
> -----Original Message-----
> From: lam-bounceslam-mpi.org [mailto:lam-bounceslam-mpi.org] On  
> Behalf
> Of Jeff Squyres
> Sent: Thursday, May 10, 2007 8:53 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: MPI error mesage
>
> Well, that's pretty kooky.  :-(
>
> Here's the code from MPI_SEND that's generating the
error:
>
> 	if (tag < 0 || tag > lam_mpi_max_tag) {
> 		return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
> 				    "out of range"));
> 	}
>
> But according to your code, that can't be happening
because your tags
> are fixed positive integers (lam_mpi_max_tag is at
least 32k).
>
> Are you absolutely certain that this is where the
problem is  
> occurring?
>
> You might want to either run this through a debugger to
verify that
> a) this is where the problem is occurring, and b) what
LAM thinks its
> getting as a tag value.  Or you could write some quick
MPI_Send /
> MPI_Recv intercept functions that utilize the PMPI
layer, perhaps
> something like this:
>
> int MPI_Send(void *buf, int count, MPI_Datatype dtype,
int dest,
> 	     int tag, MPI_Comm comm)
> {
> 	if (tag < 0 || tag > 32767) {
>              char host[4096];
>              int i = 0;
>              gethostbyname(host, sizeof(host));
>              printf("%s:%d: got invalid tag in
MPI_Send! %dn",
>                     host, getpid(), tag);
>              while (i == 0) sleep(5);
>          }
>          return PMPI_Send(buf, count, dtype, dest, tag,
comm);
> }
>
> (disclaimer: typed in e-mail; not verified!)
>
> This will print out the host/pid of the offending
process(es) and
> pause allowing you to attach a debugger.  Modify the
inner part of
> the block to suit your particular debugging tastes.
>
>
> On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr
AFRL/HEDR wrote:
>
>> I am getting this error when I run my code with
LAM.  I was using  
>> this
>> code with another system that was running with a
slightly older MPICH
>> and didn't get any errors like this.  I would seem
there is something
>> with the way I am sending and receiving slices. 
Can you see anything
>> obviously wrong with the way I am doing this?
>>
>> * Starting updates
>> * cycle 1
>> MPI_Recv: invalid tag argument: Invalid argument
(rank 0,
>> MPI_COMM_WORLD)
>> MPI_Send: invalid tag argument: Invalid argument:
out of range
>> (rank 1,
>> MPI_COMM_WORLD)
>> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD):  - MPI_Send()
>> Rank (1, MPI_COMM_WORLD):  - main()
>> Rank (0, MPI_COMM_WORLD):  - MPI_Recv()
>> Rank (0, MPI_COMM_WORLD):  - main()
>>
------------------------------------------------------------
--------- 
>> -
>
>> --
>> -----
>> One of the processes started by mpirun has exited
with a nonzero exit
>> code.  This typically indicates that the process
finished in error.
>> If your process did not finish in error, be sure to
include a "return
>> 0" or "exit(0)" in your C code
before exiting the application.
>>
>> PID 22373 failed on node n0 (127.0.0.1) with exit
status 22.
>>
------------------------------------------------------------
--------- 
>> -
>
>> --
>> -----
>> mpirun failed with exit status 22
>>
>>
===========================code=============================
========= 
>> =
>
>> ==
>> =
>> void hSndRcv(){
>>         if(my_rank != comm_size-1){
>>                 MPI_Send(h_x+Z_OFFSET(my_dim_z),
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT, 
>>                         my_rank+1, 
>>                         3, 
>>                         MPI_COMM_WORLD);
>>                 MPI_Send(h_y+Z_OFFSET(my_dim_z),
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT, 
>>                         my_rank+1, 
>>                         4, 
>>                         MPI_COMM_WORLD);
>>                 MPI_Send(h_z+Z_OFFSET(my_dim_z),
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT, 
>>                         my_rank+1, 
>>                         5, 
>>                         MPI_COMM_WORLD);
>>         }
>>         if(my_rank){
>>                 MPI_Recv(h_x,
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT,  
>>                         my_rank-1, 
>>                         3, 
>>                         MPI_COMM_WORLD, 
>>                         status);
>>                 MPI_Recv(h_y,
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT,  
>>                         my_rank-1, 
>>                         4, 
>>                         MPI_COMM_WORLD, 
>>                         status);
>>                 MPI_Recv(h_z,
>>                         (dim_x + 2*pml)*(dim_y +
2*pml), 
>>                         MPI_FLOAT,  
>>                         my_rank-1, 
>>                         5, 
>>                         MPI_COMM_WORLD, 
>>                         status);
>>         }
>> }
>>
>> Sam Adams
>> General Dynamics - Network Systems
>> Phone: 210.536.5945
>>
>>
>> _______________________________________________
>> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )