|
List Info
Thread: LAM: MPI_Broadcast misunderstanding
|
|
| LAM: MPI_Broadcast misunderstanding |

|
2006-11-23 16:44:35 |
Hello everybody.
I'm still on my MPI learning challenge Now i'm
trying to learn to use the
MPI_Broadcast instruction.
I guess I must have a deep concept problem here. I'm trying
to do a very simple
program in which node 0 calculates random numbers in a
variable called "aux"
and broadcasts its value to all the nodes in the MPI World
untill this "aux"
number reaches 5. I've done this:
------------------------------- broad.c
------------------------------
int main (int argc, char *argv[]){
int rank, size, node, aux=0, Counter=0, request;
MPI_Status status;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get
current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get
number of processes */
srand (time(NULL));
while (aux != 5){
if (rank == 0){
aux = (rand()/200000000);
printf ("New Aux = %dn", aux);
MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
}
printf ("I'm node %d, and I've got an 'aux' value
of: %dn", rank, aux);
}
printf ("Aux has reached %d in node %dn", aux,
rank);
return MPI_Finalize ();
}
------------------------------------------------------------
---
So I thought that when node 0 calculates a new
"aux" value would send it to all
the processes in the World (ehm... actually, only nodes 0
and 1) but it doesn't
work. If I execute this, node 1 doesn't update its
"aux" value and the system
goes into an infinite loop, outputing:
New Aux = 8
I'm node 0, and I've got an 'aux' value of: 8
New Aux = 6
I'm node 0, and I've got an 'aux' value of: 6
New Aux = 5
I'm node 0, and I've got an 'aux' value of: 5
Aux has reached 5 in node 0
I'm node 1, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
So it seems that node 0 calculates properly but then it's
like if the
"Broadcast" message isn't received in the node 1.
I also tried to stop it somehow with an MPI_Barrier:
---------------------- barrier.c --------------------------
void stop(){
//sleep (1);
MPI_Barrier (MPI_COMM_WORLD);
}
int main (int argc, char *argv[]){
int rank, size, node, aux=0, Counter=0, request;
MPI_Status status;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get
current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get
number of processes */
srand (time(NULL));
while (aux != 5){
if (rank == 0){
aux = (rand()/200000000);
printf ("New Aux = %dn", aux);
MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
}
printf ("I'm node %d, and I've got an 'aux' value
of: %dn", rank, aux);
stop();
}
printf ("Aux has reached %d in node %dn", aux,
rank);
return MPI_Finalize ();
}
---------------------------------------------------
But then I also have the same problem (aux in node 1 is not
updated, and when
aux in node 0 reaches 5, node 0 stops and then I've got the
following error
Aux has reached 5 in node 0
I'm node 1, and I've got an 'aux' value of: 0
MPI_Recv: process in local group is dead (rank 1,
MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Barrier()
Rank (1, MPI_COMM_WORLD): - main()
The only way to solve it is putting the MPI_Broadcast
outside the if (rank==0):
------------------------
bcast2.c-----------------------------
int main (int argc, char *argv[]){
int rank, size, node, aux=0, Counter=0, request;
MPI_Status status;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get
current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get
number of processes */
srand (time(NULL));
while (aux != 5){
if (rank == 0){
aux = (rand()/200000000);
printf ("New Aux = %dn", aux);
}
MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf ("I'm node %d, and I've got an 'aux' value
of: %dn", rank, aux);
}
printf ("Aux has reached %d in node %dn", aux,
rank);
return MPI_Finalize ();
}
------------------------------------------------------
Then yes, it works properly:
New Aux = 9
I'm node 0, and I've got an 'aux' value of: 9
New Aux = 9
I'm node 1, and I've got an 'aux' value of: 9
I'm node 0, and I've got an 'aux' value of: 9
New Aux = 0
I'm node 1, and I've got an 'aux' value of: 9
I'm node 0, and I've got an 'aux' value of: 0
I'm node 1, and I've got an 'aux' value of: 0
New Aux = 5
I'm node 0, and I've got an 'aux' value of: 5
I'm node 1, and I've got an 'aux' value of: 5
Aux has reached 5 in node 0
Aux has reached 5 in node 1
What I would like to do with MPI_Broadcast would be
something like what I can do
with a Send/Receive that looks like this:
---------------------- bcastWithoutBcast.c
------------------------
int main (int argc, char *argv[]){
int rank, size, node, aux=0, Counter=0, request;
MPI_Status status;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get
current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get
number of processes */
srand (time(NULL));
while (aux != 5){
if (rank == 0){
aux = (rand()/200000000);
printf ("New Aux = %dn", aux);
MPI_Send (&aux, 1, MPI_INT, 1, 100, MPI_COMM_WORLD);
}else {
MPI_Recv (&aux, 1, MPI_INT, 0, 100, MPI_COMM_WORLD,
&status);
}
printf ("I'm node %d, and I've got an 'aux' value
of: %dn", rank, aux);
}
printf ("Aux has reached %d in node %dn", aux,
rank);
return MPI_Finalize ();
}
-----------------------------------------------------
But if I try with MPI_Broadcast inside the if (rank==0)
block, I can't, and I
don't understand why. Is like if node 1 did not listen to
the broadcast
instruction...
Thank you very much for the help you could give me
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| LAM: MPI_Broadcast misunderstanding |

|
2006-11-26 18:05:20 |
MPI_Bcast is a collective operation - that is, all ranks in
a
communicator (i.e. MPI_COMM_WORLD) must participate in the
operation.
This is why your first example doesn't work - only rank 0 in
MPI_COMM_WORLD is calling MPI_Bcast, when all ranks should
be calling it
(your second example).
In regards to your thought process your first example - how
do you
expect the non-zero ranks to receive anything when they
never indicate
to MPI that data should be received? Your aux variable is
never going
to change from its initial value, thus the infinite loop.
If it helps, MPI_Bcast can be thought of in terms of
send/recv: the
root rank (0 in your examples) does NP-1 sends, one to each
of the other
ranks in the specified communicator. These NP-1 other ranks
perform a
single MPI_Recv operation. The idea behind collectives is
to make it
easy for the user to do more complex (but common)
communication
patterns, and to allow the MPI to optimize them internally.
And if you really want to do the branch thing like you do
for send/recv,
you could do this (though it's duplicated code):
if(rank == 0)
aux = (rank()/200000000);
printf ("New Aux = %dn", aux);
MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
}else {
MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
}
Andrew
460853 unizar.es wrote:
> Hello everybody.
>
> I'm still on my MPI learning challenge Now i'm
trying to learn to use the
> MPI_Broadcast instruction.
>
> I guess I must have a deep concept problem here. I'm
trying to do a very simple
> program in which node 0 calculates random numbers in a
variable called "aux"
> and broadcasts its value to all the nodes in the MPI
World untill this "aux"
> number reaches 5. I've done this:
> ------------------------------- broad.c
------------------------------
> int main (int argc, char *argv[]){
> int rank, size, node, aux=0, Counter=0, request;
> MPI_Status status;
>
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /*
get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /*
get number of processes */
>
> srand (time(NULL));
>
> while (aux != 5){
> if (rank == 0){
> aux = (rand()/200000000);
> printf ("New Aux = %dn", aux);
> MPI_Bcast (&aux, 1, MPI_INT, 0,
MPI_COMM_WORLD);
> }
> printf ("I'm node %d, and I've got an 'aux'
value of: %dn", rank, aux);
> }
> printf ("Aux has reached %d in node %dn",
aux, rank);
> return MPI_Finalize ();
> }
>
------------------------------------------------------------
---
>
> So I thought that when node 0 calculates a new
"aux" value would send it to all
> the processes in the World (ehm... actually, only nodes
0 and 1) but it doesn't
> work. If I execute this, node 1 doesn't update its
"aux" value and the system
> goes into an infinite loop, outputing:
>
> New Aux = 8
> I'm node 0, and I've got an 'aux' value of: 8
> New Aux = 6
> I'm node 0, and I've got an 'aux' value of: 6
> New Aux = 5
> I'm node 0, and I've got an 'aux' value of: 5
> Aux has reached 5 in node 0
> I'm node 1, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
>
> So it seems that node 0 calculates properly but then
it's like if the
> "Broadcast" message isn't received in the
node 1.
>
> I also tried to stop it somehow with an MPI_Barrier:
>
> ---------------------- barrier.c
--------------------------
> void stop(){
> //sleep (1);
> MPI_Barrier (MPI_COMM_WORLD);
> }
>
> int main (int argc, char *argv[]){
> int rank, size, node, aux=0, Counter=0, request;
> MPI_Status status;
>
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /*
get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /*
get number of processes */
>
> srand (time(NULL));
>
> while (aux != 5){
> if (rank == 0){
> aux = (rand()/200000000);
> printf ("New Aux = %dn", aux);
> MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
> }
> printf ("I'm node %d, and I've got an 'aux'
value of: %dn", rank, aux);
> stop();
> }
> printf ("Aux has reached %d in node %dn",
aux, rank);
> return MPI_Finalize ();
> }
> ---------------------------------------------------
>
> But then I also have the same problem (aux in node 1 is
not updated, and when
> aux in node 0 reaches 5, node 0 stops and then I've got
the following error
>
> Aux has reached 5 in node 0
> I'm node 1, and I've got an 'aux' value of: 0
> MPI_Recv: process in local group is dead (rank 1,
MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - MPI_Barrier()
> Rank (1, MPI_COMM_WORLD): - main()
>
> The only way to solve it is putting the MPI_Broadcast
outside the if (rank==0):
>
> ------------------------
bcast2.c-----------------------------
> int main (int argc, char *argv[]){
> int rank, size, node, aux=0, Counter=0, request;
> MPI_Status status;
>
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /*
get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /*
get number of processes */
>
> srand (time(NULL));
>
> while (aux != 5){
> if (rank == 0){
> aux = (rand()/200000000);
> printf ("New Aux = %dn", aux);
> }
> MPI_Bcast (&aux, 1, MPI_INT, 0, MPI_COMM_WORLD);
> printf ("I'm node %d, and I've got an 'aux'
value of: %dn", rank, aux);
> }
> printf ("Aux has reached %d in node %dn",
aux, rank);
> return MPI_Finalize ();
> }
> ------------------------------------------------------
>
> Then yes, it works properly:
>
> New Aux = 9
> I'm node 0, and I've got an 'aux' value of: 9
> New Aux = 9
> I'm node 1, and I've got an 'aux' value of: 9
> I'm node 0, and I've got an 'aux' value of: 9
> New Aux = 0
> I'm node 1, and I've got an 'aux' value of: 9
> I'm node 0, and I've got an 'aux' value of: 0
> I'm node 1, and I've got an 'aux' value of: 0
> New Aux = 5
> I'm node 0, and I've got an 'aux' value of: 5
> I'm node 1, and I've got an 'aux' value of: 5
> Aux has reached 5 in node 0
> Aux has reached 5 in node 1
>
> What I would like to do with MPI_Broadcast would be
something like what I can do
> with a Send/Receive that looks like this:
>
> ---------------------- bcastWithoutBcast.c
------------------------
> int main (int argc, char *argv[]){
> int rank, size, node, aux=0, Counter=0, request;
> MPI_Status status;
>
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /*
get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /*
get number of processes */
>
> srand (time(NULL));
>
> while (aux != 5){
> if (rank == 0){
> aux = (rand()/200000000);
> printf ("New Aux = %dn", aux);
> MPI_Send (&aux, 1, MPI_INT, 1, 100,
MPI_COMM_WORLD);
> }else {
> MPI_Recv (&aux, 1, MPI_INT, 0, 100,
MPI_COMM_WORLD, &status);
> }
> printf ("I'm node %d, and I've got an 'aux'
value of: %dn", rank, aux);
> }
> printf ("Aux has reached %d in node %dn",
aux, rank);
> return MPI_Finalize ();
> }
> -----------------------------------------------------
>
> But if I try with MPI_Broadcast inside the if (rank==0)
block, I can't, and I
> don't understand why. Is like if node 1 did not listen
to the broadcast
> instruction...
>
> Thank you very much for the help you could give me
>
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
[1-2]
|
|