List Info

Thread: LAM: lam Digest, Vol 813, Issue 1




LAM: lam Digest, Vol 813, Issue 1
user name
2006-11-22 17:12:06
How we can debug this kind of error??

This message is not very descriptive
> MPI_Recv: process in local group is dead (rank 1,
> MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - main()

Gdb, Valgrind ??

-----Original Message-----
From: lam-bounceslam-mpi.org [mailto:lam-bounceslam-mpi.org] On Behalf Of Nam Hoang
Sent: Martes, 21 de Noviembre de 2006 11:39 p.m.
To: lamlam-mpi.org
Subject: Re: LAM: lam Digest, Vol 813, Issue 1

Hi Hector,
I think your program is not written correctly.
First, you should allocate a memory to message (for
example : char message[12]..., or using malloc). 
Second, initializing value to message should be placed
in private code of node 0 (sending node) : 
if (rank == 0)
    {
      strcpy(message, "Hello world !");
      for (i = 1; i < size; i++)
        {
          MPI_Send (message, 12, MPI_CHAR, i, tag,
MPI_COMM_WORLD);
        }
    }

Hope this helps | 
--- lam-requestlam-mpi.org wrote:

> Send lam mailing list submissions to
> 	lamlam-mpi.org
> 
> To subscribe or unsubscribe via the World Wide Web,
> visit
> 	http:
//www.lam-mpi.org/mailman/listinfo.cgi/lam
> or, via email, send a message with subject or body
> 'help' to
> 	lam-requestlam-mpi.org
> 
> You can reach the person managing the list at
> 	lam-ownerlam-mpi.org
> 
> When replying, please edit your Subject line so it
> is more specific
> than "Re: Contents of lam digest..."
> > Today's Topics:
> 
>    1. Re: Unable to boot Lam in a remote machine
> (460853unizar.es)
> > From: 460853unizar.es
> To: lamlam-mpi.org
> Date: Mon, 20 Nov 2006 18:24:24 +0100
> Subject: Re: LAM: Unable to boot Lam in a remote
> machine
> 
> Hello everyone
> 
> Well, at first, thank you for answering. I'd also
> like to apologize for not
> having been able to write earlier, but some family
> dutys kept me out of all
> this for a while.
> 
> Next, I'd like to say that the trouble I asked about
> in my previous mail has
> been solved by disabling the Firewall so, certainly,
> that was the problem. The
> thing is that now, I'm having another trouble.
> 
> After disabling the firewall, and managing to set
> the environemnt up, I looked
> in the Internet for a very simple program (actually,
> a "Hello World") 
> done with
> MPI:
> 
> 
> ---------------------prueba.c ------------------
> /* C Example */
> #include <stdio.h>
> #include <mpi.h>
> #include <math.h>
> 
> 
> void
> main (argc, argv)
>      int argc;
>      char *argv[];
> {
>   char *message = "Hello world";
>   int rank, size, i, tag, node;
>   MPI_Status status;
> 
>   MPI_Init (&argc, &argv);      /* starts MPI
*/
>   MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /*
> get current process id */
>   MPI_Comm_size (MPI_COMM_WORLD, &size);        /*
> get number of processes */
>   tag = 100;
> 
>   if (rank == 0)
>     {
>       for (i = 1; i < size; i++)
>         {
>           MPI_Send (message, 12, MPI_CHAR, i, tag,
> MPI_COMM_WORLD);
>         }
>     }
>   else
>     {
>       MPI_Recv (message, 12, MPI_CHAR, 0, tag,
> MPI_COMM_WORLD, &status);
>     }
> 
>   printf ("node:%d  %sn", rank, message);
>   MPI_Finalize ();
> }
> --------------------------------------------
> 
> I compile it with: mpicc -o prueba.exe prueba.c
> (It's a Linux system, so I know that this of the
> .exe is unnecessary, but
> anyway... I did it this way in order to know which
> the executable file is).
> Then I place a copy of that executable in a folder
> which is in the Path 
> in both
> computers (preciseness in $HOME/bin/)
> 
> Next, I start the environment properly (ehm...
> properly "I guess")
> ---------------------------------------------
> hectorrdp13:~/Pa aprendé/Pruebas MPI> lamboot -v
> lamhosts
> 
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
> 
> n-1<26498> ssi:boot:base:linear: booting n0
> (155.210.155.67)
> n-1<26498> ssi:boot:base:linear: booting n1
> (155.210.155.70)
> n-1<26498> ssi:boot:base:linear: finished
> ----------------------------------------------
> 
> But when I try to execute with mpirun, I get the
> following output:
> ---------------------------------------------
> hectorrdp13:~/bin> mpirun -v -np 2 prueba.exe
> 26535 prueba.exe running on n0 (o)
> 4861 prueba.exe running on n1
> node:0  Hello world
> MPI_Recv: process in local group is dead (rank 1,
> MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - main()
> ---------------------------------------------
> 
> It seems that node 1 (the remote node) is not
> working. It says it's "dead". I
> looked for this error message in Google, and I
> understood that what is
> happenning is that the process is not running in the
> remote machine. It was
> also said that this can happen because the
> MPI_Finalize (); instruction was
> executed too soon. I think in this case, that can't
> be it, because is an
> absolutely simple program that has been downloaded
> from an example web 
> page, so
> I guess it should work.
> 
> I would also like to say that in the remote machine,
> after setting up the
> enviroment with the lamboot command, a "ps
aux"
> shows (among many other 
> things)
> a lamd daemon running
> 
> -----------------------------------
> hectorvenus2:~/bin> ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT
> START   TIME COMMAND
> root         1  0.0  0.0    776   304 ?        S   
> 17:24   0:00 init [5]
> root         2  0.0  0.0      0     0 ?        SN  
> 17:24   0:00 [ksoftirqd/0]
> [. . .]
> hector    3743  0.0  0.0   6484  1148 ?        S   
> 17:26   0:00
> /usr/bin/lamd -
> -----------------------------------
> 
> So the environement seems to be raised properly...
> The thing is that it 
> doesn't
> execute the program properly.
> 
> I imagine that the solution will be quite simple,
> but I can't see it :(
> 
> Thank you very much in advance!!
> //Hector
> 
> >> 460853unizar.es wrote:
> >>> I know there's a firewall in each machine
that
> only opens the SSH 
> >>> (22) port, so
> >>> I guess the problem comes from that. So,
what
> ports do I have to 
> >>> open in order
> >>> to boot LAM?.
> >>>
> >>> Executing the lamboot with the -d option,
I've
> read (among many 
> >>> other things)
> >>> this:
> >>>
> >>>    lamd -H 155.210.155.67 -P 6459 -n 1 -o
0 -d
> >>>
> >>> So, I guess that this means that the
.155.70
> machine should be able 
> >>> to reach the
> >>> port 6459 in the .155.67 machine. Am I
right? So
> the solution comes 
> >>> by opening
> >>> the 6459 port in the .155.67 machine?
Should I
> open this port also in the
> >>> .155.70 machine? Otherwise, which ports
should I
> open? Because I 
> >>> don't know if
> >>> it will be enough with opening only these
ports.
> >>
> >> All non-system (> 1024) TCP ports are
needed to
> boot and run LAM.  In
> >> more detail - LAM does not use any specific
port
> numbers, but instead
> >> requests any random open port from the OS. 
Check
> out FAQs 17 and 18
> >> here for some more info:
> >>
> >> http://www.
lam-mpi.org/faq/category4.php3
> >>
> >> Hope this helps!
> >>
> >> Andrew
> 
> 
> 
> 
> 
> 



 
____________________________________________________________
________________________
Sponsored Link

$200,000 mortgage for $660/ mo
30/15 yr fixed, reduce debt
http://yahoo.ratemar
ketplace.com
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/



Sonda S.A.
La información contenida en este correo electrónico, así
como en cualquiera de sus archivos adjuntos, es confidencial
y está dirigida exclusivamente a él o los destinatarios
indicados. Cualquier uso, reproducción, divulgación o
distribución por otras personas distintas de él o los
destinatarios está estrictamente prohibida. Si ha recibido
este correo por error, por favor notifíquelo inmediatamente
al remitente y bórrelo de su sistema sin dejar copia del
mismo. SONDA no acepta responsabilidad alguna por cualquier
pérdida o daño como consecuencia, directa o indirecta, del
uso indebido de este e-mail o de los archivos adjuntos al
mismo.

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lam Digest, Vol 813, Issue 1
user name
2006-11-22 17:17:57
Thank you all! It's already fixed. As I was told by Tim
Prins, I was trying to
receive the message in something that poits to a constant...
By making it a
variable, everything worked

Instead of having
char *message="Hello world";

Changing it by

char message[12];
strncpy (message, "Hello world", 12);

Worked

Thank you very much again for your answers! 

Quoting "Alastuey, Lucas" <Lucas.Alastueysonda.com>:

> How we can debug this kind of error??
>
> This message is not very descriptive
>> MPI_Recv: process in local group is dead (rank 1,
>> MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
>> Rank (1, MPI_COMM_WORLD):  - main()
>
> Gdb, Valgrind ??
>
> -----Original Message-----
> From: lam-bounceslam-mpi.org [mailto:lam-bounceslam-mpi.org] On 
> Behalf Of Nam Hoang
> Sent: Martes, 21 de Noviembre de 2006 11:39 p.m.
> To: lamlam-mpi.org
> Subject: Re: LAM: lam Digest, Vol 813, Issue 1
>
> Hi Hector,
> I think your program is not written correctly.
> First, you should allocate a memory to message (for
> example : char message[12]..., or using malloc).
> Second, initializing value to message should be placed
> in private code of node 0 (sending node) :
> if (rank == 0)
>    {
>      strcpy(message, "Hello world !");
>      for (i = 1; i < size; i++)
>        {
>          MPI_Send (message, 12, MPI_CHAR, i, tag,
> MPI_COMM_WORLD);
>        }
>    }
>
> Hope this helps | 
> --- lam-requestlam-mpi.org wrote:
>
>> Send lam mailing list submissions to
>> 	lamlam-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web,
>> visit
>> 	http:
//www.lam-mpi.org/mailman/listinfo.cgi/lam
>> or, via email, send a message with subject or body
>> 'help' to
>> 	lam-requestlam-mpi.org
>>
>> You can reach the person managing the list at
>> 	lam-ownerlam-mpi.org
>>
>> When replying, please edit your Subject line so it
>> is more specific
>> than "Re: Contents of lam digest..."
>> > Today's Topics:
>>
>>    1. Re: Unable to boot Lam in a remote machine
>> (460853unizar.es)
>> > From: 460853unizar.es
>> To: lamlam-mpi.org
>> Date: Mon, 20 Nov 2006 18:24:24 +0100
>> Subject: Re: LAM: Unable to boot Lam in a remote
>> machine
>>
>> Hello everyone
>>
>> Well, at first, thank you for answering. I'd also
>> like to apologize for not
>> having been able to write earlier, but some family
>> dutys kept me out of all
>> this for a while.
>>
>> Next, I'd like to say that the trouble I asked
about
>> in my previous mail has
>> been solved by disabling the Firewall so,
certainly,
>> that was the problem. The
>> thing is that now, I'm having another trouble.
>>
>> After disabling the firewall, and managing to set
>> the environemnt up, I looked
>> in the Internet for a very simple program
(actually,
>> a "Hello World")
>> done with
>> MPI:
>>
>>
>> ---------------------prueba.c ------------------
>> /* C Example */
>> #include <stdio.h>
>> #include <mpi.h>
>> #include <math.h>
>>
>>
>> void
>> main (argc, argv)
>>      int argc;
>>      char *argv[];
>> {
>>   char *message = "Hello world";
>>   int rank, size, i, tag, node;
>>   MPI_Status status;
>>
>>   MPI_Init (&argc, &argv);      /* starts
MPI */
>>   MPI_Comm_rank (MPI_COMM_WORLD, &rank);       
/*
>> get current process id */
>>   MPI_Comm_size (MPI_COMM_WORLD, &size);       
/*
>> get number of processes */
>>   tag = 100;
>>
>>   if (rank == 0)
>>     {
>>       for (i = 1; i < size; i++)
>>         {
>>           MPI_Send (message, 12, MPI_CHAR, i, tag,
>> MPI_COMM_WORLD);
>>         }
>>     }
>>   else
>>     {
>>       MPI_Recv (message, 12, MPI_CHAR, 0, tag,
>> MPI_COMM_WORLD, &status);
>>     }
>>
>>   printf ("node:%d  %sn", rank,
message);
>>   MPI_Finalize ();
>> }
>> --------------------------------------------
>>
>> I compile it with: mpicc -o prueba.exe prueba.c
>> (It's a Linux system, so I know that this of the
>> .exe is unnecessary, but
>> anyway... I did it this way in order to know which
>> the executable file is).
>> Then I place a copy of that executable in a folder
>> which is in the Path
>> in both
>> computers (preciseness in $HOME/bin/)
>>
>> Next, I start the environment properly (ehm...
>> properly "I guess")
>> ---------------------------------------------
>> hectorrdp13:~/Pa aprendé/Pruebas MPI> lamboot
-v
>> lamhosts
>>
>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>
>> n-1<26498> ssi:boot:base:linear: booting n0
>> (155.210.155.67)
>> n-1<26498> ssi:boot:base:linear: booting n1
>> (155.210.155.70)
>> n-1<26498> ssi:boot:base:linear: finished
>> ----------------------------------------------
>>
>> But when I try to execute with mpirun, I get the
>> following output:
>> ---------------------------------------------
>> hectorrdp13:~/bin> mpirun -v -np 2 prueba.exe
>> 26535 prueba.exe running on n0 (o)
>> 4861 prueba.exe running on n1
>> node:0  Hello world
>> MPI_Recv: process in local group is dead (rank 1,
>> MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
>> Rank (1, MPI_COMM_WORLD):  - main()
>> ---------------------------------------------
>>
>> It seems that node 1 (the remote node) is not
>> working. It says it's "dead". I
>> looked for this error message in Google, and I
>> understood that what is
>> happenning is that the process is not running in
the
>> remote machine. It was
>> also said that this can happen because the
>> MPI_Finalize (); instruction was
>> executed too soon. I think in this case, that can't
>> be it, because is an
>> absolutely simple program that has been downloaded
>> from an example web
>> page, so
>> I guess it should work.
>>
>> I would also like to say that in the remote
machine,
>> after setting up the
>> enviroment with the lamboot command, a "ps
aux"
>> shows (among many other
>> things)
>> a lamd daemon running
>>
>> -----------------------------------
>> hectorvenus2:~/bin> ps aux
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT
>> START   TIME COMMAND
>> root         1  0.0  0.0    776   304 ?        S
>> 17:24   0:00 init [5]
>> root         2  0.0  0.0      0     0 ?        SN
>> 17:24   0:00 [ksoftirqd/0]
>> [. . .]
>> hector    3743  0.0  0.0   6484  1148 ?        S
>> 17:26   0:00
>> /usr/bin/lamd -
>> -----------------------------------
>>
>> So the environement seems to be raised properly...
>> The thing is that it
>> doesn't
>> execute the program properly.
>>
>> I imagine that the solution will be quite simple,
>> but I can't see it :(
>>
>> Thank you very much in advance!!
>> //Hector
>>
>> >> 460853unizar.es wrote:
>> >>> I know there's a firewall in each
machine that
>> only opens the SSH
>> >>> (22) port, so
>> >>> I guess the problem comes from that.
So, what
>> ports do I have to
>> >>> open in order
>> >>> to boot LAM?.
>> >>>
>> >>> Executing the lamboot with the -d
option, I've
>> read (among many
>> >>> other things)
>> >>> this:
>> >>>
>> >>>    lamd -H 155.210.155.67 -P 6459 -n 1
-o 0 -d
>> >>>
>> >>> So, I guess that this means that the
.155.70
>> machine should be able
>> >>> to reach the
>> >>> port 6459 in the .155.67 machine. Am I
right? So
>> the solution comes
>> >>> by opening
>> >>> the 6459 port in the .155.67 machine?
Should I
>> open this port also in the
>> >>> .155.70 machine? Otherwise, which
ports should I
>> open? Because I
>> >>> don't know if
>> >>> it will be enough with opening only
these ports.
>> >>
>> >> All non-system (> 1024) TCP ports are
needed to
>> boot and run LAM.  In
>> >> more detail - LAM does not use any
specific port
>> numbers, but instead
>> >> requests any random open port from the OS.
 Check
>> out FAQs 17 and 18
>> >> here for some more info:
>> >>
>> >> http://www.
lam-mpi.org/faq/category4.php3
>> >>
>> >> Hope this helps!
>> >>
>> >> Andrew
>>
>>
>>
>>
>>
>>
>
>
>
>
>
____________________________________________________________
________________________
> Sponsored Link
>
> $200,000 mortgage for $660/ mo
> 30/15 yr fixed, reduce debt
> http://yahoo.ratemar
ketplace.com
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>
>
> Sonda S.A.
> La información contenida en este correo electrónico,
así como en 
> cualquiera de sus archivos adjuntos, es confidencial y
está dirigida 
> exclusivamente a él o los destinatarios indicados.
Cualquier uso, 
> reproducción, divulgación o distribución por otras
personas distintas 
> de él o los destinatarios está estrictamente prohibida.
Si ha 
> recibido este correo por error, por favor notifíquelo
inmediatamente 
> al remitente y bórrelo de su sistema sin dejar copia
del mismo. SONDA 
> no acepta responsabilidad alguna por cualquier pérdida
o daño como 
> consecuencia, directa o indirecta, del uso indebido de
este e-mail o 
> de los archivos adjuntos al mismo.
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>




_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )