List Info

Thread: Re: LAM: Lamboot Error




Re: LAM: Lamboot Error
country flaguser name
United States
2008-03-13 22:50:55
Have you tried the suggestions in the error message (you say
you've  
tried solutions, but don't say what they are).  This is
almost always  
caused by a software firewall running on one of the nodes
you are using.

Brian


On Mar 6, 2008, at 8:13 AM, zayar wrote:

> Dear members,
>          I have problem in lamboot. I also found this
topic on the  
> FAQs page. I have tried possible solutions but still
the error. When  
> booting lam-mpi on openSUSE 10.3, I got the following
error messages:
>
> zayarHPC-3:~>lamboot -v bhost
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<25538> ssi:boot:base:linear: booting n0
(HPC-3)
> n-1<25538> ssi:boot:base:linear: booting n1
(HPC-2)
>
------------------------------------------------------------
-----------------
> The lamboot agent timed out while waiting for the
newly-booted process
> to call back and indicated that it had successfully
booted.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS
SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE
LAM/MPI FAQ
> *** (http://www.lam-mpi.org/fa
q/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> As far as LAM could tell, the remote process started
properly, but
> then never called back.  Possible reasons that this may
happen:
>
>         - There are network filters between the lamboot
agent host and
>           the remote host such that communication on
random TCP ports
>           is blocked
>         - Network routing from the remote host to the
local host isn't
>           properly configured (this is uncommon)
>
> You can check these things by watching the output from
"lamboot -d".
>
> 1. On the command line for hboot, there are two
important parameters:
>    one is the IP address of where the lamboot agent was
invoked, the
>    other is the port number that the lamboot agent is
expecting the
>    newly-booted process to call back on (this will be a
random
>    integer).
>
> 2. Manually login to the remote machine and try to
telnet to the port
>    indicated on the hboot command line.  For example,
>        telnet <ipnumber> <portnumber>
>    If all goes well, you should get a "Connection
refused" error.  If
>    you get any other kind of error, it could indicate
either of the
>    two conditions above.  Consult with your
system/network
>    administrator.
>
------------------------------------------------------------
-----------------
> n-1<25538> ssi:boot:base:linear: aborted!
> n-1<25544> ssi:boot:base:linear: booting n0
(HPC-3)
> n-1<25544> ssi:boot:base:linear: booting n1
(HPC-2)
> n-1<25544> ssi:boot:base:linear: finished
> lamboot did NOT complete successfully
> zayarHPC-3:~> telnet (my-remote-ip) 23451
> Trying (my-remote-ip)...
> telnet: connect to address (my-remote-ip): Connection
refused
> zayarHPC-3:~> telnet 127.0.0.1 32154
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection
refused
> zayarHPC-3:~> ssh -x hpc-2 hostname
> HPC-2
> zayarHPC-3:~>
> Please advise me.
> Thanks.
>
> Looking for last minute shopping deals? Find them fast
with Yahoo!  
> Search._______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: Lamboot Error
country flaguser name
Russian Federation
2008-03-14 07:09:00
Great!!! It was firewall problem. Now my lam/mpi is 100% happy. Thank you, very much!!!

Brian Barrett <brbarretlam-mpi.org> wrote:
Have you tried the suggestions in the error message (you say you've
tried solutions, but don't say what they are). This is almost always
caused by a software firewall running on one of the nodes you are using.

Brian


On Mar 6, 2008, at 8:13 AM, zayar wrote:

&gt; Dear members,
&gt; I have problem in lamboot. I also found this topic on the
> FAQs page. I have tried possible solutions but still the error. When
> booting lam-mpi on openSUSE 10.3, I got the following error messages:
>
> zayarHPC-3:~>;lamboot -v bhost
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<25538> ssi:boot:base:linear: booting n0 (HPC-3)
&gt; n-1<25538> ssi:boot:base:linear: booting n1 (HPC-2)
&gt; -----------------------------------------------------------------------------
> The lamboot agent timed out while waiting for the newly-booted process
&gt; to call back and indicated that it had successfully booted.
&gt;
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
>; *** MAILING LIST.
>
> As far as LAM could tell, the remote process started properly, but
> then never called back. Possible reasons that this may happen:
&gt;
> - There are network filters between the lamboot agent host and
> the remote host such that communication on random TCP ports
> is blocked
&gt; - Network routing from the remote host to the local host isn't
> properly configured (this is uncommon)
>
> You can check these things by watching the output from "lamboot -d".
>
> 1. On the command line for hboot, there are two important parameters:
> one is the IP address of where the lamboot agent was invoked, the
> other is the port number that the lamboot agent is expecting the
> newly-booted process to call back on (this will be a random
>; integer).
>
> 2. Manually login to the remote machine and try to telnet to the port
> indicated on the hboot command line. For example,
&gt; telnet
> If all goes well, you should get a "Connection refused" error. If
> you get any other kind of error, it could indicate either of the
> two conditions above. Consult with your system/network
> administrator.
> -----------------------------------------------------------------------------
> n-1<25538> ssi:boot:base:linear: aborted!
&gt; n-1<25544> ssi:boot:base:linear: booting n0 (HPC-3)
&gt; n-1<25544> ssi:boot:base:linear: booting n1 (HPC-2)
&gt; n-1<25544> ssi:boot:base:linear: finished
&gt; lamboot did NOT complete successfully
> zayarHPC-3:~>; telnet (my-remote-ip) 23451
> Trying (my-remote-ip)...
>; telnet: connect to address (my-remote-ip): Connection refused
&gt; zayarHPC-3:~>; telnet 127.0.0.1 32154
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection refused
&gt; zayarHPC-3:~>; ssh -x hpc-2 hostname
&gt; HPC-2
> zayarHPC-3:~>;
> Please advise me.
> Thanks.
&gt;
> Looking for last minute shopping deals? Find them fast with Yahoo!
> Search._______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!


_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/


Never miss a thing. Make Yahoo your homepage.
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )