List Info

Thread: Re: LAM: Able to boot LAM only on single node




Re: LAM: Able to boot LAM only on single node
user name
2007-09-11 07:22:30
On Sep 6, 2007, at 1:23 AM, jeeviteshibab.ac.in wrote:

> Hi MPI/LAM Group,
>                  In my LAN, I have installed LAM/MPI on
three  
> system, I am
> getting following error,
>
> lamboot -v -ssi boot_rsh_ignore_stderr hostfile

Note that "-ssi" takes 2 parameters; I think you
are missing the "1"  
value to the boot_rsh_ignore_stderr token:

     lamboot -v -ssi boot_rsh_ignore_stderr 1 hostfile

And therefore the "hostfile" argument is being
ignored (i.e., taken  
as the non-sensical value for boot_rsh_ignore_stderr SSI
parameter).

> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> n-1<22789> ssi:boot:base:linear: booting n0
(localhost)
> n-1<22789> ssi:boot:base:linear: finished
>
> Here in only one system I was able to boot lam, and i
have taken  
> the following
> steps.
>
> 1.In Hostfile IP address of other two system.
> 2..rhosts in home directory with IP and username( in
all the three  
> system i have
> my user account)
> 3.Installed LAM on three system
> 4.I am able to do rsh to each individual system.
> But getting following warning
>
> rsh 192.168.1.141
> connect to address 192.168.1.141 port 543: Connection
refused
> Trying krb4 rlogin...
> connect to address 192.168.1.141 port 543: Connection
refused
> trying normal rlogin (/usr/bin/rlogin)
> Last login: Wed Sep  5 10:36:38 on :0

This means that rsh is falling back to a different protocol
to  
login.  Perhaps you might want to try a different service,
such as ssh?

You can set which agent LAM uses (rsh vs. ssh) at run time
-- see  
http://www.lam-mpi.org/faq/category4.php3#question14.

> But able to login without getting prompted for
password.
>
> I have tried the conventional way of booting a LAM but
did not  
> completed
> successfully
> So i followed the above way of booting
>
> lamboot -v -ssi boot rsh hostfile
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<23096> ssi:boot:base:linear: booting n0
(192.168.1.125)
> n-1<23096> ssi:boot:base:linear: booting n1
(192.168.1.141)
> ERROR: LAM/MPI unexpectedly received the following on
stderr:
> connect to address 192.168.1.141 port 544: Connection
refused
> connect to address 192.168.1.141 port 544: Connection
refused
> trying normal rsh (/usr/bin/rsh)
>
------------------------------------------------------------
---------- 
> -------
> LAM attempted to execute a process on the remote node
"192.168.1.141",
> but received some output on the standard error.  This
heuristic
> assumes that any output on the standard error indicates
a fatal error,
> and therefore aborts.  You can disable this behavior
(i.e., have LAM
> ignore output on standard error) in the rsh boot module
by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command
"rsh"
> to invoke "echo $SHELL" on the remote node.
>
> Thanks & regards
> jeevitesh
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: Able to boot LAM only on single node II
country flaguser name
Japan
2007-09-11 07:31:10
Besides u can try to put the boot_rsh_ignore_stderr 1
command on the .bashrc file instead of the boot section

also are you cbooting from the master or the nodes ?


Roberto Scipioni
ICYS, NIMS Japan
Administrator ICYS Computing cluster


----- Original Message -----
From: Jeff Squyres <jsquyrescisco.com>
Date: Tuesday, September 11, 2007 9:22 pm
Subject: Re: LAM: Able to boot LAM only on single node

> On Sep 6, 2007, at 1:23 AM, jeeviteshibab.ac.in wrote:
> 
> > Hi MPI/LAM Group,
> >                  In my LAN, I have installed
LAM/MPI on three  
> > system, I am
> > getting following error,
> >
> > lamboot -v -ssi boot_rsh_ignore_stderr hostfile
> 
> Note that "-ssi" takes 2 parameters; I think
you are missing the 
> "1"  
> value to the boot_rsh_ignore_stderr token:
> 
>     lamboot -v -ssi boot_rsh_ignore_stderr 1 hostfile
> 
> And therefore the "hostfile" argument is
being ignored (i.e., 
> taken  
> as the non-sensical value for boot_rsh_ignore_stderr
SSI parameter).
> 
> > LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> > n-1<22789> ssi:boot:base:linear: booting n0
(localhost)
> > n-1<22789> ssi:boot:base:linear: finished
> >
> > Here in only one system I was able to boot lam,
and i have taken 
> 
> > the following
> > steps.
> >
> > 1.In Hostfile IP address of other two system.
> > 2..rhosts in home directory with IP and username(
in all the 
> three  
> > system i have
> > my user account)
> > 3.Installed LAM on three system
> > 4.I am able to do rsh to each individual system.
> > But getting following warning
> >
> > rsh 192.168.1.141
> > connect to address 192.168.1.141 port 543:
Connection refused
> > Trying krb4 rlogin...
> > connect to address 192.168.1.141 port 543:
Connection refused
> > trying normal rlogin (/usr/bin/rlogin)
> > Last login: Wed Sep  5 10:36:38 on :0
> 
> This means that rsh is falling back to a different
protocol to  
> login.  Perhaps you might want to try a different
service, such as 
> ssh?
> You can set which agent LAM uses (rsh vs. ssh) at run
time -- see  
> http://www.lam-mpi.org/faq/category4.php3#question14.
> 
> > But able to login without getting prompted for
password.
> >
> > I have tried the conventional way of booting a LAM
but did not  
> > completed
> > successfully
> > So i followed the above way of booting
> >
> > lamboot -v -ssi boot rsh hostfile
> >
> > LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> >
> > n-1<23096> ssi:boot:base:linear: booting n0
(192.168.1.125)
> > n-1<23096> ssi:boot:base:linear: booting n1
(192.168.1.141)
> > ERROR: LAM/MPI unexpectedly received the following
on stderr:
> > connect to address 192.168.1.141 port 544:
Connection refused
> > connect to address 192.168.1.141 port 544:
Connection refused
> > trying normal rsh (/usr/bin/rsh)
> >
------------------------------------------------------------
-----
> ----- 
> > -------
> > LAM attempted to execute a process on the remote
node 
> "192.168.1.141",> but received some output
on the standard error.  
> This heuristic
> > assumes that any output on the standard error
indicates a fatal 
> error,> and therefore aborts.  You can disable this
behavior 
> (i.e., have LAM
> > ignore output on standard error) in the rsh boot
module by 
> setting the
> > SSI parameter boot_rsh_ignore_stderr to 1.
> >
> > LAM tried to use the remote agent command
"rsh"
> > to invoke "echo $SHELL" on the remote
node.
> >
> > Thanks & regards
> > jeevitesh
> >
> >
> > _______________________________________________
> > This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> 
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )