List Info

Thread: LAM: (no subject)




LAM: (no subject)
user name
2006-12-08 08:20:58
Brian, in fact this is how I built lam from your download
not from  
fink direct although I had the fink version already so I
just wonder  
if there is some sort of conflict somewhere. Anyway I still
cannot  
get lamboot to work. Any suggestions would be helpful
thanks,


Roger Smith
Loughborough UK
----------------------------

On 30 Nov 2006, at 03:24, Brian Barrett wrote:
> I'm not familiar with how LAM is being built by Fink or
whatever
> system you used to build Open MPI.  This is an error
I've seen from
> time to time if the LAM daemon is dynamically linked to
liblam
> instead of statically linked.  I'd recommend using the
build of LAM
> for OS X found on our web page:
>
>    http://www.la
m-mpi.org/7.1/download.php
>
> Brian
>
>
> On Nov 28, 2006, at 1:22 AM, Roger Smith wrote:
>
>> I am running Mac OS 10.4.8 on a dual processor
PowerPC G5 with 2.5 GB
>> ram. I have installed LAM 7.1.2 through the desk
manager system. I
>> wish to use MPI on this single dual processor
machine.
>> However, I cannot now get lamboot to work with the
newer version of
>> lam  even after running recon. It comes up with the
error
>>
>> router (nrecv): not attached to daemon
>>
>> when I run with lamboot -d I get the output
>>
>>
>>
------------------------------------------------------------
--------- 
>> -
>> --
>> ------
>> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>>
>> n-1<29718> ssi:boot:base: looking for boot
schema in following
>> directories:
>> n-1<29718> ssi:boot:base:   <current
directory>
>> n-1<29718> ssi:boot:base:   $TROLLIUSHOME/etc
>> n-1<29718> ssi:boot:base:   $LAMHOME/etc
>> n-1<29718> ssi:boot:base:   /sw/etc/lammpi
>> n-1<29718> ssi:boot:base: looking for boot
schema file:
>> n-1<29718> ssi:boot:base:   lam-bhost.def
>> n-1<29718> ssi:boot:base: found boot schema:
/sw/etc/lammpi/lam-
>> bhost.def
>> n-1<29718> ssi:boot:rsh: found the following
hosts:
>> n-1<29718> ssi:boot:rsh:   n0 localhost
(cpu=1)
>> n-1<29718> ssi:boot:rsh: resolved hosts:
>> n-1<29718> ssi:boot:rsh:   n0 localhost
--> 127.0.0.1 (origin)
>> n-1<29718> ssi:boot:rsh: starting RTE procs
>> n-1<29718> ssi:boot:base:linear: starting
>> n-1<29718> ssi:boot:base:server: opening
server TCP socket
>> n-1<29718> ssi:boot:base:server: opened port
53756
>> n-1<29718> ssi:boot:base:linear: booting n0
(localhost)
>> n-1<29718> ssi:boot:rsh: starting lamd on
(localhost)
>> n-1<29718> ssi:boot:rsh: starting on n0
(localhost): hboot -t -c lam-
>> conf.lamd -d -I -H 127.0.0.1 -P 53756 -n 0 -o 0
>> n-1<29718> ssi:boot:rsh: launching locally
>> hboot: performing tkill
>> hboot: tkill -d
>> tkill: setting prefix to (null)
>> tkill: setting suffix to (null)
>> tkill: got killname back: /tmp/lam-marsmars-mac.lut.ac.uk/lam-
>> killfile
>> tkill: f_kill = "/tmp/lam-marsmars-mac.lut.ac.uk/lam-killfile"
>> tkill: killing LAM...
>> tkill: killing PID (SIGHUP) 29715 ...
>> tkill:  already dead
>> tkill: removing socket file ...
>> tkill: socket file: /tmp/lam-marsmars-mac.lut.ac.uk/lam-kernel-
>> socketd
>> tkill: removing IO daemon socket file ...
>> tkill: IO daemon socket file: /tmp/lam-marsmars-mac.lut.ac.uk/lam- 
>> io-
>> socket
>> tkill: all finished
>> hboot: booting...
>> hboot: fork /sw/bin/lamd
>> [1]  29721 lamd -H 127.0.0.1 -P 53756 -n 0 -o 0 -d
>> n-1<29718> ssi:boot:rsh: successfully
launched on n0 (localhost)
>> hboot: attempting to execute
>> n-1<29718> ssi:boot:base:server: expecting
connection from finite  
>> list
>> router (nrecv): not attached to daemon
>>
>>
-----------------------------------------------------
>>
>> I would appreciate any help anyone can give me on
this problem
>>
>>
>>
>> regards to all
>>
>>
>> Roger Smith
>> Loughborough University UK
>>
>>
>
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamboot problem
user name
2006-12-09 06:54:56
Hi,

 	I was trying to use lamboot command using 2 cpus. I made a

hostfile on 10.101.11.45 like this:

 	10.101.11.45 user=manojv
 	10.101.11.58 user=manoj

When I use $ lamboot hostfile, it takes too much of time and
gives 
error(pasted below). I am using secured connection using ssh
keys. I am 
able to connect 10.101.11.58 without any password or from
10.101.11.58, I 
am able to connect 10.101.11.45.

When I use the same command with same hostfile on
10.101.11.58, it's done 
without any problem.

I have made sure that on both of the machines, there is same
version of 
LAM(7.1.1).

Can anybody have idea why I am not able to lamboot from
10.101.11.45 ???


the error it gives is pasted below for the reference.
thanks.

error::
------------------------------------------------------------
-
manojv10.101.11.45 $ lamboot ~/host

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

ERROR: LAM/MPI unexpectedly received the following on
stderr:
eros: Connection refused
------------------------------------------------------------
-----------------
LAM failed to execute a process on the remote node
"manoj10.101.11.58".
LAM was not trying to invoke any LAM-specific commands yet
-- we were
simply trying to determine what shell was being used on the
remote
host.

LAM tried to use the remote agent command "rsh"
to invoke "echo $SHELL" on the remote node.

*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS,
AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE
LAM/MPI FAQ
*** (http://www.lam-mpi.org/fa
q/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.

This usually indicates an authentication problem with the
remote
agent, some other configuration type of error in your .cshrc
or
.profile file, or you were unable to executable a command on
the
remote node for some other reason.  The following is a list
of items
that you should check on the remote node:

         - You have an account and can login to the remote
machine
         - Incorrect permissions on your home directory
(should
           probably be 0755)
         - Incorrect permissions on your $HOME/.rhosts file
(if you are
           using rsh -- they should probably be 0644)
         - You have an entry in the remote $HOME/.rhosts
file (if you
           are using rsh) for the machine and username that
you are
           running from
         - Your .cshrc/.profile must not print anything out
to the
           standard error
         - Your .cshrc/.profile should set a correct TERM
type
         - Your .cshrc/.profile should set the SHELL
environment
           variable to your default shell

Try invoking the following command at the unix command line:

         rsh 10.101.11.58 -n -l manoj 'echo $SHELL'

You will need to configure your local setup such that you
will *not*
be prompted for a password to invoke this command on the
remote node.
No output should be printed from the remote node before the
output of
the command is displayed.

When you can get this command to execute successfully by
hand, LAM
will probably be able to function properly.
------------------------------------------------------------
-----------------


-- 
manoj vaghela
zeus numerix pvt ltd
aerospace engineering department
indian institute of technology bombay

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamboot problem
user name
2006-12-09 14:32:43
HI!!

   view your /etc/hosts ( GNU/Linux ?? ) and add this:

   10.101.11.45         cpu1
   10.101.11.58         cpu2

   sometimes the system made a DNS query, this take many
time on some
systems.




> Hi,
>
>  	I was trying to use lamboot command using 2 cpus. I
made a
> hostfile on 10.101.11.45 like this:
>
>  	10.101.11.45 user=manojv
>  	10.101.11.58 user=manoj
>
> When I use $ lamboot hostfile, it takes too much of
time and gives
> error(pasted below). I am using secured connection
using ssh keys. I am
> able to connect 10.101.11.58 without any password or
from 10.101.11.58, I
> am able to connect 10.101.11.45.
>
> When I use the same command with same hostfile on
10.101.11.58, it's done
> without any problem.
>
> I have made sure that on both of the machines, there is
same version of
> LAM(7.1.1).
>
> Can anybody have idea why I am not able to lamboot from
10.101.11.45 ???
>
>
> the error it gives is pasted below for the reference.
> thanks.
>
> error::
>
------------------------------------------------------------
-
> manojv10.101.11.45 $ lamboot ~/host
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> ERROR: LAM/MPI unexpectedly received the following on
stderr:
> eros: Connection refused
>
------------------------------------------------------------
-----------------
> LAM failed to execute a process on the remote node
"manoj10.101.11.58".
> LAM was not trying to invoke any LAM-specific commands
yet -- we were
> simply trying to determine what shell was being used on
the remote
> host.
>
> LAM tried to use the remote agent command
"rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS
SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE
LAM/MPI FAQ
> *** (http://www.lam-mpi.org/fa
q/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with
the remote
> agent, some other configuration type of error in your
.cshrc or
> .profile file, or you were unable to executable a
command on the
> remote node for some other reason.  The following is a
list of items
> that you should check on the remote node:
>
>          - You have an account and can login to the
remote machine
>          - Incorrect permissions on your home directory
(should
>            probably be 0755)
>          - Incorrect permissions on your $HOME/.rhosts
file (if you are
>            using rsh -- they should probably be 0644)
>          - You have an entry in the remote
$HOME/.rhosts file (if you
>            are using rsh) for the machine and username
that you are
>            running from
>          - Your .cshrc/.profile must not print anything
out to the
>            standard error
>          - Your .cshrc/.profile should set a correct
TERM type
>          - Your .cshrc/.profile should set the SHELL
environment
>            variable to your default shell
>
> Try invoking the following command at the unix command
line:
>
>          rsh 10.101.11.58 -n -l manoj 'echo $SHELL'
>
> You will need to configure your local setup such that
you will *not*
> be prompted for a password to invoke this command on
the remote node.
> No output should be printed from the remote node before
the output of
> the command is displayed.
>
> When you can get this command to execute successfully
by hand, LAM
> will probably be able to function properly.
>
------------------------------------------------------------
-----------------
>
>
> --
> manoj vaghela
> zeus numerix pvt ltd
> aerospace engineering department
> indian institute of technology bombay
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamboot problem
user name
2006-12-09 15:10:16
On Dec 9, 2006, at 1:54 AM, Vaghela Manoj B wrote:

> When I use $ lamboot hostfile, it takes too much of
time and gives
> error(pasted below). I am using secured connection
using ssh keys.  
> I am
> able to connect 10.101.11.58 without any password or
from  
> 10.101.11.58, I
> am able to connect 10.101.11.45.

The "Connection refused" message is likely from
rsh -- see if your  
LAM is trying to use "rsh" instead of
"ssh".  You can tell LAM to use  
ssh at run time by setting the LAMRSH environment variable
to "ssh".

See if that solves your problems.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamboot problem
user name
2006-12-11 11:59:30
Hi... I'm a total newbie on this (so maybe I won't be able
to help at 
all), but
what does a recon ~/host say? Probably the same, isn't it? I
have tryied that,
and it's what it says when the remote machine (10.101.11.58)
is off. I guess
you have checked all the advices that appear in the error
message... If you're
able to run lamboot in the 10.101.11.58, I would check that
an ls -la 
gives the
same permissions to all the files, and that the file
structure looks 
reasonably
the same (I mean... that you've got a .profile with the same

permissions in the
.58 machine than in the .45 machine and so on)

I'm sorry for maybe confusing you. This is what I (a LAM
ignorant) would do

Regards


Quoting bcruchetcftaustral.cl:

>
> HI!!
>
>   view your /etc/hosts ( GNU/Linux ?? ) and add this:
>
>   10.101.11.45         cpu1
>   10.101.11.58         cpu2
>
>   sometimes the system made a DNS query, this take many
time on some
> systems.
>
> 
>
>
>> Hi,
>>
>>  	I was trying to use lamboot command using 2 cpus.
I made a
>> hostfile on 10.101.11.45 like this:
>>
>>  	10.101.11.45 user=manojv
>>  	10.101.11.58 user=manoj
>>
>> When I use $ lamboot hostfile, it takes too much of
time and gives
>> error(pasted below). I am using secured connection
using ssh keys. I am
>> able to connect 10.101.11.58 without any password
or from 10.101.11.58, I
>> am able to connect 10.101.11.45.
>>
>> When I use the same command with same hostfile on
10.101.11.58, it's done
>> without any problem.
>>
>> I have made sure that on both of the machines,
there is same version of
>> LAM(7.1.1).
>>
>> Can anybody have idea why I am not able to lamboot
from 10.101.11.45 ???
>>
>>
>> the error it gives is pasted below for the
reference.
>> thanks.
>>
>> error::
>>
------------------------------------------------------------
-
>> manojv10.101.11.45 $ lamboot ~/host
>>
>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>
>> ERROR: LAM/MPI unexpectedly received the following
on stderr:
>> eros: Connection refused
>>
------------------------------------------------------------
-----------------
>> LAM failed to execute a process on the remote node
"manoj10.101.11.58".
>> LAM was not trying to invoke any LAM-specific
commands yet -- we were
>> simply trying to determine what shell was being
used on the remote
>> host.
>>
>> LAM tried to use the remote agent command
"rsh"
>> to invoke "echo $SHELL" on the remote
node.
>>
>> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS
SUGGESTIONS, AND
>> *** CONSULT THE "BOOTING LAM" SECTION OF
THE LAM/MPI FAQ
>> *** (http://www.lam-mpi.org/fa
q/) BEFORE POSTING TO THE LAM/MPI USER'S
>> *** MAILING LIST.
>>
>> This usually indicates an authentication problem
with the remote
>> agent, some other configuration type of error in
your .cshrc or
>> .profile file, or you were unable to executable a
command on the
>> remote node for some other reason.  The following
is a list of items
>> that you should check on the remote node:
>>
>>          - You have an account and can login to the
remote machine
>>          - Incorrect permissions on your home
directory (should
>>            probably be 0755)
>>          - Incorrect permissions on your
$HOME/.rhosts file (if you are
>>            using rsh -- they should probably be
0644)
>>          - You have an entry in the remote
$HOME/.rhosts file (if you
>>            are using rsh) for the machine and
username that you are
>>            running from
>>          - Your .cshrc/.profile must not print
anything out to the
>>            standard error
>>          - Your .cshrc/.profile should set a
correct TERM type
>>          - Your .cshrc/.profile should set the
SHELL environment
>>            variable to your default shell
>>
>> Try invoking the following command at the unix
command line:
>>
>>          rsh 10.101.11.58 -n -l manoj 'echo $SHELL'
>>
>> You will need to configure your local setup such
that you will *not*
>> be prompted for a password to invoke this command
on the remote node.
>> No output should be printed from the remote node
before the output of
>> the command is displayed.
>>
>> When you can get this command to execute
successfully by hand, LAM
>> will probably be able to function properly.
>>
------------------------------------------------------------
-----------------
>>
>>
>> --
>> manoj vaghela
>> zeus numerix pvt ltd
>> aerospace engineering department
>> indian institute of technology bombay
>>
>> _______________________________________________
>> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>>
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>



_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: (no subject)
user name
2006-12-13 04:34:32
In the output included in the first e-mail, the lamd found
is /sw/bin/ 
lamd.  Since /sw/bin is usually where Fink puts stuff, I
assume that  
means that /sw/bin/lamd is the Fink version of lamd and not
the one  
you had built.  This could mean that Fink is first in your
path when  
the non-interactive shell used by ssh is started.  You can
check this  
by running:

   ssh localhost which lamd

If it points to /sw/bin/lamd  instead of the path to your
custom LAM  
build, then that's the problem.  If you fix your path so
that your  
build is first in the path for non-interactive logins, I
would be  
willing to bet your problem goes away.

Brian


On Dec 8, 2006, at 1:20 AM, Roger Smith wrote:

> Brian, in fact this is how I built lam from your
download not from
> fink direct although I had the fink version already so
I just wonder
> if there is some sort of conflict somewhere. Anyway I
still cannot
> get lamboot to work. Any suggestions would be helpful
> thanks,
>
>
> Roger Smith
> Loughborough UK
> ----------------------------
>
> On 30 Nov 2006, at 03:24, Brian Barrett wrote:
>> I'm not familiar with how LAM is being built by
Fink or whatever
>> system you used to build Open MPI.  This is an
error I've seen from
>> time to time if the LAM daemon is dynamically
linked to liblam
>> instead of statically linked.  I'd recommend using
the build of LAM
>> for OS X found on our web page:
>>
>>    http://www.la
m-mpi.org/7.1/download.php
>>
>> Brian
>>
>>
>> On Nov 28, 2006, at 1:22 AM, Roger Smith wrote:
>>
>>> I am running Mac OS 10.4.8 on a dual processor
PowerPC G5 with  
>>> 2.5 GB
>>> ram. I have installed LAM 7.1.2 through the
desk manager system. I
>>> wish to use MPI on this single dual processor
machine.
>>> However, I cannot now get lamboot to work with
the newer version of
>>> lam  even after running recon. It comes up with
the error
>>>
>>> router (nrecv): not attached to daemon
>>>
>>> when I run with lamboot -d I get the output
>>>
>>>
>>>
------------------------------------------------------------
-------- 
>>> -
>>> -
>>> --
>>> ------
>>> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>>>
>>> n-1<29718> ssi:boot:base: looking for
boot schema in following
>>> directories:
>>> n-1<29718> ssi:boot:base:   <current
directory>
>>> n-1<29718> ssi:boot:base:  
$TROLLIUSHOME/etc
>>> n-1<29718> ssi:boot:base:   $LAMHOME/etc
>>> n-1<29718> ssi:boot:base:  
/sw/etc/lammpi
>>> n-1<29718> ssi:boot:base: looking for
boot schema file:
>>> n-1<29718> ssi:boot:base:   lam-bhost.def
>>> n-1<29718> ssi:boot:base: found boot
schema: /sw/etc/lammpi/lam-
>>> bhost.def
>>> n-1<29718> ssi:boot:rsh: found the
following hosts:
>>> n-1<29718> ssi:boot:rsh:   n0 localhost
(cpu=1)
>>> n-1<29718> ssi:boot:rsh: resolved hosts:
>>> n-1<29718> ssi:boot:rsh:   n0 localhost
--> 127.0.0.1 (origin)
>>> n-1<29718> ssi:boot:rsh: starting RTE
procs
>>> n-1<29718> ssi:boot:base:linear: starting
>>> n-1<29718> ssi:boot:base:server: opening
server TCP socket
>>> n-1<29718> ssi:boot:base:server: opened
port 53756
>>> n-1<29718> ssi:boot:base:linear: booting
n0 (localhost)
>>> n-1<29718> ssi:boot:rsh: starting lamd on
(localhost)
>>> n-1<29718> ssi:boot:rsh: starting on n0
(localhost): hboot -t -c  
>>> lam-
>>> conf.lamd -d -I -H 127.0.0.1 -P 53756 -n 0 -o 0
>>> n-1<29718> ssi:boot:rsh: launching
locally
>>> hboot: performing tkill
>>> hboot: tkill -d
>>> tkill: setting prefix to (null)
>>> tkill: setting suffix to (null)
>>> tkill: got killname back: /tmp/lam-marsmars-mac.lut.ac.uk/lam-
>>> killfile
>>> tkill: f_kill = "/tmp/lam-marsmars-mac.lut.ac.uk/lam-killfile"
>>> tkill: killing LAM...
>>> tkill: killing PID (SIGHUP) 29715 ...
>>> tkill:  already dead
>>> tkill: removing socket file ...
>>> tkill: socket file: /tmp/lam-marsmars-mac.lut.ac.uk/lam-kernel-
>>> socketd
>>> tkill: removing IO daemon socket file ...
>>> tkill: IO daemon socket file: /tmp/lam-marsmars-mac.lut.ac.uk/lam-
>>> io-
>>> socket
>>> tkill: all finished
>>> hboot: booting...
>>> hboot: fork /sw/bin/lamd
>>> [1]  29721 lamd -H 127.0.0.1 -P 53756 -n 0 -o 0
-d
>>> n-1<29718> ssi:boot:rsh: successfully
launched on n0 (localhost)
>>> hboot: attempting to execute
>>> n-1<29718> ssi:boot:base:server:
expecting connection from finite
>>> list
>>> router (nrecv): not attached to daemon
>>>
>>>
-----------------------------------------------------
>>>
>>> I would appreciate any help anyone can give me
on this problem
>>>
>>>
>>>
>>> regards to all
>>>
>>>
>>> Roger Smith
>>> Loughborough University UK
>>>
>>>
>>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )