List Info

Thread: Re: LAM: lam Digest, Vol 923, Issue 1




Re: LAM: lam Digest, Vol 923, Issue 1
user name
2007-05-23 14:50:43

---------- Forwarded message ----------
From: ;"Jeff Squyres (jsquyres)" < jsquyrescisco.com">jsquyrescisco.com>
To:&nbsp;"General LAM/MPI mailing list" < lamlam-mpi.org">lamlam-mpi.org>, < lamlam-mpi.org">lamlam-mpi.org >
Date: Tue, 22 May 2007 23:56:36 -0400
Subject: Re: LAM: lamboot is ok, mpirun is not 
Hi

What happens when you try to mpirun an MPI application that was compiled with LAM's mpicc?


It's compiled sucessfully with LAM's mpicc, but still have the problem.
Here is what I did:
----------------------------------
[teruftoscar test]$ echo $PATH
/opt/lam-7.1.3/bin/:/opt/mpich-ch_p4-gcc-1.2.7/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/root/bin
[teruftoscar test]$ mpicc ring.c -o ring.out
[teruftoscar test]$ lamboot -v host

LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University

n-1<20590> ssi:boot:base:linear: booting n0 (uftoscar)
n-1<20590> ssi:boot:base:linear: booting n1 (oscarnode1)
n-1<;20590> ssi:boot:base:linear: booting n2 (oscarnode2)
n-1<20590> ssi:boot:base:linear: booting n3 (oscarnode3)
n-1<20590> ssi:boot:base:linear: booting n4 (oscarnode4)
n-1<20590> ssi:boot:base:linear: booting n5 (oscarnode5)
n-1<;20590> ssi:boot:base:linear: booting n6 (oscarnode6)
n-1<20590> ssi:boot:base:linear: booting n7 (oscarnode7)
n-1<20590> ssi:boot:base:linear: booting n8 (oscarnode8)
n-1<20590> ssi:boot:base:linear: booting n9 (oscarnode9)
n-1<;20590> ssi:boot:base:linear: booting n10 (oscarnode10)
n-1<20590> ssi:boot:base:linear: booting n11 (oscarnode11)
n-1<20590> ssi:boot:base:linear: booting n12 (oscarnode12)
n-1<20590> ssi:boot:base:linear: booting n13 (oscarnode13)
n-1&lt;20590> ssi:boot:base:linear: booting n14 (oscarnode14)
n-1<20590> ssi:boot:base:linear: finished
[teruftoscar test]$ lamnodes
n0 &nbsp; &nbsp;  uftoscar.latech:1:origin,this_node
n1   ; &nbsp; oscarnode1.latech :1:
n2 &nbsp;   ; oscarnode2.latech:1:
n3 &nbsp; &nbsp;  oscarnode3.latech:1:
n4 &nbsp; &nbsp;  oscarnode4.latech:1:
n5 &nbsp; &nbsp;  oscarnode5.latech:1:
n6 &nbsp; &nbsp;  oscarnode6.latech:1:
n7 &nbsp; &nbsp;  oscarnode7.latech:1:
n8 &nbsp; &nbsp;  oscarnode8.latech :1:
n9 &nbsp;   ; oscarnode9.latech:1:
n10 &nbsp; &nbsp; oscarnode10.latech:1:
n11 &nbsp;   oscarnode11.latech:1:
n12 &nbsp;   oscarnode12.latech:1:
n13 &nbsp;   oscarnode13.latech:1:
n14 &nbsp;   oscarnode14.latech:1:
[teruftoscar test]$ mpirun -np 15 -v ring.out
20626 ring.out running on n0 (o)
<freeze>

No firewall is running on any nodes in this cluster, and $PATH on every nodes start with "/opt/lam-7.1.3/bin/"

Thanks
Ter

 -----Original Message-----
From:   K. Charoenpornwattana Ter [ kcharoengmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">mailto:kcharoengmail.com]
Sent:&nbsp;  Tuesday, May 22, 2007 09:11 PM Eastern Standard Time
To:  ; &nbsp; lamlam-mpi.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">lamlam-mpi.org
Subject: &nbsp; &nbsp; &nbsp;  LAM: lamboot is ok, mpirun is not

Hi all,

I have some problems with lam/mpi. I have been searching around the net but
noone has same problem as me.

My cluster has 1 head node and 14 compute nodes. I installed centos 4.5-i386.
I used OSCAR 4.2.1 to help building this cluster. I completely uninstalled
lam/mpi that came with OSCAR 4.2 and installed lam/mpi 7.1.3 with blcr 0.5.1
.


The problem is I can successfully lamboot hosts, but can't execute mpi
application (even simple hello world) on multiple nodes. (I can lamboot on
single node and execute "mpirun -np 1 hello.out&quot;)

I can ping, tping, traceroute from head to every nodes and vice versa in the
cluster. I can execute any mpi applications on this cluster using MPICH.

[teruftoscar ~]$ which mpirun
/opt/lam-7.1.3/bin/mpirun
[teruftoscar ~]$ ssh oscarnode1 which mpirun
/opt/lam-7.1.3/bin/mpirun

[teruftoscar ~]$ echo $PATH
/opt/lam-7.1.3
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
7.1.3/bin/
[teruftoscar ~]$ ssh oscarnode1 echo $PATH
/opt/lam-7.1.3
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
7.1.3/bin/


I am sure that the older version of lam/mpi was completely removed. and I
set env switcher to none.

Any help would be greatly apprecated.

Thanks
Ter



Re: LAM: lam Digest, Vol 923, Issue 1
user name
2007-05-23 16:59:08
Try some simple tests:

- Does "tping -c 3" run successfully?  (It should
ping all the lamd's)
- Does "lamexec N hostname" run successfully?  (It
should run  
"hostname" on all the booted nodes)
- When you "mpirun -np 15 ring.out", do you see
ring.out executing on  
all the nodes?  (i.e., if you ssh into each of the nodes and
run ps,  
do you see it running?)


On May 23, 2007, at 3:50 PM, K. Charoenpornwattana Ter
wrote:

>
> ---------- Forwarded message ----------
> From: "Jeff Squyres (jsquyres)" <
jsquyrescisco.com>
> To: "General LAM/MPI mailing list"
<lamlam-mpi.org>, <lamlam- 
> mpi.org >
> Date: Tue, 22 May 2007 23:56:36 -0400
> Subject: Re: LAM: lamboot is ok, mpirun is not Hi
> What happens when you try to mpirun an MPI application
that was  
> compiled with LAM's mpicc?
>
>
> It's compiled sucessfully with LAM's mpicc, but still
have the  
> problem.
> Here is what I did:
> ----------------------------------
> [teruftoscar test]$ echo $PATH
>
/opt/lam-7.1.3/bin/:/opt/mpich-ch_p4-gcc-1.2.7/bin/:/usr/ker
beros/ 
>
sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:
/bin:/ 
>
usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/x
pbs/ 
>
bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/l
ib:/opt/ 
>
pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c
3-4/:/ 
> root/bin
> [teruftoscar test]$ mpicc ring.c -o ring.out
> [teruftoscar test]$ lamboot -v host
>
> LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>
> n-1<20590> ssi:boot:base:linear: booting n0
(uftoscar)
> n-1<20590> ssi:boot:base:linear: booting n1
(oscarnode1)
> n-1<20590> ssi:boot:base:linear: booting n2
(oscarnode2)
> n-1<20590> ssi:boot:base:linear: booting n3
(oscarnode3)
> n-1<20590> ssi:boot:base:linear: booting n4
(oscarnode4)
> n-1<20590> ssi:boot:base:linear: booting n5
(oscarnode5)
> n-1<20590> ssi:boot:base:linear: booting n6
(oscarnode6)
> n-1<20590> ssi:boot:base:linear: booting n7
(oscarnode7)
> n-1<20590> ssi:boot:base:linear: booting n8
(oscarnode8)
> n-1<20590> ssi:boot:base:linear: booting n9
(oscarnode9)
> n-1<20590> ssi:boot:base:linear: booting n10
(oscarnode10)
> n-1<20590> ssi:boot:base:linear: booting n11
(oscarnode11)
> n-1<20590> ssi:boot:base:linear: booting n12
(oscarnode12)
> n-1<20590> ssi:boot:base:linear: booting n13
(oscarnode13)
> n-1<20590> ssi:boot:base:linear: booting n14
(oscarnode14)
> n-1<20590> ssi:boot:base:linear: finished
> [teruftoscar test]$ lamnodes
> n0      uftoscar.latech:1:origin,this_node
> n1      oscarnode1.latech :1:
> n2      oscarnode2.latech:1:
> n3      oscarnode3.latech:1:
> n4      oscarnode4.latech:1:
> n5      oscarnode5.latech:1:
> n6      oscarnode6.latech:1:
> n7      oscarnode7.latech:1:
> n8      oscarnode8.latech :1:
> n9      oscarnode9.latech:1:
> n10     oscarnode10.latech:1:
> n11     oscarnode11.latech:1:
> n12     oscarnode12.latech:1:
> n13     oscarnode13.latech:1:
> n14     oscarnode14.latech:1:
> [teruftoscar test]$ mpirun -np 15 -v ring.out
> 20626 ring.out running on n0 (o)
> <freeze>
>
> No firewall is running on any nodes in this cluster,
and $PATH on  
> every nodes start with "/opt/lam-7.1.3/bin/"
>
> Thanks
> Ter
>
>  -----Original Message-----
> From:   K. Charoenpornwattana Ter [mailto:kcharoengmail.com]
> Sent:   Tuesday, May 22, 2007 09:11 PM Eastern Standard
Time
> To:     lamlam-mpi.org
> Subject:        LAM: lamboot is ok, mpirun is not
>
> Hi all,
>
> I have some problems with lam/mpi. I have been
searching around the  
> net but
> noone has same problem as me.
>
> My cluster has 1 head node and 14 compute nodes. I
installed centos  
> 4.5-i386.
> I used OSCAR 4.2.1 to help building this cluster. I
completely  
> uninstalled
> lam/mpi that came with OSCAR 4.2 and installed lam/mpi
7.1.3 with  
> blcr 0.5.1
> .
>
>
> The problem is I can successfully lamboot hosts, but
can't execute mpi
> application (even simple hello world) on multiple
nodes. (I can  
> lamboot on
> single node and execute "mpirun -np 1
hello.out")
>
> I can ping, tping, traceroute from head to every nodes
and vice  
> versa in the
> cluster. I can execute any mpi applications on this
cluster using  
> MPICH.
>
> [teruftoscar ~]$ which mpirun
> /opt/lam-7.1.3/bin/mpirun
> [teruftoscar ~]$ ssh oscarnode1 which mpirun
> /opt/lam-7.1.3/bin/mpirun
>
> [teruftoscar ~]$ echo $PATH
> /opt/lam-7.1.3
>
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/
usr/ 
>
local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/
pbs/ 
>
bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-sw
itcher/ 
>
bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/u
sr/ 
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
> [teruftoscar ~]$ ssh oscarnode1 echo $PATH
> /opt/lam-7.1.3
>
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/
usr/ 
>
local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/
pbs/ 
>
bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-sw
itcher/ 
>
bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/u
sr/ 
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
>
>
> I am sure that the older version of lam/mpi was
completely removed.  
> and I
> set env switcher to none.
>
> Any help would be greatly apprecated.
>
> Thanks
> Ter
>
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )