Try some simple tests:
- Does "tping -c 3" run successfully? (It should
ping all the lamd's)
- Does "lamexec N hostname" run successfully? (It
should run
"hostname" on all the booted nodes)
- When you "mpirun -np 15 ring.out", do you see
ring.out executing on
all the nodes? (i.e., if you ssh into each of the nodes and
run ps,
do you see it running?)
On May 23, 2007, at 3:50 PM, K. Charoenpornwattana Ter
wrote:
>
> ---------- Forwarded message ----------
> From: "Jeff Squyres (jsquyres)" <
jsquyres cisco.com>
> To: "General LAM/MPI mailing list"
<lam lam-mpi.org>, <lam lam-
> mpi.org >
> Date: Tue, 22 May 2007 23:56:36 -0400
> Subject: Re: LAM: lamboot is ok, mpirun is not Hi
> What happens when you try to mpirun an MPI application
that was
> compiled with LAM's mpicc?
>
>
> It's compiled sucessfully with LAM's mpicc, but still
have the
> problem.
> Here is what I did:
> ----------------------------------
> [ter uftoscar test]$ echo $PATH
>
/opt/lam-7.1.3/bin/:/opt/mpich-ch_p4-gcc-1.2.7/bin/:/usr/ker
beros/
>
sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:
/bin:/
>
usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/x
pbs/
>
bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/l
ib:/opt/
>
pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c
3-4/:/
> root/bin
> [ter uftoscar test]$ mpicc ring.c -o ring.out
> [ter uftoscar test]$ lamboot -v host
>
> LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>
> n-1<20590> ssi:boot:base:linear: booting n0
(uftoscar)
> n-1<20590> ssi:boot:base:linear: booting n1
(oscarnode1)
> n-1<20590> ssi:boot:base:linear: booting n2
(oscarnode2)
> n-1<20590> ssi:boot:base:linear: booting n3
(oscarnode3)
> n-1<20590> ssi:boot:base:linear: booting n4
(oscarnode4)
> n-1<20590> ssi:boot:base:linear: booting n5
(oscarnode5)
> n-1<20590> ssi:boot:base:linear: booting n6
(oscarnode6)
> n-1<20590> ssi:boot:base:linear: booting n7
(oscarnode7)
> n-1<20590> ssi:boot:base:linear: booting n8
(oscarnode8)
> n-1<20590> ssi:boot:base:linear: booting n9
(oscarnode9)
> n-1<20590> ssi:boot:base:linear: booting n10
(oscarnode10)
> n-1<20590> ssi:boot:base:linear: booting n11
(oscarnode11)
> n-1<20590> ssi:boot:base:linear: booting n12
(oscarnode12)
> n-1<20590> ssi:boot:base:linear: booting n13
(oscarnode13)
> n-1<20590> ssi:boot:base:linear: booting n14
(oscarnode14)
> n-1<20590> ssi:boot:base:linear: finished
> [ter uftoscar test]$ lamnodes
> n0 uftoscar.latech:1:origin,this_node
> n1 oscarnode1.latech :1:
> n2 oscarnode2.latech:1:
> n3 oscarnode3.latech:1:
> n4 oscarnode4.latech:1:
> n5 oscarnode5.latech:1:
> n6 oscarnode6.latech:1:
> n7 oscarnode7.latech:1:
> n8 oscarnode8.latech :1:
> n9 oscarnode9.latech:1:
> n10 oscarnode10.latech:1:
> n11 oscarnode11.latech:1:
> n12 oscarnode12.latech:1:
> n13 oscarnode13.latech:1:
> n14 oscarnode14.latech:1:
> [ter uftoscar test]$ mpirun -np 15 -v ring.out
> 20626 ring.out running on n0 (o)
> <freeze>
>
> No firewall is running on any nodes in this cluster,
and $PATH on
> every nodes start with "/opt/lam-7.1.3/bin/"
>
> Thanks
> Ter
>
> -----Original Message-----
> From: K. Charoenpornwattana Ter [mailto:kcharoen gmail.com]
> Sent: Tuesday, May 22, 2007 09:11 PM Eastern Standard
Time
> To: lam lam-mpi.org
> Subject: LAM: lamboot is ok, mpirun is not
>
> Hi all,
>
> I have some problems with lam/mpi. I have been
searching around the
> net but
> noone has same problem as me.
>
> My cluster has 1 head node and 14 compute nodes. I
installed centos
> 4.5-i386.
> I used OSCAR 4.2.1 to help building this cluster. I
completely
> uninstalled
> lam/mpi that came with OSCAR 4.2 and installed lam/mpi
7.1.3 with
> blcr 0.5.1
> .
>
>
> The problem is I can successfully lamboot hosts, but
can't execute mpi
> application (even simple hello world) on multiple
nodes. (I can
> lamboot on
> single node and execute "mpirun -np 1
hello.out")
>
> I can ping, tping, traceroute from head to every nodes
and vice
> versa in the
> cluster. I can execute any mpi applications on this
cluster using
> MPICH.
>
> [ter uftoscar ~]$ which mpirun
> /opt/lam-7.1.3/bin/mpirun
> [ter uftoscar ~]$ ssh oscarnode1 which mpirun
> /opt/lam-7.1.3/bin/mpirun
>
> [ter uftoscar ~]$ echo $PATH
> /opt/lam-7.1.3
>
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/
usr/
>
local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/
pbs/
>
bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-sw
itcher/
>
bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/u
sr/
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
> [ter uftoscar ~]$ ssh oscarnode1 echo $PATH
> /opt/lam-7.1.3
>
/bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/
usr/
>
local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/
pbs/
>
bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-sw
itcher/
>
bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/u
sr/
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
>
>
> I am sure that the older version of lam/mpi was
completely removed.
> and I
> set env switcher to none.
>
> Any help would be greatly apprecated.
>
> Thanks
> Ter
>
>
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|