|
List Info
Thread: Re: LAM: lamboot is ok, mpirun is not
|
|
| Re: LAM: lamboot is ok, mpirun is not |

|
2007-05-24 20:36:11 |
|
Sorry, i see you did that earlier. have you tried the
mpirun with -v parameter as well?
[ter uftoscar ~]$ which
mpirun /opt/lam-7.1.3/bin/mpirun [ter uftoscar ~]$ cexec which
mpirun ************************* oscar_cluster
************************* ---------
oscarnode1--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode2--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode3--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode4--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode5--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode6--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode7--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode8--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode9--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode10--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode11--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode12--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode13--------- /opt/lam-7.1.3/bin/mpirun
Thanks
On 5/24/07, McCalla,
Mac < macmccalla hess.com">macmccalla hess.com >
wrote:
Hi,
just for grins, what does "which mpirun" show?
......
mac
mccalla
On 5/24/07, Jeff Squyres <cisco.com" target=_blank>jsquyres cisco.com>
wrote:
That
is just weird -- I don't think I've seen a case where tping worked
(implying that inter-lamd communication is working), but running
applications did not.
The
only thing that I can think of is that there is some firewalling in
place that only allows arbitrary UDP traffic
through...? (inter- lamd traffic is UDP, not
TCP) That doesn't seem to make sense, though, if MPICH works
(cexec uses ssh, which is most certainly allowed). But can you
triple check that there are no firewalls tcp rules in place that
restrict UDP/TCP traffic? (e.g., iptables)
Also
try running tping / mpirun / lamexec from a node other than the origin
(i.e., the node you lambooted from).
I did. same problem.
On
May 23, 2007, at 11:32 PM, K. Charoenpornwattana Ter wrote:
> Try
some simple tests: > > - Does "tping -c 3" run successfully? (It
should ping all the lamd's) > > [ter uftoscar test]$ tping -c 3
n0-13 > 1 byte from 13 remote nodes and 1 local node:
0.006 secs > 1 byte from 13 remote nodes and 1 local node:
0.005 secs > 1 byte from 13 remote nodes and 1 local node:
0.005 secs > > 3 messages, 3 bytes (0.003K), 0.016 secs
(0.368K/sec) > roundtrip min/avg/max:
0.005/0.005/0.006 > > > - Does "lamexec N hostname" run
successfully? (It should run > "hostname" on all the booted nodes)
> > No, it doesn't work. It only show headnode's hostname. See
below: > > [ter uftoscar ~]$ lamexec N hostname >
uftoscar.latech > <freeze> > > I, however, can
execute "cexec hostname" with no problem. > > - When you
"mpirun -np 15 ring.out", do you see ring.out executing on > all the
nodes? (i.e., if you ssh into each of the nodes and run ps, > do you
see it running? > > I only see one ring.out run on headnode, no
ring.out running on > other nodes. > > >
Thanks > Kulathep >
_______________________________________________ > This list is
archived at http://www.lam-mpi.org/MailArchives/lam/
-- Jeff
Squyres Cisco
Systems
_______________________________________________ This list
is archived at http://www.lam-mpi.org/MailArchives/lam/
_______________________________________________ This
list is archived at http://www.lam-mpi.org/MailArchives/lam/
|
| Re: LAM: lamboot is ok, mpirun is not |

|
2007-05-24 20:43:55 |
|
Yes,
[ter uftoscar test]$ mpirun -np 14 -v ring.out 17119 ring.out running on n0 (o) <freeze>
Ummm, I guess, I will just remove everything and install it again.
Thanks anyway, Kulathep
On 5/24/07, McCalla, Mac < macmccalla hess.com">macmccalla hess.com> wrote:
Sorry, i see you did that earlier. have you tried the
mpirun with -v parameter as well?
[ter uftoscar ~]$ which
mpirun /opt/lam-7.1.3/bin/mpirun [ter uftoscar ~]$ cexec which
mpirun ************************* oscar_cluster
************************* ---------
oscarnode1--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode2--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode3--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode4--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode5--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode6--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode7--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode8--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode9--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode10--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode11--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode12--------- /opt/lam-7.1.3/bin/mpirun ---------
oscarnode13--------- /opt/lam-7.1.3/bin/mpirun
Thanks
On 5/24/07, McCalla,
Mac < macmccalla hess.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">macmccalla hess.com >
wrote:
Hi,
just for grins, what does "which mpirun" show?
......
mac
mccalla
From: lam-bounces lam-mpi.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">lam-bounces lam-mpi.org [mailto: lam-bounces lam-mpi.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
lam-bounces lam-mpi.org] On Behalf Of K.
Charoenpornwattana Ter Sent: 24 May 2007 14:47 To: General
LAM/MPI mailing list Subject: Re: LAM: lamboot is ok, mpirun is
not
On 5/24/07, Jeff Squyres < jsquyres cisco.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">jsquyres cisco.com>
wrote:
That
is just weird -- I don't think I've seen a case where tping worked
(implying that inter-lamd communication is working), but running
applications did not.
The
only thing that I can think of is that there is some firewalling in
place that only allows arbitrary UDP traffic
through...? (inter- lamd traffic is UDP, not
TCP) That doesn't seem to make sense, though, if MPICH works
(cexec uses ssh, which is most certainly allowed). But can you
triple check that there are no firewalls tcp rules in place that
restrict UDP/TCP traffic? (e.g., iptables)
Also
try running tping / mpirun / lamexec from a node other than the origin
(i.e., the node you lambooted from).
I did. same problem.
On
May 23, 2007, at 11:32 PM, K. Charoenpornwattana Ter wrote:
> Try
some simple tests: > > - Does "tping -c 3" run successfully? (It
should ping all the lamd's) > > [ter uftoscar test]$ tping -c 3
n0-13 > 1 byte from 13 remote nodes and 1 local node:
0.006 secs > 1 byte from 13 remote nodes and 1 local node:
0.005 secs > 1 byte from 13 remote nodes and 1 local node:
0.005 secs > > 3 messages, 3 bytes (0.003K), 0.016 secs
(0.368K/sec) > roundtrip min/avg/max:
0.005/0.005/0.006 > > > - Does "lamexec N hostname" run
successfully? (It should run > "hostname" on all the booted nodes)
> > No, it doesn't work. It only show headnode39;s hostname. See
below: > > [ter uftoscar ~]$ lamexec N hostname >
uftoscar.latech > <freeze> > > I, however, can
execute "cexec hostname" with no problem. > > - When you
"mpirun -np 15 ring.out", do you see ring.out executing on > all the
nodes? (i.e., if you ssh into each of the nodes and run ps, > do you
see it running? > > I only see one ring.out run on headnode, no
ring.out running on > other nodes. > > >
Thanks > Kulathep >
_______________________________________________ > This list is
archived at http://www.lam-mpi.org/MailArchives/lam/
-- Jeff
Squyres Cisco
Systems
_______________________________________________ This list
is archived at http://www.lam-mpi.org/MailArchives/lam/
_______________________________________________ This
list is archived at http://www.lam-mpi.org/MailArchives/lam/
_______________________________________________ This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|
[1-2]
|
|