|
List Info
Thread: LAM: lamtests error on cluster
|
|
| LAM: lamtests error on cluster |

|
2006-12-21 01:46:27 |
|
Dear all:
I installed lam-7.1.2 on our cluster. Each node of our cluster has a
Pentium D 945 3.4G duo-core CPU (em64t).
I installed lam to a NFS shared directory.
=============some command results=================
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliu ClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
lamboot -v hf
LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
n-1<10074> ssi:boot:base:linear: booting n0 (ClusterServer)
n-1<10074> ssi:boot:base:linear: booting n1 (n23)
n-1<10074> ssi:boot:base:linear: booting n2 (n24)
n-1<10074> ssi:boot:base:linear: finished
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliu ClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
cat hf
ClusterServer cpu=2
n23
n23
n24
n24
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliu ClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
lamnodes
n0 ClusterServer.cluster.t02:2:origin,this_node
n1 n23.cluster.t02:2:
n2 n24.cluster.t02:2:
====================================================
It seems that lamboot was done correctly.
But when I use lamtests-7.1.2, problems occure.
Under the top dir of lamtests-7.1.2, configure and make goes
successfully.
And then I do "make -k check", it hangs up at the first test and stops
there.
the output is as follow:
-----------------------output----------------------
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliu ClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
make -k check
Making check in reporting
make[1]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/reporting'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
Making check in ccl
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl'
Making check in intercomm
make[2]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
make check-TESTS
make[3]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp /cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_inter
MPI_Comm_accept: unclassified: Bad address (rank 0, comm 4)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Comm_accept()
Rank (0, MPI_COMM_WORLD): - main()
---------------------------------------------------
After a long time, the output is still like this and the usage of CPU
is 0.
I use ctrl-C to cancel it and then do command "lamnodes", but this time
lamnodes also hangs up, no output appears. Only after I do lamboot
again,
lamnodes becomes all right.
I don't know what's the problem. Can someone help me?
Yours sincerely
Guibin Liu
====================================================
laminfo
LAM/MPI: 7.1.2
Prefix: /cluster/lammpi-7.1.2
Architecture: x86_64-unknown-linux-gnu
Configured by: root
Configured on: Wed Dec 20 00:54:19 CST 2006
Configure host: ClusterServer
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: ifort
Fortran symbols: underscore
C profiling: | |