List Info

Thread: LAM: lamtests error on cluster




LAM: lamtests error on cluster
user name
2006-12-21 01:46:27
Dear all:

I installed lam-7.1.2 on our cluster. Each node of our cluster has a Pentium D 945 3.4G duo-core CPU (em64t).
I installed lam to a NFS shared directory.

=============some command results=================
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamboot -v hf

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<;10074> ssi:boot:base:linear: booting n0 (ClusterServer)
n-1<10074&gt; ssi:boot:base:linear: booting n1 (n23)
n-1<10074&gt; ssi:boot:base:linear: booting n2 (n24)
n-1<10074&gt; ssi:boot:base:linear: finished
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> cat hf
ClusterServer cpu=2
n23
n23
n24
n24
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamnodes
n0  ; &nbsp;  ClusterServer.cluster.t02:2:origin,this_node
n1 ; &nbsp; &nbsp; n23.cluster.t02:2:
n2 &nbsp; &nbsp;  n24.cluster.t02:2:
====================================================

It seems that lamboot was done correctly.
But when I use lamtests-7.1.2, problems occure.
Under the top dir of lamtests-7.1.2, configure and make goes successfully.
And then I do "make -k check", it hangs up at the first test and stops there.
the output is as follow:
-----------------------output----------------------
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> make -k check
Making check in reporting
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
Making check in ccl
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl'
Making check in intercomm
make[2]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
make  check-TESTS
make[3]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp /cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_inter
MPI_Comm_accept: unclassified: Bad address (rank 0, comm 4)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): ; - MPI_Comm_accept()
Rank (0, MPI_COMM_WORLD): ; - main()
&nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp;  ---------------------------------------------------
After a long time, the output is still like this and the usage of CPU is 0.
I use ctrl-C to cancel it and then do command "lamnodes", but this time
lamnodes also hangs up, no output appears. Only after I do lamboot again,
lamnodes becomes all right.
&nbsp; I don't know what's the problem. Can someone help me?

&nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp;  Yours sincerely
 &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; Guibin Liu


====================================================
laminfo
&nbsp;   ; &nbsp; &nbsp; &nbsp;  LAM/MPI: 7.1.2
&nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; Prefix: /cluster/lammpi-7.1.2
 &nbsp; &nbsp;   Architecture: x86_64-unknown-linux-gnu
 ; &nbsp; &nbsp; Configured by: root
&nbsp; &nbsp; &nbsp; Configured on: Wed Dec 20 00:54:19 CST 2006
&nbsp; &nbsp;  Configure host: ClusterServer
 ; &nbsp;  Memory manager: ptmalloc2
 &nbsp; &nbsp; &nbsp;   C bindings: yes
&nbsp; &nbsp; &nbsp;  C++ bindings: yes
&nbsp;  Fortran bindings: yes
&nbsp; &nbsp; &nbsp; &nbsp;  C compiler: gcc
&nbsp; &nbsp; &nbsp;  C++ compiler: g++
&nbsp;  Fortran compiler: ifort
&nbsp; &nbsp; Fortran symbols: underscore
 &nbsp; &nbsp;   ; C profiling: yes
&nbsp; &nbsp; &nbsp; C++ profiling: yes
&nbsp; Fortran profiling: yes
&nbsp; &nbsp;  C++ exceptions: no
 ; &nbsp;  Thread support: yes
&nbsp; &nbsp; &nbsp; ROMIO support: yes
&nbsp; &nbsp; &nbsp;  IMPI support: no
 ; &nbsp; &nbsp; Debug support: no
 ; &nbsp; &nbsp;  Purify clean: no
 ; &nbsp; &nbsp; &nbsp; &nbsp;  SSI boot: globus (API v1.1, Module v0.6)
&nbsp; &nbsp;   ; &nbsp; &nbsp; SSI boot: rsh (API v1.1, Module v1.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp; SSI boot: slurm (API v1.1, Module v1.0)
&nbsp; &nbsp;   ; &nbsp; &nbsp; SSI coll: lam_basic (API v1.1, Module v7.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp; SSI coll: shmem (API v1.1, Module v1.0)
&nbsp; &nbsp;   ; &nbsp; &nbsp; SSI coll: smp (API v1.1, Module v1.2)
&nbsp; &nbsp;   ; &nbsp; &nbsp;  SSI rpi: crtcp (API v1.1, Module v1.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp;  SSI rpi: lamd (API v1.0, Module v7.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp;  SSI rpi: sysv (API v1.0, Module v7.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp;  SSI rpi: tcp (API v1.0, Module v7.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp;  SSI rpi: usysv (API v1.0, Module v7.1)
&nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; SSI cr: self (API v1.0, Module v1.0)
====================================================


LAM: lamtests error on cluster
user name
2006-12-21 21:44:22
Áõ¹ó±ó wrote:
> Dear all:
> 
> I installed lam-7.1.2 on our cluster. Each node of our
cluster has a 
> Pentium D 945 3.4G duo-core CPU (em64t).
> I installed lam to a NFS shared directory.
> 
> =============some command results=================
> gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
lamboot -v hf
> 
> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
> 
> n-1<10074> ssi:boot:base:linear: booting n0
(ClusterServer)
> n-1<10074> ssi:boot:base:linear: booting n1 (n23)
> n-1<10074> ssi:boot:base:linear: booting n2 (n24)
> n-1<10074> ssi:boot:base:linear: finished
> gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
cat hf
> ClusterServer cpu=2
> n23
> n23
> n24
> n24
> gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
lamnodes
> n0      ClusterServer.cluster.t02:2:origin,this_node
> n1      n23.cluster.t02:2:
> n2      n24.cluster.t02:2:
> ====================================================
> 
> It seems that lamboot was done correctly.
> But when I use lamtests-7.1.2, problems occure.
> Under the top dir of lamtests-7.1.2, configure and make
goes successfully.
> And then I do "make -k check", it hangs up at
the first test and stops 
> there.
> the output is as follow:
> -----------------------output----------------------
> gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2>
make -k check
> Making check in reporting
> make[1]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/reporting'
> make[1]: Nothing to be done for `check'.
> make[1]: Leaving directory
`/cluster/soft/MPI/lamtests-7.1.2/reporting'
> Making check in ccl
> make[1]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/ccl'
> Making check in intercomm
> make[2]: Entering directory 
> `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
> make  check-TESTS
> make[3]: Entering directory 
> `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
> mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp 
>
/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_i
nter
> MPI_Comm_accept: unclassified: Bad address (rank 0,
comm 4)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD):  - MPI_Comm_accept()
> Rank (0, MPI_COMM_WORLD):  - main()
>                                          
> ---------------------------------------------------
> After a long time, the output is still like this and
the usage of CPU is 0.
> I use ctrl-C to cancel it and then do command
"lamnodes", but this time
> lamnodes also hangs up, no output appears. Only after I
do lamboot again,
> lamnodes becomes all right.
>   I don't know what's the problem. Can someone help me?
> 
>                                   Yours sincerely
>                                   Guibin Liu
> 
> 
> ====================================================
> laminfo
>             LAM/MPI: 7.1.2
>              Prefix: /cluster/lammpi-7.1.2
>        Architecture: x86_64-unknown-linux-gnu

As you have built your lam for x86-64 (64-bit architecture),
you must
make sure you don't mix it with an incompatible lam version,
or with
objects or libraries built for 32-bit architecture.  Such
mixtures would
produce the sort of hang you mention.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamtests error on cluster
user name
2006-12-22 03:42:24
Tim Prince Wrote:
sbcglobal.net" type="cite">
Áõ¹ó±ó wrote:
  
Dear all:

I installed lam-7.1.2 on our cluster. Each node of our cluster has a 
Pentium D 945 3.4G duo-core CPU (em64t).
I installed lam to a NFS shared directory.

=============some command results=================
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamboot -v hf

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<10074> ssi:boot:base:linear: booting n0 (ClusterServer)
n-1&lt;10074> ssi:boot:base:linear: booting n1 (n23)
n-1&lt;10074>; ssi:boot:base:linear: booting n2 (n24)
n-1&lt;10074>; ssi:boot:base:linear: finished
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> cat hf
ClusterServer cpu=2
n23
n23
n24
n24
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamnodes
n0      ClusterServer.cluster.t02:2:origin,this_node
n1      n23.cluster.t02:2:
n2      n24.cluster.t02:2:
====================================================

It seems that lamboot was done correctly.
But when I use lamtests-7.1.2, problems occure.
Under the top dir of lamtests-7.1.2, configure and make goes successfully.
And then I do "make -k check", it hangs up at the first test and stops 
there.
the output is as follow:
-----------------------output----------------------
ClusterServer:/cluster/soft/MPI/lamtests-7.1.2">gbliuClusterServer:/cluster/soft/MPI/lamtests-7.1.2> make -k check
Making check in reporting
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
Making check in ccl
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl'
Making check in intercomm
make[2]: Entering directory 
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
make  check-TESTS
make[3]: Entering directory 
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp 
/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_inter
MPI_Comm_accept: unclassified: Bad address (rank 0, comm 4)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD):  - MPI_Comm_accept()
Rank (0, MPI_COMM_WORLD):  - main()
                                         
---------------------------------------------------
After a long time, the output is still like this and the usage of CPU is 0.
I use ctrl-C to cancel it and then do command "lamnodes", but this time
lamnodes also hangs up, no output appears. Only after I do lamboot again,
lamnodes becomes all right.
  I don't know what's the problem. Can someone help me?

                                  Yours sincerely
                                  Guibin Liu


====================================================
laminfo
            LAM/MPI: 7.1.2
             Prefix: /cluster/lammpi-7.1.2
       Architecture: x86_64-unknown-linux-gnu
    

As you have built your lam for x86-64 (64-bit architecture), you must
make sure you don't mix it with an incompatible lam version, or with
objects or libraries built for 32-bit architecture.  Such mixtures would
produce the sort of hang you mention.
  
  I am sure that before I install lam-7.1.2, there is no LAM installed on my system.
&nbsp; But how do I check if I don't mix it with objects or libraries built for 32-bit architecture.
  I just configire , make and make install, what else should I config?

sbcglobal.net" type="cite">

_______________________________________________ This list is archived at http://www.lam-mpi.org/MailArchives/lam/

LAM: lamtests error on cluster
user name
2006-12-22 18:29:35
goodluck_1982163.com wrote:
> Tim Prince Wrote:
>>
>>>
>>>
>>>
====================================================
>>> laminfo
>>>             LAM/MPI: 7.1.2
>>>              Prefix: /cluster/lammpi-7.1.2
>>>        Architecture: x86_64-unknown-linux-gnu
>>>     
>>
>> As you have built your lam for x86-64 (64-bit
architecture), you must
>> make sure you don't mix it with an incompatible lam
version, or with
>> objects or libraries built for 32-bit architecture.
 Such mixtures would
>> produce the sort of hang you mention.
>>   
> I am sure that before I install lam-7.1.2, there is no
LAM installed
> on my system.
> But how do I check if I don't mix it with objects or
libraries built
> for 32-bit architecture.
> I just configire , make and make install, what else
should I config?
>
In principle, if you use only the compiler wrappers for your
lam
installation,don't select any separately built 32-bit .o
files, and
don't issue the -m32 option (for gnu compilers; each
compiler has a
different scheme), you should be OK. I may have missed it,
but I didn't
see an indication of whether you are using only a consistent
set of gcc
64-bit compilers.

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: lamtests error on cluster
user name
2006-12-28 17:27:15
On Dec 20, 2006, at 8:46 PM, 刘贵斌 wrote:

> make[3]: Entering directory
`/cluster/soft/MPI/lamtests-7.1.2/ccl/ 
> intercomm'
> mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp
/cluster/soft/MPI/ 
> lamtests-7.1.2/ccl/intercomm/./allgather_inter
> MPI_Comm_accept: unclassified: Bad address (rank 0,
comm 4)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD):  - MPI_Comm_accept()
> Rank (0, MPI_COMM_WORLD):  - main()
>                                           
> ---------------------------------------------------
> After a long time, the output is still like this and
the usage of  
> CPU is 0.
> I use ctrl-C to cancel it and then do command
"lamnodes", but this  
> time
> lamnodes also hangs up, no output appears. Only after I
do lamboot  
> again,
> lamnodes becomes all right.

This is a known issue that we have fixed in the 7.1.3b1 beta
release,  
available here:

   http://www.lam-mpi.org/b
eta/


Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/



_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )