List Info

Thread: Re: LAM: mismatched in their RPI selections




Re: LAM: mismatched in their RPI selections
user name
2007-07-16 16:26:28
Yes, this is definitely a problem.  LAM never made any
claims about  
binary compatibility between versions, which is one of the
reasons  
that we put this version check in place.

Sorry it was so confusing; glad you finally got it figured
out!


On Jul 16, 2007, at 5:22 PM, trymelz trymelz wrote:

> Hi, Jeff,
>
> Thanks. I guess we have the problem on 3b. I have 3
machines
>
> Machine A: lam 7.0.6, build executable A, copy it to
Machine B
> Machine B: lam 7.1.3, get executable A from Machine A
> Machine C: lam 7.1.3, build executable C
>
> Run executable A on Machine B with executable C on
Machine C.  I  
> will update the lam version on Machine A if possible
and let you  
> know the results
>
> Thanks.
>
>
>
> Jeff Squyres <jsquyrescisco.com> wrote: Is
this problem is still  
> occurring, then you must still somehow have
> remnants of different versions of LAM somewhere. Here's
what I would
> do...
>
> 1. Uninstall every copy of LAM from your machines. Make
them be 100%
> LAM-free.
> 2. Re-install *only* 7.1.3 on both machines.
> (I think you've done 1-2 already, but I wanted to
mention this to
> be complete)
> 3. Recompile your application with the new LAM
installation.
> a. If your app is available via a network filesystem to
all
> nodes, you're done
> b. If your app must be distributed to all nodes, either
build it
> on every node or manually distribute it to all nodes
> 4. Run it with the new LAM installation
>
> You should be good. If you're still getting the
mismatch message,
> let us know.
>
>
> On Jul 12, 2007, at 11:03 AM, trymelz trymelz wrote:
>
> > Laminfo outputs the lamd version 7.1.0 on both
machines (both
> > under interactive and non-interactive).
> >
> > I installed lam on both machines from
> >
> > http://www.lam-mpi.org/download/files/lam-7.1.3.tar.gz

> >
> > Jeff Squyres wrote: What is the output from
> > laminfo on both machines? It should show the
> > version of the lamd RPI.
> >
> > How are you installing on both machines, from a
source 7.1.3  
> tarball,
> > or from some other kind of package?
> >
> >
> > On Jul 11, 2007, at 10:30 AM, trymelz trymelz
wrote:
> >
> > > Hi, Jeff,
> > >
> > > Do you know how to check the version of the
lamd RPI? It shows
> > > version 7.1.0 by laminfo (both interactive
and non-interactive). I
> > > had an old version of lam installed, but I
removed all of them.
> > > Then I tried to
uninstall/configure/make/install the newest  
> version
> > > again. But the same problem is still there.
> > >
> > > I believe that the RPI is using some
libraries coming with lam, so
> > > I am wondering if it is possible to check
these libraries to see
> > > their version. Thanks
> > >
> > > Linfa
> > >
> > > Jeff Squyres wrote: The error message is
> > > telling you that you have different versions
of
> > > the lamd RPI (not the lamd executable) on
your different machines.
> > > So I think you want to check what versions of
LAM you have  
> installed
> > > on each machine. If all else fails, you might
want to just
> > > uninstall / reinstall LAM on both machines to
guarantee that you
> > have
> > > the same versions.
> > >
> > >
> > > On Jul 9, 2007, at 12:06 PM, trymelz trymelz
wrote:
> > >
> > > > Jeff,
> > > >
> > > > Thanks for your information. but...
> > > >
> > > > [Machine_A] rsh Machine_B 'which lamd'
> > > > /usr/bin/lamd
> > > >
> > > > [Machine_B] which lamd
> > > > /usr/bin/lamd
> > > >
> > > > where Machine_A is rank 0 and rank 1,
and Machine_B is rank 2
> > > >
> > > > Linfa
> > > >
> > > > Jeff Squyres wrote: It looks like you
have a
> > > > version mismatch of LAM/MPI between your
two
> > > > nodes. The error message is telling you
that it found two
> > different
> > > > versions of the lamd RPI on two nodes:
> > > >
> > > > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > > > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> > > >
> > > > Your laminfo is showing that you have
7.1.3 installed on both
> > nodes,
> > > > but you might want to check for PATH
differences on non-
> > interactive
> > > > logins.
> > > >
> > > >
> > > > On Jul 6, 2007, at 4:40 PM, trymelz
trymelz wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > Anyone has an idea about the
"mismatched in their RPI
> > selections"
> > > > > problem? Thanks.
> > > > >
> > > > > 1.lamboot -v hostfile3
> > > > >
> > > > > LAM 7.1.3/MPI 2 C++/ROMIO - Indiana
University
> > > > >
> > > > > n-1<14838>
ssi:boot:base:linear: booting n0 (64-bit Linux
> > > machine_A)
> > > > > n-1<14838>
ssi:boot:base:linear: booting n1 (32-bit Linux
> > > machine_B)
> > > > > n-1<14838>
ssi:boot:base:linear: finished
> > > > >
> > > > > 2. mpirun -ssi rpi lamd program
> > > > >
> > > > >
> > > >
> > >
> >  
>
------------------------------------------------------------
----------
> > > > > -------
> > > > > It seems that [at least] one of the
processes that was started
> > > with
> > > > > mpirun chose a different RPI than
its peers. For example, at
> > least
> > > > > the following two processes
mismatched in their RPI  
> selections:
> > > > >
> > > > > MPI_COMM_WORLD rank 0: lamd
(v7.1.0)
> > > > > MPI_COMM_WORLD rank 2: lamd
(v7.0.0)
> > > > >
> > > > > All MPI processes must choose the
same RPI module and version
> > when
> > > > > they start. Check your SSI settings
and/or the local  
> environment
> > > > > variables on each node.
> > > > >
> > > >
> > >
> >  
>
------------------------------------------------------------
----------
> > > > > -------
> > > > >
> > > > > 3. [Machina A]$ rsh Machine_B
laminfo
> > > > > LAM/MPI: 7.1.3
> > > > > Prefix: /usr
> > > > > Architecture: i686-pc-linux-gnu
> > > > > Configured by: linfa
> > > > > Configured on: Fri Jul 6 13:12:05
CDT 2007
> > > > > Configure host: Machine_B
> > > > > Memory manager: ptmalloc2
> > > > > C bindings: yes
> > > > > C++ bindings: yes
> > > > > Fortran bindings: yes
> > > > > C compiler: gcc
> > > > > C++ compiler: g++
> > > > > Fortran compiler: g77
> > > > > Fortran symbols: double_underscore
> > > > > C profiling: yes
> > > > > C++ profiling: yes
> > > > > Fortran profiling: yes
> > > > > C++ exceptions: no
> > > > > Thread support: yes
> > > > > ROMIO support: yes
> > > > > IMPI support: no
> > > > > Debug support: no
> > > > > Purify clean: no
> > > > > SSI boot: globus (API v1.1, Module
v0.6)
> > > > > SSI boot: rsh (API v1.1, Module
v1.1)
> > > > > SSI boot: slurm (API v1.1, Module
v1.0)
> > > > > SSI coll: lam_basic (API v1.1,
Module v7.1)
> > > > > SSI coll: shmem (API v1.1, Module
v1.0)
> > > > > SSI coll: smp (API v1.1, Module
v1.2)
> > > > > SSI rpi: crtcp (API v1.1, Module
v1.1)
> > > > > SSI rpi: lamd (API v1.0, Module
v7.1)
> > > > > SSI rpi: sysv (API v1.0, Module
v7.1)
> > > > > SSI rpi: tcp (API v1.0, Module
v7.1)
> > > > > SSI rpi: usysv (API v1.0, Module
v7.1)
> > > > > SSI cr: self (API v1.0, Module
v1.0)
> > > > >
> > > > > 4. [Machina A]$ laminfo
> > > > > LAM/MPI: 7.1.3
> > > > > Prefix: /usr/local
> > > > > Architecture:
x86_64-unknown-linux-gnu
> > > > > Configured by: linfa
> > > > > Configured on: Tue Jun 26 16:07:16
CDT 2007
> > > > > Configure host: Machine_A
> > > > > Memory manager: ptmalloc2
> > > > > C bindings: yes
> > > > > C++ bindings: yes
> > > > > Fortran bindings: yes
> > > > > C compiler:
/opt/intel/cce/9.0/bin/icc
> > > > > C++ compiler:
/opt/intel/cce/9.0/bin/icpc
> > > > > Fortran compiler:
/opt/intel/fce/9.0/bin/ifort
> > > > > Fortran symbols: underscore
> > > > > C profiling: yes
> > > > > C++ profiling: yes
> > > > > Fortran profiling: yes
> > > > > C++ exceptions: no
> > > > > Thread support: yes
> > > > > ROMIO support: yes
> > > > > IMPI support: no
> > > > > Debug support: no
> > > > > Purify clean: no
> > > > > SSI boot: globus (API v1.1, Module
v0.6)
> > > > > SSI boot: rsh (API v1.1, Module
v1.1)
> > > > > SSI boot: slurm (API v1.1, Module
v1.0)
> > > > > SSI coll: lam_basic (API v1.1,
Module v7.1)
> > > > > SSI coll: shmem (API v1.1, Module
v1.0)
> > > > > SSI coll: smp (API v1.1, Module
v1.2)
> > > > > SSI rpi: crtcp (API v1.1, Module
v1.1)
> > > > > SSI rpi: lamd (API v1.0, Module
v7.1)
> > > > > SSI rpi: sysv (API v1.0, Module
v7.1)
> > > > > SSI rpi: tcp (API v1.0, Module
v7.1)
> > > > > SSI rpi: usysv (API v1.0, Module
v7.1)
> > > > > SSI cr: self (API v1.0, Module
v1.0)
> > > > >
> > > > >
> > > > > The fish are biting.
> > > > > Get more visitors on your site
using Yahoo! Search Marketing.
> > > > >
_______________________________________________
> > > > > This list is archived at http://www.lam-m
pi.org/MailArchives/
> > lam/
> > > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > Cisco Systems
> > > >
> > > >
_______________________________________________
> > > > This list is archived at http://www.lam-m
pi.org/MailArchives/ 
> lam/
> > > >
> > > >
> > > > Pinpoint customers who are looking for
what you sell.
> > > >
_______________________________________________
> > > > This list is archived at http://www.lam-m
pi.org/MailArchives/ 
> lam/
> > >
> > >
> > > --
> > > Jeff Squyres
> > > Cisco Systems
> > >
> > >
_______________________________________________
> > > This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> > >
> > >
> > > Yahoo! oneSearch: Finally, mobile search that
gives answers, not
> > > web links.
> > >
_______________________________________________
> > > This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> >
> >
> > Building a website is a piece of cake.
> > Yahoo! Small Business gives you all the tools to
get online.
> > _______________________________________________
> > This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
>
> Looking for a deal? Find great prices on flights and
hotels with  
> Yahoo! FareChase.
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )