List Info

Thread: LAM: Running tasks from a x86_64 head node with i686 nodes (LAM-MPI 7.1.2)




LAM: Running tasks from a x86_64 head node with i686 nodes (LAM-MPI 7.1.2)
user name
2006-08-09 15:44:48

Hello everyone,

I am currently trying to get my MPI app to run off a somewhat heterogenous environement. Here is how I compile my apps:

1- Run the following command on all arches (head node and slave node)

mpicc -lm -lX11 -o mandelbrot-mpi.$(laminfo -arch | cut -d' ' -f10) mandelbrot-mpi.c

This generates the following binaries:

mandelbrot-mpi.i686-pc-linux-gnu

mandelbrot-mpi.x86_64-pc-linux-gnu

2- Start lam-mpi on the desired nodes with lamboot:

lamboot small_hst

Where small_hst contains:

headless

thinkbig1

thinkbig21

The "headless" host is the head node (dual opteron, x86_64) and the other "thinkbig" nodes are AthlonXP nodes. Lamboot starts with no complaints

3- (Try to) use mpiexec to launch the parallel application:

mpiexec -n 4 -arch i686-pc-linux-gnu $PWD/mandelbrot-mpi.i686-pc-linux-gnu 100 200 200 1 : -arch x86_64-pc-linux-gnu $PWD/mandelbrot-mpi.x86_64-pc-linux-gnu 100 200 200 1

The output I get is:

Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.

Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.

/export/home/eric/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi.i686-pc-linux-gnu: error while loading shared libraries: liblamf77mpi.so.0: cannot open shared object file: No such file or directory

/export/home/eric/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi.i686-pc-linux-gnu: error while loading shared libraries: liblamf77mpi.so.0: cannot open shared object file: No such file or directory

-----------------------------------------------------------------------------

It seems that [at least] one of the processes that was started with

mpirun did not invoke MPI_INIT before quitting (it is possible that

more than one process did not invoke MPI_INIT -- mpirun was only

notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that

invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program

to run non-MPI programs over the lambooted nodes.

-----------------------------------------------------------------------------

mpirun failed with exit status 252

Now, I noticed the library loading error and get it even though I set the following in my ~/.bashrc (which is sourced by ~/.profile, and that is the only thing that ~/.profile does):

if [ $(uname -m) == "x86_64" ]

then

export LD_LIBRARY_PATH=";/usr/lib64"

else

export LD_LIBRARY_PATH=";/usr/lib"

fi

Which seems to have no impact unless I am loging on interactively (I made sure that ~/.bashrc was not being bypassed within the script in that specific case).

Now, the questions:

1- I am not sure I am using mpiexec correctly (based my command line on the FAQ and the manpage).

2- How do I get lam-mpi to look in the correct path for the libraries. The manpage for lamboot claims that ~/.profile is sourced by deault on the local nodes but I have no way of confirming this.

3- Is setting the LD_LIBRARY_PATH the real solution to my problem or am-I missing something else?

4- This application has the first process 0 perform some display. The first process _has_ to be one running on the host named "headless", where all commands are launched. Am-I assuming that the process 0 will always be on the node first?

Thanks for the info in advance,

Eric Thibodeau

PS: I am also trying to do this with OpenMPI, if it's easyer to accomplish this under OpenMPI, please don't hesitate to inform me of this since I found no evidence that it was (I also decided not to cross-post this to the OpenMPI list for the moment)

LAM: Running tasks from a x86_64 head node with i686 nodes (LAM-MPI 7.1.2)
user name
2006-08-09 16:14:41
Eric Thibodeau wrote:
> Hello everyone,
> 
> I am currently trying to get my MPI app to run off a
somewhat 
> heterogenous environement. Here is how I compile my
apps:
> 
> 1- Run the following command on all arches (head node
and slave node)
> 
> mpicc -lm -lX11 -o mandelbrot-mpi.$(laminfo -arch | cut
-d' ' -f10) 
> mandelbrot-mpi.c
> 
> This generates the following binaries:
> 
> mandelbrot-mpi.i686-pc-linux-gnu
> 
> mandelbrot-mpi.x86_64-pc-linux-gnu
> 
What's the value in mixing the 2 architectures, when the
32-bit mpi 
should run under both the 64-bit and 32-bit systems?  In any
case, you 
will likely have to go to each node and check that ldd gives
the 
expected results with your run-time environment settings.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
LAM: Running tasks from a x86_64 head node with i686 nodes (LAM-MPI 7.1.2)
user name
2006-08-09 16:30:02

Le mercredi 9 août 2006 12:14, Tim Prince a écrit :

> Eric Thibodeau wrote:

> > Hello everyone,

> >

> > I am currently trying to get my MPI app to run off a somewhat

> > heterogenous environement. Here is how I compile my apps:

> >

> > 1- Run the following command on all arches (head node and slave node)

> >

> > mpicc -lm -lX11 -o mandelbrot-mpi.$(laminfo -arch | cut -d' ' -f10)

> > mandelbrot-mpi.c

> >

> > This generates the following binaries:

> >

> > mandelbrot-mpi.i686-pc-linux-gnu

> >

> > mandelbrot-mpi.x86_64-pc-linux-gnu

> >

> What's the value in mixing the 2 architectures, when the 32-bit mpi

> should run under both the 64-bit and 32-bit systems? In any case, you

> will likely have to go to each node and check that ldd gives the

> expected results with your run-time environment settings.

The alledged value is that the head node is supposed to be more powerful than the slave nodes for xyz reason. As for the ldd info, here it is (and it seems right):

ericthinkbig1 ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ldd mandelbrot-mpi.i686-pc-linux-gnu

linux-gate.so.1 => (0xffffe000)

libm.so.6 => /lib/tls/libm.so.6 (0xb7f55000)

libX11.so.6 => /usr/lib/libX11.so.6 (0xb7e68000)

liblamf77mpi.so.0 => /usr/lib/liblamf77mpi.so.0 (0xb7e57000)

libmpi.so.0 => /usr/lib/libmpi.so.0 (0xb7dbf000)

liblam.so.0 => /usr/lib/liblam.so.0 (0xb7d6e000)

libutil.so.1 => /lib/libutil.so.1 (0xb7d6a000)

libdl.so.2 => /lib/libdl.so.2 (0xb7d66000)

libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7d53000)

libc.so.6 => /lib/tls/libc.so.6 (0xb7c3c000)

/lib/ld-linux.so.2 (0xb7f7c000)

libXau.so.6 => /usr/lib/libXau.so.6 (0xb7c39000)

libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xb7c34000)

ericheadless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ldd mandelbrot-mpi.x86_64-pc-linux-gnu

libm.so.6 => /lib/tls/libm.so.6 (0x00002b11667c5000)

libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002b1166919000)

liblamf77mpi.so.0 => /usr/lib64/liblamf77mpi.so.0 (0x00002b1166b27000)

libmpi.so.0 => /usr/lib64/libmpi.so.0 (0x00002b1166c3a000)

liblam.so.0 => /usr/lib64/liblam.so.0 (0x00002b1166de5000)

libutil.so.1 => /lib/libutil.so.1 (0x00002b1166f4b000)

libdl.so.2 => /lib/libdl.so.2 (0x00002b116704e000)

libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00002b1167151000)

libc.so.6 => /lib/tls/libc.so.6 (0x00002b1167267000)

libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002b116748f000)

libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00002b1167592000)

libnsl.so.1 => /lib/libnsl.so.1 (0x00002b1167698000)

/lib64/ld-linux-x86-64.so.2 (0x00002b11666ad000)

I am even more confused as to why the nodes don't seem to find liblamf77mpi.so...

Eric Thibodeau

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )