List Info

Thread: Re: LAM: Theorical question about parallel computing




Re: LAM: Theorical question about parallel computing
country flaguser name
Brazil
2007-07-11 22:14:49
Thanks for the answer... It actually brought me another
question: can 
different implementations of MPI have high difference on
execution times?


>From: Tim Prince <n8tmaol.com>
>Reply-To: tprincecomputer.org, General LAM/MPI mailing list

><lamlam-mpi.org>
>To: General LAM/MPI mailing list <lamlam-mpi.org>
>Subject: Re: LAM: Theorical question about parallel
computing
>Date: Wed, 11 Jul 2007 20:09:51 -0700
>
>pedropetrovitchhotmail.com wrote:
> > Altough my question isnt directly about LAM/MPI it
kind of answers me a 
>lot
> > of things that I need to know if I need to use MPI
or not. Here it goes: 
>Is
> > it possible to make a parallel algorithm (using
MPI) running in a single
> > machine/node (with many process running on it,
i.e., mpirun -np 10 main 
>and
> > only one hardware processor) to run faster than a
serial one on the same
> > conditions? Thanks a lot for the atention. Any
help would be 
>appreciated.
>
>Theoretically, possibly with some very strange
application designed so
>that a single process doesn't use all the resources, or
resources such
>as cache/memory can't be made available to a single
process.
>Practically, no, the single node MPI speedup of a normal
application is
>limited to the number of separate processors (e.g.
cores) on the node,
>and would normally be optimized by running one process
per core.
>If there is an advantage for MPI over threading (e.g.
OpenMP) it is
>usually related to better cache and memory locality of
the MPI processes.
>_______________________________________________
>This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

____________________________________________________________
_____
Mande torpedos SMS do seu messenger para o celular dos seus
amigos 
http://mobile.msn.com/

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: Theorical question about parallel computing
country flaguser name
United States
2007-07-11 23:42:36
pedropetrovitchhotmail.com wrote:
> Thanks for the answer... It actually brought me another
question: can 
> different implementations of MPI have high difference
on execution times?
> 

Why did people write so many MPI implementations?  Why I
just spend a 
few weeks trying to make a single application perform with
one or more 
of 4 different MPIs (not including lam)?  If you are talking
only about 
  single node performance, you still have many questions;
did you set 
the -O parameter of lam?  Do you use an MPI which does not
normally 
differentiate between processes on the same or different
nodes (e.g. 
mpich)?  How are collectives implemented?  Does it recognize
which 
processes have equal access to the same buffer, so no data
movement 
needed for message passing?  Does it leave messages in a
suitable length 
range resident in the cache of the receiving process, so it
doesn't 
start out with cache misses?  Does the MPI optimize its use
of system calls?
What is a high difference? To one of my recent bosses, 5%
was high.  On 
a reasonable applications with evenly distributed work, the
MPI 
shouldn't account for that much of the time on a single
node, but easily 
could exceed that as one approaches useful cluster size for
the 
application.
When comparing as many as 4 different MPI implementations,
there is a 
good chance that not all are able to complete the job on the
full range 
of cluster sizes, so that may be a definition of high
difference.

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: Theorical question about parallel computing
country flaguser name
United States
2007-07-12 01:04:21
On Jul 11, 2007, at 10:14 PM, Pedro Petrovitch wrote:

> Thanks for the answer... It actually brought me another
question: can
> different implementations of MPI have high difference
on execution  
> times?

Compare MPI to any other program (say, BLAS). Will different
 
implementations of BLAS have high differences in execution
times? It  
depends on who's doing the implementing. It also depends on
what you  
consider "high difference".

In any case, the key to MPI is scalability and
architecture/hardware  
support. For example, an MPI implementation can do a good
job with  
the general case, but be poor at doing collective operations
or  
scatter/gather. Or one MPI implementation may take advantage
of some  
specialized hardware details—for example, on an SGI Altix,
the SGI  
MPI implementation will *probably* kick the tail of other
MPI  
implementations that are not specifically tuned to the
Altix. Does  
that mean that a given test program will take an hour with
LAM and  
half a second with SGI's MPI? Probably not. Does that mean
that SGI's  
can shave a couple minutes off of a complex program that
runs for an  
hour under LAM? Quite possibly. Can you take SGI's
implementation and  
expect the same improvement on a large Sun node? Absolutely
not.

~Kyle

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )