List Info

Thread: LAM: LAM processors using 100% CPU on MPI_Recv




LAM: LAM processors using 100% CPU on MPI_Recv
country flaguser name
United States
2008-03-25 15:32:48

When I build on Red Hat 4.3 using LAM 7.0.6, my processes use very little CPU when sitting idle at MPI_Recv.

When I build on my target hardware Red Hat 4.6 LAM 7.1.2 my processes use 100% of the CPU just sitting and waiting for a message at MPI_Recv.

To make it stranger, if I take my processes built on Red Hat 4.3 LAM 7.0.6 and run them on red hat 4.6 LAM 7.1.2 they do not use 100% of the CPU,  they behave as I expect, it is only when I build it on the Red Hat 4.6 LAM 7.1.2 that they use 100% of the CPU.

Any ideas ? 

Re: LAM: LAM processors using 100% CPU on MPI_Recv
country flaguser name
United States
2008-03-28 22:28:23
On Mar 25, 2008, at 2:32 PM, Luttinger, Matthew wrote:
> When I build on Red Hat 4.3 using LAM 7.0.6, my
processes use very  
> little CPU when sitting idle at MPI_Recv.
>
> When I build on my target hardware Red Hat 4.6 LAM
7.1.2 my  
> processes use 100% of the CPU just sitting and waiting
for a message  
> at MPI_Recv.
>
> To make it stranger, if I take my processes built on
Red Hat 4.3 LAM  
> 7.0.6 and run them on red hat 4.6 LAM 7.1.2 they do not
use 100% of  
> the CPU,  they behave as I expect, it is only when I
build it on the  
> Red Hat 4.6 LAM 7.1.2 that they use 100% of the CPU.
>
> Any ideas ?
>

LAM/MPI has a number of different transport engines it can
use under  
the covers -- tcp, sysv (blocking shared memory + tcp),
usysv (polling  
shared memory + tcp), gm (Myrinet/GM).  If the usysv rpi is
in use,  
blocking sends / receives result in hard polling and 100%
CPU  
utilization.  If the sysv rpi is in use, the process can
block instead  
of polling if there is only communication entirely on node
or entirely  
off node.  If there is a mix (including an ANY_SOURCE
receive), LAM  
must poll between the TCP and shared memory channels.  This
includes  
the case where the application calls Irecv off node then
Recv on node  
(or vice-versa, and same with Sends).  The tcp transport
should never  
poll and will only use CPU when communication is actively
taking  
place.  I believe the GM transport will end up polling when
there is  
active communication.

If your nodes have different configuration (on one platform
you were  
running one process per node and another you were running
multiple  
processes per node) or different devices supported, this
could account  
for the different behaviors you are seeing.

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )