On Mar 25, 2008, at 2:32 PM, Luttinger, Matthew wrote:
> When I build on Red Hat 4.3 using LAM 7.0.6, my
processes use very
> little CPU when sitting idle at MPI_Recv.
>
> When I build on my target hardware Red Hat 4.6 LAM
7.1.2 my
> processes use 100% of the CPU just sitting and waiting
for a message
> at MPI_Recv.
>
> To make it stranger, if I take my processes built on
Red Hat 4.3 LAM
> 7.0.6 and run them on red hat 4.6 LAM 7.1.2 they do not
use 100% of
> the CPU, they behave as I expect, it is only when I
build it on the
> Red Hat 4.6 LAM 7.1.2 that they use 100% of the CPU.
>
> Any ideas ?
>
LAM/MPI has a number of different transport engines it can
use under
the covers -- tcp, sysv (blocking shared memory + tcp),
usysv (polling
shared memory + tcp), gm (Myrinet/GM). If the usysv rpi is
in use,
blocking sends / receives result in hard polling and 100%
CPU
utilization. If the sysv rpi is in use, the process can
block instead
of polling if there is only communication entirely on node
or entirely
off node. If there is a mix (including an ANY_SOURCE
receive), LAM
must poll between the TCP and shared memory channels. This
includes
the case where the application calls Irecv off node then
Recv on node
(or vice-versa, and same with Sends). The tcp transport
should never
poll and will only use CPU when communication is actively
taking
place. I believe the GM transport will end up polling when
there is
active communication.
If your nodes have different configuration (on one platform
you were
running one process per node and another you were running
multiple
processes per node) or different devices supported, this
could account
for the different behaviors you are seeing.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|