On Jun 7, 2007, at 10:44 PM, chenyong wrote:
> Is it normal a machine in a cluster is assigned
different rank
> number in different runs.
> In my case, there are three machines (hpc01, hpc02,
hpc03) in the
> cluster. the content of the file mpd.hosts is as
follows
>
> hpc01
> hpc02
> hpc03
>
> I found that in some runs, hpc01 has rank number '0'
hpc02 has rank
> number '1' hpc03 has rank number '2';
> the order of rank numbers just follows the order of
machine names
> listed in the file.
> However, in some other runs, hpc01 has rank number '0',
hpc02 has
> rank number '2' , hpc03 has rank number '1'.
> the order does not follow the file name order.
> Is this nornal or not.
Are you using LAM/MPI or MPICH2? The mpd.boot suggests
MPICH2, in
which case you would be best off asking the MPICH lists.
This
behavior would be highly unusual for LAM/MPI. If you are
using LAM/
MPI, are you always running lamboot from the same node? Are
you
running multiple jobs at the same time?
Thanks,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|