|
List Info
Thread: LAM: mpirun on dual cores Opterons 8214HE
|
|
| LAM: mpirun on dual cores Opterons
8214HE |
  Poland |
2007-09-26 11:47:09 |
Dear All,
We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
If I run a job using mpirun (mpirun -np 4 ./jobname) how can
I know
whether my job is running on 2 physical processors (4 cores)
or
separate 4 processors (also 4 cores but from different
processors)?
Is is there any way to chose one of the above option by some
mpirun options?
What is more efficient: 2 cores on 2 processors OR 4 cores
on 4 separate
processors?
Thank you in advance.
Artur T.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE |
  United States |
2007-09-26 16:23:02 |
Artur Tyliszczak wrote:
> Dear All,
>
> We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
> If I run a job using mpirun (mpirun -np 4 ./jobname)
how can I know
> whether my job is running on 2 physical processors (4
cores) or
> separate 4 processors (also 4 cores but from different
processors)?
>
> Is is there any way to chose one of the above option by
some mpirun options?
> What is more efficient: 2 cores on 2 processors OR 4
cores on 4 separate
> processors?
>
You don't give enough information to answer, and perhaps I'm
not
guessing entirely what you are asking. You could, of
course, go to each
node while your job is running, to see how many processes
are running
there. lam mpi provides a facility to control how many
processes you
assign to each node. Within each node, you would require an
affinity
specification, as by using taskset, to assure an efficient
placement of
processes.
If your application doesn't spend much time in message
passing, it would
likely run faster with 1 process per socket, but in most
cases that is
not an efficient way to use a cluster. OTOH, lam, with
observance of
the -O option or proper build options for shared memory
messaging, will
use shared memory effectively for message passing within
each node.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE |
  Italy |
2007-09-27 03:21:27 |
Artur Tyliszczak ha scritto:
> Dear All,
>
> We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
> If I run a job using mpirun (mpirun -np 4 ./jobname)
how can I know
> whether my job is running on 2 physical processors (4
cores) or
> separate 4 processors (also 4 cores but from different
processors)?
>
> Is is there any way to chose one of the above option by
some mpirun options?
> What is more efficient: 2 cores on 2 processors OR 4
cores on 4 separate
> processors?
>
> Thank you in advance.
> Artur T.
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
>
Hi Artur
As it has already been pointed out in this list, if you have
a plain SMP
system you should not worry much about on which socket/core
your
parallel processes run within a single node, provided you
have a recent
enough kernel (multicore/hyperthread aware) it will probably
do a good
scheduling job.
Concerning the way to spread multiple processes on different
nodes, it
may surely depend on the application, I would add to Tim's
answer that I
saw some benchmarks where the same MPI application (whose
performance
bottlencks are memory access and communication) had the
following
behavior (supposing to have dual processor*dual core
nodes):
on Opteron systems maximum performance was obtained with 4
MPI processes
per node, thus using all the cores =>communication
bottleneck
on Xeon systems the maxiumum performance was reached using
double number
of nodes but starting only 2 processes per node, even if
some other
process were running on the same nodes =>memory access
bottleneck
best regards, Davide
--
__________________________________________________________
Davide Cesari ARPA-Servizio Idro Meteorologico __
tel (39) 051/525926 ||
fax (39) 051/6497501 |||
e-mail dcesari arpa.emr.it |||/
www http://www.arpa.emr.it/sim
---
Address: ARPA-SIM, Viale Silvani 6, 40122 Bologna, Italy
__________________________________________________________
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE - more doubts |
  Poland |
2007-09-27 05:26:02 |
Hi Tim,
Thank you Tim for your answer.
> You don't give enough information to answer, and
perhaps I'm not
> guessing entirely what you are asking. You could, of
course, go to each
> node while your job is running, to see how many
processes are running
> there.
What do you mean 'go to each node' . Assume that I am using
one PC with
8 dual core processors.
So, if I understand MPI vocabulary correctly we have 1 node
with 8 dual
core processors.
I am not able to run a job in this way: mpirun n0,1,2,3
./jobname on
this computer. On the other hand if I want to run a job:
mpirun c0,1,2,3
./jobname
I have to first boot the lam with lamboot hostfile, where in
the
hostfile I have: localhost cpu=16. Is it a correct
procedure?
Assume that I stared lam: lamboot hostfile, where in the
hostfile I
have: localhost cpu=16
I run the job mpirun c0,1,2,3 ./jobname. Does it mean that
I am using 2
cores from processor 0 and 2 cores from processor 1 ?
I run the job mpirun c0,2,4,6 ./jobname. Does it mean that
I am using 1
core from processors 0,1,2,3 ?
And finally what in case: mpirun c0,0,0,0 ./jobname ??
In all above cases when I check the cpu usage by 'top'
then I can see
four cpu working at 100%. The time of computations is more
or less the
same in all cases.
However in case: mpirun c0,0,0,0 ./jobname I would expect to
see 4 times
~50% because I explicitly run job on 1 cpu with 2 cores . Am
I correct?
This is the case when I run the mpirun c0,0,0,0 ./jobname on
PC with 1
dual core processor. Could you explain me this?
> OTOH, lam, with observance of
> the -O option or proper build options for shared memory
messaging, will
> use shared memory effectively for message passing
within each node.
I am using Suse 10.2 with fortran compilers from: Intel,
Pathscale and
Gfortran compiler also. Do you know any "proper build
options for shared
memory messaging" for one of those compilers?
Thank you once again and thank you in advance for answering
to the above
doubts.
Kind regards,
Artur.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE |
  Poland |
2007-09-27 06:10:39 |
Hi Davide,
Thanks a lot.
> Hi Artur
>
> As it has already been pointed out in this list, if you
have a plain SMP
> system you should not worry much about on which
socket/core your
> parallel processes run within a single node, provided
you have a recent
> enough kernel (multicore/hyperthread aware) it will
probably do a good
> scheduling job.
kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough ?
> Concerning the way to spread multiple processes on
different nodes, it
> may surely depend on the application, I would add to
Tim's answer that I
> saw some benchmarks where the same MPI application
(whose performance
> bottlencks are memory access and communication) had the
following
> behavior (supposing to have dual processor*dual core
nodes):
>
> on Opteron systems maximum performance was obtained
with 4 MPI processes
> per node, thus using all the cores =>communication
bottleneck
>
> on Xeon systems the maxiumum performance was reached
using double number
> of nodes but starting only 2 processes per node, even
if some other
> process were running on the same nodes =>memory
access bottleneck
>
> best regards, Davide
>
>
I also did some tests and this is what I observed on 8 dual
cores
workstation (16 cores)
When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np 8
./job_2)
simultaneously then the execution time of each of them is
about 20%
longer than when I run these jobs consecutively, i.e. first
mpirun -np 8
./job_1 and then when it finished mpirun -np 8 ./job_2. Is
it common
behavior?
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE |
  Italy |
2007-09-27 06:52:57 |
Artur Tyliszczak ha scritto:
> Hi Davide,
>
> Thanks a lot.
>
>>Hi Artur
>>
>>As it has already been pointed out in this list, if
you have a plain SMP
>>system you should not worry much about on which
socket/core your
>>parallel processes run within a single node,
provided you have a recent
>>enough kernel (multicore/hyperthread aware) it will
probably do a good
>>scheduling job.
>
> kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough
?
Yes, I think so.
> I also did some tests and this is what I observed on 8
dual cores
> workstation (16 cores)
> When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np
8 ./job_2)
> simultaneously then the execution time of each of them
is about 20%
> longer than when I run these jobs consecutively, i.e.
first mpirun -np 8
> ./job_1 and then when it finished mpirun -np 8 ./job_2.
Is it common
> behavior?
I would say it is a good result! Dual core processors share
part of the
surrounding hardware even more than dual processors do on PC
boards
(other people on this list may explain better what is
exactly shared by
cores, I mean cache, memory channels etc.) so I think it is
really
unlikely to obtain 100% scaling with dual cores even on
independent
applications, and losing 20% is really acceptable in my
experience.
cheers, Davide
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE - more doubts |
  Germany |
2007-09-27 07:05:18 |
On Thu, 27 Sep 2007, Artur Tyliszczak wrote:
> I am not able to run a job in this way: mpirun n0,1,2,3
./jobname on
> this computer.
What was the content of the hostfile that you passed to
lamboot ?
> On the other hand if I want to run a job: mpirun
c0,1,2,3 ./jobname
> I have to first boot the lam with lamboot hostfile,
where in the
> hostfile I have: localhost cpu=16. Is it a correct
procedure?
The procedure is correct. The usage of 'localhost' in the
hostfile
makes sense only in the case where you want to use this one
node for
the job; if you want to use processors from different nodes,
you need
to use the real nodes names or their IPs.
> I run the job mpirun c0,1,2,3 ./jobname. Does it mean
that I am
> using 2 cores from processor 0 and 2 cores from
processor 1 ?
LAM/MPI doesn't have any idea about the hardware details of
your
computers; if you want more control you should look at Open
MPI.
When LAM/MPI launches more processes on a node to be part of
the same
job (as you seem to try to), it's the OS kernel which
decides where
these processes are run. Recent Linux distributions have
tools to
inform the kernel about your preferences w.r.t. the
placement of
processes on processors - however you should first
understand all the
aspects (f.e. if you use more than one node, do you also
need to bind
the network card to a certain processor ?) before starting
to play
with them.
So, all your examples are going to end running in the same
way - no
real ties between the process and the processor...
> In all above cases when I check the cpu usage by 'top'
then I can
> see four cpu working at 100%. The time of computations
is more or
> less the same in all cases.
... and that's the reason why all the results are the same.
> This is the case when I run the mpirun c0,0,0,0
./jobname on PC with
> 1 dual core processor. Could you explain me this?
That's a total of 2 cores in that node. You are starting 4
processes
which compete for CPU time; each of them will get
approximately 2/4 =
0.5 = 50%.
> Do you know any "proper build options for shared
memory messaging"
> for one of those compilers?
It's more a matter of configuring LAM/MPI rather than using
the
compilers in a special way. This along with other questions
above lead
me to believe that you didn't read too much on the subject.
There is
enough documentation and a FAQ for LAM/MPI which should
explain most
of these issues - do read them.
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches
Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg,
GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu IWR.Uni-Heidelberg.De
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE |
  United States |
2007-09-27 08:31:03 |
Artur Tyliszczak wrote:
> Hi Davide,
>
> Thanks a lot.
>> Hi Artur
>>
>> As it has already been pointed out in this list, if
you have a plain SMP
>> system you should not worry much about on which
socket/core your
>> parallel processes run within a single node,
provided you have a recent
>> enough kernel (multicore/hyperthread aware) it will
probably do a good
>> scheduling job.
> kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough
?
Yes.
> I also did some tests and this is what I observed on 8
dual cores
> workstation (16 cores)
> When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np
8 ./job_2)
> simultaneously then the execution time of each of them
is about 20%
> longer than when I run these jobs consecutively, i.e.
first mpirun -np 8
> ../job_1 and then when it finished mpirun -np 8
./job_2. Is it common
> behavior?
Yes, as Artur pointed out, the jobs will compete with each
other for use
of the memory buss. This is probably done more efficiently
if you
assign the 2 jobs to separate groups of nodes. All job
schedulers used
with MPI have such options. Even then, they are likely to
compete for
access to shared file systems.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| Re: LAM: mpirun on dual cores Opterons
8214HE - more doubts |
  United States |
2007-09-27 08:45:00 |
Artur Tyliszczak wrote:
> Hi Tim,
>
> Thank you Tim for your answer.
>
>
>
>> You don't give enough information to answer, and
perhaps I'm not
>> guessing entirely what you are asking. You could,
of course, go to each
>> node while your job is running, to see how many
processes are running
>> there.
> What do you mean 'go to each node' . Assume that I am
using one PC with
> 8 dual core processors.
For example, log into the node by ssh and run top there.
Elsewhere, you
indicate that top showed all the processes running on a
single node.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
| LAM: Thanks - mpirun on dual cores
Opterons 8214HE |
  Poland |
2007-09-27 10:15:19 |
Hi Bogdan, Tim and David.
Thank you very, very much for your help and explanations.
Now, MPI on dual cores seems to be more clear for me.
Kind regards,
Artur.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
|
|
[1-10]
|
|