List Info

Thread: LAM: mpirun on dual cores Opterons 8214HE




LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
Poland
2007-09-26 11:47:09
Dear All,

We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
If I run a job using mpirun (mpirun -np 4 ./jobname) how can
I know
whether my job is running on 2 physical processors (4 cores)
or
separate 4 processors (also 4 cores but from different
processors)?

Is is there any way to chose one of the above option by some
mpirun options?
What is more efficient: 2 cores on 2 processors OR 4 cores
on 4 separate
processors?

Thank you in advance.
Artur T.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
United States
2007-09-26 16:23:02
Artur Tyliszczak wrote:
> Dear All,
>
> We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
> If I run a job using mpirun (mpirun -np 4 ./jobname)
how can I know
> whether my job is running on 2 physical processors (4
cores) or
> separate 4 processors (also 4 cores but from different
processors)?
>
> Is is there any way to chose one of the above option by
some mpirun options?
> What is more efficient: 2 cores on 2 processors OR 4
cores on 4 separate
> processors?
>   
You don't give enough information to answer, and perhaps I'm
not 
guessing entirely what you are asking.  You could, of
course, go to each 
node while your job is running, to see how many processes
are running 
there.  lam mpi provides a facility to control how many
processes you 
assign to each node.  Within each node, you would require an
affinity 
specification, as by using taskset, to assure an efficient
placement of 
processes.
If your application doesn't spend much time in message
passing, it would 
likely run faster with 1 process per socket, but in most
cases that is 
not an efficient way to use a cluster.  OTOH, lam, with
observance of 
the -O option or proper build options for shared memory
messaging, will 
use shared memory effectively for message passing within
each node.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
Italy
2007-09-27 03:21:27
Artur Tyliszczak ha scritto:
> Dear All,
> 
> We have 8 dual cores Opterons 8214HE cluster (total: 32
cores).
> If I run a job using mpirun (mpirun -np 4 ./jobname)
how can I know
> whether my job is running on 2 physical processors (4
cores) or
> separate 4 processors (also 4 cores but from different
processors)?
> 
> Is is there any way to chose one of the above option by
some mpirun options?
> What is more efficient: 2 cores on 2 processors OR 4
cores on 4 separate
> processors?
> 
> Thank you in advance.
> Artur T.
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
> 

Hi Artur

As it has already been pointed out in this list, if you have
a plain SMP 
system you should not worry much about on which socket/core
your 
parallel processes run within a single node, provided you
have a recent 
enough kernel (multicore/hyperthread aware) it will probably
do a good 
scheduling job.

Concerning the way to spread multiple processes on different
nodes, it 
may surely depend on the application, I would add to Tim's
answer that I 
saw some benchmarks where the same MPI application (whose
performance 
bottlencks are memory access and communication) had the
following 
behavior (supposing to have dual processor*dual core
nodes):

on Opteron systems maximum performance was obtained with 4
MPI processes 
per node, thus using all the cores =>communication
bottleneck

on Xeon systems the maxiumum performance was reached using
double number 
of nodes but starting only 2 processes per node, even if
some other 
process were running on the same nodes =>memory access
bottleneck

best regards, Davide

-- 
__________________________________________________________
Davide Cesari	ARPA-Servizio Idro Meteorologico      __
  tel       (39) 051/525926                            ||
  fax       (39) 051/6497501                           |||
  e-mail    dcesariarpa.emr.it                        |||/
  www       http://www.arpa.emr.it/sim
                 ---
  Address:  ARPA-SIM, Viale Silvani 6, 40122 Bologna, Italy
__________________________________________________________
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE - more doubts
country flaguser name
Poland
2007-09-27 05:26:02
Hi Tim,

Thank you Tim for your answer.

  

> You don't give enough information to answer, and
perhaps I'm not 
> guessing entirely what you are asking.  You could, of
course, go to each 
> node while your job is running, to see how many
processes are running 
> there.  
What do you mean 'go to each node' . Assume that I am using
one PC with
8 dual core processors.
So, if I understand MPI vocabulary correctly we have 1 node
with 8 dual
core processors.
I am not able to run a job in this way: mpirun n0,1,2,3
./jobname on
this computer. On the other hand if I want to run a job:
mpirun c0,1,2,3
./jobname
I have to first boot the lam with lamboot hostfile, where in
the
hostfile I have: localhost cpu=16. Is it a correct
procedure?

Assume that I stared lam: lamboot hostfile, where in the
hostfile I
have: localhost cpu=16
I run the job mpirun c0,1,2,3 ./jobname.  Does it mean that
I am using 2
cores from processor 0 and 2 cores from processor 1 ?
I run the job mpirun c0,2,4,6 ./jobname.  Does it mean that
I am using 1
core from processors 0,1,2,3 ?
And finally what in case: mpirun c0,0,0,0 ./jobname  ??

In all above cases  when I check the cpu usage by 'top' 
then I can see
four cpu working at 100%. The time of computations is more
or less the
same in all cases.
However in case: mpirun c0,0,0,0 ./jobname I would expect to
see 4 times
~50% because I explicitly run job on 1 cpu with 2 cores . Am
I correct?
This is the case when I run the mpirun c0,0,0,0 ./jobname on
PC with 1
dual core processor. Could you explain me this?
>  OTOH, lam, with observance of 
> the -O option or proper build options for shared memory
messaging, will 
> use shared memory effectively for message passing
within each node.
I am using Suse 10.2 with fortran compilers from: Intel, 
Pathscale and
Gfortran compiler also. Do you know any "proper build
options for shared
memory messaging" for one of those compilers?

Thank you once again and thank you in advance for answering
to the above
doubts.

Kind regards,
Artur.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
Poland
2007-09-27 06:10:39
Hi Davide,

Thanks a lot.
> Hi Artur
>
> As it has already been pointed out in this list, if you
have a plain SMP 
> system you should not worry much about on which
socket/core your 
> parallel processes run within a single node, provided
you have a recent 
> enough kernel (multicore/hyperthread aware) it will
probably do a good 
> scheduling job.
kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough ?


> Concerning the way to spread multiple processes on
different nodes, it 
> may surely depend on the application, I would add to
Tim's answer that I 
> saw some benchmarks where the same MPI application
(whose performance 
> bottlencks are memory access and communication) had the
following 
> behavior (supposing to have dual processor*dual core
nodes):
>
> on Opteron systems maximum performance was obtained
with 4 MPI processes 
> per node, thus using all the cores =>communication
bottleneck
>
> on Xeon systems the maxiumum performance was reached
using double number 
> of nodes but starting only 2 processes per node, even
if some other 
> process were running on the same nodes =>memory
access bottleneck
>
> best regards, Davide
>
>   
I also did some tests and this is what I observed on 8 dual
cores
workstation (16 cores)
When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np 8
./job_2)
simultaneously then the execution time of each of them is
about 20%
longer than when I run these jobs consecutively, i.e. first
mpirun -np 8
./job_1 and then when it finished mpirun -np 8 ./job_2.  Is
it common
behavior?
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
Italy
2007-09-27 06:52:57
Artur Tyliszczak ha scritto:
> Hi Davide,
> 
> Thanks a lot.
> 
>>Hi Artur
>>
>>As it has already been pointed out in this list, if
you have a plain SMP 
>>system you should not worry much about on which
socket/core your 
>>parallel processes run within a single node,
provided you have a recent 
>>enough kernel (multicore/hyperthread aware) it will
probably do a good 
>>scheduling job.
> 
> kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough
?

Yes, I think so.

> I also did some tests and this is what I observed on 8
dual cores
> workstation (16 cores)
> When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np
8 ./job_2)
> simultaneously then the execution time of each of them
is about 20%
> longer than when I run these jobs consecutively, i.e.
first mpirun -np 8
> ./job_1 and then when it finished mpirun -np 8 ./job_2.
 Is it common
> behavior?

I would say it is a good result! Dual core processors share
part of the 
surrounding hardware even more than dual processors do on PC
boards 
(other people on this list may explain better what is
exactly shared by 
cores, I mean cache, memory channels etc.) so I think it is
really 
unlikely to obtain 100% scaling with dual cores even on
independent 
applications, and losing 20% is really acceptable in my
experience.
	cheers, Davide
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE - more doubts
country flaguser name
Germany
2007-09-27 07:05:18
On Thu, 27 Sep 2007, Artur Tyliszczak wrote:

> I am not able to run a job in this way: mpirun n0,1,2,3
./jobname on 
> this computer.

What was the content of the hostfile that you passed to
lamboot ?

> On the other hand if I want to run a job: mpirun
c0,1,2,3 ./jobname 
> I have to first boot the lam with lamboot hostfile,
where in the 
> hostfile I have: localhost cpu=16. Is it a correct
procedure?

The procedure is correct. The usage of 'localhost' in the
hostfile 
makes sense only in the case where you want to use this one
node for 
the job; if you want to use processors from different nodes,
you need 
to use the real nodes names or their IPs.

> I run the job mpirun c0,1,2,3 ./jobname.  Does it mean
that I am 
> using 2 cores from processor 0 and 2 cores from
processor 1 ?

LAM/MPI doesn't have any idea about the hardware details of
your 
computers; if you want more control you should look at Open
MPI.

When LAM/MPI launches more processes on a node to be part of
the same 
job (as you seem to try to), it's the OS kernel which
decides where 
these processes are run. Recent Linux distributions have
tools to 
inform the kernel about your preferences w.r.t. the
placement of 
processes on processors - however you should first
understand all the 
aspects (f.e. if you use more than one node, do you also
need to bind 
the network card to a certain processor ?) before starting
to play 
with them.

So, all your examples are going to end running in the same
way - no 
real ties between the process and the processor...

> In all above cases when I check the cpu usage by 'top'
then I can 
> see four cpu working at 100%. The time of computations
is more or 
> less the same in all cases.

... and that's the reason why all the results are the same.

> This is the case when I run the mpirun c0,0,0,0
./jobname on PC with 
> 1 dual core processor. Could you explain me this?

That's a total of 2 cores in that node. You are starting 4
processes 
which compete for CPU time; each of them will get
approximately 2/4 = 
0.5 = 50%.

> Do you know any "proper build options for shared
memory messaging" 
> for one of those compilers?

It's more a matter of configuring LAM/MPI rather than using
the 
compilers in a special way. This along with other questions
above lead 
me to believe that you didn't read too much on the subject.
There is 
enough documentation and a FAQ for LAM/MPI which should
explain most 
of these issues - do read them.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches
Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg,
GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.CostescuIWR.Uni-Heidelberg.De
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE
country flaguser name
United States
2007-09-27 08:31:03
Artur Tyliszczak wrote:
> Hi Davide,
> 
> Thanks a lot.
>> Hi Artur
>>
>> As it has already been pointed out in this list, if
you have a plain SMP 
>> system you should not worry much about on which
socket/core your 
>> parallel processes run within a single node,
provided you have a recent 
>> enough kernel (multicore/hyperthread aware) it will
probably do a good 
>> scheduling job.
> kernel 2.6.18.8-0.5 with suse 10.2 is it recent enough
?

Yes.
> I also did some tests and this is what I observed on 8
dual cores
> workstation (16 cores)
> When I run 2 jobs (mpirun -np 8 ./job_1 and mpirun -np
8 ./job_2)
> simultaneously then the execution time of each of them
is about 20%
> longer than when I run these jobs consecutively, i.e.
first mpirun -np 8
> ../job_1 and then when it finished mpirun -np 8
./job_2.  Is it common
> behavior?

Yes, as Artur pointed out, the jobs will compete with each
other for use
of the memory buss.  This is probably done more efficiently
if you
assign the 2 jobs to separate groups of nodes.  All job
schedulers used
with MPI have such options.  Even then, they are likely to
compete for
access to shared file systems.

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: mpirun on dual cores Opterons 8214HE - more doubts
country flaguser name
United States
2007-09-27 08:45:00
Artur Tyliszczak wrote:
> Hi Tim,
> 
> Thank you Tim for your answer.
> 
>   
> 
>> You don't give enough information to answer, and
perhaps I'm not 
>> guessing entirely what you are asking.  You could,
of course, go to each 
>> node while your job is running, to see how many
processes are running 
>> there.  
> What do you mean 'go to each node' . Assume that I am
using one PC with
> 8 dual core processors.

For example, log into the node by ssh and run top there. 
Elsewhere, you
indicate that top showed all the processes running on a
single node.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

LAM: Thanks - mpirun on dual cores Opterons 8214HE
country flaguser name
Poland
2007-09-27 10:15:19
Hi Bogdan, Tim and David.

Thank you very, very much for your help and explanations.
Now, MPI on dual cores seems to be more clear for me.

Kind regards,
Artur.
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )