List Info

Thread: LAM: Help regarding MPI_Allreduce




LAM: Help regarding MPI_Allreduce
country flaguser name
India
2007-12-15 03:30:31
we are implementing the sparse matrix (n x n) vector (n x
1)
multiplication in parallel where matrix is divided column
wise (equally
divided columns) and vector is divided accordingly. Each
processor
performs local matrix-vector multiplication. A vector of
size (n x 1) is
generated on each processor. To get the final resultant
vector, all the
local vectors are summed.
We have used MPI_Allreduce to collect the final result.
The matrix that we are processing is 39601 x 39601.
The vector of size 39601 is summed using MPI_Allreduce.
We run this code for different no. of processors.
We get a very strange results.
The time required for processors 1 - 19 goes on increasing
and decreases
suddenly for no. of processors 20. After that time remains
constant.

# Processors	Time for MV
1	         0.145209
2	         0.123415
3	         0.142032
4	         0.153569
5	         0.154709
6	         0.167946
7	         0.177953
8	         0.195782
9	         0.190688
10	         0.203196
12	         0.224951
14	         0.252618
16	         0.262348
17	         0.271355
18	         0.286621
19               0.298761
20	         0.105971
22	         0.102517
24	         0.105836
26	         0.102807
28	         0.104299
30	         0.105835
32	         0.10602


We are calculating time using cpu_time (Fortran).

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: Help regarding MPI_Allreduce
user name
2007-12-15 07:12:15
FWIW: Measuring CPU time in parallel is fairly meaningless;
the only  
reasonable measurement is wall clock time.  There's too much
other  
stuff happening in parallel communications that are not
accounted for  
by user CPU time (e.g., communications and time spent in the
kernel).


On Dec 15, 2007, at 4:30 AM, Scientific Computing, ISSC
wrote:

> we are implementing the sparse matrix (n x n) vector (n
x 1)
> multiplication in parallel where matrix is divided
column wise  
> (equally
> divided columns) and vector is divided accordingly.
Each processor
> performs local matrix-vector multiplication. A vector
of size (n x  
> 1) is
> generated on each processor. To get the final resultant
vector, all  
> the
> local vectors are summed.
> We have used MPI_Allreduce to collect the final
result.
> The matrix that we are processing is 39601 x 39601.
> The vector of size 39601 is summed using
MPI_Allreduce.
> We run this code for different no. of processors.
> We get a very strange results.
> The time required for processors 1 - 19 goes on
increasing and  
> decreases
> suddenly for no. of processors 20. After that time
remains constant.
>
> # Processors	Time for MV
> 1	         0.145209
> 2	         0.123415
> 3	         0.142032
> 4	         0.153569
> 5	         0.154709
> 6	         0.167946
> 7	         0.177953
> 8	         0.195782
> 9	         0.190688
> 10	         0.203196
> 12	         0.224951
> 14	         0.252618
> 16	         0.262348
> 17	         0.271355
> 18	         0.286621
> 19               0.298761
> 20	         0.105971
> 22	         0.102517
> 24	         0.105836
> 26	         0.102807
> 28	         0.104299
> 30	         0.105835
> 32	         0.10602
>
>
> We are calculating time using cpu_time (Fortran).
>
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/


-- 
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )