List Info

Thread: LAM: MPI_Recv getting garbage




LAM: MPI_Recv getting garbage
user name
2006-05-26 12:41:14
I think that your mpiStatus is not large enough -- it is
supposed to be
an integer array of size MPI_STATUS_SIZE.

Try changing that and see if that resolves your problem.



> -----Original Message-----
> From: lam-bounceslam-mpi.org 
> [mailto:lam-bounceslam-mpi.org] On Behalf Of
Adams Samuel D 
> Contr AFRL/HEDR
> Sent: Tuesday, May 23, 2006 3:16 PM
> To: 'General LAM/MPI mailing list'
> Subject: LAM: MPI_Recv getting garbage
> 
> For some reason I am some weird values when I am using
MPI_Recv.  To
> simplify the debugging I just send 10 MPI_REALs from
1.0 to 
> 10.0.  The first
> one comes in as 0.0, the next one comes as garbage, and
the rest are
> correct.  Let me first mention that I am not really a
regular fortran
> programmer, so there could be easily something I am
doing 
> wrong with my
> fortran, but I don't understand why I am getting crap
on my 
> MPI_Recv calls.
> I am not sure if this is a MPI problem, or the more
likely 
> case of a problem
> with my code.  It seems like this should be really
simple.  
> 
> ---------------------------code section that is giving 
> problems-------------
> subroutine getArbPulseArray()
>    use ps_parameters
>    use commona
>    use aitoc
>    implicit none
>    
>    integer :: pulseStatus, mpiStatus, i
>    real, dimension(10) :: r_arr
>    real :: real1, real2, real3
> 
>    write(*,*)"checking for a pulse... array that
is!"
>    call MPI_Recv(pulseStatus, 1, MPI_INTEGER, 0, 0,
MPI_COMM_WORLD,
> mpiStatus, ierr)
>    write(*,*)"   -0 got ", pulseStatus,
" my rank ", my_rank
>    if(pulseStatus.eq.-2) then
>       write(*,*)"   -no pulse found! 
Abort!"
>       call MPI_Abort(MPI_COMM_WORLD, ierr)
>    else if(pulseStatus.eq.-1) then
>       write(*,*)"   -not an arbitrary pulse
problem"
>    else if(pulseStatus.eq.0) then
>       write(*,*)"   -getting pulse array"
>       call MPI_Recv(real1, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, 
> mpiStatus,
> ierr)
>       write(*,*)"   -1 got ", real1,
" my rank ", my_rank, " 
> ierror = ",
> ierr
>       call MPI_Recv(real2, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, 
> mpiStatus,
> ierr)
>       write(*,*)"   -2 got ",real2, "
my rank ", my_rank, " 
> ierror = ", ierr
>       call MPI_Recv(real3, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, 
> mpiStatus,
> ierr)
> ------------------the sender 
> guy-------------------------------------------
> subroutine readArbPulseFile(fileName)
>    use ps_parameters
>    use commona
>    use aitoc
>    implicit none   
> 
>    character*40 :: fileName
>    integer :: ioUnit = 101
>    integer :: returnStatus, i
>    real, dimension(10) :: r_arr = (/ (i, i=1,10) /)
>    open(unit=ioUnit, file=fileName,
status="old", iostat=returnStatus,
> form="formatted",
action="read")
>    if(returnStatus.ne.0) then
>       write(*,*)"error: could not open file
", fileName, " (error ",
> returnStatus, ")"
>       write(*,*)"   -killing processors"
>       call sendInt(-2)
>       write(*,*)"   -aborting"
>       call MPI_Abort(MPI_COMM_WORLD, ierr)
>    end if
>    call sendInt(0)
>    do i = 1, 10
>       call sendReal(r_arr(i))
>    end do
>    write(*,*)"everything was good with the
root."
> end subroutine
>
------------------output------------------------------------
--
--------------
> Script started on Tue 23 May 2006 03:57:26 PM CDT
> ]0;jnorredcooper:~/fdtd/test_files
>  mpirun -np 3 ../fdtd <modelsphere.dat -ifile ps.dat
 -tfile 
> tissue.txt -air
> 20 -ps
> 
>  variable declaration complete, calling
init_permit_calc
>  init_permit_calc complete, initializing MPI layer
>  variable declaration complete, calling
init_permit_calc
>  init_permit_calc complete, initializing MPI layer
>  variable declaration complete, calling
init_permit_calc
>  init_permit_calc complete, initializing MPI layer
>  MPI Layer initialized, processing command line
parameters
>  
>  Running as point source
>  Program continues WITHOUT master-node calculating. 
>  
>  command line parameters processed, calling readparams
>  checking for a pulse... array that is!
>  checking for a pulse... array that is!
>  Log files prefix will be Sphere101ABC10air            
          
>  --sending  0  to  1
>  --sending  0  to  2
>     -0 got  0  my rank  1
>     -getting pulse array
>     -0 got  0  my rank  2
>     -getting pulse array
>  --sending  1.00000000  to  1
>  --sending  1.00000000  to  2
>     -1 got  0.00000000E+00  my rank  1  ierror =  0
>  --sending  2.00000000  to  1
>  --sending  2.00000000  to  2
>     -1 got  0.00000000E+00  my rank  2  ierror =  0
>  --sending  3.00000000  to  1
>     -2 got  5.60519386E-45  my rank  1  ierror =  0
>     -2 got  5.60519386E-45  my rank  2  ierror =  0
>  --sending  3.00000000  to  2
>  --sending  4.00000000  to  1
>     -3 got  3.00000000  my rank  1  ierror =  0
>     -3 got  3.00000000  my rank  2  ierror =  0
>  --sending  4.00000000  to  2
>     -4 got  4.00000000  my rank  1  ierror =  0
>  --sending  5.00000000  to  1
>  --sending  5.00000000  to  2
>     -5 got  5.00000000  my rank  1  ierror =  0
>  --sending  6.00000000  to  1
>     -4 got  4.00000000  my rank  2  ierror =  0
>  --sending  6.00000000  to  2
>     -5 got  5.00000000  my rank  2  ierror =  0
>     -6 got  6.00000000  my rank  1  ierror =  0
>     -6 got  6.00000000  my rank  2  ierror =  0
>  --sending  7.00000000  to  1
>  --sending  7.00000000  to  2
>     -7 got  7.00000000  my rank  1  ierror =  0
>  --sending  8.00000000  to  1
>     -7 got  7.00000000  my rank  2  ierror =  0
>  --sending  8.00000000  to  2
>     -8 got  8.00000000  my rank  1  ierror =  0
>  --sending  9.00000000  to  1
>  --sending  9.00000000  to  2
>     -8 got  8.00000000  my rank  2  ierror =  0
>  --sending  10.0000000  to  1
>     -9 got  9.00000000  my rank  2  ierror =  0
>  --sending  10.0000000  to  2
>  everything was good with the root.
>     -9 got  9.00000000  my rank  1  ierror =  0
>     -10 got  10.0000000  my rank  2  ierror =  0
>     -10 got  10.0000000  my rank  1  ierror =  0
>  everything was good with the node 1
>  everything was good with the node 2
> jwe0019i-u The program was terminated abnormally with
signal 
> number SIGSEGV.
> 
> error summary (Fortran)
> error number  error level  error count
>   jwe0019i         u           1      
> total error count = 1
> MPI_Recv: process in local group is dead (rank 1,
MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 2,
MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (2, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - main()
> Rank (2, MPI_COMM_WORLD):  - main()
>
------------------------------------------------------------
--
> --------------
> One of the processes started by mpirun has exited with
a nonzero exit
> code.  This typically indicates that the process
finished in error.
> If your process did not finish in error, be sure to
include a "return
> 0" or "exit(0)" in your C code before
exiting the application.
> 
> PID 8581 failed on node n0 (127.0.0.1) with exit status
240.
>
------------------------------------------------------------
--
> --------------
> ]0;jnorredcooper:~/fdtd/test_files
> [jnorredcooper test_files]$ exit
> exit
> 
> Script done on Tue 23 May 2006 03:57:39 PM CDT
> 
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
> 
> 
> 

_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )