List Info

Thread: LAM: MPI_Recv getting garbage




LAM: MPI_Recv getting garbage
user name
2006-05-23 19:15:36
For some reason I am some weird values when I am using
MPI_Recv.  To
simplify the debugging I just send 10 MPI_REALs from 1.0 to
10.0.  The first
one comes in as 0.0, the next one comes as garbage, and the
rest are
correct.  Let me first mention that I am not really a
regular fortran
programmer, so there could be easily something I am doing
wrong with my
fortran, but I don't understand why I am getting crap on my
MPI_Recv calls.
I am not sure if this is a MPI problem, or the more likely
case of a problem
with my code.  It seems like this should be really simple.  

---------------------------code section that is giving
problems-------------
subroutine getArbPulseArray()
   use ps_parameters
   use commona
   use aitoc
   implicit none
   
   integer :: pulseStatus, mpiStatus, i
   real, dimension(10) :: r_arr
   real :: real1, real2, real3

   write(*,*)"checking for a pulse... array that
is!"
   call MPI_Recv(pulseStatus, 1, MPI_INTEGER, 0, 0,
MPI_COMM_WORLD,
mpiStatus, ierr)
   write(*,*)"   -0 got ", pulseStatus, "
my rank ", my_rank
   if(pulseStatus.eq.-2) then
      write(*,*)"   -no pulse found!  Abort!"
      call MPI_Abort(MPI_COMM_WORLD, ierr)
   else if(pulseStatus.eq.-1) then
      write(*,*)"   -not an arbitrary pulse
problem"
   else if(pulseStatus.eq.0) then
      write(*,*)"   -getting pulse array"
      call MPI_Recv(real1, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, mpiStatus,
ierr)
      write(*,*)"   -1 got ", real1, " my
rank ", my_rank, " ierror = ",
ierr
      call MPI_Recv(real2, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, mpiStatus,
ierr)
      write(*,*)"   -2 got ",real2, " my
rank ", my_rank, " ierror = ", ierr
      call MPI_Recv(real3, 1, MPI_REAL, 0, 0,
MPI_COMM_WORLD, mpiStatus,
ierr)
------------------the sender
guy-------------------------------------------
subroutine readArbPulseFile(fileName)
   use ps_parameters
   use commona
   use aitoc
   implicit none   

   character*40 :: fileName
   integer :: ioUnit = 101
   integer :: returnStatus, i
   real, dimension(10) :: r_arr = (/ (i, i=1,10) /)
   open(unit=ioUnit, file=fileName,
status="old", iostat=returnStatus,
form="formatted", action="read")
   if(returnStatus.ne.0) then
      write(*,*)"error: could not open file ",
fileName, " (error ",
returnStatus, ")"
      write(*,*)"   -killing processors"
      call sendInt(-2)
      write(*,*)"   -aborting"
      call MPI_Abort(MPI_COMM_WORLD, ierr)
   end if
   call sendInt(0)
   do i = 1, 10
      call sendReal(r_arr(i))
   end do
   write(*,*)"everything was good with the
root."
end subroutine
------------------output------------------------------------
----------------
Script started on Tue 23 May 2006 03:57:26 PM CDT
]0;jnorredcooper:~/fdtd/test_files
 mpirun -np 3 ../fdtd <modelsphere.dat -ifile ps.dat 
-tfile tissue.txt -air
20 -ps

 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 MPI Layer initialized, processing command line parameters
 
 Running as point source
 Program continues WITHOUT master-node calculating. 
 
 command line parameters processed, calling readparams
 checking for a pulse... array that is!
 checking for a pulse... array that is!
 Log files prefix will be Sphere101ABC10air                 
     
 --sending  0  to  1
 --sending  0  to  2
    -0 got  0  my rank  1
    -getting pulse array
    -0 got  0  my rank  2
    -getting pulse array
 --sending  1.00000000  to  1
 --sending  1.00000000  to  2
    -1 got  0.00000000E+00  my rank  1  ierror =  0
 --sending  2.00000000  to  1
 --sending  2.00000000  to  2
    -1 got  0.00000000E+00  my rank  2  ierror =  0
 --sending  3.00000000  to  1
    -2 got  5.60519386E-45  my rank  1  ierror =  0
    -2 got  5.60519386E-45  my rank  2  ierror =  0
 --sending  3.00000000  to  2
 --sending  4.00000000  to  1
    -3 got  3.00000000  my rank  1  ierror =  0
    -3 got  3.00000000  my rank  2  ierror =  0
 --sending  4.00000000  to  2
    -4 got  4.00000000  my rank  1  ierror =  0
 --sending  5.00000000  to  1
 --sending  5.00000000  to  2
    -5 got  5.00000000  my rank  1  ierror =  0
 --sending  6.00000000  to  1
    -4 got  4.00000000  my rank  2  ierror =  0
 --sending  6.00000000  to  2
    -5 got  5.00000000  my rank  2  ierror =  0
    -6 got  6.00000000  my rank  1  ierror =  0
    -6 got  6.00000000  my rank  2  ierror =  0
 --sending  7.00000000  to  1
 --sending  7.00000000  to  2
    -7 got  7.00000000  my rank  1  ierror =  0
 --sending  8.00000000  to  1
    -7 got  7.00000000  my rank  2  ierror =  0
 --sending  8.00000000  to  2
    -8 got  8.00000000  my rank  1  ierror =  0
 --sending  9.00000000  to  1
 --sending  9.00000000  to  2
    -8 got  8.00000000  my rank  2  ierror =  0
 --sending  10.0000000  to  1
    -9 got  9.00000000  my rank  2  ierror =  0
 --sending  10.0000000  to  2
 everything was good with the root.
    -9 got  9.00000000  my rank  1  ierror =  0
    -10 got  10.0000000  my rank  2  ierror =  0
    -10 got  10.0000000  my rank  1  ierror =  0
 everything was good with the node 1
 everything was good with the node 2
jwe0019i-u The program was terminated abnormally with signal
number SIGSEGV.

error summary (Fortran)
error number  error level  error count
  jwe0019i         u           1      
total error count = 1
MPI_Recv: process in local group is dead (rank 1,
MPI_COMM_WORLD)
MPI_Recv: process in local group is dead (rank 2,
MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
Rank (2, MPI_COMM_WORLD):  - MPI_Recv()
Rank (1, MPI_COMM_WORLD):  - main()
Rank (2, MPI_COMM_WORLD):  - main()
------------------------------------------------------------
----------------
One of the processes started by mpirun has exited with a
nonzero exit
code.  This typically indicates that the process finished in
error.
If your process did not finish in error, be sure to include
a "return
0" or "exit(0)" in your C code before
exiting the application.

PID 8581 failed on node n0 (127.0.0.1) with exit status 240.
------------------------------------------------------------
----------------
]0;jnorredcooper:~/fdtd/test_files
[jnorredcooper test_files]$ exit
exit

Script done on Tue 23 May 2006 03:57:39 PM CDT

Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )