List Info

Thread: LAM: Error in checkpointing file inloadleveler




LAM: Error in checkpointing file inloadleveler
country flaguser name
India
2007-03-13 04:03:59
Hai everybody
 
Iam using AIX Loadleveler3.1 for checkpointing my simple serial  application.The problem is while generating ckeckpoint file.It generates ckpt file name with extension .err(ckptname.err).when restarted_from_ckpt is set to yes in job command file and run the job ,the node simply remove the job from the queue and i could not get output file.
 
         ;           ;          I am posting my job command file and application here.Please say if anybody knows what is the problem for not generating ckpt file in correct format,how to debug the problem.Tnx in advance
 
 
My job command file
 
# For First.c
# job_type = serial
# executable = first
# output = stp.out
# error = stp.err
# class = general
# checkpoint = yes
# restart_from_ckpt = yes
# ckpt_dir = /home/rtsg/crypt/ramakrishna/trial/ex/
# ckpt_file = stp.ckpt
# restart_on_same_nodes = yes
# requirements = Machine == "tf04"
# wall_clock_limit = 5:00:00,4:30:00
# queue
 
My application
 
#include<stdio.h>
#include "llapi.h"
int main()
{
 int i;
 LL_ckpt_info ckpt_info;
 cr_error_t cp_error1;
 
 ckpt_info.version = LL_API_VERSION;
 ;ckpt_info.step_id = NULL;
 ckpt_info.ckptType=NULL;
 ckpt_info.waitType=NULL;
 ckpt_info.abort_sig=NULL;
 ckpt_info.cp_error_data=&cp_error1;
 ckpt_info.ckpt_rc=0;
 ckpt_info.soft_limit=0;
 ;ckpt_info.hard_limit=0;
 for(i=1;i<4000;i++)
&nbsp;{
&nbsp; printf("%dn",i);
 &nbsp;if(i==2000)
   ;ll_init_ckpt(&ckpt_info );
 }
 return 0;
}

;


Looking for earth-friendly autos?
Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
Re: LAM: Error in checkpointing file inloadleveler
country flaguser name
United States
2007-03-22 12:20:21
This seems more like a question for the AIX
checkpoint/restart team.  
You should contact them directly as this list is for LAM/MPI
specific  
discussion.

Good luck,
Josh

On Mar 13, 2007, at 5:03 AM, rama krishna wrote:

> Hai everybody
>
> Iam using AIX Loadleveler3.1 for checkpointing my
simple serial   
> application.The problem is while generating ckeckpoint
file.It  
> generates ckpt file name with extension
.err(ckptname.err).when  
> restarted_from_ckpt is set to yes in job command file
and run the  
> job ,the node simply remove the job from the queue and
i could not  
> get output file.
>
>                               I am posting my job
command file and  
> application here.Please say if anybody knows what is
the problem  
> for not generating ckpt file in correct format,how to
debug the  
> problem.Tnx in advance
>
>
> My job command file
>
> # For First.c
> #  job_type = serial
> #  executable = first
> #  output = stp.out
> #  error = stp.err
> #  class = general
> #  checkpoint = yes
> #  restart_from_ckpt = yes
> #  ckpt_dir = /home/rtsg/crypt/ramakrishna/trial/ex/
> #  ckpt_file = stp.ckpt
> #  restart_on_same_nodes = yes
> #  requirements = Machine == "tf04"
> #  wall_clock_limit = 5:00:00,4:30:00
> #  queue
>
> My application
>
> #include<stdio.h>
> #include "llapi.h"
> int main()
> {
>  int i;
>  LL_ckpt_info ckpt_info;
>  cr_error_t cp_error1;
>
>  ckpt_info.version = LL_API_VERSION;
>  ckpt_info.step_id = NULL;
>  ckpt_info.ckptType=NULL;
>  ckpt_info.waitType=NULL;
>  ckpt_info.abort_sig=NULL;
>  ckpt_info.cp_error_data=&cp_error1;
>  ckpt_info.ckpt_rc=0;
>  ckpt_info.soft_limit=0;
>  ckpt_info.hard_limit=0;
>  for(i=1;i<4000;i++)
>  {
>   printf("%dn",i);
>   if(i==2000)
>    ll_init_ckpt(&ckpt_info );
>  }
>  return 0;
> }
>
> Looking for earth-friendly autos?
> Browse Top Cars by "Green Rating" at Yahoo!
Autos' Green Center.
> _______________________________________________
> This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

----
Josh Hursey
jjhurseyopen-mpi.org
http://www.open-mpi.org/


_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )