The examples that follow apply to both interactive and batch
submissions. For brevity, only batch submissions are described. Usage
of the LSF checkpoint and restart capabilities, requiring echkpnt and erestart, are described
as follows:
Serial 3D Fluent batch job under LSF with checkpoint/restart
fluent 3d -g -ijournal_file -scheduler=lsf -scheduler_opt='-k " /home/username 60"' -scheduler_opt='-a fluent'In this example, the LSF
-a fluentspecification identifies whichechkpnt/erestartcombination to use,/home/usernameis the checkpoint directory, and the duration between automatic checkpoints is 60 minutes.
The following commands can then be used:
bjobs -l<job_ID>This command returns the job information about <job_ID> in the LSF system.
bchkpnt<job_ID>This command forces Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=option in the originalfluentcommand.Fluent then continues to iterate.
bchkpnt -k<job_ID>This command forces Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID> and then Fluent exits. The <checkpoint_directory> is defined through the
-scheduler_opt=option in the originalfluentcommand.
brestart<checkpoint_directory> <job_ID>This command starts a Fluent job using the latest case and data files in the <checkpoint_directory>/<job_ID> directory.
The restart journal file <checkpoint_directory>/<job_ID>/
#restart.inpis used to instruct Fluent to read the latest case and data files in that directory and continue iterating.
Parallel 3D Fluent batch job under LSF with checkpoint/restart, which specifies
/home/usernameas the checkpoint directory, uses 4 processes, and reads a journal file calledjournal_filefluent 3d -t4 -g -ijournal_file -scheduler=lsf -scheduler_opt='-k " /home/username"' -scheduler_opt='-a fluent'The following commands can then be used:
bjobs -l<job_ID>This command returns the job information about <job_ID> in the LSF system.
bchkpnt<job_ID>This command forces parallel Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=option in the originalfluentcommand.Parallel Fluent then continues to iterate.
bchkpnt -k<job_ID>This command forces parallel Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=option in the originalfluentcommand.Parallel Fluent then exits.
brestart<checkpoint_directory> <job_ID>This command starts a Fluent network parallel job using the latest case and data files in the <checkpoint_directory>/<job_ID> directory.
The restart journal file <checkpoint_directory>/<job_ID>/#restart.inp is used to instruct Fluent to read the latest case and data files in that directory and continue iterating.
The parallel job will be restarted using the same number of processes as that specified through the
-t<x> option in the originalfluentcommand (4 in the previous example).
bmig -m<host>0This command checkpoints all jobs (indicated by
0job ID) for the current user and moves them to host <host>.