The examples that follow apply to both interactive and batch
submissions. For brevity, only batch submissions are described. Usage
of the LSF checkpoint and restart capabilities, requiring echkpnt
and erestart
, are described
as follows:
Serial 3D Fluent batch job under LSF with checkpoint/restart
fluent 3d -g -i
journal_file -scheduler=lsf -scheduler_opt='-k " /home/username 60"' -scheduler_opt='-a fluent'
In this example, the LSF
-a fluent
specification identifies whichechkpnt
/erestart
combination to use,/home/username
is the checkpoint directory, and the duration between automatic checkpoints is 60 minutes.
The following commands can then be used:
bjobs -l
<job_ID>This command returns the job information about <job_ID> in the LSF system.
bchkpnt
<job_ID>This command forces Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=
option in the originalfluent
command.Fluent then continues to iterate.
bchkpnt -k
<job_ID>This command forces Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID> and then Fluent exits. The <checkpoint_directory> is defined through the
-scheduler_opt=
option in the originalfluent
command.
brestart
<checkpoint_directory> <job_ID>This command starts a Fluent job using the latest case and data files in the <checkpoint_directory>/<job_ID> directory.
The restart journal file <checkpoint_directory>/<job_ID>/
#restart.inp
is used to instruct Fluent to read the latest case and data files in that directory and continue iterating.
Parallel 3D Fluent batch job under LSF with checkpoint/restart, which specifies
/home/username
as the checkpoint directory, uses 4 processes, and reads a journal file calledjournal_file
fluent 3d -t4 -g -i
journal_file -scheduler=lsf -scheduler_opt='-k " /home/username"' -scheduler_opt='-a fluent'
The following commands can then be used:
bjobs -l
<job_ID>This command returns the job information about <job_ID> in the LSF system.
bchkpnt
<job_ID>This command forces parallel Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=
option in the originalfluent
command.Parallel Fluent then continues to iterate.
bchkpnt -k
<job_ID>This command forces parallel Fluent to write a case file, a data file, and a restart journal file at the end of its current iteration.
The files are saved in a directory named <checkpoint_directory>/<job_ID>. The <checkpoint_directory> is defined through the
-scheduler_opt=
option in the originalfluent
command.Parallel Fluent then exits.
brestart
<checkpoint_directory> <job_ID>This command starts a Fluent network parallel job using the latest case and data files in the <checkpoint_directory>/<job_ID> directory.
The restart journal file <checkpoint_directory>/<job_ID>/#restart.inp is used to instruct Fluent to read the latest case and data files in that directory and continue iterating.
The parallel job will be restarted using the same number of processes as that specified through the
-t
<x> option in the originalfluent
command (4 in the previous example).
bmig -m
<host>0
This command checkpoints all jobs (indicated by
0
job ID) for the current user and moves them to host <host>.