GE Commands for Information About Jobs and Cluster Configuration

The following SGE commands are especially useful for getting information about the cluster configuration or for getting information about running or completed jobs. This list only contains a few of the most common commands. Consult the SGE man pages for a complete list and more details.

qconf -help: The first line displays the SGE version

qacct -j job-id: Displays a log of the completed job with id job-id (if accounting is enabled)

qstat -j job-id: Displays a log of the running job with id job-id

qconf -sc: Show all complex attributes

qconf -spl: Show a list of all parallel environments

qconf -sp pe-name: Show details of parallel environment named pe-name

qconf -sql: Show a list of all queues

qconf -sq queue-name: Show details of queue named queue-name

qconf -sconf: Show configurations

Submitting Ansys Electromagnetics SGE Batch Jobs

The SGE qsub command may be used to submit Ansys Electromagnetics jobs. Typical command formats are:

qsub qsub_argsansysEM_exeansys_args

qsub qsub_argsjob_script

qsub qsub_args[ -]

where:

In the first format, the Ansys Electromagnetics desktop command and its arguments are specified on the qsub command line. In the second format, the pathname of a shell script containing the Ansys Electromagnetics desktop command and its arguments is specified on the qsub command line. In the third format, the command is omitted or replaced with a hyphen; this indicates that the command or script will be taken from stdin.

Quoting Ansys Electromagnetics Command or Arguments for SGE

If the Ansys Electromagnetics tool executable pathname (ansys_exe) or any of the arguments of the Ansys Electromagnetics tool command (ansys_args) contain characters which are interpreted by the command shell, then these special characters must be properly quoted to ensure that the correct command is launched by SGE. This is especially important when using the first form of the qsub command, as the Ansys Electromagnetics desktop command is processed by the shell twice in this case. It is processed by the shell when the qsub command is processed, and again when the job is started.

Serial SGE Batch Jobs

In general, Ansys Electromagnetics batch jobs may be submitted as SGE serial jobs without any special considerations.

See Monitoring Ansys Electromagnetics SGE Batch Jobs for options for monitoring Ansys Electromagnetics batch jobs.

Parallel SGE Batch Jobs

When an Ansys Electromagnetics batch job is run as an SGE parallel job, the SGE scheduler will select the hosts for the distributed analysis job, and start the desktop process on one of these hosts. The desktop process will obtain the list of hosts from the SGE scheduler, and start analysis processes, as needed, using the SGE scheduler facilities. To run an SGE parallel job, the job must be submitted to an SGE parallel environment (PE).

If the qmaster tcp port is not configured as a service, but rather via the environment variable SGE_QMASTER_PORT, this variable must be set in the Ansys Electromagnetics batch job environment. This is needed because the Ansys Electromagnetics desktop uses the "qrsh -inherit" command to launch engine processes.

See Monitoring Ansys Electromagnetics SGE Batch Jobs for options for monitoring Ansys Electromagnetics batch jobs.

Setting Up an SGE Parallel Environment (PE)

To allow Ansys Electromagnetics batch jobs to distribute analysis engines to multiple hosts, the job must be run in a parallel environment (PE) in which the control_slaves parameter is set to TRUE. This setting is required to allow the Ansys Electromagnetics desktop to start analysis engines on hosts other than the local host, i.e., the host where the Ansys Electromagnetics desktop is running.

Here is a sample parallel environment configuration:

pe_name ans_test1

slots 999

user_lists NONE

xuser_lists NONE

start_proc_args /bin/true

stop_proc_args /bin/true

allocation_rule $round_robin

control_slaves TRUE

job_is_first_task FALSE

urgency_slots min

accounting_summary TRUE

The user_lists and xuser_lists parameters are ACLs (access control lists) used to control which users have permission to use the parallel environment. The user_lists setting gives permission to use the PE. The xuser_lists setting denies permission to use the parallel environment. The xuser_lists settings override the user_lists settings.

The start_proc_args and stop_proc_args parameters contain the pathname and arguments for the parallel environment startup and shutdown scripts. No startup or shutdown scripts are needed for parallel Ansys Electromagnetics batch jobs. The setting /bin/true may be used as the value for these scripts; this utility does nothing and returns an exit code indicating success (0).

The parallel environment allocation_rule parameter will affect how the analysis engine tasks are distributed across the hosts allocated to the job. The $round_robin setting distributes the tasks across the hosts in a round robin fashion, resulting in the load being relatively evenly distributed over all of the hosts. The $fill_up setting allocates all slots on a host before distributing the tasks to another host; the result is that most hosts are either fully utilized or completely unused. See the sge_pe man page for other settings for this parameter.

The control_slaves parameter must be set to TRUE, as described above.

The job_is_first_task parameter also affects how tasks are allocated. When submitting a job to run in a parallel environment, the number of parallel tasks, n, is specified on the command line. If this setting is TRUE, then the job process is considered one of the tasks, and only (n-1) additional tasks are allocated to the job. If the setting is FALSE, then the job process is not considered to be one of the tasks, and n additional tasks are allocated for the job.

See the sge_pe man page for more information about these and other PE parameters.

A parallel environment does not run tasks directly. Instead, the tasks are distributed to queues associated with the parallel environment. In order to complete the setup of a parallel environment, one or more queues need to be associated with the parallel environment. The queue pe_list parameter is used to specify the parallel environments (PEs) supported by the queue. This is an important step; if no queues support a given PE, then jobs submitted to that PE will not run.

Parallel Batch Job Command Line Considerations

The number of engines run on a host will depend on the total number of distributed engines, and the number of hosts allocated to the job. The memory required on a host depends on the number of engines running on the host and on the memory needed for each engine. The qsub command -l resource=value,... or -q queue_list command line options specify that the parallel batch job run on machines with sufficient memory and other resources.

Related Topics 

Monitoring Ansys Electromagnetics SGE Batch Jobs

Ansys Electromagnetics Desktop -monitor Command Line Option for SGE

Example SGE qsub Command Lines

Issue with MainWin Core Services for SGE

What a Scheduler Does

Recommended Practices for SGE Clusters

Scheduler Proxy Interfaces