LSF Troubleshooting
The following are general troubleshooting steps:
- Ensure the LSF
lsrun
command is enabled. - Look for user errors.
For example:
- Are the executable path and project path correct and complete?
- Are there sufficient resources (CPU/Memory/Disk) allocated to the job?
- Is the project available on the execution host?
- Does the job submitter have read/writer permissions on the project directory and read/execute permissions on the installation directory?
- Is the project locked?
- Determine whether this is a standalone product issue.
- Run Electronics Desktop on the machine outside of the scheduler and see if it opens and analyzes.
- Examine outputs and logs.
- Output of the LSF batch job. Obtain this using LSF commands:
"bacct -l <jobid>"
- Batch log (typically <projectname>.log, located in the project directory.
- Output of the LSF batch job. Obtain this using LSF commands:
- Enable additional debug logs using the steps below.
In the job submission window, set the following environment variables:
- ANSOFT_DEBUG_MODE = 1
- ANSOFT_DEBUG_LOG = <path to directory accessible by all machines in the cluster>
- ANSOFT_DEBUG_LOG_SEPARATE = 1
- ANSOFT_LSF_LOG = <path to a specific .log file in the directory set under ANSOFT_DEBUG_LOG>
- For each pair of machines between which remote analysis fails, run
ping remote-machine
and note the output. - For each machine in the network, dump network interfaces (for example, run
ifconfig -a
) and note the output.