LSF Troubleshooting

The following are general troubleshooting steps:

  1. Ensure the LSF lsrun command is enabled.
  2. Look for user errors.

    For example:

    • Are the executable path and project path correct and complete?
    • Are there sufficient resources (CPU/Memory/Disk) allocated to the job?
    • Is the project available on the execution host?
    • Does the job submitter have read/writer permissions on the project directory and read/execute permissions on the installation directory?
    • Is the project locked?
  3. Determine whether this is a standalone product issue.
    • Run Electronics Desktop on the machine outside of the scheduler and see if it opens and analyzes.
  4. Examine outputs and logs.
    • Output of the LSF batch job. Obtain this using LSF commands: "bacct -l <jobid>"
    • Batch log (typically <projectname>.log, located in the project directory.
  5. Enable additional debug logs using the steps below.

    In the job submission window, set the following environment variables:

    • ANSOFT_DEBUG_MODE = 1
    • ANSOFT_DEBUG_LOG = <path to directory accessible by all machines in the cluster>
    • ANSOFT_DEBUG_LOG_SEPARATE = 1
    • ANSOFT_LSF_LOG = <path to a specific .log file in the directory set under ANSOFT_DEBUG_LOG>
  6. For each pair of machines between which remote analysis fails, run ping remote-machine and note the output.
  7. For each machine in the network, dump network interfaces (for example, run ifconfig -a) and note the output.