Large Scale DSO Job Monitoring

Large scale DSO avoids detailed intra-variation monitoring is avoided as it increases network traffic for large-scale jobs. Large scale DSO jobs are monitored as below:

The following logs are available:

There is one 'desktopjob.log' file per node assigned to the job. This log contains information regarding the node such as name, local storage folder, number of engines started on this node, etc. It is located in <workdir>/<jobid>/r<nodeIndex>. E.g. <workdir>/<jobid>/r0 has desktopjob.log corresponding to the engines running on the first node of job, while <workdir>/<jobid>/r2 has logs corresponding to engines running on third node

There is one desktopjob.log file per distributed engine. It is located in <workdir>/<jobid>/r<nodeIndex>/r<taskIndex>. For example, <workdir>/<jobid>/r0/r0 has logs corresponding to first engine running on first node, while <workdir>/<jobid>/r1/r2 has logs corresponding to third engine running on second node. Engine unique information (such as local storage of this engine) is logged here

This log file is located in <workdir>/<jobid>/r<nodeIndex>/r<taskIndex> folder and corresponds to Desktop's local-machine parametric 'batchsolve'. It is available only at the end of analysis and contains information regarding the variations solved by this engine and any info/warning/error messages.

This is the top-level log that logs job distribution information such as hierarchical activation and the list of nodes assigned to this job

For a complete discussion of methods for aborting jobs or specific tasks, see the discussion of Aborting a Large Scale DSO Simulation under Large Scale DSO for Parametric Analysis.

Related Topics 

Large Scale DSO for Parametric Analysis