Autoscaling with Traditional Job Schedulers

The Autoscaling Service can be configured to work with Slurm, LSF, PBS, and UGE.

As described in the deployment topics, the autoscaler can be configured in two different ways. It can:

Run all of the job requests as the service user on the machine where the autoscaler is deployed (as whichever user started the service).
Run the job requests through the Process Launcher to impersonate each user that generated each project.

In either case, the scaling strategy is the same. Options are available in the Ansys HPC Job Manager web app for limiting the number of compute resources that the autoscaler will use, such as total jobs (instances), cores, and so on.

Autoscaling Strategy for Third-Party Clusters

The autoscaler will keep track of all of the requests made to the job scheduler through the Ansys HPC Platform Services system and can limit the requests to one or many of the compute resource limits.

If no limits are specified, every task in every project that is currently pending will simply create its own job request and submit it to the scheduler.

The autoscaler configuration contains a set of default values for the number of cores, exclusive jobs, queue, and distribution properties. It uses these default values when a value is not supplied for the Job Definition through each application's interface. They are there to simplify the number of selections that need to be made for every job. However, these and other properties are made available in the Job Definition, enabling default values to be overridden if desired.

Details of the Autoscaling Behavior

The following are useful details about the architecture and its implications:

Due to the "pull" nature of the architecture, Workers by default will run as many tasks in sequence that they have permission to run before exiting. This means that many tasks can be run during one job scheduler request.
By default, all autoscaled evaluators are scope-limited to the Job Definition, so they will NOT run tasks from different Projects or Job Definitions that may have different users, permissions, or requirements.
Note: It is not guaranteed that every job request is one-to-one with a single task as configured by default.
By default, the autoscaler will request jobs up to the maximum allowed by its configuration, or total number of tasks, whichever comes first.
Autoscaling occurs based on the state of the requests in the scheduler. Jobs marked as running or queued are counted as part of the total number of requests. Jobs failed or finished are not.
Restarting the autoscaler causes it to lose track of the current requests, and may end up requesting more workers.
Due to the "pull" nature of the architecture, it is possible that some job requests will be removed as the project completes tasks.
To understand this better, consider a simple scenario where 100 design point variations are run from the same Job Definition on a cluster with five nodes and no compute resource limits applied.
One hundred scheduler requests will be made, and possibly five will be scheduled to run concurrently due to the model size requirements.
After the first five models are done, the workers will not exit, and will instead immediately continue on to run the next five models. This will keep happening until all the tasks are complete.
At regular time intervals, the scheduler will check the number of queued requests and automatically kill some of them if there are more than the number of remaining tasks. Since this adds unnecessary load to the scheduling system, it is advised to reasonably limit the compute resources to as many as you want to allow Ansys HPC Platform Services to use at one time. This same concept can be extended to cases where the scheduler is used for multiple purposes, where some of the cluster may simply be busy and also cause the requests to queue.