Scalability should always be measured using wall clock time or elapsed time, and not CPU time. CPU time can be either accumulated by all of the involved processors (or cores), or it can be the CPU time for any of the involved processors (or cores). CPU time may exclude time spent waiting for data to be passed through the interconnect or the time spent waiting for I/O requests to be completed. Thus, elapsed times provide the best measure of what the user actually experiences while waiting for the program to complete the analysis.
Several elapsed time values are reported near the end of every Mechanical APDL output file. An example of this output is shown below.
Elapsed time spent pre-processing model (/PREP7) : 3.7 seconds Elapsed time spent solution - preprocessing : 4.7 seconds Elapsed time spent computing solution : 284.1 seconds Elapsed time spent solution - postprocessing : 8.8 seconds Elapsed time spent postprocessing model (/POST1) : 0.0 seconds
At the very end of the file, these time values are reported:
CP Time (sec) = 300.890 Time = 11:10:43 Elapsed Time (sec) = 302.000 Date = 02/10/2009
When using shared-memory parallel (SMP), all of these elapsed times may be reduced since this form of parallelism is present in the preprocessing, solution, and postprocessing phases (/PREP7, /SOLU, and /POST1). Comparing each individual result when using a single core and multiple cores can give an idea of the parallel efficiency of each part of the simulation, as well as the overall speedup achieved.
When using distributed-memory parallel (DMP), the main values to review are:
The "Elapsed time spent computing solution" which measures the time spent doing parallel work inside the SOLVE command (see Program Architecture for more details).
The "Elapsed Time" reported at the end of the output file.
The "Elapsed time spent computing solution" helps measure the parallel efficiency for the computations that are actually parallelized, while the "Elapsed Time" helps measure the parallel efficiency for the entire analysis of the model. Studying the changes in these values as the number of cores is increased/decreased will give a good indication of the parallel efficiency of your DMP analysis.
When using a GPU to accelerate the solution, the same main values listed above should be reviewed. Since the GPU is currently only used to accelerate computations performed during solution, the elapsed time computing the solution is the only time expected to decrease. Of course, the overall time for the simulation should also decrease when using a GPU.