Aqwa employs the OpenMP for multi-threaded parallelization on a symmetric-multiprocessing (SMP) machine (see Hermanns [16]) in hydrodynamic diffraction analysis (Aqwa-Line) and time domain dynamic cable and tether analysis.
The actual number of cores used by Aqwa during parallel processing, , is defined as the smallest number between the user-defined number, the total number of cores in the node on which Aqwa is executed, and the total number of available parallel licenses + 4. This number is used when executing a hydrodynamic diffraction analysis.
In hydrodynamic diffraction analysis (Aqwa-Line), the parallel processing is carried out in the following calculations:
First-order hydrodynamic properties
Difference and sum frequency full QTF matrices
Directional coupling QTF matrices
Wave elevation database
The parallel scaling performance of hydrodynamic diffraction (Aqwa-Line) is generally good for medium to large-scaled problems (number of diffracting panel elements), and the parallelization will always deliver some degree of speed-up. An example of the parallel calculation efficiency on a 12-physical-core workstation for a single ship model is listed in Table 14.1: Speed-Up of Hydrodynamic Diffraction Analysis (SN), where 22,276 diffraction panels, 20 wave frequencies, and 13 wave directions are defined.
Table 14.1: Speed-Up of Hydrodynamic Diffraction Analysis (SN)
Calculation items | Number of Cores (N) | Percentage of elapsed Times on a Single Core | ||||
2 | 4 | 6 | 8 | 12 | ||
1st order hydrodynamic | 1.52 | 2.43 | 3.13 | 3.49 | 4.00 | 85.0 |
Difference/sum frequency QTF | 1.62 | 2.10 | 2.33 | 2.48 | 2.54 | 0.7 |
Directional coupling QTF | 1.39 | 1.63 | 1.94 | 2.01 | 2.01 | 0.6 |
Wave elevation | 1.49 | 2.37 | 3.08 | 3.50 | 3.81 | 13.7 |
Total elapsed time of HD analysis | 1.51 | 2.40 | 3.09 | 3.44 | 3.90 | 100.0 |
Note: is defined as the ratio of the elapsed time to execute the calculation item on a single core to the time (in seconds) on N cores.
In the hydrodynamic response analysis (Aqwa-Librium, Aqwa-Drift, Aqwa-Fer and Aqwa-Naut), the parallel processing is carried out in the following calculations:
Element pressure estimation in time domain (Aqwa-Naut)
Irregular wave database at each time step (Aqwa-Naut)
Database of static composite moorings on sloped seabed
Dynamic cable in time domain (Aqwa-Drift and Aqwa-Naut)
Tether in time domain (Aqwa-Drift and Aqwa-Naut)
To ensure efficiency while minimizing total memory use in time domain dynamic cable and/or tether analysis, the fewest number of required cores for dynamic cable and tether calculation are determined based on the total number of dynamic cables and tethers in each mooring configuration and the actual number of cores available to Aqwa.
In each time step, the number of OpenMP parallel loops for dynamic cable calculation is required:
(14–22) |
where is the number of dynamic cables in a mooring configuration and is the number of cores used for parallel dynamic cable calculation.
Aqwa assigns memory blocks to copy the thread-private variables and common blocks for each thread.
If the actual number of cores for Aqwa, was used for parallel dynamic cable calculation, the number of OpenMP parallel loops for a set of moorings would be:
(14–23) |
The fewest number of required cores, , is determined by:
(14–24) |
under the condition
(14–25) |
From Equation 14–25 it can be observed that employing the fewest number of cores requires the smallest number of memory blocks for parallel dynamic cable calculation while keeping the number of OpenMP parallel loops the same. In other words, using the fewest number of cores ensures the same simulation time as using the maximum available cores, while minimizing memory usage.
The same approach is employed to determine the fewest number of cores required for the parallel processing calculation of tethers in a time domain analysis.
The actual number of cores used in the Aqwa parallel calculation is listed in Table 14.2: Summary of Cores Used in Parallel Calculation.
Table 14.2: Summary of Cores Used in Parallel Calculation
Module | Number of Cores Used |
---|---|
Aqwa-Line (Hydrodynamic Diffraction) | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Librium (Hydrodynamic Response: Stability) with composite mooring on sloped seabed | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Fer (Hydrodynamic Response: Stability) with composite mooring on sloped seabed | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (1) with time domain element pressure output | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (2) with composite mooring on sloped seabed | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (3) with dynamic cable and/or tether | Nc |
Aqwa_Naut (Hydrodynamic Response: Irregular Wave) | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Drift (Hydrodynamic Response: Slow Drift only, Irregular Wave with slow Drift) (1) with composite mooring on sloped seabed | Min(Num_Cores, Num_Proc, HPC+4) |
Aqwa_Drift (Hydrodynamic Response: Slow Drift only, Irregular Wave with slow Drift) (2) with dynamic cable and/or tether | Nc |
Note:
Num_Cores: User required number of cores |
Num_Proc: Total number of cores in the node |
HPC: Number of available HPC licenses |
Nc: The fewest number of cores defined by Equation 14–24 for dynamic cable or tethers |
The speed-up values of the Aqwa-Drift time domain analyses of a tension-leg platform model with 16 tethers are listed in Table 14.3: Speed-Up of Hydrodynamic Time Domain Analysis with Tethers (SN). In this model, each tether is modeled by 200 elements. The simulation in the 3-hour duration with the time step interval of 0.2 seconds is carried out on a 12- physical-core (24-logical-processors) workstation. The speed-up of 16 cores shows that using above the number of physical cores for parallel calculation may not achieve higher efficiency.
Table 14.3: Speed-Up of Hydrodynamic Time Domain Analysis with Tethers (SN)
Number of Cores (N) | 2 | 4 | 6 | 8 | 16 |
Speed-up | 1.77 | 2.86 | 3.41 | 4.07 | 3.85 |