14.5. Aqwa Parallel Processing Calculation

Aqwa employs the OpenMP for multi-threaded parallelization on a symmetric-multiprocessing (SMP) machine (see Hermanns [16]) in hydrodynamic diffraction analysis (Aqwa-Line) and time domain dynamic cable and tether analysis.

The actual number of cores used by Aqwa during parallel processing, , is defined as the smallest number between the user-defined number, the total number of cores in the node on which Aqwa is executed, and the total number of available parallel licenses + 4. This number is used when executing a hydrodynamic diffraction analysis.

In hydrodynamic diffraction analysis (Aqwa-Line), the parallel processing is carried out in the following calculations:

First-order hydrodynamic properties
Difference and sum frequency full QTF matrices
Directional coupling QTF matrices
Wave elevation database

The parallel scaling performance of hydrodynamic diffraction (Aqwa-Line) is generally good for medium to large-scaled problems (number of diffracting panel elements), and the parallelization will always deliver some degree of speed-up. An example of the parallel calculation efficiency on a 12-physical-core workstation for a single ship model is listed in Table 14.1: Speed-Up of Hydrodynamic Diffraction Analysis (S_N), where 22,276 diffraction panels, 20 wave frequencies, and 13 wave directions are defined.

Table 14.1: Speed-Up of Hydrodynamic Diffraction Analysis (S_N)

Calculation items	Number of Cores (N)					Percentage of elapsed Times on a Single Core
Calculation items	2	4	6	8	12	Percentage of elapsed Times on a Single Core
1^st order hydrodynamic	1.52	2.43	3.13	3.49	4.00	85.0
Difference/sum frequency QTF	1.62	2.10	2.33	2.48	2.54	0.7
Directional coupling QTF	1.39	1.63	1.94	2.01	2.01	0.6
Wave elevation	1.49	2.37	3.08	3.50	3.81	13.7
Total elapsed time of HD analysis	1.51	2.40	3.09	3.44	3.90	100.0

Note: is defined as the ratio of the elapsed time to execute the calculation item on a single core to the time (in seconds) on N cores.

In the hydrodynamic response analysis (Aqwa-Librium, Aqwa-Drift, Aqwa-Fer and Aqwa-Naut), the parallel processing is carried out in the following calculations:

Element pressure estimation in time domain (Aqwa-Naut)
Irregular wave database at each time step (Aqwa-Naut)
Database of static composite moorings on sloped seabed
Dynamic cable in time domain (Aqwa-Drift and Aqwa-Naut)
Tether in time domain (Aqwa-Drift and Aqwa-Naut)

To ensure efficiency while minimizing total memory use in time domain dynamic cable and/or tether analysis, the fewest number of required cores for dynamic cable and tether calculation are determined based on the total number of dynamic cables and tethers in each mooring configuration and the actual number of cores available to Aqwa.

In each time step, the number of OpenMP parallel loops for dynamic cable calculation is required:

(14–22)

where is the number of dynamic cables in a mooring configuration and is the number of cores used for parallel dynamic cable calculation.

Aqwa assigns memory blocks to copy the thread-private variables and common blocks for each thread.

If the actual number of cores for Aqwa, was used for parallel dynamic cable calculation, the number of OpenMP parallel loops for a set of moorings would be:

(14–23)

The fewest number of required cores, , is determined by:

(14–24)

under the condition

(14–25)

From Equation 14–25 it can be observed that employing the fewest number of cores requires the smallest number of memory blocks for parallel dynamic cable calculation while keeping the number of OpenMP parallel loops the same. In other words, using the fewest number of cores ensures the same simulation time as using the maximum available cores, while minimizing memory usage.

The same approach is employed to determine the fewest number of cores required for the parallel processing calculation of tethers in a time domain analysis.

The actual number of cores used in the Aqwa parallel calculation is listed in Table 14.2: Summary of Cores Used in Parallel Calculation.

Table 14.2: Summary of Cores Used in Parallel Calculation

Module	Number of Cores Used
Aqwa-Line (Hydrodynamic Diffraction)	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Librium (Hydrodynamic Response: Stability) with composite mooring on sloped seabed	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Fer (Hydrodynamic Response: Stability) with composite mooring on sloped seabed	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (1) with time domain element pressure output	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (2) with composite mooring on sloped seabed	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Naut (Hydrodynamic Response: Regular Wave) (3) with dynamic cable and/or tether	N_c
Aqwa_Naut (Hydrodynamic Response: Irregular Wave)	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Drift (Hydrodynamic Response: Slow Drift only, Irregular Wave with slow Drift) (1) with composite mooring on sloped seabed	Min(Num_Cores, Num_Proc, HPC+4)
Aqwa_Drift (Hydrodynamic Response: Slow Drift only, Irregular Wave with slow Drift) (2) with dynamic cable and/or tether	N_c

Note:

Num_Cores: User required number of cores

Num_Proc: Total number of cores in the node

HPC: Number of available HPC licenses

N_c: The fewest number of cores defined by Equation 14–24 for dynamic cable or tethers

The speed-up values of the Aqwa-Drift time domain analyses of a tension-leg platform model with 16 tethers are listed in Table 14.3: Speed-Up of Hydrodynamic Time Domain Analysis with Tethers (S_N). In this model, each tether is modeled by 200 elements. The simulation in the 3-hour duration with the time step interval of 0.2 seconds is carried out on a 12- physical-core (24-logical-processors) workstation. The speed-up of 16 cores shows that using above the number of physical cores for parallel calculation may not achieve higher efficiency.

Table 14.3: Speed-Up of Hydrodynamic Time Domain Analysis with Tethers (S_N)

Number of Cores (N)	2	4	6	8	16
Speed-up	1.77	2.86	3.41	4.07	3.85