1.2. Parallel Processing Terminology

It is important to fully understand the terms we use, both relating to our software and to the physical hardware. The terms shared-memory parallel (SMP) processing and distributed-memory parallel (DMP) processing refer to our software offerings, which run on shared-memory or distributed-memory hardware configurations. The term GPU accelerator capability refers to our software offering which allows the program to take advantage of certain GPU (graphics processing unit) hardware to accelerate the speed of the solver computations.

1.2.1. Hardware Terminology

The following terms describe the hardware configurations used for parallel processing:

Shared-memory hardware

This term refers to a physical hardware configuration in which a single shared-memory address space is accessible by multiple CPU cores; each CPU core "shares" the memory with the other cores. Common examples of a shared-memory system is any single machine: a Windows desktop machine, a laptop, or a workstation with one or two multicore processors.

Distributed-memory hardware

This term refers to a physical hardware configuration in which multiple machines are connected together on a network (that is, a cluster). Each machine on the network (that is, each compute node on the cluster) has its own memory address space. Communication between machines is handled by interconnects (Gigabit Ethernet, Infiniband, etc.).

Virtually all clusters involve both shared-memory and distributed-memory hardware. Each compute node on the cluster typically contains at least two or more CPU cores, which means there is a shared-memory environment within a compute node. The distributed-memory environment requires communication between the compute nodes involved in the cluster.

GPU hardware

A graphics processing unit (GPU) is a specialized microprocessor that off-loads and accelerates graphics rendering from the microprocessor. Their highly parallel structure makes GPUs more effective than general-purpose CPUs for a range of complex algorithms. In a personal computer, a GPU on a dedicated video card is more powerful than a GPU that is integrated on the motherboard.

Processor

Also called the central processing unit (CPU) or microprocessor chip, the processor is the part of a computer that interprets and executes instructions. In the past, processors generally had just one CPU core, so it unambiguously referred to a chip that is inserted into a socket. With the advent of multi-core processors, which are common today, the term "processor" has become ambiguous since it is used as a synonym for CPU core and also to refer to the chip itself, which typically has multiple CPU cores. To avoid confusion, the term "core" is used rather than "processor" throughout this guide.

CPU Core

A CPU core (also called, more simply, "core") is a distinct physical component of the CPU that performs tasks.

Hyperthreading

An Intel hardware innovation by which a CPU divides up its physical cores into "virtual or logical cores" that are treated as if they are actually physical cores by the operating system. When hyperthreading is active, the CPU exposes two execution contexts per physical core so that one physical core now works like two "virtual or logical cores" that can handle different software threads. To avoid oversubscribing hardware, "virtualized cores" are not used.

Head compute node

In a distributed-memory parallel (DMP) run, the machine or node on which the master process runs (that is, the machine on which the job is launched). The head compute node should not be confused with the host node in a Windows cluster environment. The host node typically schedules multiple applications and jobs on a cluster, but does not typically run the application.

1.2.2. Software Terminology

The following terms describe our software offerings for parallel processing:

Shared-memory parallel (SMP) processing: This term refers to running across multiple cores on a single machine (for example, a desktop workstation or a single compute node of a cluster). Shared-memory parallelism is invoked, which allows each core involved to share data (or memory) as needed to perform the necessary parallel computations. When run within a shared-memory architecture, most computations in the solution phase and many pre- and postprocessing operations are performed in parallel. For more information, see Using Shared-Memory Parallel (SMP) Processing.
Distributed-memory parallel (DMP) processing: This term refers to running across multiple cores on a single machine (for example, a desktop workstation or a single compute node of a cluster) or across multiple machines (for example, a cluster). Distributed-memory parallelism is invoked, and each process communicates data needed to perform the necessary parallel computations through the use of MPI (Message Passing Interface) software. With distributed-memory parallel processing, all computations in the solution phase are performed in parallel (including the stiffness matrix generation, linear equation solving, and results calculations). Pre- and postprocessing do not make use of the distributed-memory parallel processing, but these steps can exploit shared-memory parallelism. See Using Distributed-Memory Parallel (DMP) Processing for more details.
Hybrid parallel processing: This term refers to running across multiple cores on a single machine (for example, a desktop workstation or a single compute node of a cluster) or across multiple machines (for example, a cluster) using a combination of distributed-memory parallelism and shared-memory parallelism. With hybrid parallel, all computations in the solution phase are performed in parallel (including the stiffness matrix generation, linear equation solving, and results calculations). Pre- and postprocessing do not make use of distributed-memory parallel processing, but these steps can make use of shared-memory parallelism. See Using Hybrid Parallelism for more details.
GPU accelerator capability: This capability takes advantage of the highly parallel architecture of the GPU hardware to accelerate the speed of solver computations and, therefore, reduce the time required to complete a simulation. Some computations of certain equation solvers can be off-loaded from the CPU(s) to the GPU, where they are often executed much faster. The CPU core(s) will continue to be used for all other computations in and around the equation solvers. For more information, see GPU Accelerator Capability.
Parallel processing: Parallel processing is a method in computing of running two or more CPU cores to handle separate parts of an overall task. Breaking up different parts of a task among multiple cores helps reduce the amount of time to run a program.
Processes: Also referred to as MAPDL processes, Ansys Processes, MPI processes, or MPI ranks, processes are the execution of instructions to perform smaller tasks, which comprise the overall task.
Master process: The first process launched on the head compute node in a distributed-memory parallel processing run.
Worker process: A distributed-memory parallel processing process other than the master process.
Thread: A software thread is a unit of execution on concurrent programming. It is the virtual component which manages the tasks of the cores.

Shared-memory parallel (SMP) processing analyses can only be run on shared-memory hardware (a single machine). However, distributed-memory parallel (DMP) analyses can be run on both shared-memory hardware (a single machine) or distributed-memory hardware (a cluster). While both forms of hardware can achieve a significant speedup, with DMP processing, only running on distributed-memory hardware allows you to take advantage of increased resources (for example, available memory and disk space, as well as memory and I/O bandwidths) by using multiple machines. The GPU accelerator capability can be used with either SMP or DMP.