Multiprocessor system architecture

A multiprocessor system is defined as :

a system with more than one processor" and more precisely a number of central processing units linked together to enable parallel processing to take place ^[1]^[2]^[3].

The key objective of using a multiprocessor is to boost the system’s execution speed, with other objectives being fault tolerance and application matching ^[4].

- Note:

Sometimes the term multiprocessor is confused with the term multiprocessing. While multiprocessing is a type of processing in which two or more processors work together to execute more than one program simultaneously, the term multiprocessor refers to the hardware architecture that allows multiprocessing ^[5]. The two terms must be kept separate as in all other encyclopedias.

There are many types of multiprocessor systems. These can be classified based on:

Loosely coupled multiprocessor system
Tightly coupled multiprocessor system
Homogeneous multiprocessor system
Heterogeneous multiprocessor system
Shared memory multiprocessor system
Distributed memory multiprocessor system
UMA system
cc-NUMA system
Hybrid system - shared system memory for global data and local memory for local data

Loosely coupled multiprocessor system

(or distributed memory)

This is a type of multiprocessor system, in which each processor has its own local memory, I / O channels (input-output) and an operating system. Processors exchange data over a high-speed communication network by sending messages by a technique known as "message passing". Loosely coupled multiprocessor systems are also known as distributed memory systems, as the processors do not share physical memory and have their own IO channels.

System characteristics

These systems are able to perform multiple instructions on multiple data (MIMD) programming.
This type of architecture allows parallel processing.
The distributed memory allows high scalability

Tightly coupled multiprocessor system

(or shared memory system)

Multiprocessor system with a shared memory closely connected to the processors.

A symmetric multiprocessor system is a system with centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors.

There are two types of systems:

UMA system
NUMA system

UMA system

(Uniform Memory Access)

Heterogeneous Multiprocessor System
Symmetric Multiprocessor System - (SMP)

Heterogeneous multiprocessor system

A Heterogeneous Multiprocessing System itself refers to systems that contain multiple not homogeneous processing units – central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), or any type of application-specific integrated circuits (ASICs). The system architecture allows any accelerator, for instance, a graphics processor, to operate at the same processing level as the system's CPU.

Symmetric multiprocessor system

Systems operating under a single OS (Operating System) with two or more homogeneous processors and with a centralized shared Main Memory.

Symmetric Multiprocessor System (SMP) is a system with a pool of homogeneous processors running independently of each other. Each processor, executing different programs and working on different sets of data, has the capability of sharing common resources (memory, I/O device, interrupt system and so on) that are connected using a system bus or a crossbar or a mix of two previously approach, bus for address and crossbar for Data (Data crossbar) [25][26][27].

Each processor has its own cache that acts as a bridge between the processor and Main Memory. The function of the cache memory is to speed up the MM data access (performance increasing) and most important, in multiprocessor systems with shared memory, to reduce the system bus and MM traffic that is one of the major bottlenecks of these systems. In this system, the cache is an essential element.

The shared memory allows a Uniform Memory Access time (UMA)

cc-NUMA system

(cache coherency - Non-Uniform Memory Access)

It is known that the SMP system is limited in scalability. To overcome this limitation, the architecture called "cc-NUMA " is normally used.

cc-NUMA system is a cluster of SMP systems, called "NODEs", connected via a high-speed connection network that can be a link that can be a single or double-reverse ring, or multi-ring, point-to-point connections^[6]^[7]or a mix of these (e.g. IBM Power Systems^[6]^[8]), bus interconnection (e.g. NUMAq^[9]), crossbar, segmented bus (NUMA Bull HN ISI ex Honeywell^[10]), Mesh router, etc..

The main characteristic of a cc-NUMA system is having a unique shared global memory distributed in each node directly accessed from all the processors of all the nodes

In a cc-NUMA system, the access from a processor to a remote memory of a remote node is slower compared to the access to its local memory. For this reason, this system is called NUMA (Non-Uniform Memory Access).

cc-NUMA is also called Distributed Shared Memory (DSM) architecture ^[11].

Each node usually is an SMP system, where a processor can be a single processor, a multi-core processor, or a mix of these two and/or any other kind of architecture. The fig. above is just an example.

The difference in access time from local and remote can be also of an order of magnitude, depending on the kind of the connection network used (faster in segmented bus, crossbar, and point-to-point interconnection, slower in serial rings connection).

Examples of interconnection

To overcome this limit, a large remote cache (see Remote cache) is normally used. With this solution, the cc-NUMA system becomes very close to a large SMP system.

Tightly coupled vs Loosely coupled architecture

Both architectures have advantages and trade-offs which may be summarized as follows:

Loosely coupled architectures feature high performances of each individual processor but do not enable for easy real-time balancing of the load among processors.

Tightly coupled architectures feature by reverse an easy load balancing and distribution among processors but suffer from the bottleneck consisting in the sharing of common resources through one or more buses (which is also a common resource)

Multiprocessor system featuring global data multiplication

An intermediate approach of the two previous architectures is having common resources and local resources such as local memories (LM) in each processor.

The common resources are accessible from all processors via the system bus, while local resources are only accessible to the pertaining local processor. Cache memories can be viewed in this perspective as local memories.

This system (patented F. Zulian ^[12]) used on the DPX/2 300 Unix based system (Bull Hn Information Systems Italia (ex Honeywell)) ^[13]^[14], is a mix of tightly coupled and loose coupled systems and takes all the advancements of these two architectures.

The Local memory is divided into two sectors, global data (GD) and local data (LD).

The basic concept of this architecture is to have global data, which is modifiable information, accessible by all processors. This information is duplicated and stored in each local memory of each processor.

Each time the global data is modified in a local memory, a hardware write-broadcasting is sent to the system bus to all other local memories to maintain the global data coherency. Thus, global data may be read by each processor accessing its own local memory without involving the system bus. System bus access is only required when global data is modified in a local memory to update the copy of this data stored in the other local memories.

Local data can be exchanged like in loosely coupled system via message-passing

References

^ "Multiprocessor definition and meaning - Collins English Dictionary". www.collinsdictionary.com.
^ "Data" (PDF). www.cs.vu.nl.
^ "multiprocessor - Definition of multiprocessor in English by Oxford Dictionaries". Oxford Dictionaries - English.
^ "What is a Multiprocessor? - Definition from Techopedia". Techopedia.com.
^ "Multiprocessor dictionary definition - multiprocessor defined". www.yourdictionary.com.
^ ^a ^b AMD Opteron Shared Memory MP Systems – http://www.cse.wustl.edu/~roger/569M.s09/28_AMD_Hammer_MP_HC_v8.pdf
^ An Introduction to the Intel® QuickPath Interconnect – http://www.intel.ie/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf
^ "IBM POWER Systems Overview". computing.llnl.gov.
^ SourceForge – http://lse.sourceforge.net/numa/faq/system_descriptions.html
^ Bull HN F. Zulian – A. Zulian patent – Computer system with a bus having a segmented structure – http://www.freepatentsonline.com/6314484.html
^ NUMA Architecture – http://www.dba-oracle.com/real_application_clusters_rac_grid/numa.html
^ "Multiprocessor system featuring global data multiplation".
^ "UNIX and Bull". www.feb-patrimoine.com.
^ "Bull DPX". www.feb-patrimoine.com.

[1] "Multiprocessor definition and meaning - Collins English Dictionary". www.collinsdictionary.com.

[2] "Data" (PDF). www.cs.vu.nl.

[3] "multiprocessor - Definition of multiprocessor in English by Oxford Dictionaries". Oxford Dictionaries - English.

[4] "What is a Multiprocessor? - Definition from Techopedia". Techopedia.com.

[5] "Multiprocessor dictionary definition - multiprocessor defined". www.yourdictionary.com.

[auto-6] AMD Opteron Shared Memory MP Systems – http://www.cse.wustl.edu/~roger/569M.s09/28_AMD_Hammer_MP_HC_v8.pdf

[7] An Introduction to the Intel® QuickPath Interconnect – http://www.intel.ie/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf

[8] "IBM POWER Systems Overview". computing.llnl.gov.

[9] SourceForge – http://lse.sourceforge.net/numa/faq/system_descriptions.html

[10] Bull HN F. Zulian – A. Zulian patent – Computer system with a bus having a segmented structure – http://www.freepatentsonline.com/6314484.html

[11] NUMA Architecture – http://www.dba-oracle.com/real_application_clusters_rac_grid/numa.html

[12] "Multiprocessor system featuring global data multiplation".

[13] "UNIX and Bull". www.feb-patrimoine.com.

[14] "Bull DPX". www.feb-patrimoine.com.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]