from the city of Vienna: Notes on SPMD architecture I

Introduction.

It’s not the best fourteen rowers that row the best boat,

it’s the fourteen rowers that row best together¹

Russell Stanley (Rusty) Callow

This is the first post of the Notes on SPMD architecture series. It presents the object of the series, provides a description of the SPMD architecture and also some considerations on Multiprocessor (MP) machines.

The object of this series is to describe and characterize the Single Program Multiple Data (SPMD) architecture model of a multiprocessor (MP) computer dedicated to signal and / or image processing applications.

For this purpose, firstly, high level (system level) requirements are stated. Then, the description and characterization are carried out in terms of number of memory buffers per node and the transferences among nodes, and addressing 1-D and 2-D processing on local and distributed memory as well as steady and non-steady nodes workload over time. Finally, in the light of all the above, the initial requirements are reviewed and rewritten.

Within this text, the terms computer and machine are used interchangeably.

--------------------

Multiprocessor (MP) computers are needed when single processor computers are not enough to process the required data in the available time. Their function can be considered as the one of specific application co-processors. Today, in many cases, their hardware is no longer directly linked to the application. It is the software that customizes them for the application.

SPMD machines are parallel processing computers that operate using an identical copy of the program in each processor, and in which each processor acts on different chunks of data. The following figure shows a simple topology.

Figure 1-1

This work is centered in SPMD machines dedicated to signal and image processing applications. So hereinafter, we will use the term “algorithm” instead of “program”.

What the previous figure depicts is:

A set of N processing nodes (n_pi, i=1,..,N). Each node runs a “copy” of the algorithm over different blocks of input data.
One input node (n_i), which manages the input link and distributes the input data to the processing nodes.
One output node (n_o), which collects the processing results and manages the output link.

It may sometimes be necessary that the input node and / or the output node process the data with some algorithm’s lightweight section as shown in the next figure.

Figure 1-2

Figure 2-1 shows, in a simple diagram, an algorithm split in three sections. The loops, if they exist, are inside the sections. Let us say that the MP machine executes this algorithm and that the “Head” section runs on the input node, the “Body” section on the processing nodes and the “Tail” section on the output node.

In any case, we will continue using the term “algorithm” for the code that runs on the processing nodes since it will be the heaviest weight section.

From a physical point of view, a MP computer consists of processor boards connected by a high speed bus plus input and output interfaces. For the machine to fulfill its purpose, the time spent on communications has to be far less than the time spent on processing.

Main computing power requirements for real time applications are Latency and Throughput. Latency and Throughput are parameters which characterize the performance of a processing machine. Latency stands for the time delay between an input data and the correspondent output data. This time is the necessary time to process the input data and to produce the output data (processing time). Throughput stands for the volume of data that the machine is able to process per time unit (input and output Throughputs can be different).

Let’s assume that the time spent on communications is far less than the time spent on processing. In such a case, the Latency of a SPMD computer is given by the summation of the Latency of the input node, the slowest processing node and the output node (Li+Lpi+Lo). Whereas, the Throughput is given by the summation of the Throughputs of the N processing nodes (∑Tpi). Therefore, only the Throughput of the machine (and not the Latency) depends on the number of processing nodes.

Consequently, a SPMD machine may be an appropriate choice when it meets the required Latency and the number of nodes can be increased until the required Throughput is fulfilled. Other requirements to be taken into account by the application are discussed in subsequent posts of this series.

In the writing of this article, Dulce Pontes and Ennio Morricone (Cinema Paradiso, Theme of Love, Andrea Morricone) have collaborated in an involuntary but decisive way.

--------------------

1. For this series, I have chosen photographs related to a special type of boat called "trainera", as well as quotes related to rowing, given the similarity between the SPMD architecture and the crew of a “trainera”.

As was stated above, the SPMD architecture consists of three types of nodes; those are the input node, the output node and the processing nodes. In a like manner, the roles of a “trainera” crew are the coxswain, the “proel” and the rowers.

The resemblance between processing nodes and rowers, as well as the existence in both cases of three different “roles”, have been the reasons for having chosen these photographs and quotes.

(The post “Rowing traineras in Santander bay” by Pamela Cahill includes an excellent description of the “proel” role)

2. The original phrase of Mr. Callow refers to a crew of eight rowers. I have adapted the quote to a "trainera" crew. In my understanding, the number of people in the crew does not alter the spirit of the phrase or its value.

3. Picture: Based on http://static-cache-origin.elcorreo.com/methode/2016/traineras/img/trainera-tipo2.svg

4. I want to thank Theresa Curtis for her revision of this text.

from the city of Vienna

Notes on SPMD architecture I

Introduction.

Leopoldo Gomez

0 Comments