Introduction.
It’s not the best fourteen rowers that row the best boat,
it’s the fourteen rowers that row best together1
Russell Stanley (Rusty) Callow
This is the first post of the Notes on SPMD architecture series.
It presents the object
of the series, provides a description of the SPMD architecture and
also some considerations on Multiprocessor (MP) machines.
The object of this series is to describe and characterize the Single Program Multiple Data (SPMD) architecture model of a multiprocessor (MP) computer dedicated to signal and / or image processing applications.
For this purpose, firstly, high level (system level) requirements are stated. Then, the description and characterization are carried out in terms of number of memory buffers per node and the transferences among nodes, and addressing 1-D and 2-D processing on local and distributed memory as well as steady and non-steady nodes workload over time. Finally, in the light of all the above, the initial requirements are reviewed and rewritten.
Within this text, the terms computer and machine are used interchangeably.
Multiprocessor (MP) computers are needed when single processor computers are not enough to process the required data in the available time. Their function can be considered as the one of specific application co-processors. Today, in many cases, their hardware is no longer directly linked to the application. It is the software that customizes them for the application.
SPMD machines are parallel processing computers that operate using an
identical copy of the program in each processor, and in which each processor
acts on different chunks of data. The following figure shows a
simple topology.
This
work is centered in SPMD machines dedicated to signal and
image processing applications. So hereinafter, we will
use the term “algorithm” instead of “program”.
What
the previous figure depicts is:
- A set of N processing nodes (npi, i=1,..,N). Each node runs a “copy” of the algorithm over different blocks of input data.
- One input node (ni), which manages the input link and distributes the input data to the processing nodes.
- One output node (no), which collects the processing results and manages the output link.
It may sometimes be necessary that the input node
and / or the output node process the data with some algorithm’s lightweight section as shown in the next figure.
Figure 1-2 |
Figure 2-1 shows, in a simple diagram, an algorithm split in three sections. The loops, if they exist, are inside the sections. Let us say that the MP machine executes this algorithm and that the “Head” section runs on the input node, the “Body” section on the processing nodes and the “Tail” section on the output node.
In any case, we will continue using the term “algorithm” for the code that runs on the processing nodes since it will be the heaviest weight section.
From
a physical point of view, a MP computer
consists of processor boards connected by a high speed bus plus input and output interfaces.
For the
machine to fulfill its purpose, the time spent on
communications has to be far less than the time spent on processing.
Main computing power requirements for real time applications are Latency and Throughput. Latency and Throughput are parameters which characterize the performance of a processing machine. Latency stands for the time delay between an input data and the correspondent output data. This time is the necessary time to process the input data and to produce the output data (processing time). Throughput stands for the volume of data that the machine is able to process per time unit (input and output Throughputs can be different).
Let’s assume that the time spent on
communications is far less than the time spent on processing. In
such a case, the Latency of a SPMD computer is given by the summation
of the Latency of the input node, the slowest processing node and the
output node (Li+Lpi+Lo). Whereas, the Throughput is given by the
summation of the Throughputs of the N processing nodes (∑Tpi).
Therefore, only the Throughput of the machine (and not the Latency)
depends on the number of processing nodes.
Consequently, a SPMD machine may be an
appropriate choice when it meets the required Latency and the number
of nodes can be increased until the required Throughput is fulfilled.
Other requirements to be taken into account by the application are
discussed in subsequent posts of this series.
In the writing of
this article, Dulce Pontes and Ennio Morricone (Cinema Paradiso,
Theme of Love, Andrea Morricone) have collaborated in an involuntary but decisive
way.
1. For
this series, I have chosen photographs related to a special type of
boat called "trainera", as well as quotes related to
rowing, given the similarity between the SPMD architecture and the
crew of a “trainera”.
As was stated above, the SPMD architecture consists of three types of
nodes; those are the input node, the output node and
the processing
nodes. In a like manner, the roles of a “trainera” crew are
the coxswain, the “proel” and the rowers.
The resemblance between processing nodes and rowers, as well as the
existence in both cases of three different “roles”, have been the
reasons for having chosen these photographs and quotes.
(The post “Rowing
traineras in Santander bay” by Pamela Cahill includes an
excellent description of the “proel” role)
2. The original phrase of Mr. Callow refers to a crew of eight rowers. I have adapted the quote to a "trainera" crew. In my understanding, the number of people in the crew does not alter the spirit of the phrase or its value.
3. Picture: Based on http://static-cache-origin.elcorreo.com/methode/2016/traineras/img/trainera-tipo2.svg
4. I want to thank Theresa Curtis for her revision of this text.
0 Comments