The task of the software engineer is to design and implement programs that satisfy user requirements for correctness and performance. However, the ``performance'' of a parallel program is a complex and multifaceted issue. We must consider, in addition to the execution time and scalability of the computational kernels, the mechanisms by which data are generated, stored, transmitted over networks, moved to and from disk, and passed between different stages of a computation. We must consider costs incurred at different phases of the software life cycle, including design, implementation, execution, and maintenance. Hence, the metrics by which we measure performance can be as diverse as execution time, parallel efficiency, memory requirements, throughput, latency, input/output rates, network throughput, design costs, implementation costs, verification costs, potential for reuse, hardware requirements, hardware costs, maintenance costs, portability, and scalability.
The relative importance of these diverse metrics will vary according to the nature of the problem at hand. A specification may provide hard numbers for some metrics, require that others be optimized, and ignore yet others. For example, the design specification for an operational weather forecasting system may specify maximum execution time (``the forecast must complete within four hours''), hardware costs, and implementation costs, and require that the fidelity of the model be maximized within these constraints. In addition, reliability is of particularly high importance, as may be scalability to future generations of computers.
In contrast, a group of engineers developing a parallel database search program for their own occasional use may be happy with anything that runs faster than an existing sequential program but may be tightly constrained as to how long they can spend on implementation. Here, scalability is less critical, but the code should adapt easily to changes in both the database system and computer technologies.
As a third example, consider an image-processing pipeline consisting of several concurrent stages, each performing a different transformation on a stream of images. Here, one may be concerned not with the total time required to process a certain number of images but rather with the number of images that can be processed per second ( throughput ) or the time that it takes a single image to pass through the pipeline ( latency ). Throughput would be important in a video compression application, while latency would be important if the program formed part of a sensor system that must react in real time to events detected in an image stream.
In other situations, the ratio of execution time to system cost may be important. For example, consider a bank that spends two hours every night on its overloaded mainframe computer running an analysis program that searches for fraudulent transactions. A version that runs in six hours on a parallel computer at one-twentieth the cost is significantly more cost effective even though the total execution time is greater.
Much of the material in the rest of this chapter is concerned with the modeling and measurement of just two aspects of algorithm performance: execution time and parallel scalability. We focus on these issues because they are frequently among the more problematic aspects of parallel program design and because they are the most easily formalized in mathematical models. However, these topics must be examined in the context of a broader design process that also addresses the other issues listed in this section.
© Copyright 1995 by Ian Foster