Stream Processor


Streams and kernels

The central idea behind stream processing is to organize an application into streams and kernels to expose the inherent locality and concurrency in media-processing applications. In most cases, not only do streams and kernels expose desirable properties of media applications, but they are also a natural way of expressing the application.


Introduction

The complex modern signal and image processing applications requires hundreds of GOPS (giga, or billions, of operations per second) with a power budget of a few watts, an efficiency of about 100 GOPS/W (GOPS per watt), or 10 pJ/op (Pico Joules per operation). To meet this requirement current media processing applications use ASICs that are tailor made for a particular application. Such processors require significant design efforts and are difficult to change when a new media processing application or algorithm evolve.

Overview

Many signal processing applications require both efficiency and programmability. The complexity of modern media processing, including 3D graphics, image compression, and signal processing, requires tens to hundreds of billions of computations per second. To achieve these computation rates, current media processors use special-purpose architectures tailored to one specific application. Such processors require significant design effort and are thus difficult to change as media-processing applications and algorithms evolve. Digital television, surveillance video processing, automated optical inspection, and mobile cameras, camcorders, and 3G cellular handsets have similar needs.

Abstract

For many signal processing applications programmability and efficiency is desired. With current technology either programmability or efficiency is achievable, not both. Conventionally ASIC's are being used where highly efficient systems are desired. The problem with ASIC is that once programmed it cannot be enhanced or changed, we have to get a new ASIC for each modification. Other option is microprocessor based or dsp based applications. These can provide either programmability or efficiency. Now with stream processors we can achieve both simultaneously. A comparison of efficiency and programmability of Stream processors and other techniques are done. We will look into how efficiency and programmability is achieved in a stream processor. Also we will examine the challenges faced by stream processor architecture.

Challenges

Stream processors depend on parallelism and locality for their efficiency. For an application to stream well, there must be sufficient parallel work to keep all of the arithmetic units in all of the clusters busy. The parallelism need not be regular, and the work performed on each stream element need not be of the same type or even the same amount. If there is not enough work to go around, however, many of the stream processor's resources will idle and efficiency will suffer.

 Conclusions

The main competition for stream processors are fixed-function (ASIC or ASSP) processors. Though ASICs have efficiency as good as or better than stream processors, they are costly to design and lack flexibility. It takes about $15 million and 18 months to design a high-performance signal-processing ASIC for each application, and this cost is increasing as semiconductor technology advances. In contrast, a single stream processor can be reused across many applications with no incremental design cost, and software for a typical application can be developed in about six months for about $4 million. 


No comments:

Post a Comment