Streams and kernels
The central idea behind
stream processing is to organize an application into streams and kernels to
expose the inherent locality and concurrency in media-processing applications.
In most cases, not only do streams and kernels expose desirable properties of
media applications, but they are also a natural way of expressing the
application.
Introduction
The complex modern
signal and image processing applications requires hundreds of GOPS (giga, or
billions, of operations per second) with a power budget of a few watts, an
efficiency of about 100 GOPS/W (GOPS per watt), or 10 pJ/op (Pico Joules per
operation). To meet this requirement current media processing applications use
ASICs that are tailor made for a particular application. Such processors
require significant design efforts and are difficult to change when a new media
processing application or algorithm evolve.
Overview
Many signal processing
applications require both efficiency and programmability. The complexity of
modern media processing, including 3D graphics, image compression, and signal
processing, requires tens to hundreds of billions of computations per second.
To achieve these computation rates, current media processors use
special-purpose architectures tailored to one specific application. Such
processors require significant design effort and are thus difficult to change
as media-processing applications and algorithms evolve. Digital television,
surveillance video processing, automated optical inspection, and mobile
cameras, camcorders, and 3G cellular handsets have similar needs.
Abstract
For many signal
processing applications programmability and efficiency is desired. With current
technology either programmability or efficiency is achievable, not both. Conventionally
ASIC's are being used where highly efficient systems are desired. The problem
with ASIC is that once programmed it cannot be enhanced or changed, we have to
get a new ASIC for each modification. Other option is microprocessor based or
dsp based applications. These can provide either programmability or efficiency.
Now with stream processors we can achieve both simultaneously. A comparison of
efficiency and programmability of Stream processors and other techniques are
done. We will look into how efficiency and programmability is achieved in a
stream processor. Also we will examine the challenges faced by stream processor
architecture.
Challenges
Stream processors
depend on parallelism and locality for their efficiency. For an application to
stream well, there must be sufficient parallel work to keep all of the
arithmetic units in all of the clusters busy. The parallelism need not be
regular, and the work performed on each stream element need not be of the same
type or even the same amount. If there is not enough work to go around,
however, many of the stream processor's resources will idle and efficiency will
suffer.
Conclusions
The main competition
for stream processors are fixed-function (ASIC or ASSP) processors. Though
ASICs have efficiency as good as or better than stream processors, they are
costly to design and lack flexibility. It takes about $15 million and 18 months
to design a high-performance signal-processing ASIC for each application, and
this cost is increasing as semiconductor technology advances. In contrast, a
single stream processor can be reused across many applications with no
incremental design cost, and software for a typical application can be
developed in about six months for about $4 million.
No comments:
Post a Comment