Comparing FPGAs and DSPs for Embedded Signal Processing

pancakesbootAI and Robotics

Nov 24, 2013 (3 years and 9 months ago)

60 views

Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 1
© 2002 Berkeley Design Technology,Inc.
Berkeley Design Technology, Inc.
2107 Dwight Way, Second Floor
Berkeley, California 94704
USA
+1 (510) 665-1600
info@BDTI.com
http://www.BDTI.com
Optimized DSP Software • Independent DSP Analysis
Comparing FPGAs and DSPs for
Embedded Signal Processing
2© 2002 Berkeley Design Technology,Inc.
About BDTI
 Implementation of optimized
DSP application software
 Implementation of optimized
DSP software libraries
 Algorithm development
 Evaluation of processors
DSP performance and
capabilities
 Advisory and consulting
services
 Technical publications
 Technical training
 Custom benchmarking
ANALYSIS DEVELOPMENT
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 2
3© 2002 Berkeley Design Technology,Inc.
Presentation Outline
What are the driving applications?
How are DSPs meeting application needs?
Why consider FPGAs?
How do DSPs and FPGAs stack up
in terms of performance?
What other factors influence
designers’ decisions?
4© 2002 Berkeley Design Technology,Inc.
Communications: The “Killer App”
Source:Forward Concepts
Consumer
7.3%
Computer
9.2%
Wireless
62.4%
Wireline
6.9%
Automotive
3.1%
Programmable DSP Revenues by Market,Jan-Aug 2002
2002 Revenues:$4.5 Billion (Projected)
Other
11.1%
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 3
5© 2002 Berkeley Design Technology,Inc.
Comms Apps: Two Types
Infrastructure

Wired

E.g., xDSL, “cable,” VoIP gateway

Wireless

E.g., cellular, PCS, fixed wireless, satellite
Terminals

Portable

Battery-powered, size-constrained

Non-portable (e.g., “CPE”)
6© 2002 Berkeley Design Technology,Inc.
Terminal Requirements
Key criteria

Sufficient performance

Cost

Energy efficiency

Memory use

Small-system integration support

Packaging

Tools

Application-development infrastructure

Chip-product roadmap
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 4
7© 2002 Berkeley Design Technology,Inc.
Infrastructure Requirements
Key criteria

Board area per channel

Power per channel

Cost per channel

Large-system integration support

Tools

Application-development infrastructure

Architecture roadmap
8© 2002 Berkeley Design Technology,Inc.
Detection,
Demodulation
Generalized Comm System
Source
Coding
Channel
Coding
Inverse
Channel
Coding
Source
Decode
Signal
In
Signal
Out
Parameter Estimation
Encryption,
Decryption
Modulation
Mult.Access
Transmitter
Receiver
Mult.Access
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 5
9© 2002 Berkeley Design Technology,Inc.
Key Processing Technologies
DSPs
GPPs/DSP-enhanced
GPPs
Reconfigurable
architectures

FPGAs

Reconfigurable
processors
Massively parallel
processors
ASSPs
ASICs

Licensable cores

Customizable cores

Platform-based
design
10© 2002 Berkeley Design Technology,Inc.
DSPs: The Incumbents
Modern conventional DSPs introduced ~1986

One instruction, one MAC per cycle

Developed primarily for telecom applications
High-performance VLIW DSPs introduced ~1997

Developed primarily for wireless infrastructure

Speed focused:

Independent execution units support many instructions,
MACs per cycle

Deeper pipelines and simpler instruction sets support higher
clock rates

Emphasis on compilability
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 6
11© 2002 Berkeley Design Technology,Inc.
Example:StarCore SC140

6-issue 16-bit fixed-point architecture

Up to four 16-bit MACs per cycle

Motorola MSC8101 (one SC140 core) shipping at 300
MHz, $134 (10 ku)

Agere SP2000B (three SC140 cores) sampling at 250
MHz, $200 (10 ku)
Data Buses (2 x 64 bits)
Address Buses (3 x 32 bits)
Instruction Bus (1 x 128 bits)
AGUs
(2)
Prog.
Seq.
BMU
MAC
ALU
Shift
MAC
ALU
Shift
MAC
ALU
Shift
MAC
ALU
Shift
Motorola, Agere,… and now Infineon
12© 2002 Berkeley Design Technology,Inc.
Motorola MSC8101
SC140
Core
PowerPC
Bus
(100 MHz)
Filter
Coprocessor
CPM
ATM
Ethernet
UTOPIA
UART
I
2
C
SPI
E1/T1
E3/T3
HDLC
DMA
Controller
512 KB
SRAM
Memory
Controller
Addr.
(32-bit)
Data
(64-bit)
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 7
13© 2002 Berkeley Design Technology,Inc.
Other Infrastructure DSPs
Texas Instruments TMS320C64xx

8-issue 16-bit fixed-point architecture

Up to four 16-bit MACs per cycle

Special instructions and co-processors for communications
applications

Compatible with ‘C62xx, ‘C67xx

Sampling at 600 MHz, $111 (10 ku)
Analog Devices TigerSHARC

4-issue fixed- and floating-point

Up to eight 16-bit fixed-point MACs per cycle

Special instructions for 3G base stations

High memory bandwidth (8 GB/s)

Shipping at 250 MHz, $175 (10 ku)
14© 2002 Berkeley Design Technology,Inc.
DSP Processors

DSP performance, efficiency strong compared
to other off-the-shelf processors

But may not be adequate for demanding
tasks

Relatively easy to program

But compilers are often inefficient

And ‘C6xxx processors are assembly programmer’s
worst nightmare

Good DSP-oriented dev. tools, infrastructure

TI’s dev. infrastructure is particularly good

But mediocre dev. infrastructure for non-DSP tasks
Strengths and Weaknesses
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 8
15© 2002 Berkeley Design Technology,Inc.
DSP Processors
Strengths and Weaknesses

Relatively low development cost, risk

Mature technology

Large, experienced developer base

Fast time-to-market

Some architectures available from multiple vendors

But some vendors’ roadmaps are unclear

Relatively limited product offerings

But products offer strong, relevant integration
16© 2002 Berkeley Design Technology,Inc.
Wireless Bandwidth Growth

GSM

DSC1800

PCS1900

IS-95B

IS-54B

IS-136

PDC

GPRS

HCSD

IS-95C

IS-136+

IS-136 HS

Compact EDGE

3GPP-DS-FDD

3GPP-DS-TDD

3GPP-MC

ARIB W-CDMA

IS-2000 CDMA

IS-95-HDR
2G
2.5G
3G
8-13 Kbps 64-384 Kbps 384-2000+ Kbps
NARROWBAND
CIRCUIT
VOICE
WIDEBAND
PACKET
DATA
~100 MIPS ~10,000 MIPS ~100,000 MIPS
Source:MorphICs Technology,Inc.
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 9
17© 2002 Berkeley Design Technology,Inc.
Why Consider FPGAs?
“As the industry shifts from second-generation,
2G, to 3G wireless we see the percentage of the
physical layer MIPS that reside in the DSP
dropping from essentially 100 percent in today’s
technology for GSM to about 10 percent
for
wideband code-division multiple access
(WCDMA).”
Texas Instruments
IEEE Communications Magazine
January 2000
18© 2002 Berkeley Design Technology,Inc.
FPGAs
An amorphous “sea” of reconfigurable logic with
reconfigurable interconnect

Possibly interspersed with fixed-logic resources, e.g.,
processors, multipliers
Potential for very high parallelism
Historically used for prototyping and “glue logic,” but
becoming more sophisticated

DSP-oriented architecture features

DSP-oriented tools and design libraries

Viterbi, Turbo, and Reed-Solomon coders and decoders, FIR
filters, FFTs,…
Key DSP players: Altera and Xilinx
Field-Programmable Gate Arrays
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 10
19© 2002 Berkeley Design Technology,Inc.
Example: Altera Stratix
Up to 28 hard-wired “DSP blocks”

8x9-bit, 4x18-bit, 1x36-bit multiply operations

Optional pipelining, accumulation, etc.
3 sizes of hard-wired memory blocks
M512 RAM
Blocks
Phase-Locked
Loops
Logic Array
Blocks
M4K RAM
Blocks
I/O Elements
MegaRAM
Blocks
DSP Blocks
20© 2002 Berkeley Design Technology,Inc.
Altera Stratix

IP blocks

Filters, FFTs, Viterbi decoders,…

Nios processor

Third-party IP, e.g., DMA controllers

DSP tools

Parameterized IP block generators

Simulink to FPGA link

C+Simulink to FPGA design flow

Sampling now; production end of 2002

Prices begin at $170 (1 ku)
High-end, DSP-enhanced FPGAs
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 11
21© 2002 Berkeley Design Technology,Inc.
Source: Altera
Altera
FIR Filter
Compiler
22© 2002 Berkeley Design Technology,Inc.
Others: Xilinx
Virtex-II

Includes array of hard-wired 18 × 18 multipliers plus
distributed memory

Up to 168 multipliers in biggest chip

Most versions available now
Virtex-II Pro: joint effort with IBM

Adds up to four hard-wired
PowerPC 405 cores

Up to 216 multipliers in biggest chip

Sampling now
Prices begin at $169 (1 ku)
Source:Xilinx
“Virtex” line of FPGAs
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 12
23© 2002 Berkeley Design Technology,Inc.
FPGAs

Massive performance gains on some
algorithms

Architectural flexibility can yield efficiency

Adjust data widths throughout algorithm

Parallelism where you need it

Massive on-chip memory bandwidth

Efficiency compromised by generality

Embedded MAC units and memory blocks improve
efficiency but reduce generality

Re-use hardware for multiple tasks

Field reconfigurability (for some products)
Strengths and Weaknesses
24© 2002 Berkeley Design Technology,Inc.
FPGAs

Potentially good cost and power efficiency

But prices and power consumption are much
higher than DSPs’

Development is long and complicated

Design flow is unfamiliar to most DSP engineers

But cost and complexity is much lower than ASICs’

And processor cores reduce development burden

Development infrastructure badly lags DSPs’

DSP-oriented tools are immature

Xilinx has mature products, but others are
playing catch-up
Strengths and Weaknesses
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 13
25© 2002 Berkeley Design Technology,Inc.
Performance Analysis

Comparing performance of off-the-shelf DSP
to that of FPGAs is tricky

Common MMACS metric is oversimplified to
the point of absurdity

FPGAs vendors use distributed-arithmetic
benchmark implementations that require fixed
coefficients

MMACS metric overlooks need to dedicate
resources to non-MAC tasks

Many important DSP algorithms don’t use MACs at
all!
26© 2002 Berkeley Design Technology,Inc.
Alternative Approach: Application
Benchmarks
Use a full application, e.g., N channels of an
OFDM receiver
Hazards:

Applications tend to be ill-defined

Hand-optimization usually required in real-
world applications

Costly, time-consuming to implement

Evaluates programmer as much as processor

What is a “reasonable” benchmark
implementation?
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 14
27© 2002 Berkeley Design Technology,Inc.
Solution: Simplified Application
Benchmark
BDTI’s benchmark is based on a simplified
OFDM receiver

Closely resembles a real-world application

Simplified to enable optimized
implementations

Constrained to ensure consistent, reasonable
implementation practices
Benchmark goals:

Maximize the number of channels

Minimize the cost per channel
28© 2002 Berkeley Design Technology,Inc.
Benchmark Overview
Flexibility is an asset:

Algorithms range from table look-ups to MAC-
intensive transforms

Data sizes range from 4 to 16 bits

Data rates range from 40 to 320 MB/s

Data includes real and complex values
FFT
Slicer
Viterbi
Decoder
IQ
Demodulator
FIR
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 15
29© 2002 Berkeley Design Technology,Inc.
Benchmark Requirements
“Pins to pins”
Real-time throughput
Bit-exact output data
Resource sharing is permitted
Channel 1
FFT
4 ch.
FFT
4 ch.
FIR
8 ch.
Slicer
4 ch.
Slicer
4 ch.
Viterbi 2 ch.
Viterbi 2 ch.
Viterbi 2 ch.
Viterbi 2 ch.
Channel 2
Channel 3
Channel 4
Channel 5
Channel 8
Channel 7
Channel 6
30© 2002 Berkeley Design Technology,Inc.
Benchmark Results
~$10
$325
~10
Altera Stratix
1S20-6
(Projected)
~$50
~$500
Cost per
channel
$3,480
$140
Cost (1 ku)
~50
<<1
Channels
Altera Stratix
1S80-6
(Preliminary)
Motorola
MSC8101
(300 MHz)
These results are approximate. For full results, see BDTI's report, FPGAs for DSP.
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 16
31© 2002 Berkeley Design Technology,Inc.
Density Comparison
Source:Andre DeHon
[ALUbitOps/
λ
λ
λ
λ
2s]
Technology [λ
λλ
λ]
100
10
1
0.1
1.0
SRAM-based FPGAs
RISC Processors
32© 2002 Berkeley Design Technology,Inc.
Dealing with Non-Ideal Channels
Multi-antenna approach exploits
multi-path fading by sending
data along good channels
Results in large theoretical
improvements in bandwidth
efficiency for fading channels
But…computationally hungry
Array
Processing
x(t)
Array
Processing
1
st
path,
α
1
= 1
2
nd
path,
α
2
= 0.6
SNR (dB)
0
5
10
15
20
25
0
5
10
15
20
25
30
Capacity(bits/s/Hz)
(4,4) With Feedback
(4,4) No Feedback
(4,1) Orthogonal Design
(1,1) Baseline
Source:Jan Rabaey,
Berkeley Wireless
Research Center
y(t)
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 17
33© 2002 Berkeley Design Technology,Inc.
Why Use a DSP?

Many applications are not amenable to FPGA
implementations

Parallellism is sometimes inherently limited

Ultimate speed is not always the first priority

FPGAs are still too expensive for terminal
applications

FPGA energy efficiency is still an unknown

Implementing a complex algorithm is much
more difficult on an FPGA than on a DSP
34© 2002 Berkeley Design Technology,Inc.
Conclusions

High-end FPGAs can wallop DSPs on
computation-intensive, highly
parallelizable tasks

FPGAs are expensive, but they can beat DSPs
in terms of performance per dollar

DSP have the advantage in development
infrastructure, time-to-market,…

The “best” architecture depends on the
application

Heterogeneous architectures, e.g., combining
DSP and FPGA components, are a key trend
Comparing FPGAs and DSPs for Embedded Signal Processing
© 2002 Berkeley Design Technology,Inc.
Stanford University
October 2002
Page 18
35© 2002 Berkeley Design Technology,Inc.
For More Information...
www.BDTI.com
Free Information

BDTImark2000™ scores

DSP Insidernewsletter

Pocket Guide to Processors for DSP
White papers on processor architectures
and benchmarking
Article reprints on DSP-oriented
processors and applications

EE Times

IEEE Spectrum

IEEE Computerand others
comp.dspFAQ
2001 Edition