Low Power Synchronization for Wireless Communication
by
Marcy Josephine Ammer
B.S. (Massachusetts Institute of Technology) 1997
M. Eng. (Massachusetts Institute of Technology) 1999
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Engineering – Electrical Engineering and Computer Sciences
in the
GRADUATE DIVISION
Of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor Jan Rabaey, Chair
Professor Heinrich Meyr
Professor Borivoje Nikolic
Professor Paul Wright
Fall 2004
The dissertation of Marcy Josephine Ammer is approved:
Chair Date
Date
Date
Date
University of California, Berkeley
Fall 2004
Low Power Synchronization for Wireless Communication
Copyright 2004
by
Marcy Josephine Ammer
1
Abstract
Low Power Synchronization for Wireless Communication
by
Marcy Josephine Ammer
Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Jan Rabaey, Chair
Synchronization is increasingly important in wireless communication devices.
Synchronization performance is critical to system performance and, it is where a large amount of
design time and receiver area and power is spent.
Not only is synchronization important, but the relevance is increasing due to four factors:
1. Decreased transmit distances use lower transmit power and, therefore, receiver power
begins to dominate.
2. The wireless channel is more frequency selective at higher transmission speeds which
require increased synchronization functionality.
3. Trends toward higher bandwidth efficiency moves modulation to higher order
constellations where synchronization specifications are tighter.
4. The push for integration moves RF functionality to digital CMOS processes with low
supply voltages forcing the synchronization system to contend with more frontend
nonidealities.
There are few places where the whole topic of synchronization is covered and fewer still
where the power consumption is considered. This research shows that significant system power
savings can be realized through systematic exploration of synchronization power consumption.
2
This dissertation sets up a framework for the systematic exploration of power consumption in
synchronization systems, applies this framework to a few representative problems, and uses some
system examples to show the impact of this type of exploration.
At the component level, frequency estimation and interpolation are investigated. It is shown
that frequency estimation power reductions of up to 4x are possible while simultaneously
decreasing convergence time by up to 4x. For interpolation, it is shown that proper parameter
selection can result in a 10x reduction in power consumption.
At the system level, two non standardsbased communication systems are considered. PNII is
a 1.6 Mbps personal area network system for wireless intercom type applications over short
distances (1030 m). The original system’s frequency and phase estimation blocks are redesigned
using the framework developed here. Simultaneous reductions of 66% in synchronization energy
consumption and 72% in convergence time are achieved. PN3 is a 50 Kbps system designed for
use in wireless sensor network applications. A 300uW synchronization system was designed for
PN3. This is low enough so that further reduction has very little impact on system energy
consumption.
i
Acknowledgements
I would like to thank my advisor, Jan Rabaey, for his grand vision and subtle guidance (except
when otherwise required). If I am half as successful in my career as he has been in his, I will be
fulfilled. I would also like to thank him, in conjunction with Bob Brodersen, for creating the
Berkeley Wireless Research Center. It has been a true gift to be able to earn my Ph.D. in such a
rich environment.
Thanks to Bora Nikoloic for setting such high standards in his EE225C class. My final project
in that class was the genesis of this work. Thanks also for being the much needed harsh critic and
for the constant encouragement to do good work.
To Heinrich Meyr for his encouragement and for teaching his seminar on Digital
Communication Receivers where I first got hooked on synchronization. Much of my work grows
out of the fundamentals in his two volumes on communication receivers. Vielen Dank.
Thanks to Paul Wright for adding a different perspective with flair.
Thanks to Tom Knight, my original advisor at MIT, for supporting my move to Berkeley and
for his constant wisdom, both technical and nontechnical.
To my labmates: Mike Sheets, Ian O’Donnell, Dave Sobel, and Johan Vanderhaegen.
Thanks, Mike, for being such a good friend as well as constantly saving me from the tools.
Thanks, Ian, for your endless supply of interesting conversations, your sense of humor, and your
cynical opinions. Thanks, Dave, for your clarity of thought, for always reminding me to be
methodical, and for helping me locate those pesky factorsoftwo I always seem to be missing.
ii
Thanks, Johan, for being so incredibly smart and precise. When we no longer work together, I
will sorely miss the safety net your knowledge provides.
To the old BobandJan group core: Varghese George, Marlene Wan, Vandana Prahbu, and
Jeff Gilbert. George showed me the ropes of Berkeley and all the good coffee shops and Indian
restaurants in town. Marlene is a never ending source of fun times and the inspiration that it is
possible to “do it all”. Vandana never fails to make you smile and remind you to be carefree.
And, Jeff is the sage. Thanks also to all of you for showing me what life looks like on the other
side of graduation.
A special thanks to Rhett Davis for all his untiring work on the first chip.
To my housemates: Sunny, Carol, Marina, and Sandon. Thanks for letting me be a part of
your lives, and teaching me to squeeze every ounce out of every experience.
To Tony Gray and Olin Shivers. Two great men, whose advice I always recall when things
look bad.
To my family for all the years of love and support. Especially my sisters. I will always
remember finishing my thesis as the time when Erin stopped being my little sister and became my
friend. Marissa, for having the courage to be yourself. I am so proud of your accomplishments.
And Candy, for being the consummate enduring friend.
Finally, I would like to thank Misha. I do not have the words to describe to what extent this is
all not possible without you. ‘Here’s to being speechless and those who make us so.’
Bali, Indonesia
August 28, 2004
iii
Table of Contents
Acknowledgements.....................................................................................................................i
Table of Contents......................................................................................................................iii
List of Figures..........................................................................................................................vii
List of Tables..............................................................................................................................x
1 Introduction.............................................................................................................................1
2 Background.............................................................................................................................7
2.1 Introduction......................................................................................................................7
2.2 Synchronization................................................................................................................8
2.3 Metrics for Comparing Algorithms................................................................................13
2.4 Wireless Channel Models...............................................................................................16
3 Evaluation and Exploration Environment.............................................................................22
3.1 Introduction....................................................................................................................22
3.2 Simulation and HDL Description of Algorithms...........................................................24
3.3 Gatelevel Power Estimation..........................................................................................26
3.4 Analog to Digital Converter Power Estimation.............................................................32
3.5 System Power Estimation Tool......................................................................................33
3.6 Conclusion......................................................................................................................34
4 PNII System..........................................................................................................................35
4.1 Introduction....................................................................................................................35
iv
4.2 System Details................................................................................................................36
4.3 Synchronization System.................................................................................................37
4.3.1 Timing Recovery.....................................................................................................38
4.3.2 Course Timing Estimation......................................................................................39
4.3.3 Fine Timing and Frequency Estimation..................................................................40
4.3.4 Frequency Correction and Timing Tracking...........................................................41
4.3.5 Phase Estimation and Correction............................................................................42
4.3.6 Synchronization System Performance.....................................................................43
4.4 Results and Conclusion..................................................................................................44
5 Frequency Estimation............................................................................................................46
5.1 Introduction....................................................................................................................46
5.2 Frequency Estimation Algorithms.................................................................................47
5.3 Power Estimation Methodology.....................................................................................52
5.4 Algorithm Comparison and Results...............................................................................53
5.5 Conclusion......................................................................................................................57
5.6 Postscript: Application to DSSS Systems......................................................................57
6 PNII System Refinement.......................................................................................................60
6.1 Introduction....................................................................................................................60
6.2 Frequency Estimation Refinement.................................................................................61
6.3 Frequency and Phase Estimation Redesign....................................................................63
6.3.1 Differential Modulation Penalty..............................................................................64
6.3.2 Phase Error vs. SNR Degradation...........................................................................65
6.3.3 FeedForward Phase Estimation..............................................................................69
6.3.4 Frequency and Phase Estimation Redesign.............................................................70
6.3.5 System Results........................................................................................................74
v
7 Interpolation..........................................................................................................................77
7.1 Introduction....................................................................................................................77
7.2 Interpolation Background...............................................................................................78
7.3 Farrow Interpolator Exploration.....................................................................................81
7.4 Achieving the Timing Resolution Specification............................................................86
7.5 Achieving the Output SNR Specification......................................................................88
7.6 Conclusion......................................................................................................................90
7.7 Postscript: Interpolator Hardware Implementation Specifics.........................................91
8 PN3 System Design...............................................................................................................92
8.1 Simplification of Synchronization.................................................................................93
8.2 Analog vs. Digital Implementation................................................................................96
8.3 Matched Filtering...........................................................................................................97
8.4 Amplitude Estimation....................................................................................................98
8.5 Timing Estimation..........................................................................................................99
8.6 Digital Synchronization Scheme Summary.................................................................102
8.7 Analog Synchronization Scheme Summary.................................................................103
8.8 Comparison of Synchronization Schemes....................................................................104
8.9 Conclusion and Future Work.......................................................................................106
8.10 Postscript: Simulation Environment...........................................................................107
9 Conclusion and Future Work..............................................................................................109
A Power Estimation Scripts...................................................................................................112
A.1 Makefile......................................................................................................................112
A.2 Netlist Script................................................................................................................115
A.3 Reporting Script..........................................................................................................117
A.4 Testbench....................................................................................................................118
vi
A.5 Simulate Script............................................................................................................120
A.6 Synthesis Script...........................................................................................................121
vii
List of Figures
Figure 11: Area and power of digital synchronization functions as a portion of PHY layer. a,b)
Bluetooth c) PNII d) 802.11a....................................................................................................2
Figure 21: Illustration of a typical communication system............................................................8
Figure 22: Realistic communication system includes synchronization system.............................9
Figure 23: Illustration of timing error............................................................................................9
Figure 24: Illustration of frequency error.....................................................................................10
Figure 25: Feedforward vs. feedback estimation........................................................................11
Figure 26: Synchronization algorithm classification. Highlighted blocks are those addressed in
this thesis..................................................................................................................................13
Figure 31: Flow diagram of tools used in this thesis....................................................................23
Figure 32: Example synchronization system in Simulink............................................................24
Figure 33: Estimation accuracy requirements..............................................................................28
Figure 34: Accurate gatelevel power estimation flow................................................................31
Figure 35: Proposed power estimation method comes within 15% of the EP method for a wide
range of block sizes..................................................................................................................32
Figure 41: PNII system block diagram........................................................................................37
Figure 42: Flow diagram of the PNII synchronization system....................................................38
Figure 43: Coarse timing block diagram......................................................................................40
Figure 44: Joint frequency and fine timing estimation................................................................40
viii
Figure 45: Power loss in correlator with frequency offset...........................................................41
Figure 51: Meyr and Kay weighted and unweighted performance..............................................49
Figure 52: Meyr weighted D = {1, 2} performance.....................................................................51
Figure 53: Block diagram of the weighted Kay estimator...........................................................51
Figure 54: Block diagram of the weighted Meyr estimator.........................................................52
Figure 55: Meyr weighted vs. unweighted comparison...............................................................55
Figure 56: Kay weighted vs. unweighted comparison.................................................................55
Figure 57: Meyr vs. Kay weighted comparison...........................................................................56
Figure 58: Meyr weighted D = 1 vs. D = 2 comparison...............................................................56
Figure 59: Variance of frequency estimation applied to chips versus symbols for 802.11blike
symbols....................................................................................................................................58
Figure 61: Convergence time of different frequency estimators..................................................62
Figure 62: BER of coherent and differential QPSK, BPSK.........................................................65
Figure 63: QPSK BER with Gaussian and fixed phase errors.....................................................67
Figure 64: QPSK BER with Gaussian phase error.......................................................................68
Figure 65: BER vs. SNR with uniform phase error in the range of [0..lim]................................69
Figure 66: Phase estimation variance vs. L for different SNRs...................................................70
Figure 67: System power consumption for different schemes.....................................................76
Figure 71: Block diagram of the Farrow interpolator..................................................................80
Figure 72: Tap error (dB) vs. (N, M) and W
T
...............................................................................84
Figure 73: Interpolation performance for timing resolution of 1/16............................................86
Figure 74: Interpolation performance for timing resolution of 1/64............................................87
Figure 75: Interpolation performance for timing resolution of 1/1024........................................87
Figure 76: Interpolator performance for Wµ = 2..........................................................................89
Figure 77: Interpolator performance for Wµ = 4..........................................................................89
ix
Figure 78: Interpolator performance for Wµ = 8..........................................................................90
Figure 81: Digital (a) and analog (b) synchronization header structure.....................................102
Figure 82: Performance breakdown of the digital synchronization scheme..............................103
Figure 83: Energyperusefulbit vs. packet length of analog and digital schemes...................104
Figure 84: Energy savings of 0bit and 9bit headers vs. 18bit headers...................................106
Figure 85: Digital algorithm high level simulation and digital synchronization block..............107
Figure 86: Simulation results of the digital synchronization system timing correlator.............108
x
List of Tables
Table 21: SNR degradation due to carrier phase and timing errors for PSK and QAM modulation
..................................................................................................................................................16
Table 22: Average path loss parameters for an indoor office environment at 2 GHz [ITU].......18
Table 23: R.m.s. delay spread for 2 GHz indoor office environment [ITU]................................19
Table 24: JTC indoor office environment channel models [JTC]................................................20
Table 41: Implementation losses in PNII synchronization and detection....................................43
Table 42: BBP statistics...............................................................................................................44
Table 43: Physical layer receiver power consumption.................................................................44
Table 61: New and old frequency estimation methods................................................................63
Table 62: Frequency/phase estimation methods to be considered...............................................72
Table 63: Comparison of different Frequency/Phase Estimation Schemes.................................74
Table 64: Parameters used in system exploration........................................................................74
Table 71: MMSE coefficients for λ = 4, (N, M) = (2, 2)..............................................................81
Table 72: Vo coefficients for λ = 4, (N, M) = (2, 2)....................................................................81
Table 81: Summary of synchronization requirements for the PN3 system..................................96
Table 82: Target synchronization implementation losses............................................................97
1
1
Introduction
“Assuming perfect synchronization…”
 Arbitrary Communication Text
Synchronization is an increasingly important component in a wireless communication device.
Synchronization performance is critical to system performance and it is where a large amount of
design time and receiver area and power is spent. There are few places where the whole topic of
synchronization is covered. In fact, in most texts on digital communication, the topic of
synchronization is examined very briefly if at all. Further, very few sources examine the
implementation costs of synchronization, especially the power consumption. This research shows
that significant system power savings can be realized through systematic exploration of
synchronization power consumption.
Synchronization is a significant component of wireless communication devices. Figure 11
highlights the area and power attributed to digital synchronization functions in three commercial
radio chips ((a) [KOK], (b) [CHA], and (d) [THO]) and one academic radio from this work (c). It
is shown that synchronization can consume up to 45% of the physical layer area.
2
Figure 11: Area and power of digital synchronization functions as a portion of PHY layer.
a,b) Bluetooth c) PNII d) 802.11a
The synchronization system typically has the highest clock rates and duty cycles of all digital
blocks. This, coupled with the large area, indicates that power consumption of synchronization
blocks is a significant component of physical layer power. Indeed, it will be shown in Chapter 4,
despite efforts to reduce power, the synchronization system still consumed 18% of the physical
layer power.
Not only is synchronization important, but the relevance is increasing due to four factors:
1) Decreased transmit distances use lower transmit power and, therefore, receiver power
begins to dominate.
2) The wireless channel is more frequency selective at higher transmission speeds which
require increased synchronization functionality.
3) Trends toward higher bandwidth efficiency moves modulation to higher order
constellations where synchronization specifications are tighter.
3
4) The push for integration moves RF functionality to digital CMOS processes with low
supply voltages forcing the synchronization system to contend with more frontend
nonidealities.
There are few authoritative sources where the whole topic of synchronization is addressed as a
cohesive unit. The seminal volumes by Meyr [MEY] are a noted exception. Further, very little
work on synchronization systematically considers implementation issues (again, the Meyr
volumes are an exception). Often existing research stops at complexity bound approximations.
This is to be expected. Development of synchronization algorithms requires a deep
understanding of communication and estimation theory. It is rare that someone with these skills
also has a deep understanding of circuit implementation technologies.
Most important of all, nowhere to the author’s knowledge is the power consumption of
different algorithms systematically compared. However, power consumption is one of the most
critical factors in the design of untethered wireless devices. Most notably, in the emerging field
of wireless sensor networks, power consumption is the most important factor [RAB2].
The topic of synchronization power consumption is too large to be solved in one dissertation.
Rather, this dissertation sets up a framework for the systematic exploration of power consumption
in synchronization systems, applies this framework to a few representative problems, and uses
some system examples to show the impact of this type of exploration. The two wireless
communication systems considered here are nonstandards based systems called PNII and PN3.
PNII is a 1.6 Mbps personal area network system designed to carry voice over short distances
(1030 m) for wireless intercom type applications [AMM]. PN3 is a 50 Kbps system designed for
use in wireless sensor network applications [SHE].
While the main focus of this work is on power consumption, it is not the only significant
metric. Circuit area, convergence time, and component cost are also important. Indeed, these
metrics are not orthogonal; often they are intricately linked. Therefore, it would be simplistic to
consider power consumption in isolation from the other criteria. Certainly, the framework
4
developed in this thesis is applicable to these other metrics. Wherever power consumption is
considered in this work, the effect on other metrics is noted. Although sometimes gains in one
metric must be traded for losses in another; sometimes both can be simultaneously improved.
Specific contributions of this thesis are:
• Definition of a framework for the systematic exploration of power consumption
in synchronization systems.
• Development of a fast and accurate method for power estimation that is within
15% accurate of the best available method and over 50 times faster. This is an
enabling step in being able to systematically characterize synchronization power
consumption over a meaningfully sized parameter space.
• A systematic exploration of feedforward dataaided frequency estimation
algorithms that resulted in the development of straightforward rules for which
algorithm to choose for a given system specification. Simultaneous reductions in
energy consumption and convergence time of more than a factor of 4 are possible
in some scenarios.
• A systematic exploration of the Farrowstyle interpolating filter which is critical
to the future systematic exploration of most timing recovery algorithms. Joint
optimization of interpolation and ADC power consumption illustrate the ability
of this framework to lower system power consumption, not just block power
consumption.
• Application of the frequency estimation exploration results to reduce the energy
consumption of the frequency estimation unit of the PNII system by 84% and the
convergence time by 50%.
• Systematic comparison of 4 different phase and frequency synchronization
methods for the PNII system including considering differential versus coherent
5
modulation schemes. Synchronization energy consumption was reduced by 66%
resulting in a system energy consumption reduction of 7% for coherent schemes,
and it was determined at what packet lengths it makes sense to move to
differential modulation. This is the first instance, to the author’s knowledge, in
which differential versus coherent modulation was systematically evaluated from
a system power consumption standpoint.
• Using the framework developed here, a synchronization system was designed for
a wireless sensor network application that consumes 300uW (including ADC
power). This is low enough so that further reduction has very little impact on
total system energy consumption.
While full characterization of the synchronization space is not completed in this work,
following it through to completion is a worthwhile goal. Results of this work show that this type
of exploration has meaningful impact on system performance. Completion of this research will
have a few fundamental ramifications. First, it will instruct proper algorithm selection for given
synchronization parameters. Second, it will illustrate which synchronization parameters are the
most difficult to estimate in terms of power consumption or convergence time. Third, it will
highlight areas where existing algorithms are inefficient. These answers most likely change in
different channel environments and over different modulation schemes and data rates. This
information can highlight promising areas for new algorithm development. It can assist in
producing the most efficient implementation for existing wireless communication standards. And
finally, it can assist in the creation of new wireless communication standards to meet the quality
of service goals with the lowest power or lowest synchronization overhead.
The remainder of this dissertation is as follows: Chapter 2 details the background information
necessary to understand this work including defining a classification of synchronization
algorithms, and the metrics on which synchronization algorithms are evaluated. Chapter 3
describes the tools used for system simulation and analysis, implementation, and power
6
consumption estimation. Chapter 4 describes the PNII system, shown in Figure 11c to motivate
the necessity of this research and to provide a system example for illustrating the improvements
possible with this research. Chapters 5 and 7 delve into systematic exploration of the power
consumption of specific classes of synchronization algorithms (frequency estimation, and timing
recovery respectively). These serve as examples for how algorithm exploration should be
conducted, and what information is necessary for these results to be used in a system design.
Chapters 6 and 8 move back up the system level to apply the techniques developed here to show
the significance of the results at the system level. First, in Chapter 6, the results of Chapter 5 are
used to improve the PNII system described in Chapter 4. Second, in Chapter 8, the framework is
applied to a wireless sensor network system (where power is the primary concern) to reduce the
power consumption of the synchronization system and show the impact on the system power
consumption.
7
2
Background
2.1 Introduction
This chapter details all the background information required to understand this thesis. For
some topics the reader is referred to canonical references. Topics are more fully described here
when canonical sources don’t exist, the information is used in a unique way for this work, or the
information is deemed too fundamental to this work to be omitted.
It is assumed that the reader is familiar with basic digital communication theory to the extent
described in the text by Proakis [PRO]. In particular, familiarity with the standard modulation
schemes such as OOK, MPSK, MQAM, and DSSS are required. The concept of theoretical
bounds on BER versus SNR for different modulation schemes is assumed known; however
specific bounds are reiterated when used. The reader is expected to be familiar with the use of
transmit filters such as the rootraisedcosine (RRC). Basic channel concepts such as multipath,
frequency selective vs. frequency flat fading and the basic techniques used to combat these
effects such as AGC and equalizers are assumed. Familiarity with the basic network protocol
stack (especially physical, datalink, network, and application layers) including basics of media
access control (MAC) is also helpful [ISO].
8
It is assumed that the reader is familiar with basic low power digital design principles to the
extent described in [RAB]. While no esoteric low power circuit techniques are used in this thesis,
these techniques can be applied orthogonally to these algorithms for further power reduction. It is
assumed that designers make use of the standard low power techniques available as builtin
functionality to industry standard tools, specifically, parallelization, optimizing out fixed
parameters from logic, gated clocks, lowleakage standard cell libraries, and using the lowest
supply voltage required for correct circuit operation.
The remainder of this chapter sets out to describe three other pieces of background
information. First, synchronization is described within the context used in this thesis. Second,
the metrics for comparing different synchronization algorithms are discussed. Last, the indoor
wireless channel model used for many examples throughout the thesis is defined.
2.2 Synchronization
A canonical communication system (Figure 21) typically considers the source and channel
coders (classified as outer receiver functionality), and some channel that perturbs symbols.
However, a realistic communication system (Figure 22) also considers what is called the inner
receiver consisting of the modulator, a waveform channel (one that perturbs transmitted
waveforms, not the more simplistic one that just perturbs symbols), and the synchronization
system in the receiver.
Figure 21: Illustration of a typical communication system
9
Figure 22: Realistic communication system includes synchronization system
The four salient synchronization parameters are timing (θ
ε
), phase (θ
φ
), frequency (θ
Ω
), and
amplitude (θ
Α
) (some of which may include multipath effects). Timing errors occur because of
the small mismatches in the transmitter and receiver oscillators and from the unknown time of
flight between transmitter and receiver (Figure 23).
Figure 23: Illustration of timing error
Phase errors occur because of mismatches in the transmitter and receiver carrier references
and from the unknown time of flight between the transmitter and receiver. In multipath channels,
each multipath arrival has a different time of flight, and therefore a different phase error to be
estimated.
Amplitude errors arise mostly because of attenuation in the channel, but also are contributed to
by mismatches in the transmitter and receiver frontend gain stages. As with phase errors, in
multipath channels, each multipath arrival takes a different path through the channel and therefore
has a different attenuation.
Frequency errors, more correctly termed carrier frequency errors, are cause by a frequency
mismatch in the transmitter and receiver carrier references (Figure 24). Frequency errors show
10
up as a rotating phase error in the received signal. While, it is possible to lump frequency errors
in with phase errors, most systems correct for frequency separately from phase, and therefore, it is
classified as a separate synchronization parameter.
Figure 24: Illustration of frequency error
With all four parameters, there is a notion of the rate of change being either slowlyvarying or
static. What is important is whether the parameter varies enough to matter over the observation
interval. If not, it can be treated as static for the purposes of synchronization. While static
parameters can be estimated once and that estimate used for the interval over which the parameter
is deemed to be static, varying parameters need to be either continually reestimated or
continuously tracked. Of course, reestimating or tracking parameters costs more power and area
(and potentially more synchronization preamble bits) than estimating static parameters once.
Sometimes system design can be used to reduce the number of varying parameters, and therefore
the power consumption of the synchronization system. One instance of this is using clock
references with tighter specifications so the variation over, say, one packet is negligible. Here is
where the channel model (including the variation of clock references, and frontend components)
is critical for specifying the required functionality of the synchronization system.
Estimation algorithms can be classified according to their type along two axes: the
configuration of the estimator and parameter adjustment blocks, and what additional information
is used to achieve the estimation. There are two configurations for the estimation and parameter
adjustment blocks: feedforward (FF) and feedback (FB) (Figure 25). In FF systems, the
estimator receives the input signal and computes the parameter estimate which is fed to the
11
parameter adjustment block. In FB systems, the estimator receives the output of the parameter
adjustment bloc and computes an error which is fed back to the parameter adjustment block.
Figure 25: Feedforward vs. feedback estimation
There are three categories of what additional information is used to achieve the estimation:
nondataaided (NDA), dataaided (DA), and decisiondirected (DD). When no additional
information other than the input signal is used, the estimation is termed nondataaided. When
known data symbols are sent (such as within a synchronization header, or pilot symbols
interspersed with the data), and these known data symbols are used to help the estimation, it is
called dataaided estimation. When no known data is sent, but detected symbols are used in the
place of knowndata symbols, the estimation is called decisiondirected. Nondataaided and
dataaided estimation can be performed in a feedback or feedforward configuration, however,
since detected symbols can only be know after parameter adjustment has been made, decision
directed estimation can only be performed in a feedback configuration.
All together, there are 20 different algorithm classifications (4 parameters x 5 estimation
types). Each classification can contain tens of algorithms that have been proposed in the
literature in addition to any new algorithms that are developed in the future. This thesis addresses
8 of these classifications in varying degrees (Figure 26). First, Chapter 5 performs a complete
exploration of 4 different feedforward dataaided frequency estimation algorithms. The results
of this exploration are twofold. First, it is determined which among these four algorithms
achieves the lowest power for a given input SNR and variance requirement. Second, absolute
12
numbers for power consumption and convergence time are determined which allow these
algorithms to be evaluated in a systemlevel framework. Chapter 5 serves as a model for how
these comparisons should be conducted and the results that are needed to allow a system level
designer to make use of the information.
The component exploration in Chapter 5 is continued in Chapter 7. A major component of
most timing recovery algorithms of any type is a timing interpolator to perform the parameter
adjustment. Therefore, a study of timing recovery algorithms relies on accurate power
consumption estimates of interpolators of various sizes and performance. Chapter 7 performs a
thorough study of the commonly used Farrow type of interpolator over a wide range of
parameters. The results of this work can be used to conduct the study of timing recovery
algorithms of all types.
The other three chapters explore entire synchronization systems rather than just a single block.
Within these chapters, several types of synchronization algorithms are used. In Chapter 4, timing
is performed in two steps. The coarse estimation is done with a feedforward dataaided
algorithm. The fine timing estimation is done jointly with the frequency estimation and uses a
different feedforward dataaided algorithm. Timing tracking is done with a nondataaided feed
forward algorithm. Phase acquisition is performed using a dataaided feedback algorithm, and
phase tracking is done with a feedback decisiondirected algorithm. In Chapter 6, different
frequency and phase estimation methods are compared. In addition to the method used in
Chapter 4, a feedforward nondataaided phase estimation method is explored for initial
estimation and tracking. In Chapter 8 a feedforward dataaided timing recovery method is
compared to a feedback dataaided method. In addition, a feedforward nondataaided
algorithm is used for amplitude estimation.
13
Figure 26: Synchronization algorithm classification. Highlighted blocks are those
addressed in this thesis.
2.3 Metrics for Comparing Algorithms
Systems in this thesis compared on a cost vs. performance basis. For synchronization
algorithms, cost is a multifaceted metric. Three interrelated components usually are considered:
power consumption, area, and component cost. Area and component cost are usually inextricably
tied because each square millimeter of silicon area costs more money. However, area also
determines how small the package and potentially how small the ultimate system. Component
cost also includes the cost of external components, such as offchip filters and crystal oscillators
(whose cost scales with required accuracy). Power consumption affects size and component cost
through the size of the battery, or in cooling mechanisms to dissipate the generated heat. Power
consumption also affects quality of service, in that the device may need the batteries recharged
more often.
Variance and convergence time are the main metrics used to measure the performance of a
synchronization algorithm. Specifically, the variance measured is that of the parameter estimate
produced (assuming the estimation is unbiased). If the estimation is biased, MMSE may be a
more appropriate metric. Convergence time is the number of symbols required to achieve that
variance. Bounds, called the CramerRao bounds (CRB), are available to determine what
14
variance is theoretically possible for different synchronization parameters given the input SNR
and convergence time. The actual CramerRao bounds, especially for timing estimation, depend
on the actual received waveform, so are dependent on modulation rate among other things, and
can be difficult to calculate exactly. Approximations are available, called modified CramerRao
bounds (MCRB), given in [DAN] for phase and frequency.
)(2
1
)(
0
N
E
s
N
MCRB =
φ
(21)
( )
)(1
6
)(
0
2
N
E
s
NN
MCRB
−
=Ω
(22)
where N is the estimation length or convergence time, and
0
N
E
s
is the signal to noise ratio for
symbols.
Tighter bounds are given in [TAV] for MPSK signals, but are more difficult to compute.
There are algorithms for phase and frequency estimation that are known to achieve the CRB at
high SNR. The CRB for timing is given in Meyr [MEY] under some realistic simplifying
assumptions: 1) independent noise samples, 2) signal pulse shape, g(t), is real, and 3) random
data.
∫
∫
∞+
∞−
+∞
∞−
=
ωωω
ωω
ε
dGT
dG
N
CRB
N
E
s
2
22
2
)(
)(
)(2
1
)(
0
(23)
Of course, the SNR gives a direct measure of the amplitude variance for one symbol. Therefore,
)(
1
0
)(
N
s
E
N
ACRB =
.
Next, the blocklevel metrics of variance and convergence time are translated to systemlevel
metrics. Convergence time is the easiest, since the convergence time for all synchronization
blocks can be summed to get the total convergence time (assuming no synchronization blocks
operate in parallel). However, translating the different variances for each synchronization
parameter into a global system specification is more difficult.
15
The official goal of the inner receiver system, as defined by Meyr [MEY], is to produce output
Y such that the outer receiver performance is as close as possible to the case where the estimated
values are equal to the actual values, i.e.
},,,{}
ˆ
,
ˆ
,
ˆ
,
ˆ
{
ΩΩ
= θθθθθθθθ
φεφε AA
. (24)
This combined effect can not be evaluated until the entire system is designed and simulated
together because it includes interactions between the synchronization parameters and the coding
used in the outer receiver. For this reason, it is impossible to partition separately amongst the
different synchronization blocks. Instead, the SNR margin metric is used in practice. Typically,
a communication system will specify a data rate and uncoded BER requirement. The input SNR
to the inner receiver will contain some margin over the theoretical SNR required to achieve this
BER. This SNR margin is typically how synchronization systems are specified and evaluated.
The total SNR margin is usually divided amongst the synchronization blocks using designer
experience to get an initial partitioning, and iterating once preliminary design of the different
synchronization blocks is completed. This adhoc process is not guaranteed to achieve the
optimal system design, but it is the method available using current information. The results of a
complete exploration of the synchronization space would allow this process to be deterministic
and achieve the optimal design. However, this process is currently prohibitive for any practical
system.
Formulas that compute the SNR degradation versus the variance of different synchronization
algorithms are used to convert between the metric of the algorithm: variance; and the metric for
the system: SNR loss. The BER degradation due to amplitude is easy to calculate since it can be
directly tied to SNR. The BER degradation for timing and phase errors is more difficult and is
treated in [MEY] for MPSK, MPAM, and M
2
QAM modulation. [MEY] gives approximations
to the degradation, D (measured in dB) defined as the increase in E
s
/N
0
required to maintain the
same BER as the receiver without synchronization errors. The approximation,
16
( )
[ ]
(dB)var2
10ln
10
0
ψ
N
E
s
BAD ⋅+=
. (25)
is officially valid for BER degradations < 0.2 dB (but is pretty accurate in most scenarios for D<1
dB). Table 21 gives the parameters A and B for degradation due to carrier phase and timing
errors for MPSK, MPAM, and M
2
QAM.
Table 21: SNR degradation due to carrier phase and timing errors for PSK and QAM
modulation
Carrier Phase Errors Timing Errors
A B A B
MPSK 1
(
)
M
π
2
cos
2
)0( Th
′′
−
(
)
( )
2)(
2)(
2
2
1
2
>
′
=
′
∑
∑
MTmTh
MTmTh
m
m
M
PAM
1 0
2
)0( Th
′′
−
(
)
∑
′
m
TmTh
2
)(
M
2

QAM
1 ½
2
)0( Th
′′
−
(
)
∑
′
m
TmTh
2
2
1
)(
The quantity, A, accounts for a reduction in the useful signal. The quantity, B, accounts for an
increase in the variance at the input of the decision device. Observe that the degradation due to
timing errors is dependent on the transmit pulse shape, h(t).
2.4 Wireless Channel Models
The simplest channel is the additive white Gaussian noise (AWGN) channel where noise with
a Gaussian distribution of zeromean and variance σ
2
is added to symbols in the channel. This
channel is often used when exploring outer receiver functionality. To explore inner receiver
functionality, a more complicated channel must be considered. This channel model must include
the effects of the transmitter and receiver frontends in addition to the effects of the channel (over
the air).
Effects of the transmitter and receiver local oscillators and carrier references can be modeled
in a straightforward manner using just one offset that is the sum of the errors in both the
17
transmitter and receiver. To model timing offset in simulation, an interpolation filter can be used.
To model carrier frequency offset, the modulated waveform is multiplied by a rotating phasor.
The clock accuracy (specified in parts per million or ppm) is an important parameter because
it determines how often the timing needs to be reestimated. If the required timing estimation
resolution is εT, where T is the symbol period, clocks can drift up to ½ εT over the course of the
packet before needing to be reestimated. If the crystal accuracy ppm is lower than 1e6*ε/2N,
then no timing tracking is needed. Where N is the maximum number of symbols in a packet, and
ε is the fractional timing resolution requirement. A similar equation can be used for determining
whether the frequency estimation is essentially static over the course of the packet.
Amplitude and phase models are more complicated since they depend on a combination of
factors in the wireless environment. This work uses the approach given in [ITU], reproducing
here the general channel modeling equations. However, in the interest of brevity, only the actual
coefficients for a 2GHz indoor office channel are given because that is the one used in this thesis
wherever a channel model is required.
Path loss effects are divided into two effects: average path loss, and associated shadow fading
statistics. Average path loss is that loss that is common to all multipath arrivals and is given by
dB28)(loglog20
1010
−
+
+
= nLdPfL
ftotal
(26)
where P is the distance power loss coefficient, f is the frequency (in MHz), d is the separation
distance in meters between the two terminals (d > 1 m), L
f
is the floor penetration loss factor in
decibels and n is the number of floors in a multistory building between the two terminals (only
included when
1≥n
). Table 22 outlines the parameter values used for the indoor office
environment at 2GHz.
18
Table 22: Average path loss parameters for an indoor office environment at 2 GHz [ITU]
Parameter Value
P 30
f 2,000 MHz
d 1100 m
L
f
15+4(n1)
Paths with a line of sight (LOS) component are dominated by freespace loss and have P=20.
The indoor shadow fading statistics are lognormal with a standard deviation of 10dB for our
channel.
The radio propagation channel varies in time and with spatial displacement. Even in the static
case where the transmitter and receiver locations are fixed, the channel can be dynamic since
scatters and reflectors are likely to be in motion. The term multipath arises from the fact that,
through reflection, diffraction, and scattering, radio waves can travel from a transmitter to a
receiver by multiple paths. There is a time delay associated with each of these paths that is
proportional to the path length. Each delayed signal has an associated amplitude (with real and
imaginary parts) and together they form a linear filter with timevarying characteristics. Since the
radio channel is linear; it is fully described by its impulse response. The impulse response is
usually represented as a power density that is a function of excess delay, relative to the first
detectable signal.
Although the r.m.s. delay spread is very widely used, it is not always a sufficient
characterization of the delay profile. However, if an exponentially decaying profile can be
assumed, it is sufficient to express the r.m.s. delay spread instead of the power delay profile. In
this case, the impulse response can be reconstructed approximately as:
⎩
⎨
⎧
≤≤
=
−
otherwise0
tt0for
)(
max
/
...smr
t
e
th
τ
(27)
19
where
...smr
τ
is the r.m.s delay spread and t
max
is the maximum delay (
...max smr
t
τ
>>
). Table 23
outlines the r.m.s. delay spreads used for the example channel. Within a given building, the delay
spread tends to increase as the distance between antennas increases.
Table 23: R.m.s. delay spread for 2 GHz indoor office environment [ITU]
Low value
appearing frequently
Median value
appearing frequently
High value
appearing rarely
...smr
τ
35 ns 100 ns 460 ns
One way to model the statistical nature of the channel is to replace the many scattered paths
that may exist in a real channel with only a few N multipath components in the model. With this
method, a complex Gaussian time variant process g
n
(t) models the superposition of unresolved
multipath components arriving from different angles with different delays close to the delay τ
n
of
the nth multipath component. Then, the impulse response h(t) is given by:
)()()(
1
nn
N
n
n
ttgpth τδ −=
∑
=
, (28)
where p
n
is the received power of the nth model multipath component.
The JTC channel models [JTC] give three different instantiations of the channel for
simulations of indoor office environments. Channel A is the least severe, Channel B is
intermediate, and Channel C is extremely severe. The coefficients for the model are given in
Table 24. Note the indoor channel models use a flat Doppler spectrum, whereas models for an
outdoor channel usually use the Jakes Doppler Spectrum [DEN] to determine the correlation in
channel coefficients over time.
20
Table 24: JTC indoor office environment channel models [JTC]
Channel A Channel B Channel C
Tap
Relative
Delay
(ns)
Average
Power
(dB)
Relative
Delay
(ns)
Average
Power
(dB)
Relative
Delay
(ns)
Average
Power
(dB)
Doppler
Spectrum
1 0 0 0 0 0 0 Flat
2 50 3.6 50 1.6 100 0.9 Flat
3 100 7.2 150 4.7 150 1.4 Flat
4 325 10.1 500 2.6 Flat
5 550 17.1 550 5.0 Flat
6 700 21.7 1125 1.2 Flat
7 1650 10.0 Flat
8 2375 21.7 Flat
...smr
τ
43 116 598
The Doppler spectrum (whether Jakes or flat) is defined by the maximum Doppler frequency
shift in the channel given by:
cfvf
cD
/
max
⋅
=
㈭㤩=
睨敲攠 f
c
is the carrier frequency, c is the speed of light, and v is the maximum speed of objects in
the channel (whether the transmitter, receiver, or scattering or reflecting elements in the channel).
For the 2 GHz indoor channel, 10 Hz is a common value for f
Dmax
(translating to a speed of
around 6km/hr).
The maximum Doppler frequency is an important parameter because it dictates how quickly
the channel is changing and therefore whether the phase and amplitude synchronization
parameters are static or slowly varying. Specifically,
max
/1
D
f
is the coherence time of the
channel, or the time at which channel estimates become uncorrelated with each other. Therefore,
if the estimate made at the start of the packet is to be, say, x% correlated with the last symbol in
the packet, the packet length (t
packet
) must be shorter than
max
/%)1(
Dpacket
fxt
−
<
. (210)
The r.m.s. delay spread is also an important parameter because it determines whether the
channel is frequency selective or frequency nonselective (flat). A synchronization system for a
21
frequency selective channel must combat multipath effects (for instance with the use of an
equalizer or RAKE receiver), but no such complexity is required for the flat channel.
Specifically,
smr..
/1
τ
猠瑨攠捯桥牥湣攠扡湤睩摴栠潦⁴桥= 慮湥氬爠瑨攠晲敱略湣a=摩晦敲敮捥癥爠
睨楣栠捨慮湥氠敳瑩m慴敳e捯me⁵湣潲牥污瑥搠睩瑨慣 栠潴桥爮†⁔桥牥景牥Ⱐ楦⁴桥慮摷楤瑨=潦o
瑨攠獩杮慬猠汥獳⁴桡渠┠潦⁴桥潨敲敮捥慮摷 楤瑨Ⱐ睥慹⁴桥=捨慮湥氠楳污琠慮搠′u汴楰慴栠
敦晥捴猠湥敤潴攠捯湳楤e±敤⸠⁈潷敶敲Ⱐ楦e 瑨攠獩杮慬慮摷楤瑨=牥慴敲⁴桡渠=映瑨攠
捯桥牥湣攠扡湤睩摴栬⁴h攠捨慮湥氠楳牥煵敮′y獥汥捴 楶攬湤u汴楰慴栠敦晥捴猠mu獴攠瑡步渠楮瑯s
慣捯畮琺a
=
⎩
⎨
⎧
>
<
selectivefrequency )/1(1.0
flat)/1(1.0
...
...
smr
smr
BW
τ
τ
. (211)
22
3
Evaluation and Exploration Environment
3.1 Introduction
Simulation and implementation tools are an important component of this research. First, a
rich simulation environment for communication algorithms is required. Second, the ability to
move quickly from simulated algorithm to implementation is also desired. Lastly, two levels of
power estimation are needed. The first is to accurately estimate the absolute power consumption
of an algorithm. The second is to compare different synchronization systems in framework that
considers total system power consumption. A flow diagram of the different tools used in this
research is show in Figure 31. To compare two algorithms, only relatively accurate estimations
are required. However, an absolutely accurate power estimation, though more difficult to
achieve, is necessary to be able to use in the system framework where power consumption of
other components is included.
23
Figure 31: Flow diagram of tools used in this thesis
Packetbased communication systems often require resynchronization with every packet.
Especially in adhoc networks where transmitters communicate with different receivers at
different times, the synchronization parameters are different every time and therefore can not be
stored between packets. In this case, the synchronization convergence time can be a significant
portion of the packet length. The energy expended in the synchronization along with the energy
transmitting and receiving the synchronization header must be calculated into a systemlevel
metric. Different synchronization algorithms may take different amounts of time to converge to
the required accuracy. In this case, the algorithms must be compared in a system framework.
Higher power algorithms with shorter convergence time may be favored over lower power
algorithms with longer convergence times. In order for the designer to make the appropriate
tradeoff, the power estimates must be absolutely accurate, and the power consumption of other
subsystems, such as the transmitter and receiver frontend power, must be known.
24
3.2 Simulation and HDL Description of Algorithms
Synchronization algorithm implementation costs (area and power) are often dominated by
datapath operations such as multipliers and adders, with relatively simple control requirements.
Mathworks Simulink [MAT] was chosen for algorithm simulation. It is a graphical data flow tool
with many provided library functions which make it easy to simulate and analyze communication
systems. An additional program, Stateflow [MAT], is integrated into Simulink to allow graphical
entry of state machines for programming control functions. An example of a Simulink
synchronization system simulation is shown in Figure 32.
Figure 32: Example synchronization system in Simulink
For hardware coding, Synopsys Module Compiler [SYN] was chosen as the entry point for the
datapath portions of the algorithms. Its highlevel HDL language allows an algorithm to be
parameterized and later synthesized in different configurations. (For instance, it’s possible to
synthesize a frequency estimation algorithm for different input SNRs and estimation lengths.) It
is built to optimize datapath operations with features such as allowing adder implementations to
25
be easily customized between carrysave and ripplecarry. It is known to achieve better area for
datapath blocks than standard HDL synthesizers [HAI].
The use of Module Compiler enables reuse of many smaller modules within larger designs.
Some builtin functions in module compiler have facilitated easy implementation of
communication algorithms in this thesis:
• Various adder types (carrysave, ripplecarry, etc.)
• Various multiplier types (booth, signed/unsigned, +/ A*B, A*(B+C) where C is one
bit, pipelined/unpipelined, etc.)
• Square (special multiplier for two identical inputs)
• Scalar Multiply ACcumulate (MAC)
• Comparators/Muxes/Selectors
• Shift registers
A small library of the following parameterized blocks built on the basic blocks has served to
implement most blocks in this thesis:
• Filters (fixed and adaptive coefficients are automatically detected by Module
Compiler)
• CORDIC (a single parameterized CORDIC slice is arrayed in several configurations
to implement iterative/pipelined anglefinder/rotator blocks)
• Complex MAC
By creating a Simulink library of corresponding parameterized blocks, larger designs can be
implemented and simulated in Simulink with good assurance that they can be quickly translated
to the equivalent behavior in hardware. Verification testbenches ensure that the Simulink and
hardware versions are equivalent through simulation.
For control flow, an automated tool, called SF2VHD [CAM], automatically converts Matlab
Stateflow diagrams into VHDL for synthesis. Since the control is usually a small part of the
26
synchronization algorithm, no effort was spent optimizing these state machine implementations
beyond compilation in standard synthesis tools.
For power comparison, each algorithm is coded as a parameterized module in Module
Compiler. Each module is synthesized as a gatelevel VHDL netlist in Module Compiler for a
range of parameters, such as input SNR and estimation length. Realistic input vectors for each
block are synthesized in MATLAB by simulating the block inside a realistic system and capturing
the inputs. Each synthesized VHDL netlist from Module Compiler is sent through the gatelevel
power estimation tool using the input vectors from MATLAB.
For the examples in this work, power estimation is done assuming a 0.13um technology. In
the component exploration sections of this thesis (Chapter 5 and 6), the impact of changing
technology on the presented results are discussed. In all cases, the highly automated flow allows
automatic recharacterization in a new process once new libraries are available.
3.3 Gatelevel Power Estimation
The most accurate power estimation method available with current tools is to extract parasitics
from a postplacedandrouted design and simulate in a switchlevel simulator like PowerMill or
NanoSim (called Extracted Physical or EP estimation method). Our own experience and reports
from our foundry show this method of power estimation to be within 15% accurate compared to
power consumption of actual chips. However, placing and routing a design can take considerable
time, and switchlevel simulation is very slow. It can take up to two days to complete the
placement, routing, extraction, and simulation of a moderately sized block with today’s
computers and tools. Since this research relies on the accurate power estimation of several
algorithms across many different parameter sets (for instance over 100 frequency estimation
blocks), this research would be impossible with power estimation this slow. Therefore, a faster
power estimation method was required. The method should automatically characterize the same
algorithm over a set of parameters, and make as much use as possible of existing tools. In this
27
way, this power estimation flow benefits from the constant improvements made in the existing
tools.
Faster methods of power consumption than the EP method are available, but typically incur
errors in proportion to their estimation speed. Therefore, to get the fastest estimation feasible for
this research, it is necessary to examine the required power estimation accuracy. To reach the
correct conclusion when comparing two items, the accuracy of the estimate must be better than
the difference between the two items being compared. As stated in Chapter 1 synchronization
systems can consume around 15% of the physical layer power. In order to make an impact on
system power consumption (say greater than 5%), synchronization power consumption has to
improve by at least 30%. Estimation accuracy should be on the order of (or better than) this
desired improvement. Figure 33 shows an estimation accuracy of 30% and the desired
improvement of the original versus the revised system of 30%. In order to guarantee that the
actual revised system is at least 30% better than the original system, the estimates have to show
an improvement of almost a factor of two (y=50%). Since test chips are not available for
comparison, the proposed power estimation method will be compared to the EP estimation
method. Therefore, a method which is within 15% accurate of the EP method is required.
28
Figure 33: Estimation accuracy requirements
The fastest, but least accurate power estimation methods are statistical gatelevel methods
(called PG for probabilistic gatelevel). Here, the gatelevel netlist is analyzed assuming
statistical activity factors on the inputs, which are propagated throughout the design to produce a
statistical activity factor for each net. Statistical activity factors are multiplied by statistical wire
load models, and statistical switching probabilities of the gates to produce a power estimate.
Because communication data is often highly correlated, these statistical methods, which assume
randomness, are not accurate enough for our purposes. For instance, in an illustrative experiment
a complex MAC with 8bit inputs and 23bit outputs, consumes 22 uW with random inputs, but
only 10uW with realistic inputs as would occur in a frequency estimator of a communication
system.
To capture the power savings from correlations in the data stream, the design must be
simulated to determine the actual activity factors on each net and within each gate. Gate level
simulation is around 50 times faster than switch level simulation not including time to place and
route, and requires fewer tools. Existing synthesis tools, such as Synopsys Design/Power
29
Compiler have the builtin capability to use gatelevel simulation information to produce a power
estimate. However, typical gatelevel power estimation with simulation (called SG for simulated
gatelevel) is still not accurate enough because there are some critical components missing from
all gatelevel estimation methods.
A typical flow for taking a gatelevel netlist to a placedandrouted netlist is:
• Place the gates in standard cell rows
• Insert a clock tree and route the clock net
• Insert hold time buffers to eliminate race conditions between register stages
• Route the signal nets
In comparison to the EP method, the SG power estimation is missing 3 pieces of information
which make the estimations less accurate. First and most important is the clock tree, which often
accounts for 3050% of the block power. Second, the power of the hold time buffers can be
significant especially where there is little combinational logic between registers (such as in
communication systems components like filters and delay chains. Third, the exact wire loads are
not known until the design is placed and routed.
An accurate gatelevel power estimation method (called AG for accurate gatelevel) has to
address these three issues. The easiest issue to address is the wire loads. Although the exact
length of each wire is unknown before placement and routing, current tools do a good job of
estimating the average load of a wire in the system. These estimates are based on the technology
and the number of gates in the block. Since placement tools don’t use information about the
activity factor on the nets, they are just as likely to force long routes on highactivity wires as
lowactivity wires. Therefore, the statistical wire load models are used. The second issue to
address is the hold time buffers. Hold time buffers are averted if there is enough combinational
circuit delay or wire delay between registers. Placing hold time buffers are placed assuming
statistical wire load models. Insertion of holdtime buffers is achieved in Synopsys Design
30
Compiler with a builtin function that fixes hold times on specified nodes. The last issue to
address is the clock tree insertion. It turns out that the exact clock tree is not necessary for power
estimation purposes. It is possible to force the tools to insert a “good enough” clock tree into the
gate level netlist. This is achieved by tagging the clock as a highfanout node in Synopsys Design
Compiler. By placing constraints on the rise and fall times of the clock net, the tool inserts a
“good enough” clock tree into the design. By addressing these three issues, gatelevel power
estimation accuracy can be within 15% of the EP method as will be shown below. Of course, the
accuracy of the estimation relies on the accuracy of the standard cell library characterization. To
achieve these results, no extra characterization was required. The foundrysupplied libraries were
characterized well enough for to meet the power estimation accuracy goals.
As process technology scales, leakage power is becoming a significant source of power
consumption both when blocks are in use and when they are in standby mode. Because leakage
power can be significant, it is included in the power consumption estimates produced by the AG
method. In standby mode, aggressive low power designs have blocklevel gated clocks and
power rails. By gating both the clocks and power rails, standby power is reduced to near zero and
need not be considered in the system power framework.
The new AG estimation flow is shown in Figure 34. Each VHDL netlist is incrementally
compiled in Synopsys Design Compiler to insert a clock tree and to add buffer delays to fix hold
time violations. The block is then simulated at the gate level in ModelSim using realistic input
vectors to verify functionality and to determine the switching activity on each node. Synopsys
Power Compiler is used to estimate the power consumption of the block using the back annotated
switching activity and statistical wire load models.
31
Figure 34: Accurate gatelevel power estimation flow
Five frequency estimation blocks with a wide range of parameters were compared using the
AG method versus the EP method. The results are shown in Figure 35 along with the SG power
estimation method. Over a wide range of block sizes, the AG estimation is within 15% of the EP
estimation (see error bars) however, the SG method had errors of 3050%. The Makefile and
scripts for running the AG power estimation for a range of frequency estimation blocks are given
in Appendix A.
32
EP vs. AG Power Estimation Method
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
1 2 3
L
Power (uW)
Group 1
(EP)
Group 2
(EP)
Group 1
(AG)
Group 2
(AG)
Group 1
(SG)
Group 2
(SG)
Figure 35: Proposed power estimation method comes within 15% of the EP method for a
wide range of block sizes.
The AG power estimation method is over 50 times faster than EP method (not including the
time required to placeandroute the block and thereby extract accurate parasitics). The total time
to characterize 21 different chosen instantiations of a frequency estimation algorithm is less than
3 hours using the AG method. Execution time will vary with the size of the block, the duration of
the simulation interval, and different server processor and memory configurations.
3.4 Analog to Digital Converter Power Estimation
The analog to digital converter (ADC) is often a significant powerconsuming component of a
communication system. Because different synchronization systems place different requirements
on the ADC, a method to estimate the power consumption of ADCs with different specifications
is required. In a survey of over 100 ADCs published in the literature from 1978 to 1999 [WAL],
a simple but accurate architectureindependent figure of merit (FOM) is determined for
comparing them:
33
diss
samp
SNRbits
P
f
FOM
⋅
=
2
. (31)
Here f
samp
is the sampling rate, P
diss
is the power dissipation, and SNRbits is the equivalent
number of bits given by:
02.6/)76.1)((
−
=
dBSNRSNRbits
. (32)
FOMs from the surveyed ADCs range between 1x10
10
and 1.2x10
12
with a mean around
1x10
11
. Given that the designer does the best design possible with the given process technology,
the FOM is dependent on how extreme are the given ADC specs relative to the fundamental
process capabilities. For instance, an f
samp
that is closer to the maximum frequency of a process is
likely to achieve a lower FOM than one that has a much lower f
samp
. Therefore, to predict the
power consumption of an ADC with arbitrary specifications, one needs to find an appropriate
FOM. This can be achieved by finding a similar ADC in the literature and using the same FOM,
or by extrapolating an FOM by determining how extreme the required specs are relative to the
fundamental process capabilities.
3.5 System Power Estimation Tool
To compare algorithms with different convergence times, a systemlevel power estimation
tool is required. For the purposes of this work, two communication system variables are
generally considered: the length of the header, and the transmit power. Other variables, such as
the number of bits per packet, and the required BER are typically fixed for a given scenario.
Because the synchronization system is well within the physical layer, a sensible metric is energy
perusefulbit (EPUB), taking energy over the physical layer components. EPUB may not be the
right metric for upper levels of the protocol stack, like the network or MAC layer (where network
uptime or latency may also be considered). For instance, the number of packet collisions
increases with increasing packet length. Therefore, packets with more header bits will incur more
packet collisions, and therefore, more energy per useful bit. However, for comparisons where the
34
difference in packet lengths is within 10%, the increased power consumption due to increased
packet collisions can be safely ignored. Therefore, EPUB is used because it is simple and
adequate for the purposes of this research.
The energy consumed by the system per packet includes the power in the transmitter and
receiver and is equal to:
DDSSRXDissTXDissDSP
PBPBPPBBE
+
+
+
+= ))((
,,
(33)
Where B
S
is the number of synchronization bits, B
D
is the number of data bits, P
Diss,TX
is the
transmitter power dissipation including radiated power, P
Diss,RX
is the receiver frontend power,
P
S
is the baseband power when synchronizing, and P
D
is the baseband power when receiving
data. Energy per useful bit is computed by dividing E
P
by B
D
.
D
P
B
E
EPUB =
(34)
3.6 Conclusion
The MATLAB Simulink and Stateflow tools are used for simulation and analysis of
communication algorithms. SF2VHD and the developed libraries in Synopsys Module Compiler
allow quick translation into implementation. An accurate and fast power estimation method has
been developed. The key steps to getting accurate power estimation at the gatelevel are to add a
clock tree, hold time buffers, and to simulate with realistic input vectors. These steps are
achieved using Synopsys Designs Compiler, Power Compiler, and ModelSim. This method has
proven to be within 15% accurate versus the EP method, and believed to be 30% accurate versus
real chip measurements.
Use of parameterized modules in Simulink and Module Compiler allow one hardware
description to be automatically synthesized, verified, and characterized over a wide parameter
space. Because the blocklevel estimation is absolutely accurate, it is possible to compare
algorithms in a systemlevel framework using the EPUB metric.
35
4
PNII System
4.1 Introduction
The PNII system is a 1.6 Mbps personal area network system designed to carry voice over
short distances (1030 m) for wireless intercom type applications [AMM]. PNII was the impetus
for this research on low power synchronization. Much effort was expended to make PNII a low
power synchronization system. Despite these efforts, the synchronization system still consumed
18% of the physical layer power. Most of the power reduction effort was centered on circuit
implementation, such as choosing the right adder types and complex multiply structures, using
the lowest possible supply voltage, and gating clocks on unused blocks. Therefore, it was
determined that to further reduce synchronization power consumption it was necessary to move to
higher levels of design, such as up to algorithm selection or system design.
The preliminary design of the synchronization system was documented in [HUS]. Much of
the structure of the physical layer, from the data rate, modulation scheme, and ADC oversampling
rate was dictated by the frontend and system designers [YEE]. This is not an uncommon
scenario in radio design. Often the synchronization system is designed to accommodate
36
constraints dictated by other radio subsystems rather than the other way around. One goal of this
thesis is to show that this is not always an advantageous design methodology from a system
energy perspective.
This chapter is devoted to describing the original PNII synchronization system and some of
the power saving implementation methods employed. This is not to say that this system is in any
way optimal. In fact, parts of the system are provably suboptimal (as will be shown in Chapter
7). Rather, the goals here are threefold: 1) To provide an example of the design and
implementation a complete synchronization system, 2) to provide a design example for future
refinement gains to be illustrated, and 3) as motivation for the necessity of this research.
4.2 System Details
The protocol used in PNII, called Intercom, allows for adhoc peertopeer communication of
64kbps uplink/downlink between 20 sensor/communicator nodes [AMM]. A data rate of
1.6Mbps and a BER of 1e5 is required to support this functionality.
The physical layer is made compatible with a commercially available RF frontend
(performing carrier up/down conversion), ADC, and DAC. Although the commercial
components have high power consumption resulting from their tight design specs, the PHY
accommodates significantly relaxed specs for integration with a custom, lowpower front end
[YEE] (e.g. by only requiring a freerunning clock with 50 ppm accuracy). The chip integrates
all other PHY receiver and transmitter functions, such as carrier detect, synchronization, and
detection.
The airinterface is direct sequence spread spectrum (DSSS) with a length 31 spreading code
at 25 Mcps (Million Chips per Second) and QPSK modulation resulting in a raw data rate of 1.6
Mbps. The primary receiver specifications are
±
1=䭈稠浡硩Kum慲物敲=晲敱略湣y=潦晳整o
⠫⼭㔰⁰灭牯=′䝈稠捡牲楥爠牥晥牥湣攩Ⱐ慮搠愠 㔰灰5⁁䑃慭ple汯捫⸠=周攠瑲慮獭楴楬瑥爠
楳潯琭牡楳敤潳楮攠睩瑨汰桡㴰⸳⸠⁔=攠mi湩nu m湰畴=华删⡰敲桩瀩琠瑨攠楮灵琠瑨攠䅄䌠楳=
37
2.9 dB for a SNR per symbol of 12dB
1
. Ideal detection of QPSK symbols requires 9.6dB to
achieve a BER of 1e5. Therefore, the 12dB input SNR gives a realistic (if overly generous)
2.4dB implementation target. The PNII supports a typical indoor frequencyselective wireless
channel with mobile units traveling at foot speeds as described in Chapter 2.
4.3 Synchronization System
Figure 41: PNII system block diagram.
A block diagram is shown in Figure 41. The RX/TX Controller interfaces with the protocol
processor and controls the data flow from one data path block to another. During receive, the
baseband signal is sampled by dual offchip 8bit ADCs at 100 Msps (4 samples per chip) using a
freerunning clock. These 100 MHz streams are each split into 4 parallel streams of 25 MHz each
so that the BBP could operate off the slower 25 MHz chip clock reducing power by allowing a
1
The original synchronization design required an input SNR per chip of 5dB for a SNR per symbol of 19.9dB. However, as the
system specs were dictated to require 1e5 BER, it was determined that the original SNR spec was grossly wasteful, and a lower input
SNR could achieve the design goals.
38
lower operating voltage. Parallel filter techniques interpolate the streams to increase the receiver
timing resolution to 8 samples per chip. Performing onchip interpolation of the signal is lower
power than running the ADC at twice the rate. However, further reduction of the ADC sampling
rate is prohibited by the frontend filter specs.
4.3.1 Timing Recovery
Figure 42: Flow diagram of the PNII synchronization system
A flow diagram of the synchronization system is shown in Figure 42. The overall goal of the
timing recovery unit is to select the best of 8 timing instances per chip. This is completed in two
steps: a coarse timing estimation which estimates the timing to within 3/8 chip and a fine timing
estimation which estimates timing to within 1/8 chip. The timing variance due to quantization is
(
)
2
2
1
)var(
OSR
Q
=ε
(41)
where OSR is the relative symbol oversampling ratio. In the final timing estimation step, the
OSR is 248 (8 samples per chip * 31 of chips per symbol). Therefore, the variance for the final
timing estimate is 4.1e6. Whereas, in the initial timing estimation step, the OSR is 8/3 * 31 =
83, for a variance of 3.7e5. The variance of the selection process must be lower than the
variance caused by the quantization in order for the final result to be quantizationlimited. If the
system is not quantizationlimited, energy has been wasted in the ADC, interpolation filter, and
synchronization hardware to accommodate the unnecessarily high oversampling ratio. The SNR
39
degradation due to timing recovery for this DSSS signal with rootraised cosine data with
alpha=0.3 is determined by simulation to be 0.3 dB.
4.3.2 Course Timing Estimation
Before coarse timing estimation, the system performs carrier detect using an algorithm that
compares the codematched filter output to an adaptive threshold, set using the RSSI
measurement. If the codematched filter output exceeds the threshold twice with a delay of one
symbol between threshold crossings, it is assumed that the correct code is being sent and carrier
detect status is declared. The coarse timing block then estimates timing to within 3/8 chip by
selecting the best of streams 2, 4, and 7 using a dataaided feedforward algorithm (Figure 43).
The variance of this algorithm is treated in [MEY], and for rootraised cosine data with α=0.3 is
given by:
)8.0*
1
2
3.0
*
1
(
1
)var(
22
CSNRCL
T
+=
ε
(42)
where C is the number of chips used in the estimate, L is the number of chips per symbol, and
SNR is given per chip. The L
2
factor is due to the estimate being produced in fractions of chips,
whereas we are interested in fractions of symbols. A variance of 1.1x10
5
is achieved with
estimation performed over one symbol (C=31 chips). This is lower than the quantization error of
3.7x10
5
for this stage, so the performance is sufficiently quantizationlimited.
40
Figure 43: Coarse timing block diagram
4.3.3 Fine Timing and Frequency Estimation
The fine timing block estimates timing to within 1/8 chip and the carrier frequency offset to
within 2.5 Hz using the unweighted Meyr algorithm (Figure 44). While this algorithm is
typically used solely for frequency estimation, Meyr suggests its use as a joint frequency and
timing estimator [MEY]. The timing variance of this method is not computed analytically by
Meyr, but simulation shows it to be lower than 1e6 under worst case frequency offset conditions.
This is sufficiently smaller than the required variance of 4.1e6. The variance of the frequency
estimation is 4.5e5 with 35 symbol estimation, giving a 3sigma residual offset of less than the
2.5 KHz required by the pullin range of the PLL.
Figure 44: Joint frequency and fine timing estimation
41
4.3.4 Frequency Correction and Timing Tracking
The rotate and correlate block corrects the frequency offset, correlates the incoming signal
with the spreading code, and performs early/late detection to track the optimal timing instant
(using a FF NDA algorithm to choose the best of the chosen stream or one if its direct neighbors).
Since 50ppm clocks are used, the system should switch streams no more frequently than once
every 40 symbols.
The coarse frequency offset needs to be corrected before entering the PLL for two reasons.
First, the pullin range of the PLL is limited. Second, the frequency offset must be corrected
before entering the code correlator to avoid the power loss associated with correlation in the
presence of a large frequency offset. Figure 45 shows postcorrelation power loss as a function
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο