Topic 2
Signal Processing Review
(Some slides are adapted from Bryan Pardoβs course slides on Machine Perception of Music)
Recording Sound
Mechanical
Vibration
Pressure
Waves
Motion

>Voltage
Transducer
Voltage over time
2
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Microphones
http://www.mediacollege.com/audio/microphones/how

microphones

work.html
3
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Pure Tone = Sine Wave
time
amplitude
frequency
i
nitial phase
π‘
=
sin
(
2
+
π
)
Time (
ms
)
Amplitude
0
2
4
6
1
0
1
440Hz
Period T
4
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Reminders
β’
Frequency,
=
1
/
π
, is measured in cycles per
second , a.k.a.
Hertz
(Hz).
β’
One cycle contains
2
radians.
β’
Angular
frequency
Ξ©
, is measured in radians per
second and is related to frequency by
Ξ©
=
2
.
β’
So we can rewrite the sine wave as
π‘
=
sin
(
Ξ©
π‘
+
π
)
5
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Fourier Transform
Time (
ms
)
Amplitude
0
2
4
6
1
0
1
=
(
π‘
)
β
2
ππ
π‘
β
β
β
Amplitude
Frequency (Hz)
0
440

440


6
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
We can also write
Time (
ms
)
Amplitude
0
2
4
6
1
0
1
Ξ©
=
(
π‘
)
β
Ξ©
π‘
β
β
β
Amplitude
Angular Frequency (radians)
0
440
Γ
2
β
440
Γ
2


7
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Complex Tone = Sine Wave
s
0
10
20
30
40
50
60
70
80
90
100
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
70
80
90
100
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
70
80
90
100
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
70
80
90
100
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
+
+
=
220 Hz
660 Hz
1100 Hz
8
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Frequency Domain
Amplitude
Frequency (Hz)
Time (
ms
)
Amplitude
0
10
20
30
40
50
60
70
80
90
100
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
220
660
1100
=
(
π‘
)
β
2
ππ
π‘
β
β
β


9
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Harmonic Sound
β’
1 or more sine waves
β’
Strong components at
integer multiples
of
a
fundamental frequency (F0)
in the range
of human hearing (20
H
z
~
20,000
H
z)
β’
Examples
β
220 + 660 + 1100 is harmonic
β
220 + 375 + 770 is
not
harmonic
10
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Noise
β’
Lots of
sines
at random
freqs
. = NOISE
β’
Example: 100
sines
with random
frequencies, such that
100
<
<
10000
.
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
30
20
10
0
10
20
30
11
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
How strong is the signal?
β’
Instantaneous value?
β’
Average value?
β’
Something else?
0
2
4
6
1
0
1
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
30
20
10
0
10
20
30
π‘
12
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Acoustical or Electrical
β’
Acoustical
πΌ
=
1
1
π
π·
2
π‘
π‘
π·
0
β’
Electrical
=
1
1
π
π·
2
π‘
π‘
π·
0
View
π‘
as
sound pressure
Average
intensity
View
π‘
as
electric voltage
Average
power
density
sound
speed
resistance
13
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Root

Mean

Square (RMS)
=
1
π
π·
2
π‘
π‘
π·
0
β’
π
π·
should be long enough.
β’
(
π‘
)
should have 0 mean, otherwise the DC
component will be integrated.
β’
For sinusoids
=
1
π
2
sin
2
2
π‘
π‘
0
=
2
/
2
=
0
.
707
14
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Sound Pressure Level (SPL)
β’
Softest audible sound intensity
0.000000000001 watt/m
2
β’
Threshold of pain is around 1 watt/m
2
β’
12 orders of magnitude difference
β’
A log scale helps with this
β’
The decibel (dB) scale is a log scale, with
respect to a reference value
15
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
The Decibel
β’
A logarithmic measurement that expresses the
magnitude of a physical quantity (e.g.
power or
intensity) relative to a specified
reference level
.
β’
Since it expresses a ratio of two (same unit)
quantities, it is
dimensionless.
πΏ
β
πΏ
ref
=
10
log
10
πΌ
πΌ
ref
=
20
log
10
,
ref
16
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Lots of references!
β’
dB SPL
β
A measure of sound pressure level. 0dB SPL is
approximately the quietest sound a human can hear,
roughly the sound of a mosquito flying 3 meters away
.
β’
dbFS
β
relative to digital full

scale. 0 VU is the
maximum allowable signal. Values typically negative.
β’
dBV
β
relative to 1 Volt RMS. 0dBV = 1V.
β’
dBu
β
relative to 0
.
775 Volts RMS with an unloaded,
open circuit.
β’
dBmV
β
relative
to 1 millivolt across 75
Ξ©. Widely
used
in
cable television networks.
β’
β¦β¦
17
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Typical Values
β’
Jet engine at 3m
β’
Pain threshold
β’
Loud motorcycle, 5m
β’
Vacuum cleaner
β’
Quiet restaurant
β’
Rustling leaves
β’
Human breathing, 3m
β’
Hearing threshold
140 db

SPL
130 db

SPL
110 db

SPL
80 db

SPL
50 db

SPL
20 db

SPL
10 db

SPL
0 db

SPL
18
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Digital Sampling
0
1
2
3

1

2
AMPLITUDE
TIME
quantization increment
sample
interval
011
010
0
01
101
100
000
RECONSTRUCTION
19
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
More quantization levels = more dynamic range
0
1
2
3
4
5
6

4

3

2

1
0000
0001
0010
0110
0100
0101
0011
1001
1010
1011
1000
AMPLITUDE
TIME
sample
interval
quantization increment
20
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Bit Depth and Dynamics
β’
More bits = more quantization levels = better
sound
β’
Compact Disc: 16 bits = 65,536 levels
β’
POTS (plain old telephone service): 8 bits = 256
levels
β’
Signal

to

quantization

noise ratio (SQNR), if the
signal is uniformly distributed in the whole range
SQNR
=
20
log
10
2
β
6
.
02
dB
β
E.g. 16 bits depth gives about 96dB SQNR.
21
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
RMS
=
1
2
[
]
β
1
π
=
0
Amplitude
0
2
4
6
1
0
1
The red dots
form the discrete
signal
[
]
22
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Aliasing and
Nyquist
0
1
2
3
4
5
6
AMPLITUDE
TIME

4

3

2

1
sample
interval
23
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Aliasing and Nyquist
0
1
2
3
4
5
6
AMPLITUDE
TIME

4

3

2

1
sample
interval
24
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Aliasing and Nyquist
0
1
2
3
4
5
6
AMPLITUDE
TIME

4

3

2

1
sample
interval
25
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Nyquist

Shannon Sampling Theorem
β’
You canβt reproduce the signal if your
sample rate isnβt faster than twice the
highest frequency in the signal.
β’
Nyquist
rate: twice the frequency of the highest
frequency in the signal.
β
A property of the continuous

time signal.
β’
Nyquist
frequency: half of the sampling rate
β
A property of the discrete

time system.
26
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Discrete

Time Fourier Transform (DTFT)
Amplitude
0
2
4
6
1
0
1
π
=
[
]
β
π
π
β
π
=
β
β
Amplitude
Angular frequency
π
0
β
2

π

The red dots form the
discrete signal
[
]
,
where
=
0
,
Β±
1
,
Β±
2
,
β¦
2
(
π
)
is Periodic.
We often only show
β
,
π
is a continuous variable
β
27
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Relation between FT and DTFT
π
=
1
π
π
π
π
+
2
π
π
β
=
β
β
β’
Scaling:
π
=
Ξ©
π
, i.e.
π
=
2
corresponds to
Ξ©
=
2π
=
2
, which corresponds to
=
.
β’
Repetition:
π
contains infinite copies of
π
,
spaced by
2
.
Amplitude
0
2
4
6
1
0
1
Time (
ms
)
Sampling:
=
π
(
π
)
FT:
π
(
Ξ©
)
=
π
(
π‘
)
β
Ξ©
π‘
β
β
β
DTFT:
π
=
[
]
β
ππ
β
π
=
β
β
28
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Aliasing
Ξ©
0

π
Ξ©

1
800
3600
β
3600
β
1800
Complex tone
900Hz + 1800Hz
Sampling rate
= 8000Hz
0

π

2
β
2
β
3600
8000
π
Sampling rate
= 2000Hz
π
0
2
β
2
β

π

3600
2000
1800
2000
200Hz
29
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Fourier Series
β’
FT and DTFT do not require the signal to be periodic, i.e.
the signal may contain arbitrary frequencies, which is
why the frequency domain is continuous.
β’
Now, if the signal is periodic:
π‘
+
π
=
π‘
β
β
Ξ
β’
It can be reproduced by a series of sine and cosine
functions:
π‘
=
0
+
π
cos
Ξ©
π
π‘
+
π
sin
Ξ©
π
π‘
β
π
=
1
β’
In other words, the frequency domain is discrete.
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Discrete Fourier Transform (DFT)
β’
FT and DTFT are great, but the infinite integral
or summations are hard to deal with.
β’
In digital computers, everything is discrete,
including both the signal and its spectrum
π
=
[
]
β
2
ππ
/
β
1
π
=
0
frequency
domain index
t
ime domain
index
Length of the
signal, i.e.
length of DFT
31
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
DFT and IDFT
π
=
[
]
β
2
ππ
/
π
=
0
=
1
[
π
]
2
ππ
/
β
1
=
0
β’
Both
[
]
and
[
π
]
are discrete and of length
.
β’
Treats
[
]
as if it were infinite and periodic.
β’
Treats
[
π
]
as if it were infinite and periodic.
β’
Only one period is involved in calculation.
DFT:
IDFT:
32
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Discrete Fourier Transform
β’
If the time

domain signal has no imaginary
part (like an audio signal
)
then the frequency

domain signal is
conjugate symmetric around
N/2.
DFT
0
N

1
0
N

1
0
N

1
0
N

1
Real portion
Imaginary portion
N/2
N/2
Real portion
Imaginary portion
Time domain
[
]
Frequency domain
[
π
]
IDFT
33
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
DC
f
s
/2
Kinds of Fourier Transforms
Fourier Transform
Signals: continuous, aperiodic
Spectrum: aperiodic, continuous
Fourier Series
Signals: continuous, periodic
Spectrum: aperiodic, discrete
Discrete Time Fourier Transform
Signals: discrete, aperiodic
Spectrum: periodic, continuous
Discrete Fourier Transform
Signals: discrete, periodic
Spectrum: periodic, discrete
34
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
The FFT
β’
Fast Fourier Transform
β
A much, much faster way to do the DFT
β
Introduced by Carl F.
Gauss in 1805
β
Rediscovered by J.W. Cooley and John
Tukey
in 1965
β
The
Cooley

Tukey
algorithm is the one we use
today (mostly)
β
Big O notation for this is
O(N
log
N)
β
Matlab
functions
fft
and
ifft
are standard.
35
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Windowing
β’
A function that is zero

valued outside of some
chosen interval.
β
When a signal (data) is multiplied by a window
function, the product is zero

valued outside the
interval: all that is left is the "view" through the
window.
x[n]
w[n]
z[n]
x
=
Example: windowing x[n] with a rectangular window
36
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Some famous windows
β’
Rectangular
=
1
β’
Triangular
(Bartlett
)
=
2
β
1
β
1
2
β
β
β
1
2
β’
Hann
=
0
.
5
1
β
cos
2
ππ
β
1
Note: we assume w[
n
] = 0
outside some range [0,
N
]
sample
amplitude
sample
amplitude
sample
amplitude
37
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Why window shape matters
β’
Donβt forget that a DFT assumes the
signal in the window is periodic
β’
The boundary conditions mess things
upβ¦unless you manage to have a window
whose length
is
exactly 1 period of your
signal
β’
Making the edges of the window less
prominent helps suppress undesirable
artifacts
38
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Fourier Transform of Windows
4
2
0
2
4
30
20
10
0
10
20
30
40
Normalized angular frequency
Amplitude (dB)
Main lobe
Sidelobes
We want

Narrow main lobe

Low
sidelobes
39
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Which window is better?
4
2
0
2
4
150
100
50
0
50
Normalized angular frequency
Amplitude (dB)
4
2
0
2
4
60
40
20
0
20
40
Normalized angular frequency
Amplitude (dB)
Hann
window
=
0
.
5
1
β
cos
2
β
1
Hamming window
=
0
.
54
β
0
.
46
Γ
cos
2
β
1
40
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Multiplication
v.s
. Convolution
Time domain
Frequency Domain
[
]
β
[
]
1
[
π
]
β
[
π
]
[
]
β
[
]
[
π
]
β
[
π
]
β’
Windowing is multiplication in time domain, so the spectrum
will be a convolution between the signalβs spectrum and the
windowβs spectrum
β’
Convolution in time domain takes
(
2
)
, but if we perform in
the frequency domainβ¦
β’
FFT takes
log
β’
Multiplication takes
β’
IFFT takes
log
41
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Windowed Signal
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
42
0
50
100
150
200
250
300
350
400
3
2
1
0
1
2
3
0
50
100
150
200
250
300
350
400
3
2
1
0
1
2
3
Spectrum of Windowed Signal
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
Frequency (Hz)
Amplitude (dB)
β’
Two sinusoids: 1000Hz + 1500Hz
β’
Sampling rate: 10KHz
β’
Window length: 100 (i.e. 100/10K = 0.01s)
β’
FFT length: 400 (i.e. 4 times zero padding)
43
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Zero Padding
β’
Add zeros after (or before) the signal to
make it longer
β’
Perform DFT on the padded signal
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
44
0
200
400
600
800
1000
1200
1400
1600
3
2
1
0
1
2
3
Windowed
signal
Padded zeros
Why Zero Padding?
β’
Zero padding in time domain gives the ideal
interpolation in the frequency domain.
β’
It doesnβt increase (the real) frequency resolution!
β
4 times is generally enough
β
Here the resolution is always
fs
/L=100Hz
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
Frequency (Hz)
Amplitude (dB)
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
Frequency (Hz)
Amplitude (dB)
No zero padding
4 times zero padding
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
Frequency (Hz)
Amplitude (dB)
8 times zero padding
45
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
How to increase frequency resolution?
β’
Time

frequency resolution tradeoff
β
π‘
β
β
=
1
(second) (Hz)
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
60
Frequency (Hz)
Amplitude (dB)
0
1000
2000
3000
4000
5000
100
50
0
50
Frequency (Hz)
Amplitude (dB)
0
1000
2000
3000
4000
5000
80
60
40
20
0
20
40
Frequency (Hz)
Amplitude (dB)
46
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Window length: 10ms
Window length: 20ms
Window length: 40ms
Short time Fourier Transform
β’
Break signal into windows
β’
Calculate DFT of each window
47
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
The Spectrogram
β’
There
is a
βspectrogramβ
function in
matlab
, but you
canβt do zero padding using it.
48
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
A Fun Example
(Thanks to Robert
Remez
)
49
ECE 492

Computer Audition and Its Applications in Music, Zhiyao Duan 2013
Overlap

Add Synthesis
β’
IDFT on each spectrum
β
The complex, full spectrum
β
Donβt forget the
phase (
often using the original
phase).
β
If you do it right, the time signal you get is real.
β’
Multiply with a synthesis window (e.g.
Hamming)
β
Not dividing the analysis window
β’
Overlap and add different frames together.
50
ECE 492

Computer Audition and Its Applications in Music, Zhiyao
Duan
2013
Shepard Tones
Continuous
Risset
scale
Barberβs pole
ECE 492

Computer Audition and Its Applications in Music, Zhiyao
Duan
2013
51
Shepard Tones
β’
Make a sound composed of sine waves
spaced at octave intervals.
β’
Control their amplitudes by imposing a
Gaussian (or something like it) filter in the
(log) frequency dimension
β’
Move all the sine waves up a musical Β½
step.
β’
Wrap around in frequency.
ECE 492

Computer Audition and Its Applications in Music, Zhiyao
Duan
2013
52
Comments 0
Log in to post a comment