01 - Docjava.com

spectacularscarecrowΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

85 εμφανίσεις




Professor Douglas Lyon


Lyon@docjava.com


Fairfield University


http://www.docjava.com

Voice and Signal Processing

Two Course Texts!


Java for Programmers


Available from:


http://www.docjava.com

Java Digital Signal Processing


Java for Programmers


Available from:


http://www.docjava.com

Grading


Midterm: 1/3


Homework: 1/3


Final: 1/3


Midterm and Final


Take home!

Email


Please send me an e
-
mail asking to be
placed on the CR310 List


E
-
mail: lyon@docjava.com

Pre
-
reqs


You should have CS232 and MA 172


OR permission of the instructor


You need a working knowledge of Java!

What do I need to learn this?


Basic multimedia programming


It helps implement interesting programs


It enables active learning


It requires a good background in Java
programming

Preliminary Java Topics


exceptions (ch11)


nested reference data types (ch 12)


threads (ch13)

Preliminary IO Topics


files (ch14)


streams(15)


readers (16)


writers (17)

Preliminary GUI Topics


Swing (ch 18)


Events (ch 19)

What is Voice and Signal
Processing?


1D data processing


input sound


output sound


a time varying functions are used as both input
and output.

What is Digital Signal Processing?


A kind of data processing.


Typically numeric data processing


Look at
kind

and DIMENSION of data.


1D in, 1D out
-
> DSP.


2D in, 2D out
-
> Image Processing


2D in, symbols out
-
> computer vision


3D in, 2D out
-
> computer graphics

What are some DSP examples?


If the input is images and the output is images we
call it
image processing


If the input is images and the output is symbols we
call it pattern recognition or machine vision


If the input is text and the output is voice we call it
voice synthesis.


If the input is voice and the output is text we call it
voice recognition


If the input is images and geometry and the output is
images we call it image warping


What are some 1D DSP
applications?


Analysis


weak variables
-
> strong variables


Systhesis


Strong variables
-

> weak variables


What are some kinds of 1D data?


Any form of energy that can be digitized.


Any source of data (a function in 1D).


Voice data


Sound data


Temperature data


Range, blood pressure, EEG (brain stuff), EKG
(heart stuff), weight, age…..

non
-
physical phenomena and DSP


Anything that can produce a digital stream
of data is suitable for DSP


i.e., financial data,


statistical data,


network traffic, etc.

What is Audio?


Pressure wave that moves air.


Human auditory system (ear).


Audio is a sensation.

What is digitzation?

A low
-
pass filter removes high frequencies

ADC samples the signal and quantizes it

Parallel to serial converter is a shift
-
register

Sampling and Quantization

Quantization


1 part of digitization


input v(t)


ouput Vq(t)


let N = the number of quantization levels.


Suppose minimum voltage is 0 vdc


Suppose max voltage is 1 vdc


What is the min quantization step?

Computing the quantization step


maximum voltage / total number of steps.


For example, a CD has 16 audio sampling.


N = 2**16 = 65536


Voltage of quantization = 1/ 65536=0.00002


For AU files, N = 2 ** 8 = 256


Voltage of quantization = 1/256=.003

What is the noise relative to the
signal?


SNR = signal to noise ratio


Log(Signal power / noise power) to base 10.


This is named after Alexander Grahm Bell


It is called the decibel (dB).


10Log(65536/0.00002) = 95 db


Usually about 6 dB per bit.

General Analysis for the ADC

The role of the low
-
pass filter


anti
-
aliasing filter


Nyquest frequency = sample freq /2


only pass freqs below Nyquest Frequency

How do I reconstruct a signal?

sample/reconstruction process

v(t)
f
s
Amplifier
low-pass
filter
output
R
Digitizing Voice: PCM

Waveform Encoding


Nyquist Theorem: sample at twice the

highest frequency



Voice frequency range: 300
-
3400 Hz



Sampling frequency = 8000/sec (every 125us)



Bit rate: (2 x 4 Khz) x 8 bits per sample



= 64,000 bits per second (DS
-
0)


By far the most commonly used method

CODEC

PCM

64 Kbps

= DS
-
0

In 1D, DSP Is…


1D Digital signal processing is a kind of
data processing that operates on 1D PCM
data.

O
-
scope

Harmonics


The
fundamental

frequency of a sound is
said to be the component of strongest
magnitude.


Few sounds are just sine waves.


The extra waves in a sound refer to the
harmonic content or timbre.

Harmonic formula


A harmonic is a numeric multiple of
pitches.


If 440 Hz is the 1
st

harmonic then


880 Hz is the 2
nd

harmonic


Individual sine waves are called partials.

Harmonic Motion

The
frequency

of the oscillations is given by







How do I model Spectra?


Suppose the continuous signal is
v(t)


Let the Fourier coefficients be denoted:

v
(
t
)

a
0

(
a
1
cos
t

b
1
sin
t
)

(
a
2
cos
2
t

b
2
sin
2
t
)

a
0
,
a
1
,
b
1
,
a
2
,
b
2
Sawtooth Wave Form

K=10

Model of a Saw Wave

f
(
x
)

2


1


(
n

1
)
sin
(
n

x
)
n
n

1
K

Sawwave k=100

Example: a 4 voice synthesizer


Design a program that can:


Play sound


Provide a GUI for determining the amplitudes
of up to 7 harmonics


Enable the user to alter the frequency for the
fundamental tone.


Enable the playing of 4 voices


Enable the control of the overall volume.

Building an Oscillator in software


//the period of the wave form is



lambda = 1 / frequency in seconds


//The number of samples per period is



samplesPerCycle = sampleRate *
lambda;


sampleRate = 8000 samples/ second

Fourier transform

V
(
f
)

F
[
v
(
t
)
]

v
(
t
)
e

2

if t
dt




v
(
t
)

F

1
V
(
f
)



V
(
f
)
e
2

if t
dt






How do you compute the Fourier
Coefficients?


Use the Fourier transform!

v
(
t
)

a
0

(
a
1
cos
t

b
1
sin
t
)

(
a
2
cos
2
t

b
2
sin
2
t
)

V
(
f
)

F
[
v
(
t
)
]

v
(
t
)
e

2

if t
dt




v
(
t
)

F

1
V
(
f
)



V
(
f
)
e
2

if t
dt




Recall Euler’s identity


Complex numbers have a real and
imaginary part:

e
i


cos


i
sin

Another way to express a function

v
(
t
)

a
0

(
a
1
cos
t

b
1
sin
t
)

(
a
2
cos
2
t

b
2
sin
2
t
)

f
0

frequency
nf
0

nth harmonic of
f
0
Sine
-
Cosine Representation


x
(
t
)

a
n
cos
(
2

nf
0
t
)

b
n
sin
(
2

nf
0
t
)
n

1


n

0


f
0

frequency
nf
0

nth harmonic of
f
0


Correlation


Fourier coefficients, are found by
correlating the time dependent function,
x(t)
, with a Nth harmonic sine
-
cosine pair:

a
0

1
T
x
(
t
)
dt
0
T

a
n

2
T
x
(
t
)
cos
(
2

nf
0
t
)
dt
0
T

b
n

2
T
x
(
t
)
sin
(
2

nf
0
t
)
dt
0
T

amplitude
-
phase representation

x
(
t
)
=
c
0

c
n
cos
(
2

f
0
t


n
)
n

1


c
0

1
T
x
(
t
)
dt
0
T

c
n

a
n
2

b
n
2

n


tan

1
b
n
a
n






Average Power

P

1
t
1

t
2
x
(
t
)
2
t
1
t
2

2
0
1
( )
T
P x t dt
T


Periodic signal avg power

PSD (Power Spectral Density)


is the power at a
specific frequency, .

( )
S f
Linear combinations in the time
domain become linear combinations
in the frequency domain

1 1 2 2 1 1 2 1
( ) ( ) [ ( ) ( )]
aV f a V f F a v t a v t
  
Delay in the time domain causes a
phase shift in the frequency domain

2
( ) ( ( ))
if
d
V f e F v t t


 
Scale change in the time domain
causes a reciprocal scale change in
the frequency domain

1
( ( )),0
f
V F v t
 
 
 
 
 
 
convolution theorem: multiplication
in the time domain causes
convolution in the frequency domain

* ( ) ( ( ) ( ))
V W f F v t w t

Convolution between two functions
of the same variable is defined by

* ( ) ( ) ( )
V W f V W f d
  


 

Various Codec Bandwidth
Consumptions

Encoding/

Compression

Result

Bit Rate

G.711 PCM

A
-
Law/
u
-
Law

64 kbps (DS0)

G.726 ADPCM

16, 24, 32, 40 kbps

G.727 E
-
ADPCM

G.729 CS
-
ACELP

8 kbps

G.728 LD
-
CELP

16 kbps

G.723.1 CELP

6.3/5.3 kbps

Variable

16, 24, 32, 40 kbps

Standard

Transmission

Rate for Voice

A means to improve SNR


Compression uses a coder and a decoder.


One CODEC is called U
-
Law.


U
-
Law runs at 8 khz sampling and 8 bits per
digitized sample.


ULaw is meant for voice.

Voice grade audio
-
Application



voice over IP


Voice ranged to about 3.4 khz


Sample at 8 Khz, that should be plenty


Quantize to 8 bits of data (about 48 db
SNR)


Improve the SNR with compression

Voice Quality of Service (QoS)
Requirements


Loss

Delay

Delay Variation (Jitter)

Avoiding The 3 Main QoS Challenges

The
u
-
law codec


X is a number whose range is 0..255


Log, to the base 2 of X is a number whose
range is 0..8


U
-
law uses a scale factor (mu) that
multiplies the input before log is taken.


Log (x), base 2 = Log(x)/Log(2)


Mu
-
law takes the log to the base 1+mu.