Relationship to inner products

spiritualblurtedΤεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

69 εμφανίσεις

Digital Signal Processing



1.


Define statistical variance and covariance.


In

probability theory

and

statistics
,

covariance

is a measure of how much two

random variables
change
together. If the greater
values of one variable mainly correspond with the greater values of the other
variable, and the same holds for the smaller values, i.e. the variables tend to show similar behavior, the
covariance is a positive number. In the opposite case, when the greater

values of one variable mainly
correspond to the smaller values of the other, i.e. the variables tend to show opposite behavior, the
covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship
between the vari
ables. The magnitude of the covariance is not that easy to interpret. The normalized
version of the covariance, the

correlation coefficient
, however shows by its magnitude the strength of the
linear relation.A distinction has to be made between the covariance of two random variables,
a

population

parameter
, that can be seen as a property of the

joint probability distribution

at one side, and
on the other side the
sample

covariance, which serves as an

estimated

value of the parameter.

Relationship to inner products

Many of the properties of covariance can be extracted elegantly by observing that it satisfies similar
properties to those of an

inner product
:

1.

bilinear
: for constants

a

and

b

and random variables

X
,

Y
, and

U
, Cov(
aX

+

bY
,

U
)
=

a

Cov(
X
,

U
)

+

b

Cov(
Y
,

U
)

2.

symmetric: Cov(
X
,

Y
) = Cov(
Y
,

X
)

3.

positive semi
-
definite
: Var(
X
) = Cov(
X
,

X
)



0, and Cov(
X
,

X
) =

0 implies that

X

is a constant

random variable

(
K
).

In fact these properties imply that the covariance defines an inner product over the

quotient vector
space

obtained by t
aking the subspace of random variables with finite second moment and identifying
any two that differ by a constant. (This identification turns the positive semi
-
definiteness above into
positive definiteness.) That quotient vector space is isomorphic to the

subspace of random variables with
finite second moment and mean zero; on that subspace, the covariance is exactly the

L
2

inner product of
real
-
valued functions on the sample space.

2.

How do

you compute the energy of a discrete signal in time and frequency domains?


The

Discrete
-
Time Fourier Transform

is a version of the fourier transform that is used to convert
a discrete data set into a continuous
-
frequency representation. The DTFT is
used mostly in theory, and
less in practice, because computers are not usually capable of handling continuous
-
frequency data. The
DTFT is also useful because it provides a theoretical basis for the Z transform.

Conceptually, it is
important to note that si
gnal processing operates on

an

abstract

representation

of a physical
quantity and not on the quantity itself. At the same time, the

type

of abstract representation we
choose for the physical phenomenon of interest determines the nature of a signal processi
ng unit.
A temperature regulation device, for instance, is not a signal processing system as a whole. The
device does however contain a signal processing core in the feedback control unit which converts
the instantaneous

measure

of the temperature into an
ON/OFF trigger for the heating element.
The physical nature of this unit depends on the temperature model: a simple design is that of a
mechanical device based on the dilation of a metal sensor; more likely, the temperature signal is
a voltage generated by

a thermocouple and in this case the matched signal processing unit is an
operational amplifier.Finally, the adjective “digital”

derives from

digitus,

the Latin word for
finger: it concisely describes a world view where everything can be ultimately represe
nted as an
integer number. Counting, first on one’s fingers and then in one’s head, is the earliest and most
fundamental form of abstraction; as children we quickly learn that counting does indeed bring
disparate objects (the proverbial “apples and oranges
”) into a common modeling paradigm,
i.e.

their cardinality. Digital signal processing is a flavor of signal processing in which
everything

including

time

is described in terms of integer numbers; in other words, the abstract
representation of choice is a o
ne
-
size
-
fit
-
all countability. Note that our earlier “thought
experiment” about ambient temperature fits this paradigm very naturally: the measuring instants
form a countable set (the days in a month) and so do the measures themselves (imagine a finite
numb
er of ticks on the thermometer’s scale). In digital signal processing the underlying abstract
representation is always the set of natural numbers regardless of the signal’s origins; as a
consequence, the physical nature of the processing device will also a
lways remain the same, that
is, a general digital (micro)processor. The extraordinary power and success of digital signal
processing derives from the inherent universality of its associated “world view”.


3.


Define sample autocorrelation function. Give the mean value of this estimate.


Autocorrelation

is the

cross
-
correlation

of a

signal

with itself. Informally, it is the similarity between
observations as a function of the time separation between them. It is a mathematical tool for finding
repeating patterns, such as t
he presence of a periodic signal which has been buried under noise, or
identifying the

missing fundamental
frequency in a signal implied by its

harmonic

frequencies. It is often
used in

signal processing

for analyzing functions or series of values,

such as

time domain

signals.

In

statistics
, the autocorrelation of a

random process

describes the

correlation

between values of the
process at different points in time, as a function of the two times or of the t
ime difference. Let

X

be some
repeatable process, and

i

be some point in time after the start of that process. (
i

may be an
integer

for
a

discrete
-
time

process or a

real number

for a

continuous
-
time

process.) Then

X
i

is the value
(or

realization
) produced by a given

run

of the process at time

i
. Suppose that the process is further
known to have defined values for

mean

μ
i

and

variance

σ
i
2

for all times

i
. Then the definition of the
autocorrelation between times

s

and

t

is


where "E" is the

expected value

operator. Not
e that this expression is not well
-
defined for all time
series or processes, because the variance may be zero (for a constant process) or infinite. If the
function

R

is well
-
defined, its value must lie in the range [−1,

1], with 1 indicating perfect correl
ation
and −1 indicating perfect

anti
-
correlation
.If

X
t

is a

second
-
order stationary process

then the
mean

μ

and the variance

σ
2

are time
-
independent, and further the autocorrelation depends only on
the difference between

t

and

s
: the correlation depends only on the time
-
distance between the pair of

values but not on their position in time. This further implies that the autocorrelation can be expressed
as a function of the time
-
lag, and that this would be an

even function

of the lag τ

=

s



t
. This gives
the more familiar form


and the fact that this is an

even function

can be stated as


It is common practice in some disciplines, other than statistics and

time series analysis
, to
drop the normalization by

σ
2

and use the term "autocorrelation"
interchangeably with
"autocovariance". However, the normalization is important both because the interpretation of
the autocorrelation as a correlation provides a scale
-
free measure of the strength
of

statistical dependence
, and because the normalization has an effect on the statistical
properties of the estimated autocorrelations.


4.


What is the basic principle of Welch method to estimate power spectrum?


In

physics
,

engineering
, and applied

mathematic
s
,

Welch's method
, named after P.D. Welch, is used
for estimating the

power

of a

signal

at different

frequencies
: that is, is is an approach to

spectral density
estimation
. The method is based on the concept of using

periodogram

spectrum estimates, which are the
result of converting a signal from the time domain to the fre
quency domain. Welch's method is an
improvement on the standard

periodogram

spectrum estimating method and on

Bartlett's method
, in that
it reduces noise in the estimated

power spectra

in exchange for

reducing the frequency resolution. Due to
the noise caused by imperfect and finite data, the noise reduction from Welch's method is often desired.

The Welch method is based on

Bartlett's method

and differs in two ways:

1.

The signal is split up into overlapping segments: The original data segment is split up into L data
segments of length M, overlapping by D points.

1.

If D = M / 2, the overlap is said to be 50%

2.

If D = 0, the

overlap is said to be 0%. This is the same situation as in the

Bartlett's
method
.

2.

The overlapping segments are then windowed: After the data is split up into
overlapping
segments, the individual L data segments have a window applied to them (in the time domain).

1.

Most

window functions

afford more influence to the data at the center

of the set than to
data at the edges, which represents a loss of information. To mitigate that loss, the
individual data sets are commonly overlapped in time (as in the above step).

2.

The windowing of the segments is what makes the Welch method a
"modified"
periodogram
.

After doing the above, the

periodogram

is calculated by computing the

discrete Fourier transform
, and
then computing the squared magnitude of the result. The individual

peri
odograms

are then time
-
averaged, which reduces the variance of the individual power measurements. The end result is an array
of power measurements vs. frequency "bin".


5.


How do find the ML estimate?


In statistics,

maximum
-
likelihood estimation

(
MLE
) is
a method of

estimating

the

parameters

of
a
statistical model
. When applied to a data set and given a

statistical model
, maximum
-
likelihood
estimation provides

estimates

for the model's parameters.The method of maximum likelihood
corresponds to many well
-
known estimation methods in statistics. For example, one may be interested in
the heights of adult female giraffes, but

be unable due to cost or time constraints, to measure the height
of every single giraffe in a population. Assuming that the heights are

normally (Gaussian) distribut
ed

with
some unknown

mean

and

variance
, the mean and variance can be estimated with MLE while only
knowing the heights of some
sample of the overall population. MLE would accomplish this by taking the
mean and variance as parameters and finding particular parametric values that make the observed
results the most probable (given the model).In general, for a fixed set of data and un
derlying statistical
model, the method of maximum likelihood selects values of the model parameters that produce a
distribution that gives the observed data the greatest probability (i.e., parameters that maximize
the

likelihood function
). Maximum
-
likelihood estimation gives a unified approach to estimation, which
is

well
-
defined

in the case of

the

normal distribution

and many other problems. However, in some
complicated problems, difficulties do occur: in such problems, maximum
-
likelihood estimators are
un
suitable or do not exist.

Suppose there is a sample

x
1
,

x
2
, …,

x
n

of

n

independent and identically
distributed

observations,
coming from a distribution with an unknown

pdf

ƒ
0
(∙). It is however surmised that
the function

ƒ
0
belongs to a certain family of distributions

{

ƒ
(∙|
θ
),

θ



Θ

}, called the

parametric model
, so
that
ƒ
0

=

ƒ
(∙|
θ
0
)
. The value

θ
0

is unknown and is referred to as the "
true value
" of the parameter. It is
desirable to find some

estimator


which would be as close to the true value

θ
0

as possible. Both the
observed variables

x
i

and the parameter

θ

can
be vectors.

To use the method of maximum likelihood, one first specifies the

joint density
function

for all observations.
For an

iid

sample this joint density function will be


Now we look at this function from a different perspective by considering the observed values

x
1
,

x
2
,
...,
x
n

to be fixed "parameters" of this function, whereas

θ

will

be the function's variable and allowed to
vary freely. From this point of view this distribution function will be called the

likelihood
:


s


6.


Give the basic principle of Levinson recursion.

7.

Why are FIR filters widely used for adaptive filters?

8.


Express the Windrow
-

Hoff LMS adaptive algorithm. State its pro
perties.