CS 395T: Computational Statistics with Application to Bioinformatics

underlingbuddhaΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

74 εμφανίσεις

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

1

4th IMPRS Astronomy Summer School

Drawing Astrophysical Inferences from Data Sets

William H. Press

The University of Texas at Austin


Lecture 2




The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

2


Many, though not all, common distributions look sort
-
of like this:


Suppose we want to
summarize

p(x)

by a single number
a
, its “center”. Let’s find the value
a

that minimizes the
mean
-
square distance of the “typical” value
x
:

We already saw the beta distribution with
a
,
b

> 0 as an example
on the interval [0,1]. We’ll see more examples soon.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

3

e

x

p

e

c

t

a

t

i

o

n

n

o

t

a

t

i

o

n

:

h

a

n

y

t

h

i

n

g

i

´

R

x

(

a

n

y

t

h

i

n

g

)

p

(

x

)

d

x

e

x

p

e

c

t

a

t

i

o

n

i

s

l

i

n

e

a

r

,

e

t

c

.

This is the variance Var(x),
but all we care about here is
that it doesn’t depend on a.

T

h

e

m

i

n

i

m

u

m

i

s

o

b

v

i

o

u

s

l

y

a

=

h

x

i

.

(

T

a

k

e

d

e

r

i

v

a

t

i

v

e

w

r

t

a

a

n

d

s

e

t

t

o

z

e

r

o

i

f

y

o

u

l

i

k

e

m

e

c

h

a

n

i

c

a

l

c

a

l

c

u

l

a

-

t

i

o

n

s

.

)

m

i

n

i

m

i

z

e

:

¢

2

´


(

x

¡

a

)

2

®

=


x

2

¡

2

a

x

+

a

2

®

=

(


x

2

®

¡

h

x

i

2

)

+

(

h

x

i

¡

a

)

2

(in physics this is called the “parallel axis theorem”)

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

4

Higher moments, centered moments defined by

¹

i

´


x

i

®

=

R

x

i

p

(

x

)

d

x

M

i

´


(

x

¡

h

x

i

)

i

®

=

R

(

x

¡

h

x

i

)

i

p

(

x

)

d

x

But generally wise to be cautious about using high moments.

Otherwise perfectly good distributions don’t have them at all
(divergent). And (related) it can take a
lot

of data to measure
them accurately.

Third and fourth moments also have “names”

The centered second moment
M
2
, the variance, is by far
most useful

M

2

´

V

a

r

(

x

)

´


(

x

¡

h

x

i

)

2

®

=


x

2

®

¡

h

x

i

2

¾

(

x

)

´

p

V

a

r

(

x

)

“standard deviation” summarizes a distribution’s half
-
width
(r.m.s. deviation from the mean)

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

5

Certain combinations of higher moments are also additive. These
are called semi
-
invariants.

Mean and variance are additive over independent random variables:

note “bar” notation, equivalent to < >

Skew and kurtosis are dimensionless combinations of semi
-
invariants

A Gaussian has all of its semi
-
invariants higher than
I
2

equal to zero.

A Poisson distribution has all of its semi
-
invariants equal to its mean.


The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

6

This is a good time to review some standard (i.e., frequently occurring) distributions:

Normal (Gaussian):

Cauchy (Lorentzian):

tails fall off “as fast as possible”

tails fall off “as slowly as possible”

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

7

Student:

“bell shaped” but you get to specify the power with
which the tails fall off. Normal and Cauchy are
limiting cases. (Also occurs in some statistical tests.)

we’ll see uses for “heavy
-
tailed” distributions later

note that
s

is not (quite) the standard deviation!

“Student” was actually William Sealy Gosset (1876
-
1937), who spent
his entire career at the Guinness brewery in Dublin, where he rose to
become the company’s Master Brewer.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

8

Common distributions on positive real line:

Exponential:

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

9

Lognormal:

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

10

Gamma distribution:


Gamma and Lognormal are both commonly used as convenient 2
-
parameter fitting functions for “peak with tail” positive distributions.


Both have parameters for peak location and width.


Neither has a separate parameter for how the tail decays.


Gamma: exponential decay


Lognormal: long
-
tailed (exponential of square of log)

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

11

Chi
-
square distribution (we’ll use this a lot!)

Has only one parameter
n
that determines both peak location and width.

n

is often an integer, called “number of degrees of freedom” or “DF”

the independent variable is
c
2
, not
c

It’s actually just a special case of Gamma,

namely Gamma(
n
/2,1/2)

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

12

A deviate from is called a t
-
value.

is exactly the distribution of the
sum of the squares of
n

t
-
values
.

Let’s prove the case of
n
=1:

p

Y

(

y

)

d

y

=

2

p

X

(

x

)

d

x

S

o

,

p

Y

(

y

)

=

y

¡

1

=

2

p

X

(

y

1

=

2

)

=

1

p

2

¼

y

e

¡

1

2

y

Why this will be important: We will know the distribution of any “statistic”
that is the sum of a known number of t
2
-
values.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

13

The characteristic function of a distribution is its
Fourier transform.

Á

X

(

t

)

´

Z

1

¡

1

e

i

t

x

p

X

(

x

)

d

x

(Statisticians often use notational convention that
X

is a random
variable,

x

its value,
p
X
(x)

its distribution.)

Á

X

(

0

)

=

1

Á

0

X

(

0

)

=

Z

i

x

p

X

(

x

)

d

x

=

i

¹

¡

Á

0

0

X

(

0

)

=

Z

x

2

p

X

(

x

)

d

x

=

¾

2

+

¹

2

So, the coefficients of the Taylor series expansion of the
characteristic function are the (uncentered) moments.

Characteristic function of a distribution

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

14

l

e

t

S

=

X

+

Y

p

S

(

s

)

=

Z

p

X

(

u

)

p

Y

(

s

¡

u

)

d

u

Á

S

(

t

)

=

Á

X

(

t

)

Á

Y

(

t

)

Addition of independent r.v.’s:

(Fourier convolution theorem.)

Á

a

X

(

t

)

=

Z

e

i

t

x

p

a

X

(

x

)

d

x

=

Z

e

i

t

x

1

a

p

X

³

x

a

´

d

x

=

Z

e

i

(

a

t

)

(

x

=

a

)

p

X

³

x

a

´

d

x

a

=

Á

X

(

a

t

)

Scaling law for characteristic functions:

Scaling law for r.v.’s:

Properties of characteristic functions:

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

15

Á

X

(

t

)

´

Z

1

¡

1

e

i

t

x

p

X

(

x

)

d

x

Proof of convolution theorem:

Fourier transform pair

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

16

What’s the characteristic function of a Gaussian?

Tell Mathematica that sig is positive.
Otherwise it gives “cases” when taking
the square root of sig^2

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

17

What’s the characteristic function of ?

Since we already proved that
n
=1 is the
distribution of a single t
2
-
value, this proves that
the general
n

case is the sum of
n

t
2
-
values.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

IMPRS Summer School 2009, Prof. William H. Press

18

Cauchy distribution has ill
-
defined mean and infinite variance, but it has a
perfectly good characteristic function:

Matlab and Mathematica both sadly fails at computing the
characteristic function of the Cauchy distribution, but you
can use fancier methods* and get:

Á

C

a

u

c

h

y

(

t

)

=

e

i

¹

t

¡

¾

j

t

j

note non
-
analytic at t=0

*If t>0, close the contour in the upper 1/2
-
plane with a big semi
-
circle, which adds nothing. So the
integral is just the residue at the pole (x
-
m
)/
s
=i, which gives exp(
-
s
t). Similarly, close the contour in the
lower 1/2
-
plane for t<0, giving exp(
s
t). So answer is exp(
-
|
s
t|). The factor exp(i
m
t) comes from the
change of x variable to x
-
m
.