basicx

tastelesscowcreekΒιοτεχνολογία

4 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

80 εμφανίσεις

Basic statistics

European Molecular Biology Laboratory

Predoc

Bioinformatics Course

17
th

Nov 2009


Tim Massingham, tim.massingham@ebi.ac.uk


Introduction

Basic statistics


What is a statistical test


Count data and Simpson’s paradox


A Lady drinking tea and statistical power


Thailand HIV vaccine trial and missing data


Correlation

Nonparametric statistics


Robustness and efficiency


Paired data


Grouped data

Multiple testing adjustments


Family Wise Error Rate and simple corrections


More powerful corrections


False Discovery Rate

What is a statistic

Anything that can be measured

Calculated from measurements

Branches of statistics


Frequentist


Neo
-
Fisherian


Bayesian


Lies


Damn lies

(and each of these can be split further)


If your experiment needs statistics, you ought to have done a better
experiment.
” Ernest Rutherford


Statistical thinking will one day be as necessary for efficient
citizenship as the ability to read and write.

H.G.Wells


He uses statistics as a drunken man uses lamp
-
posts, for support
rather than illumination.
” Andrew Lang (+others)

Classical statistics

Repeating the experiment

1,000 times

10,000 times

100 times

Initial experiment

Imagine repeating experiment, some variation in statistic

Frequentist

inference

Repeated experiments at heart of
frequentist

thinking

Have a “null hypothesis”


Null distribution


What distribution of statistic would
look like if we could repeatedly sample

Likely

Unlikely

Compare actual statistic to null distribution

Classical statistics

Finding statistics for which the null distribution is known

Anatomy of a Statistical Test

Value of statistic

Density of null distribution

P
-
value of test

Probability of observing statistic or something more extreme

Equal to area under the curve

Density measures relative probability

Total area under curve equals exactly one

Some antiquated jargon

Critical values

Dates back to when we used tables

Old way

Calculate statistic

Look up critical values for test

Report “significant at 99% level”

Or “rejected null hypothesis at
….”

Size

0.05

0.01

0.001

0.0001

Critical

value

1.64

2.33

3.09

3.72

Pre
-
calculated values (standard normal distribution)

Power

Power

Probability of correctly rejecting null hypothesis (for a given size)

Null distribution

An alternative distribution

Power changes with size

Reject null

Accept null

Confidence intervals

Use same construction to
generate confidence intervals

Confidence interval = region which excludes unlikely values

For the null distribution, the “confidence interval” is the re
gion

which
we accept the null hypothesis.

The tails where we reject the null hypothesis are the critical region

Count data

Count data

(Almost) the simplest form of data we can work with

Each experiment gives us a discrete outcome

Have some “null” hypothesis about what to expect

Yellow

Black

Red

Green

Orange

Observed

19

27

15

22

17

Expected

20

20

20

20

20

Example: Are all jelly babies equally liked by PhD students?

Chi
-
squared goodness of fit test

Yellow

Black

Red

Green

Orange

Observed

19

27

15

22

17

Expected

20

20

20

20

20

Summarize results and expectations in a table

jelly_baby

<
-

c
( 19, 27, 15, 22, 17)

expected_jelly

<
-

c(20,20,20,20,20)

chisq.test
(
jelly_baby
,
p
=
expected_jelly
,
rescale.p
=TRUE )



Chi
-
squared test for given probabilities


data:
jelly_baby


X
-
squared = 4.4,
df

= 4,
p
-
value = 0.3546


pchisq(4.4,4,lower.tail=FALSE)

[1] 0.3546

Chi
-
squared goodness of fit test

jelly_baby

<
-

c
( 19, 27, 15, 22, 17)

expected_jelly

<
-

c(20,20,20,20,20)

chisq.test
(
jelly_baby
,
p
=
expected_jelly
,
rescale.p
=TRUE )



Chi
-
squared test for given probabilities


data:
jelly_baby


X
-
squared = 4.4,
df

= 4,
p
-
value = 0.3546

What’s this?

Yellow

Black

Red

Green

Orange

Observed

19

27

15

22

17

19

27

15

22

?

How much do we need to know to reconstruct table?


Number of samples


Any four of the observations


or equivalent, ratios for example

More complex models

Specifying the null hypothesis entirely in advance is very restrictive

Yellow

Black

Red

Green

Orange

Observed

19

27

15

22

17

Expected

20

20

20

20

20

4
df

0
df

Allowed expected models that have some features from data


e.g. Red : Green ratio

Each feature is one degree of freedom

Yellow

Black

Red

Green

Orange

Observed

19

27

15

22

17

Expected

20

20

16.2

23.8

20

16.2:23.8 = 15:22


16.2+23.8 = 40

4
df

1
df

Example: Chargaff’s parity rule

Chargaff’s 2
nd

Parity Rule

In a single strand of
dsDNA
, %A≈%T and %C≈%G

A

C

G

T

2021445

1308204

1285029

2056142

Helicobacter pylori

From data, %AT = 61% %CG=39%

Apply Chargaff to get %{A,C,G,T}

A

C

G

T

Proportion

0.305

0.194

0.194

0.305

Number

2038794

1296616

1296616

2038794

Null hypothesis has one degree of freedom

Alt. hypothesis has three degrees of freedom

Difference: two degrees of freedom

Contingency tables

Pie

Fruit

Custard

a

c

Ice
-
cream

b

d

Observe two variables in pairs
-

is there a relationship between them?

Silly example: is there a relationship desserts and toppings?

A test of row / column independence

Real example

McDonald
-
Kreitman

test
-

Drosophila ADH locus mutations

Between

Within

Nonsynonymous

7

2

Synonymous

17

42

Contingency tables

A contingency table is a chi
-
squared test in disguise

Pie

Fruit

Custard


p

Ice
-
cream


(1
-
p)

q

(1
-
q)

1

Null hypothesis: rows and columns are independent

Multiply probabilities

Pie &
Custard

Pie & Ice
-
cream

Fruit &
Custard

Fruit &

Ice
-
cream

Observed

a

b

c

d

Expected

n

p

q

n

(1
-
p)
q

n

p

(1
-
q)

n

(1
-
p)(1
-
q)


p

q


p

(1
-
q)

(1
-
p)
q

(1
-
p)(1
-
q)

×

n

×

n

Contingency tables

Pie &
Custard

Pie & Ice
-
cream

Fruit &
Custard

Fruit &

Ice
-
cream

Observed

a

b

c

n

-

a
-

b

-

c

Expected

n

p

q

n

(1
-
p)
q

n

p

(1
-
q)

n

(1
-
p)(1
-
q)

Observed:

three degrees of freedom

(a,
b

&
c
)

Expected:

two degrees of freedom

(
p

&
q
)

In general, for a table with
r

rows and
c

columns

Observed:

r

c

-

1



degrees of freedom

Expected:

(r
-
1) + (c
-
1)


degrees of freedom

Difference:

(r
-
1)(c
-
1)


degrees of freedom

Chi
-
squared test with one degree of freedom

Bisphenol

A

Bisphenol

A is an environmental
estrogen

monomer

Used to manufacture polycarbonate plastics



lining for food cans



dental sealants



food packaging

Many
in vivo

studies on whether safe: could polymer break down?

Is the result of the study independent of who performed it?

F
vom

Saal

and C Hughes (2005)
An Extensive New Literature Concerning Low
-
Dose Effects of
Bisphenol

A Shows the Need for a New Risk Assessment.
Environmental Health Perspectives
113
(8):928

Harmful

Non
-
harmful

Government

94

10

Industry

0

11

Bisphenol

A

Harmful

Non
-
harmful

Government

94

10

90.4%

Industry

0

11

9.6%

81.7%

18.2%

115

Observed table

Harmful

Non
-
harmful

Government

85.0

19.0

90.4%

Industry


9.0


2.0

9.6%

81.7%

18.2%

Expected table

E.g. 0.817
×

0.904
×
115 =
85.0

Chi
-
squared statistic = 48.6. Test with 1
d.f
.

p
-
value = 3.205e
-
12

Bisphenol

A

Association measure

Discovered that we have dependence

How strong is it?

Coefficient of association

for 2
×
2 table

Chi
-
squared statistic

Number of observations

Number between 0 = independent to 1 = complete dependence

For the
Bisphenol

A study data

Should test really be one degree of freedom?

Reasonable to assume that government / industry randomly assigned?

Perhaps null model only has one degree of freedom

pchisq(48.5589,
df
=1,
lower.tail
=FALSE)

[1] 3.205164e
-
12

pchisq(48.5589,
df
=2,
lower.tail
=FALSE)

[1] 2.854755e
-
11

Simpson’s paradox

Famous example of Simpson’s paradox

C
R
Charig
, D R Webb, S R Payne, and J E
Wickham
(1986) Comparison
of treatment of renal calculi by open
surgery,
percutaneous

nephrolithotomy
, and extracorporeal shockwave
lithotripsy. BMJ 292:
879

882

Compare two treatments for kidney stones



open surgery



percutaneous

nephrolithotomy

(surgery through a small puncture)

Success

Fail

open

273

77

350

perc
.
neph
.

289

61

350

562

138

700

78% success

83% success

Percutaneous

nephrolithotomy

appears better

(but not significantly so,
p
-
value 0.15)

Simpson’s paradox

Success

Fail

open

81

6

87

perc
.
neph
.

234

36

270

315

42

357

Success

Fail

open

192

71

263

perc
.
neph
.

55

25

80

247

96

343

Small kidney stones

Large kidney stones

Missed a clinically relevant factor: the size of the stones

The order of treatments is reversed

93% success

87% success

p
-
value 0.15

73% success

69% success

p
-
value 0.55

Combined

Open



78%

Perc
.
neph
.

83%

Simpson’s paradox

Success

Fail

open

81

6

87

prec.
neph
.

234

36

270

315

42

357

Success

Fail

open

192

71

263

prec.
neph
.

55

25

80

247

96

343

Small kidney stones

Large kidney stones

What’s happened?

Small

Large

Failure of randomisation (actually an observational study)

Small and large stones have a different prognosis (
p
-
value<1.2e
-
7)

88% success total

72% success total

Open

Prec.
neph

93% success

87% success

73% success

69% success

A Lady Tasting Tea

The Lady Tasting Tea

“A LADY declares that by tasting a cup of tea made with milk she can discriminate
whether the milk or the tea infusion was first added to the cup


Our experiment consists in mixing eight cups of tea, four in one way and four in the other,
and presenting them to the subject for judgment in a random order. The subject has been
told in advance of what the test will consist, namely that she will be asked to taste eight
cups, that these shall be four of each kind, and that they shall be presented to her in a
random order.”

Fisher, R. A. (1956) Mathematics of a Lady Tasting Tea



Eight cups of tea



Exactly four one way and four the other



The subject knows there are four of each



The order is randomised

Guess tea

Guess milk

Tea first

?

?

4

Milk first

?

?

4

4

4

8

Fisher’s exact test

Guess tea

Guess milk

Tea first

?

?

4

Milk first

?

?

4

4

4

8

Looks like a Chi
-
squared test

But the experiment design fixes the marginal totals



Eight cups of tea



Exactly four one way and four the other



The subject knows there are four of each



The order is randomised

Fisher’s exact test gives exact
p
-
values with fixed marginal totals

Often incorrectly used when marginal totals not known

Sided
-
ness

Not interested if she can’t tell the difference

Guess tea

Guess milk

Tea first

4

0

4

Milk first

0

4

4

4

4

8

Guess tea

Guess milk

Tea first

0

4

4

Milk first

4

0

4

4

4

8

Two possible ways of being significant, only interested in one

Exactly right

Exactly wrong

Sided
-
ness

tea <
-

rbind
( c(0,4) , c(4,0) )

tea


[,1] [,2]

[1,] 0 4

[2,] 4 0

fisher.test(tea)$p.value

[1] 0.02857143

fisher.test(tea,alternative
="
greater")$p.value

[1] 1

fisher.test(tea,alternative
="
less")$p.value

[1] 0.01428571

More correct

More wrong

Only interested in significantly greater


Just use area in one tail

Statistical power

Are eight cups of tea enough?

Guess tea

Guess milk

Tea first

4

0

4

Milk first

0

4

4

4

4

8

A perfect score,
p
-
value
0.01429

Guess tea

Guess milk

Tea first

3

1

4

Milk first

1

3

4

4

4

8

Better than chance,
p
-
value
0.2429


Statistical power

Guess tea

Guess milk

Tea first

p

n

(1
-
p)
n

n

Milk first

(1
-
p)
n

p

n

n

n

n

2n

Assume the lady correctly guesses proportion
p

of the time

4

6

8

10

12

14

16

18

20

70%

0.39

0.31

0.25

0.20

0.17

0.14

0.12

0.11

0.09

80%

0.25

0.16

0.11

0.07

0.05

0.04

0.03

0.02

0.01

90%

0.12

0.06

0.02

0.01

0.007

0.003

0.001

8e
-
5

4e
-
5

To investigate



simulate 10,000 experiments



calculate
p
-
value for experiment



take mean

Thailand HIV vaccine trials

Thailand HIV vaccine trial

News story from end of September 2009

“Phase III HIV trial in Thailand shows positive outcome”


16,402 heterosexual volunteers tested every six months for three years

Sero

+
ve

Sero

-
ve

Control

51

8197

Vaccine

74

8198

fisher.test(hiv
)



Fisher's Exact Test for Count Data


data:
hiv


p
-
value = 0.04784

alternative hypothesis: true odds ratio is not equal to 1

95 percent confidence interval:


0.4721409 0.9994766

sample estimates:

odds ratio


0.689292

16,395

Rerks
-
Ngarm
,
Pitisuttithum

et al. (2009)
New England Journal of Medicine 10.1056/NEJMoa0908492

Thailand HIV vaccine trial

"Oh my God, it's amazing. "



"The significance that has been established in this trial is that there is a 5% chance

that this is a fluke. So we are 95% certain that what we are seeing is real and not

down to pure chance. And that's great."

Significant? Would you publish? Should you publish?

Study was randomized (double
-
blinded)

Deals with many possible complications



male / female balance between arms of trial



high / low risk life styles



volunteers who weren’t honest about their sexuality
(or changed their mind mid
-
trial)



genetic variability in population (e.g. CCL3L1)



incorrect HIV test / samples mixed
-
up



vaccine improperly administered

Thailand HIV vaccine trial

"Oh my God, it's amazing. "



"The significance that has been established in this trial is that there is a 5% chance

that this is a fluke. So we are 95% certain that what we are seeing is real and not

down to pure chance. And that's great."

Significant? Would you publish? Should you publish?

Initially
p
u
blished

data based on modified Intent To Treat

Intent To Treat


People count as soon as they are enrolled in trial

modified Intent To Treat


Excluded people found to
sero

+
ve

at beginning of trial

Per
-
Protocol


Only count people who completed course of vaccines

A multiple testing issue?

How many unsuccessful HIV vaccine trials have there been?

One or more and these results are toast.

Missing data

Some people go missing during / drop out of trials

This could be informative

E.g. someone finds out they have HIV from another source, stops attending check
-
ups

Double
-
blinded trials help a lot

Extensive follow
-
up, hospital records of death etc

Missing Completely At Random

Missing data completely unrelated to trial

Missing At Random

Missing data can be imputed

Missing Not At Random

Missing data informative about effects in trial

1

2

3

4

5

6

Volunteer

1

-

-

-

-

?

?

Volunteer

2

-

-

+

+

+

+

Correlation

Correlation?

Wikimedia, public domain

http://
commons.wikimedia.org/wiki/File:Correlation_examples.png

Various types of data and their correlation coefficients

Correlation does not imply causation

If A and B are correlated then one or more of the following are true



A causes B



B causes A



A and B have a common cause (might be surprising)

Do pirates cause global warming?

Pirates:

http://
commons.wikimedia.org/wiki/File:PiratesVsTemp_English.jpg

R. Matthews (2001) Storks delivery Babies (
p
=0.008)
Teaching Statistics

22
(2):36
-
38

Pearson correlation coefficient

Standard measure of correlation “Correlation coefficient”

Measure of linear correlation, statistic belongs to [
-
1,+1]



0

independent



1 perfect positive correlation


-
1 perfect negative correlation

cor.test
( gene1, gene2, method="
pearson
")



Pearson's product
-
moment correlation


data: gene1 and gene2

t

= 851.4713,
df

= 22808,
p
-
value < 2.2e
-
16


alternative hypothesis:

true correlation is not equal to 0

95 percent confidence interval:


0.9842311 0.9850229

sample estimates:


cor


0.984632

Measure of correlation

Roughly proportion of variance of gene1
explained by gene2 (or
vice versa
)

Pearson: when things go wrong

Pearson’s correlation test

Correlation =
-
0.05076632(p
-
value =
0.02318)

Correlation =
-
0.06499109
(
p
-
value = 0.003632)

Correlation =
-
0.1011426
(p
-
value = 5.81e
-
06)

Correlation = 0.1204287
(
p
-
value = 6.539e
-
08)

A single observation can change the outcome of many tests

Pearson correlation is sensitive to outliers

200 observations from normal distribution

x

~ normal(0,1)

y

~ normal(1,3)

Spearman’s test

Nonparametric test for correlation


Doesn’t assume data is normal


Insensitive to outliers


Coefficient has roughly same meaning


Replace observations by their ranks

Ranks

Raw expression

cor.test
( gene1, gene2, method="spearman")



Spearman's rank correlation rho


data: gene1 and gene2

S = 30397631415,
p
-
value < 2.2e
-
16


alternative hypothesis:

true rho is not equal to 0

sample estimates:


rho

0.984632

Comparison

Look at gene expression data

cor.test(lge1,lge2,method="
pearson
")



Pearson's product
-
moment correlation


data: lge1 and lge2

t

= 573.0363,
df

= 22808,
p
-
value < 2.2e
-
16


alternative hypothesis:

true correlation is not equal to 0

95 percent confidence interval:


0.9661278 0.9678138

sample estimates:


cor


0.9669814




cor.test(lge1,lge2,method="spearman")



Spearman's rank correlation rho


data: lge1 and lge2

S = 30397631415,
p
-
value < 2.2e
-
16


alternative hypothesis:

true rho is not equal to 0

sample estimates:


rho

0.984632

cor.test(lge1,lge2,method="
kendall
")



Kendall's rank correlation tau


data: lge1 and lge2

z

= 203.0586,
p
-
value < 2.2e
-
16


alternative hypothesis:

true tau is not equal to 0

sample estimates:


tau

0.8974576

Comparison

Spearman and Kendall are scale invariant

Log
-

Log scale

Pearson


0.97

Spearman


0.98

Kendall


0.90



Normal
-

Log scale

Pearson


0.56

Spearman


0.98

Kendall


0.90



Normal
-

Normal scale

Pearson


0.99

Spearman


0.98

Kendall


0.90