Introduction to Bioinformatics
6
.
Statistical Analysis of Gene Expression
Matrices II
Course 341
Department of Computing
Imperial College, London
Moustafa Ghanem
Lecture Overview
Motivation
–
Get a feel for t

values and how they change
Volcano plots
–
Visual method for differential gene expression analysis
–
Meaning of x and y axes
–
Interpretation of results
Interpretation of t

test
The higher the t

value, the lower the p

value, the
more confident you are
Calculating t

test (t statistic)
First
calculate
t
statistic
value
and
then
calculate
p
value
For
the
paired
t

test,
t
is
calculated
using
the
following
formula
:
And
n
is
the
number
of
pairs
being
tested
.
For
an
unpaired
(independent
group)
t

test,
the
following
formula
is
used
:
Where σ
(
x
) is the standard deviation of
x
and
n (
x
) is the number of elements in
x
.
Where
d
is calculated by
Remember these formulae !!
Calculating and Interpreting t

values
Consider the following examples, and assume a paired experiment:
High t

value
Take Gene A, assuming paired test:
For Either type of test
Average Difference is = 100, SD. = 0
t value is near infinity,
p is extremely low
Consider Gene M for a paired experiment
Where
d
is calculated by
Average Difference is = 0
t value is zero, what does this mean?
Consider Gene T for a paired experiment
Where
d
is calculated by
t

value = Signal/Noise ratio
Graphical Interpretation of t

test (Paired)
t
= Mean of differences
S.D. of differences/sqrt(n)
d
1
d
2
d
3
d
4
Value
Sample ID
d =Diff
Sample ID
d
avg
Case2: Moderate Variation around mean of
differences
d
2
d
3
d
4
Value
Sample ID
d =Diff
Sample ID
d
avg
Case1: Low Variation around mean of
differences
d
1
d
2
d
3
d
4
Value
Sample ID
d =Diff
Sample ID
d
avg
Case3: Large Variation around mean of differences
Graphical Interpretation of t

test (Paired)
Back to our problem
5000 Rows
represent
genes
Columns
represent
samples
4 Wild Type samples (Blue)
4 Wild KO samples (Red)
Hypothesis Testing
Uses hypothesis testing methodology.
For each Gene (>5,000)
–
Pose Null Hypothesis (Ho) that gene is not affected
–
Pose Alternative Hypothesis (Ha) that gene is affected
–
Use statistical techniques to calculate the probability of rejecting the
hypothesis (p

value)
–
If p

value < some critical value reject Ho and Accept Ha
The issues:
–
Large number of genes (or experiments)
–
Need quick way to filter out significant genes that have high fold change
–
Need also to sort genes by fold change and significance
Volcano Plots
For each gene
calculate the
significance of
the change
(t

test, p

value)
For each gene
compare the
value of the
effect between
population WT
vs. KO
(fold change)
Identify Genes
with high effect
and high
significance
Volcano Plot
Volcano plots are a graphical means for visualising results of
large numbers of t

tests allowing us to plot both the Effect
and significance of each test in an easy to interpret way
Volcano plots
In a volcano plot:
X

axis represents effect measured as fold
change:
Effect = log(WT)
–
log(KO)
2
2
= log(WT / KO)
2
If WT = WO, Effect Fold Change = 0 , If WT = 2 WO, Effect Fold Change = 1
...
Numerical Interpretation (Effect)
Using log
2
for X axis:
Effect has
doubled
2
1
(2 raised to
the power of 1)
Two Fold
Change
Effect has halved
2
0.5
(2 raised to
the power of 0.5)
Volcano plots
Calculate Significance as
–
log (p_value)
If p = 0.1,

log(0.1) = 1 (1 decimal point)
If p = 0.01,

log (0.01) = 2 (2 decimal points)
...
10
In a volcano plot:
y

axis represents the number of zeroes in the p

value
–
(remember with a p

value of 0.0001, you are more confident than with
a p

value of 0.01
–
This is just a trick so that higher values on the graph are more
important
Numerical Interpretation (Significance)
Using log
10
for
Y axis:
p< 0.1
(1 decimal place)
p< 0.01
(2 decimal places)
Visualise the Result :Volcano Plot
Effect vs.
Significance
Selections of items
that have both a
large effect and are
highly significant can
be identified easily.
Choosing log scales is a matter of
convenience
Effect can be both +ve or

ve
High Effect & Significance
Boring stuff

ve effect
+ve effect
High
Significance
Low
Significance
Summary
t

Test good for small samples (in our case 4 paired observations)
–
t distribution approximates to normal distribution when degrees of
freedom > 30
–
Remember formulae for paired/un

paired
Volcano plot simple method for visualising large sets of such
observations
–
Remember formula for x

axis
–
Remember formula for y

axi
Comments 0
Log in to post a comment