Gene Expression
1
Gene Expression
•
Methods
–
Unsupervised Clustering
•
Hierarchical clustering
•
K

means clustering
•
Expression data
–
GEO
–
UCSC
•
EPCLUST
2
Microarray

Reminder
3
Expression Data Matrix
•
Each
column
represents all the gene expression levels from a
single experiment.
•
Each
row
represents the expression of a gene across all
experiments.
Exp
1
Exp
2
Exp
3
Exp
4
Exp
5
Exp
6
䝥Ge
1

1⸲

2⸱

3

1⸵
1.8
2.9
䝥Ge
2
2.7
0.2

1⸱
1.6

2⸲

1⸷
䝥Ge
3

2⸵
1.5

0⸱

1⸱

1
0.1
䝥Ge
4
2.9
2.6
2.5

2⸳

0⸱

2⸳
Gene
5
0.1
2.6
2.2
2.7

2⸱
Gene
6

2⸹

1⸹

2⸴

0⸱

1⸹
2.9
4
Expression Data Matrix
Each element is a log ratio: log
2
(T/R).
T

the gene expression level in the testing sample
R

the gene expression level in the reference sample
Exp
1
Exp
2
Exp
3
Exp
4
Exp
5
Exp
6
䝥Ge
1

1⸲

2⸱

3

1⸵
1.8
2.9
䝥Ge
2
2.7
0.2

1⸱
1.6

2⸲

1⸷
䝥Ge
3

2⸵
1.5

0⸱

1⸱

1
0.1
䝥Ge
4
2.9
2.6
2.5

2⸳

0⸱

2⸳
Gene
5
0.1
2.6
2.2
2.7

2⸱
Gene
6

2⸹

1⸹

2⸴

0⸱

1⸹
2.9
5
Microarray Data Matrix
Black indicates a log
ratio of zero, i.e.
T=~R
Green indicates a
negative log ratio,
i.e. T<R
Red indicates a positive log
ratio, i.e. T>R
Grey indicates missing data
6
4
3
2
1
0
1
2
3
4
1
2
3
4
5
6
Exp
Log ratio
Exp
Log ratio
Microarray Data:
Different representations
T<R
T>R
7
Microarray Data:
Clusters
8
How to determine the similarity between two
genes
? (for clustering)
Patrik D'haeseleer,
How does gene expression clustering work?,
Nature Biotechnology
23
,
1499

1501
(
2005
)
,
http://www.nature.com/nbt/journal/v
23
/n
12
/full/nbt
1205

1499
.html
9
Microarray Data: Clustering
Hierarchical Clustering
10
Hierarchical Clustering
: genes with similar expression patterns are grouped
together and are connected by a series of branches (dendrogram).
Microarray Data: Clustering
1
6
3
5
2
4
1
6
3
5
2
4
11
Leaves (the shapes in our case) represent genes and the length of the paths
between leaves represents the distances between genes. Similar genes lie
within the same sub

trees.
12
If we want a certain number of clusters we need to cut the tree
at a level indicates that number (in this case

four).
Hierarchical clustering finds an entire hierarchy of clusters.
Hierarchical clustering result
13
Five clusters
Microarray Data: Clustering
K

mean clustering
is an algorithm to classify the data into K
number of groups.
14
K=
4
Microarray Data: Clustering
How?
15
The algorithm divides iteratively the genes into K groups and calculates
the center of each group. The results are the optimal groups (center
distances) for K clusters.
1
2
3
4
k
initial "means" (in
this case
k
=
3
) are
randomly selected
from the data set
(shown in color).
k
clusters are created by
associating every
observation with the
nearest mean
The centroid
of each
of the
k
clusters
becomes the new
means.
Steps
2
and
3
are repeated
until convergence has
been reached.
16
Different types of clustering
–
different results
17
How to search for expression profiles
•
GEO
(Gene
Expression
Omnibus)
http://www.ncbi.nlm.nih.gov/geo/
•
Human genome browser
http://genome.ucsc.edu/
Like Series, but further
curated and suitable for
analysis with GEO tools
Expression profiles
by gene
Microarray
experiments
Probe sets
Groups of related microarray
experiments
18
Searching for expression profiles in the GEO
Download dataset
Clustering
Statistic
analysis
19
20
The expression distribution for different lines in the cluster
21
Searching for expression profiles in the Human
Genome browser.
22
Keratine
10
is
highly expressed
in skin
23
24
What can we do with all the expression
profiles?
Clusters!
How?
EPCLUST
http://www.bioinf.ebc.ee/EP/EP/EPCLUST/
25
26
27
28
29
Edit the input matrix:
Transpose,Normalize,Randomize
30
Hierarchical clustering
K

means clustering
In the input matrix each column should represents a gene
and each row should represent an experiment (or
individual).
Graphical
representation of the
cluster
Graphical
representation of the
cluster
Samples found in cluster
31
32
Initial seeds
Final seeds
10
clusters, as
requested
33
Gene Expression
•
Methods
–
Unsupervised Clustering
•
Hierarchical clustering
•
K

means clustering
•
Expression data
–
GEO
–
UCSC
•
EPCLUST
34
35
9.1
–
last day to decided on a project!
18
,
23
,
24
/
1

Presenting a proposed project in small
groups
A very short presentation (Max
5
minutes)
Title

Background
Main question
Major tools you are planning to use to
answer the questions
6.3
Final submission
FINAL PROJECT

Key dates
Comments 0
Log in to post a comment