tutorial10_11

sharpfartsΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

79 εμφανίσεις

Gene Expression

1

Gene Expression


Methods


Unsupervised Clustering


Hierarchical clustering


K
-
means clustering


Expression data


GEO


UCSC


EPCLUST







2

Microarray
-

Reminder

3

Expression Data Matrix


Each
column

represents all the gene expression levels from a
single experiment.



Each
row

represents the expression of a gene across all
experiments.

Exp
1

Exp
2

Exp
3

Exp
4

Exp
5

Exp
6

䝥Ge
1

-
1⸲

-
2⸱

-
3

-
1⸵

1.8

2.9

䝥Ge
2

2.7

0.2

-
1⸱

1.6

-
2⸲

-
1⸷

䝥Ge
3

-
2⸵

1.5

-
0⸱

-
1⸱

-
1

0.1

䝥Ge
4

2.9

2.6

2.5

-
2⸳

-
0⸱

-
2⸳

Gene
5

0.1

2.6

2.2

2.7

-
2⸱

Gene
6

-
2⸹

-
1⸹

-
2⸴

-
0⸱

-
1⸹

2.9

4

Expression Data Matrix



Each element is a log ratio: log
2

(T/R).


T
-

the gene expression level in the testing sample


R
-

the gene expression level in the reference sample

Exp
1

Exp
2

Exp
3

Exp
4

Exp
5

Exp
6

䝥Ge
1

-
1⸲

-
2⸱

-
3

-
1⸵

1.8

2.9

䝥Ge
2

2.7

0.2

-
1⸱

1.6

-
2⸲

-
1⸷

䝥Ge
3

-
2⸵

1.5

-
0⸱

-
1⸱

-
1

0.1

䝥Ge
4

2.9

2.6

2.5

-
2⸳

-
0⸱

-
2⸳

Gene
5

0.1

2.6

2.2

2.7

-
2⸱

Gene
6

-
2⸹

-
1⸹

-
2⸴

-
0⸱

-
1⸹

2.9

5

Microarray Data Matrix

Black indicates a log
ratio of zero, i.e.
T=~R

Green indicates a
negative log ratio,
i.e. T<R

Red indicates a positive log
ratio, i.e. T>R

Grey indicates missing data

6

-4
-3
-2
-1
0
1
2
3
4
1
2
3
4
5
6
Exp

Log ratio

Exp

Log ratio

Microarray Data:

Different representations

T<R

T>R

7

Microarray Data:

Clusters

8

How to determine the similarity between two
genes
? (for clustering)


Patrik D'haeseleer,
How does gene expression clustering work?,
Nature Biotechnology

23
,
1499
-

1501
(
2005
)

,

http://www.nature.com/nbt/journal/v
23
/n
12
/full/nbt
1205
-
1499
.html


9

Microarray Data: Clustering

Hierarchical Clustering

10

Hierarchical Clustering
: genes with similar expression patterns are grouped
together and are connected by a series of branches (dendrogram).

Microarray Data: Clustering

1

6

3

5

2

4

1

6

3

5

2

4

11

Leaves (the shapes in our case) represent genes and the length of the paths
between leaves represents the distances between genes. Similar genes lie
within the same sub
-
trees.

12

If we want a certain number of clusters we need to cut the tree
at a level indicates that number (in this case
-

four).

Hierarchical clustering finds an entire hierarchy of clusters.

Hierarchical clustering result

13

Five clusters

Microarray Data: Clustering

K
-
mean clustering

is an algorithm to classify the data into K
number of groups.


14

K=
4


Microarray Data: Clustering

How?

15

The algorithm divides iteratively the genes into K groups and calculates
the center of each group. The results are the optimal groups (center
distances) for K clusters.

1

2

3

4

k

initial "means" (in
this case
k
=
3
) are
randomly selected
from the data set
(shown in color).

k

clusters are created by
associating every
observation with the
nearest mean

The centroid

of each
of the

k
clusters
becomes the new
means.

Steps
2
and
3
are repeated
until convergence has
been reached.

16

Different types of clustering


different results

17

How to search for expression profiles



GEO
(Gene

Expression

Omnibus)


http://www.ncbi.nlm.nih.gov/geo/





Human genome browser


http://genome.ucsc.edu/

Like Series, but further
curated and suitable for
analysis with GEO tools

Expression profiles
by gene

Microarray
experiments

Probe sets

Groups of related microarray
experiments

18

Searching for expression profiles in the GEO


Download dataset

Clustering

Statistic
analysis

19

20

The expression distribution for different lines in the cluster

21

Searching for expression profiles in the Human
Genome browser.

22

Keratine
10
is
highly expressed
in skin

23

24

What can we do with all the expression

profiles?

Clusters!

How?

EPCLUST

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

25

26

27

28

29

Edit the input matrix:
Transpose,Normalize,Randomize

30

Hierarchical clustering

K
-
means clustering



In the input matrix each column should represents a gene
and each row should represent an experiment (or
individual).

Graphical
representation of the
cluster

Graphical
representation of the
cluster

Samples found in cluster

31

32

Initial seeds

Final seeds

10
clusters, as
requested

33

Gene Expression


Methods


Unsupervised Clustering


Hierarchical clustering


K
-
means clustering


Expression data


GEO


UCSC


EPCLUST







34

35


9.1


last day to decided on a project!


18
,
23
,
24
/
1
-

Presenting a proposed project in small
groups

A very short presentation (Max
5
minutes)


Title
-

Background


Main question


Major tools you are planning to use to
answer the questions


6.3

Final submission








FINAL PROJECT
-

Key dates