Bioinformatics Tools

dasypygalstockingsΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

96 εμφανίσεις

From motif search to
gene expression analysis

P[ED]XK[RW][RK]X[ED]

Protein Motifs

Protein motifs are usually 6
-
20 amino acids long and

can be represented as a consensus/profile:

or as PWM

Protein Domains


In additional to protein short motifs, proteins are
characterized by
Domains.


Domains are long motifs (30
-
100 aa) and are
considered as the building blocks of proteins
(evolutionary modules).




The zinc
-
finger domain

Some domains can be found in many proteins
with different functions:

….while other domains are only
found in proteins with a certain
function…..

MBD= M
ethylated DNA
B
inding
D
omain

Varieties of protein domains

Page 228

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Pfam

> Database that contains a large collection of
multiple sequence alignments of protein domains


Based on

Profile hidden Markov Models (HMMs).


HMM in comparison to PWM is a model

which considers dependencies between the

different columns in the matrix (different
residues) and is thus much more powerful!!!!

http://pfam.sanger.ac.uk/

Profile HMM (Hidden Markov Model)

can accurately represent a MSA

D16

D17

D18

D19

M16

M17

M18

M19

I16

I19

I18

I17

100%

100%

100%

100%

D 0.8

S 0.2

P 0.4

R 0.6

T 1.0

R 0.4

S 0.6

X

X

X

X

50%

50%

D R T R

D R T S

S
-

-

S

S P T R

D R T R

D P T S

D
-

-

S

D
-

-

S

D
-

-

S

D
-

-

R


16 17 18

19

Match

delete

insert

Gene Expression Analysis

Gene Expression

10

protein

RNA

DNA

Gene Expression

11

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

mRNA gene1

mRNA gene
2

mRNA gene3

Studying Gene Expression

1987
-
2011

12


Spotted microarray (first high throughput gene expression experiments)


DNA chips

RNA
-
seq (Next Generation Sequencing)

Classical versus modern technologies
to study gene expression



13

Classical Methods
(Spotted microarray, DNA chips)

-
Require prior knowledge on the RNA transcript

Good for studying the expression of known genes


New generation RNA sequencing

-
Do not require prior knowledge

Good for discovering new transcripts

14

1.
Spotted Microarray


Two channel cDNA microarrays.

2.
DNA Chips


One channel microarrays



(
Affymetrix, Agilent),



Classical Methods

http://www.bio.davidson.edu/courses/genomics/chip/chip.html


15

16

Experimental Protocol

Two channel cDNA arrays


1.
Design an experiment


(probe design)

2
. Extract RNA molecules from cell

3.
Label molecules with fluorescent dye

4.
Pour solution onto microarray


Then wash off excess molecules

5
. Shine laser light onto array


Scan for presence of fluorescent
dye


6
.

Analyze the microarray image

17

Cy3

Cy
5

Cy
5

Cy
3

Cy
5

log
2

Cy
3

The ratio of expression is indicated by the intensity of the color

Red= High

mRNA abundance in the experiment sample

Green= High mRNA abundance in the control sample

Transforming raw data to ratio of expression

18

One channel DNA chips



Each sequence is represented by a probe set colored with
one fluorescent dye


Target hybridizes to complimentary probes only


The fluorescence intensity is indicative of the

expression of the target sequence

19

Affymetrix Chip

RNA
-
seq

20

21

Clustering genes according to their
expression profiles

.

Genes

Experiments

NEXT…

22

WHY?

What can we learn from the
clusterers?


Identify gene function


Similar expression can infer similar function


Diagnostics and Therapy


Different genes expression can indicate a disease
state


Genes which change expression in a disease can be
good candidates for drug targets


23

HOW?

Different clustering approaches


Unsupervised



-
Hierarchical Clustering


-
Partition Methods



K
-
means



Supervised Methods


-
Analysis of variance


-
Discriminant analysis


-
Support Vector Machine (SVM)


Clustering

Clustering organizes things that are
close

into groups.


-

What does it mean for two genes to be close?


-

Once we know this, how do we define groups?


What does it mean for two genes
to be close?

25

We need a mathematical definition of distance between the

expression of two genes

Gene
1

Gene 2

Gene
1
= (
E
11
, E
12
, …, E
1
N
)’

Gene
2
= (
E
21
, E
22
, …, E
2
N
)’

For example distance between gene
1
and
2

Euclidean distance= Sqrt of Sum of (
E
1
i
-
E
2
i
)
2
, i=
1
,…,N

Once we know this, how do we define
groups?




26

Michael Eisen,
1998

:

Generate a tree based on similarity

(similar to a phylogenetic tree)

Each gene is a leaf on the tree

Distances reflect similarity of expression

Hierarchical Clustering

Genes

Experiments

Gene Cluster

Internal nodes

represent different

functional

Groups (A, B, C, D, E)

One genes may belong

to more than one cluster


genes

28

Clusters can be presented by graphs

29

What can we learn from clusters
with similar gene expression ??

30

0
500
1000
1500
2000
2500
3000
3500
4000
Pancreas
bonemarrow
WHOLEBLO…
adrenalgland
Ovary
Uterus
Prostate
testis
Heart
Lung
Liver
SkeletalMus…
SmoothMuscle
salivarygland
skin
Thyroid
Tonsil
trachea
kidney
WholeBrain
HNRPA
1

Pancreas
bonemarrow
WHOLEBLOOD
adrenalgland
Ovary
Uterus
Prostate
testis
Heart
Lung
Liver
SkeletalMuscle
SmoothMuscle
salivarygland
skin
Thyroid
Tonsil
trachea
kidney
WholeBrain
SRp
40

EXAMPLE
-


hnRNP A1 and SRp40

HnRNPA
1
and SRp
40
are not clear homologs based on
blast e
-
value but
have a very similar gene expression
pattern in different tissues

31

Are hnRNP A
1
and SRp
40
functionally homologs ??

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SRP
40

hnRNP A
1

YES!!!!

32

What can we learn from clusters
with similar gene expression ??


Similar expression between genes


The genes have similar function


One gene controls the other


All genes are controlled by a common regulatory genes

33

How can we use microarray for
diagnostics?


Gene
-
Expression Profiles in

Hereditary Breast Cancer



Breast tumors studied:


BRCA
1


BRCA
2

sporadic tumors




Log
-
ratios measurements of

3226
genes for each tumor

after initial data filtering

cDNA Microarrays

Parallel Gene Expression Analysis


RESEARCH QUESTION

Can we distinguish
BRCA
1
from
BRCA
2


cancers based solely on their
gene expression profiles?


35

How can microarrays be used as a basis for diagnostic ?



Patient
1

patient
2

patient
3

patient
4

patient
5

Gen1

+

-

-

+

+

Gen2

+

+

-

+

-

Gen3

-

+

+

+

-

Gen4

+

+

+

-

-

Gen5

-

-

+

-

+

5
Breast Cancer Patient

36

How can microarrays be used as a basis

for diagnostic ?


patinet
1

patient
2

patient
4

patient
3

patient
5

Gen1

+

-

+

-

+

Gen3

-

+

+

+

-

Gen4

+

+

-

+

-

Gen2

+

+

+

-

-

Gen5

-

-

-

+

+

Informative

Genes

BRCA
1

BRCA
2

37

Specific Examples

Cancer Research

Ramaswamy et al,
2003

Nat Genet
33
:
49
-
54


Hundreds of genes

that differentiate between

cancer tissues in different

stages of the tumor were found.

The arrow shows an example

of a tumor cells which

were not detected correctly by

histological or other

clinical parameters.





38

Supervised approaches

for predicting gene function based on microarray data


SVM

would begin with a set of genes that have a
common function (red dots), In addition, a
separate set of genes that are known not to be
members of the functional class (blue dots) are
specified.



S
upport
V
ector
M
achine

39


Using this training set, an
SVM

would learn to
differentiate between the members and non
-
members of a
given functional class based on expression data.





Having learned the expression features of the class, the
SVM

could recognize new genes as members or as non
-
members of
the class based on their expression data.


?

40

Using SVMs to diagnose tumors based on

expression data

Each dot represents a vector of the expression pattern taken
from a microarray experiment . For example the expression
pattern of all genes from a cancer patients.

41

How do SVM’s work with expression data?

In this example red dots can be primary tumors and blue are

from metastasis stage.

The SVM is trained on data which was classified based on
histology.



?

After training the SVM we can use it to diagnose the unknown tumor.

Projects
2012
-
13


Key dates

13.12
lists of suggested projects published
*

*You are highly encouraged
to choose a project yourself or find
a relevant project which can help in your research


22.1
Submission project overview
(one page)

-
Title

-
Main question

-
Major Tools you are planning to use to answer the questions


Final week


meetings on projects

12.3
Poster submission

20.3
Poster presentation




Instructions for the final project

Introduction to Bioinformatics
2012
-
13


2
. Planning your research

After you have

described the main question or questions of your
project, you should carefully plan your next steps

A. Make sure you understand the problem and read the necessary
background to proceed

B. formulate your working plan, step by step

C. After you have a plan, start from extracting the necessary data and
decide on the relevant tools to use at the first step.

When running a tool make sure to summarize the results and extract
the relevant information you need to answer your question, it is
recommended to save the raw data for your records , don't present
raw data in your final project.

Your initial results should guide you towards your next steps.

D. When you feel you explored all tools you can apply to answer your
question you should summarize and get to conclusions. Remember NO
is also an answer as long as you are sure it is NO. Also remember this is
a course project not only a HW exercise.

.

3.
Summarizing final project in a poster (in pairs)

Prepare in PPT poster size
90
-
120
cm

Title of the project



Names and affiliation of the students presenting


The poster should include
5
sections :

Background

should include description of your question (can add
figure)

Goal and Research Plan
:

Describe the main objective and the research plan

Results (main section)

: Present your results in
3
-
4
figures, describe
each figure (figure legends) and give a title to each result

Conclusions

: summarized in points the conclusions of your project

References

: List the references of paper/databases/tools used for
your project

Examples of posters will be presented in class