MeV

throneharshΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

100 εμφανίσεις


1

Gene Cluster Analysis with MeV.


Luca Zammataro

luca.zammataro@iit.it



Aim of practice:


1.

Analyze gene expression datasets by means of cluster analysis algorithms

2.

Individuate common trends among regulated
genes, in a particular experimental
condition.



Background

The ratios of fluorescence on a microarray must be optically analyzed to determine whether
there has been repression or induction. The fluorescence of both colors (red and green) at each
spot is
quantified by an image scanner. Recall that red represents the control mRNA, and green
represents the experimental mRNA. Each spot is then given a ratio of
green
:
red
, which tells us
whether that particular gene was produced in
greater quantities
(induced)

or produced in
smaller quantities (repressed)

in comparison to the baseline amount of expression. This
expression ratio is subsequently divided to give a decimal value and then converted to a
logarithmic (base 2) scale.



The resulting data can

be converted into images in order to quickly assess repression or
induction visually.
Green and Red represent the changes in expression and not the initial
fluorescence from mRNA hybridization
. A typical scale would look something like this, where
a blac
k spot on the array would indicate that equal amounts of red and green fluorescence were
observed on the original spot, thereby giving an equal expression ratio of 1:1, and subsequently a
logarithmic value of 0.




In this example we will use the
The
Pearson correlation coefficient t
o clusterize our genes. The
Pearson correlation is a similarity metric, whose values vary from
-
1 (perfect anticorrelation) to
+1 (perfect correlation). High correlation values thus indicate strong correspondences between
e
xpression profiles. However, the hierarchical clustering algorithm requires a distance matrix,
where high values indicate strong differences between two objects (expression profiles).
Pearson's correlation can be transformed into a distance metric by subtr
acting from 1.


distPearson = 1


corPearson


The
Pearson distance

varies from 0 (perfect correlation) to 2 (perfect anti
-
correlation).


By retracing the order in which the genes were progressively joined into clusters and by knowing
the correlation value
of each step, you can map out which genes are related to each other closely
and which genes are related only distantly. This is best represented graphically, as shown by the
following hypothetical diagram.



2






MeV (Multiple Array Viewer)

MeV is a
desktop application for the analysis, visualization and data
-
mining of large
-
scale
genomic data. It is a versatile microarray tool, incorporating sophisticated algorithms for
clustering, visualization, classification, statistical analysis and biological th
eme discovery.

http://www.tm4.org/



1.

We will start uploading an expression data set, with all normalizad data, and with
background subraction. All the data have to be represented in Log2 values. (The tutorial
file is “GS
E5099
-
GPL97
-
Complete_Cleaned_median.txt”, which represent a matrix with
adjusted values from the Eset.txt file and 250TopList.txt file derived from GEO2R)

Always take in consideration the groups that you have created during statistical
evaluation using GEO
2R as in the case of this example:




G0

G1

G2

G3

G4





2.

After uploading, the file (it appears as a spreadsheet) follow the guidline (click on the
first value representing the starting point of the matrix you want to process.
Then click
on “load”)




3





3.

From the “Display” menu, sßet the “Color Scheme” to
-
1, 0 1, to rescale all values in a
green/red range in which the lower values correspond to green and the upper to red.
The midpoint value is black (zero)



























4





4.

Now,
choose “K
-
means/Median Clustering” from Clustering algorithm menu:






5.

The KMC menu gives you the possibility to choose Distance Metrics:

Choose Pearson Correlation, uncheck the “Sample Tree” checkbox and check the
box for the construction of “Hierarchic
al Trees” for our exercise, and then the
click on OK.







5




Finally you can browse clusters, studying differences or common behaviours among selected
genes, across various experiental conditions offered by the microarray.
Save the analysis using
the
menu.












6

Questions:


1.

Identify clusters in which all genes are upregulated and clusters in which genes are all
downregulated in a particular condition.

a.




2.

Choosing the “biological function” from the “Display menu”, individuate selected cluster
in
which are almost two or more genes having similar functions among these:


a.

Transport (i.e.)

i.

cluster 1: SLC30A4, SLC16A6

ii.

cluster 4: TNPO1, SLC41A2

iii.

Cluster 5: SLC41A2; SLC38A5
..


Complete the exercise…

b.

immune response

c.

skeletal system development

d.

regulation of

cell growth
.

e.

cell adhesion

f.

signal transduction

g.

apoptosis

h.

Transcriptio/regulation of transcription