J-Express Pro Practicals

naivenorthΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

84 εμφανίσεις

J
-
Express Practicals 2


1

J
-
Express Pro Practicals

PART 2


Data an
aly
sis using J
-
Express




P
reface

The dataset
in this practical microarray
exercise

is a subset from a study of the malaria
transcriptome by Derisi et. Al, where the process of the Intraerythrocytic
Development Cycl
e has been monitored each hour for the complete cycle (ca 48 h).
The paper and dataset can be found on the website
http://malaria.ucsf.edu/


Title:


The Transcriptome

of the Intraerythrocytic
Developmental Cycle

o
f Plasmodium falciparum

Zbynek Bozdech, Manuel Llina, Brian Lee Pulliam, Edith D. Wong,
Jingchun Zhu, Joseph L. DeRisi



T
he parasite enters a
Red blood cell in the ring stage and
multiplies until it reaches the merozoites stage where it exits
to infect ne
w cells. This takes about 48 hours. In the practical we have used every 5
hour
.


Normally you will know a lot more about the organism and data being analyzed than you do
in this session. If you look at the paper referenced above you will probably get a few

surprises.




A Few components used frequently in this text:




This is a Combobox. Click it to change
the value.



This is a tab


All buttons have a tool
-
tip. Hold the mouse pointer over a button to see
what it does.

J
-
Express Practicals 2


2


Step 0 Downloading data


F
or the Pathway analysis, you need the raw kegg data files. Th
e
s
e

can be downloaded
from
ftp://ftp.genome.ad.jp/pub/kegg/pathways/pfa/

(only the pfa_gene_map.tab ,
pfa_synonym, .gif and .conf fi
les are needed)
and must be placed in your PW folder
under J
-
Express in a new folder called pfa.(the complete directory for the pathway
files downloaded from keg
g

should be “<J
-
Express folder>/PW/pfa”)

REMEMBER THAT TH
E KEGG

DATA IS LICENSE PROTECTED FOR
C
OMMERCIAL USERS AND THAT YOU MAY NEED TO PURCHASE A
LICENSE IF YOU ARE IN A COMMERCIAL ENVIRONMENT. PLEASE READ
MORE AT
http://www.genome.ad.jp/kegg/kegg5.html
. If this is a problem, you can
skip the

step involving pathway analysis.


Two other files are needed in this tutorial:


annotation.txt
for annotation mapping.


plasmodiumGO
for Gene Ontology mapping
.
Pu
t this file in the J
-
Express

/go/goassociations folder
.


Both files can be downloaded from

the molmine learning page (www.molmine.com)


Step 1 Importing and preparing the data

1.

Open J
-
Express Pro

2.

Choose
Load Project

from the
File

menu

3.

Select the Project File from
Practicals 1

(
Malaria1
.pro
)


If you do not have this file, look for malaria
1
.pro in

yo
ur data
folder or download from:

http://www.molmine.com

-

microarray software
-

J
-
Express
tutorials and education




Step 2 Data Clustering and analysis


4.

Select the dataset called “SpotPix Data”

5.

Under the Methods
menu, choose
Annotation manager (
IDLinker
)


6.

Click the
Add annotation

tab

7.

Click

and locate the file annotation.txt

(if you can not find this file, download from:


http://www.molmine.com

-

microarray software
-

J
-
Express
tutorials and education

J
-
Express Practicals 2


3


)

8.

Click the
Autoset

button to automatically find a key column in the
dataset that map to a key column in the dataset.

9.

Click once ion colum
n 3 and column 4 so that they turn green.




10.

Click
Create Mapping

and put into dataset

11.

Click the
Current
A
nnotation

tab

12.

In column E and F, rename
annotation 3

to
Gene
Product

and
annotation 4

to
Kegg ID

by double clicking in th
e cells.

13.

Close the
IDLinker


We have now added
two new

Identifier Column
s
with
gene product
information and keg IDs
. The id file could be any tab
-
delimited file
with identifiers as long as there
is

a column that link
s

the rows in the
dataset to the rows in

the text
-
file.
(
se
e

the user help for more
information about the
Annotation manager
)
.



14.

Click the filter button (
)
or select
filter data set

from the
Data
set
menu.

15.

Ch
eck the
Min total distance from y=0.0.
Set the value to 1.5 and
set the distance measure to
Euclidean

J
-
Express Practicals 2


4

16.

Click
Try filter

to see how many genes passing the filter

(ca
4
7
00
)
.




17.

Click
Create

DataSet


In this analysis we are only int
erested in genes
with
a significant
expression change.


18.

Close the
filtering

dialog

19.

Select the
SpotPix Data

Filtered

dataset in the Project window

20.

Click the
Line Chart

button or select
Gene Graph

from the
M
ethods

menu

21.

Move the line chart to the right of the

screen

22.

Click the
Search and sort

button

or select from the
Data
S
et

menu

23.

Select the
columns
(comma delimited)

radiobutton and set
3

into
the textfield to only search in
column
5

(where we inserted the
gene products)

24.

In the Search Phrase field, type
.*
ribosomal protein
.*

(remember
the dots and stars). This is a regular expression that will hit any
product that contains
ribosomal protein
.
Click
Search

25.

In the Line
C
hart wind
ow, click the
Shadow Unselected
button

26.

In the search and sort window a
t the bottom
, click the red button
with a star

(
asterisk
)
. This selects all rows with a hit.

View the
selection in the line chart.


You can now see only the genes inv
olved in
ribosomal protein
.
Is there
any similarity in the expression profiles?

Why?


J
-
Express Practicals 2


5




27.

Try Searching for something else

28.

Close the search and sort window



Hierarchical Clustering


29.

Click the
Hierarchical clustering

button

or
select from the
M
ethods

menu

30.

Choose the
Manhattan

Distance measure and
turn off the
cluster
columns

checkbox. C
lick
OK

in the clustering dialog

31.

Wait for the clustering to finish and view the dendrogram

(tree)
.

32.

Select a sub
-
tree by clicking a branch
.


J
-
Express Practicals 2


6



33.

In the Zoomed dendrogram, click and select all the rows in the
right table.

34.

Take a look at the gene graph window, this now shows the new
selection





J
-
Express Practicals 2


7

The
p
lot shows only the selected genes


35.

Click the
Create Group of Dendrogram

in the
hierarchical
clustering

window

36.

Call the group
dendrogram

and click ok in the “create group”
dialog.

37.

Close the
dendrogram

window


K
-
Means Clustering


38.

Click the
K
-
Means Clustering

button

or select from
methods

39.

Use the default settings (16 clusters, max 200 iterations, forgy
initialization and Euclidean distance), Click
OK




J
-
Express Practicals 2


8



(this displays all clusters resulting from the k
-
means cluster)


40.

Click th
e
Show all Profiles

button

41.

On the J
-
Express
main butto
n bar, select the
Group Controller

42.

Click the color box in the
Dendrogram

group and select a red
color

43.

Click
Update All Components

44.

In the K
-
Means
window
, click the
Toggle group Colors

button



You ca
n now see if the hierarchical clustering produced the same
cluster as the k
-
means clustering. Are all the red profiles in the
same cluster?


45.

In the Group Controller window, uncheck the
All Indexes

group
and press
update all components
. Look at the k
-
means
clustering
window.

J
-
Express Practicals 2


9




46.

Turn on
A
ll Indexes

again.

47.

In the K
-
Means window,
dou
b
le
-
c
lick on one of the clusters

with a
significant expression change and few or none red profiles

48.

Click on the new tab
(cl.
[
X
]
)

49.

C
lick the
Shadow Unsele
cted
button

50.

Select different rows in the
right
table to see individual expression
profiles




51.

Select all the rows and click the
Create Group(s)

button
. Call
the new group kmeans, give it a

green color (click the color
rectangle) and click ok in the create group dialog.

J
-
Express Practicals 2


10

52.

Close the
k
-
means clustering

window


Principal Component Analysis



53.

Click the
Principal
C
omponent
A
nalysis

(
)
button or select from
methods menu

54.

Locate the red and green spots.


Each spot is an expression profile projection. S
imilar profiles will
be close in the scatter plot


55.

Point on some spots to see ident
ifiers
. Click them to show label in
the plot.

56.

In the
PCA

menu, uncheck the
show
T
ooltip

box

57.

In the
PCA

menu, select
show location thumb

58.

Point on some spots to see

the

profiles

59.

Close the location thumb

60.

In the
PCA

menu, select Show principal components

61.

Click

the
Shadow Unselected
button

62.

Click the first and second item in the list


These are the most significant principal components in the dataset,
with these two, 75
-
80 % of the variance can
be explained (for this
dataset!)


63.

Close the line
chart window

64.

Right
-
click the PCA scatter plot

65.

In the
Chart area

combobox, select
Density Map.

In the density
map box, set the
paint threshold (%)

to 10 and click
OK
,
Clo
se the
properties dialog


The plot will now only show outlier profiles




J
-
Express Practicals 2


11



66.

Click the Frame contents to line chart button

67.

Press the
m
ouse

b
utton somewhere on the plot and drag to create a
frame selection, select some (30
-
70) spots

68.

Go back to the PCA plot by clicking the
PCA

tab

69.

Select all the spots

70.

Click the ne
w cluster (cluster2)

71.

Click the
Zoomed 1

tab

72.

Click the
Branch Dataset

button


You can now continue the analysis on the selected subset

(outside
the most dense area in the PCA plot)


73.

Close the PCA window

74.

Select the new dataset (Branched) in the J
-
Express P
roject tree

75.

Click the
Principal
C
omponent
A
nalysis

(
)
button
again
or
select from methods menu


How is this plot different from previo
us plot?

Sometimes you are
only l
ooking for profiles that differ

significantly from all the
others.


76.

Click the
3D PCA plot

button

J
-
Express Practicals 2


12

77.

Click the 3D tab and drag the mouse inside the plot to rotate


With 3 dimensions
, you get more of the variance




78.

Close the
PCA plot

window


Self Organizing Maps


79.

Select the
SpotPix Data filtered

dataset in the J
-
Express Project
tree.

80.

Click the Self
-
Organizing Map button

or select from the
Methods menu

81.

Select 25 clusters and c
lick
OK


The data is now clustered in a similar way as k
-
means clustering. This
method is way faster than the k
-
means clustering and
the clusters are
ordered
.
See how neighbouring clusters are s
imilar.


82.

Click one of the clusters

83.

Select the tab for the new cluster
(SW
0

CL.
[
X
]
)

84.

Select all rows in the list and click the
Create Group(s)

button

85.

Call the new group SOM and give it a blue color

86.

View
the
group controller

again

(the wi
ndow saying
groups
).

J
-
Express Practicals 2


13



87.

On the thumbnails menu, click the
all mean value cell view



This is a classical SOM view and can be used to see how similar
samples are in regard to mean SOM cell expression.

T
his view are
J
-
Express Practicals 2


14

however better suited to find similarity in gene expression between
classes of samples, such as tumor samples versus normal samples,
and not between steps in a time series.

88.

Close the
SOM

window


Using external data


If you do not have the KEGG

or Gene Ont
o
logy
data installed,
jump to

Chromosome viewer



Gene Ontology Mapping

89.

Click the
Gene Ontology Mapping

(
) button or select

from the
methods menu

90.

In the
mapping File

combobox, select PlasmodiumGO.

91.

In the
DataSet identifier
combobox, select Kegg ID.

92.

Click
Map DataSet
.

93.

If you do not have a gene graph viewer open,
Click the
Line Chart

button or select
Gene Graph

from the
M
ethods

menu.

94.

Move the Gene ontology window and the Gene Graph window so
that you can see both at the same time.

95.

Make sure the “shadow unselected” is on in the genegraph viewer
(you can do this by selecting a few genes in the genegraph and see
if the unselected ge
nes are shown in the background. If not click
the
shadow unselected

button).

96.

Open the different Gene Ontology Objects (by double clicking or
clicking the open handle) and browse the trees. See how the gene
graph profiles change as you select a node (if the

number of genes
in the object are more than 1 and less than 100).
Is

there any co
-
regulation associated with the Gene Ontology Term?

J
-
Express Practicals 2


15




KEGG Pathway

Mapping

97.

Click the
Open Pathways Analysis

(
)
button or choose
Pathway
analysis

from the Methods menu.

98.

Set the
Pathway Set

to P.falciparum and the
Dataset Name ID
column

to
Kegg ID

99.

Click
OK



J
-
Express Practicals 2


16



100.

Scrol
l down the pathway list and find a cluster with similar
expression profiles and more than 7 members (the red number to
the left in the cluster component)


You can
members of a pathway by clicking the chart. The members
will appear as a selection in other w
indows such as the gene graph


101.

Click the
View

Pathway

button on
the selected

pathway

102.

Resize the window and click some of the items in the table to see
where in the pathway they are (you can select multiple rows)

The
selected genes are shown in the chart by

a yellow dashed line

(red,
green or blue if it is a member of a group)
.


You can use this component to find clusters of co
-
expressed genes
sharing the same pathway
. This can give you a clue on why they
are co
-
expressed.


J
-
Express Practicals 2


17




103.

Poi
nt on a circle or frame in the pathway so it becomes yellow
and click the mouse.

This should take you to an external database

104.

Close the
p
athway

window

105.

In the pathways window
Click the score groups

106.

Are there some statistical
relation
s between your gr
oups (f
rom
the dendrogram and SOM

analysis) that
coincide

with the pathway
groups?




Not in this case
..


107.

Close all the
pathway analysis

windows and the
S
core

window

108.

View the
Group Controller

again

J
-
Express Practicals 2


18

109.

Click
on the row
(Not the

“ALL” group)
with the most count
(most members)

110.

Right
-
Click the selected and choose
show group in table

111.

You can now see the expression values

112.

Select the top 10
-
20 in this table

113.

On the J
-
Express main menu

under raw data
, select
View Image
Spots

from the
Ra
w Data menu

114.

Click
Get Spots

from selection


and let the component finish..

(this will not work if the links to the raw datafiles are missing)


You can now see the spots that created these values. This is a good
quality control




115.

Close the SpotView window

116.

Select all rows in the
Group: <group>

window

(you selected top
10
-
20 in
a
step
above
)


Chromosome viewer


117.

On the J
-
Express main Toolbar, Click the
Open Chromosome
View

118.

Select the Plasmodium_falciparum item, Right
-
Click it and

select
Set Selected Folder

119.

Set the
Use Info Column

Combobox to
6

120.

Click
Find selected genes in selected folder

121.

Scroll down the list
.

A
re some of the chromosomes
overrepresented in your group?

J
-
Express Practicals 2


19




122.

Double
-
click one of the rows wi
th a hit in this chromosome





123.

Scale the view by dragging the slider. Zoom in on areas in the
chromosome by dragging the mouse over the chart.

This is a hit

J
-
Express Practicals 2


20

124.

Click on a gene with a hit (blue or red box) to get details

125.

Close the
Chromosome vie
w

windows

126.

Click the
similarity search

button

or select
find most similar

from the
methods

menu

127.

Select a gene in the left table and slide the tolerance slider to ca
3%
.

128.

Select a

different
gene.





129.

Click the create profile button
.

130.

D
rag some of the green squares
up
and click the
Perform Search

button.




131.

Continue on your own.

Use the help menu to get info on the
various methods. Good luck




J
-
Express Practicals 2


21