Spotfire DecisionSite for Functional Genomics

overratedbeltΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

92 εμφανίσεις

Microarray analysis course

Spotfire: getting started

Judith Boer


1

Spotfire DecisionSite for Functional Genomics


Getting started: basic gene expression data analysis

Introduction

Spotfire DecisionSite for Functional Genomics is a powerful analytic application that
helps you organize, explore, and understand the results o
f large gene expression
experiments. It lets you access and merge expression results and current biological
annotation information. Spotfire supports direct import of GenePix and Affymetrix data,
with access to quality control information. Several preconfi
gured guides make it simple
to quickly perform common tasks in gene expression analysis such as generate heat maps
with hierarchical clustering and generate cluster groups using K
-
means clustering. Other
analysis tools include treatment comparison to compa
re two or more groups for one
variable. New columns can be added from calculations on expression data columns.

Availability

Spotfire runs on a SARA server at the University of Amsterdam. Spotfire licenses are
available via the BioASP. For more information,

please contact Han Rauwerda,
University of Amsterdam (
rauwerda@science.uva.nl
) or Steven Naarding, Spotfire
(
steven.naarding@spotfire.com
).

Example datase
t

As an example, we will use part of the
DMD

data described by
Pescatori

et al. (
FASEB J
2007
,
21:1210
-
26
). The data set consists of rma
-
normalized Affymetrix
hg133a
expression data of
33

muscle

samples from
2

different
groups
:
19 Duchenne muscular
dystrop
hy (DMD) patients and 14 controls.
In principle, all Affymetrix probesets could
be used (22K),

here we use a filtered set of 1663

to shorten the calculation time.

Objectives

The objective of this exercise is to get started with Spotfire. You will get to
know some
basic functions of the program, including:



How to import data using copy and paste



How to identify genes that distinguish between groups of samples using ANOVA



How to mark a subset of interesting genes



How to visualize the expression of the selec
ted genes using hierarchical clustering

The aim of this introduction is not to explain clustering algorithms, that part is covered in
the Clustering practical on Day 2.

Preconditions



Spotfire DecisionSite 7.2 or higher is running



Spotfire DecisionSite for
Functional Genomics is running in the Navigator
display.

Microarray analysis course

Spotfire: getting started

Judith Boer


2

Open dataset in Excel

1. Open
DMD_data_filtered_forSpotfire
.xls in MS Excel. The dataset is structured as
follows:




Column descriptions:



Column

A: Gene_Id, unique ProbeSet Id



Column B: Gene_Name,

gene symbol and gene name (when known)



Column C: Accession number



Column D: Gene symbol



Column E: Cytoband



Columns F
-
AL: rma
-
normalized intensities for
1663

probesets

in
14 Control and
19 DMD muscle

samples.

Copy & Paste data from Excel to Spotfire

2. In
Excel, select the whole worksheet. Then
Edit > Copy
.


3. Launch Spotfire, and select
Edit > Paste
. Spotfire has five panels: Guides and Tools
on the left, Query Devices and Details
-
on
-
Demand on the right, and visualizations in the
middle. Your screen shoul
d look like this:


Microarray analysis course

Spotfire: getting started

Judith Boer


3


Set up visualization

Spotfire has many visualization tools, such as scatter plots, heatmaps and profile charts, which
can be displayed simultaneously. Genes selected in one view are highlighted in all views.


4. Create new 2D scatter p
lots by selecting different samples under the axes tabs. Try
both within group and between group comparisons.


5. Decrease the data points in the scatter plot by right clicking on the plot, then go to
Trellis... > Markers > Size

and move the bar to Min.


6
. Create an additional scatter plot (
Visualizations > New Scatter Plot
).


7. Show both scatter plots by activating the auto tile function
Window > Auto Tile
.


8. Try moving the mouse over a marker (gene)


this will highlight in all views.


9. Try clicking

a marker to view the details in the Details
-
on
-
Demand display.



Microarray analysis course

Spotfire: getting started

Judith Boer


4




10. If you wish to view the data table, use
Visualization > New Table
.

Microarray analysis course

Spotfire: getting started

Judith Boer


5

Identify genes that
are differentially expressed in DMD

The t
-
test/ANOVA tool compares the means within the groups

with the total mean value. The
results are obtained by testing the null hypothesis, e.g., the hypothesis that all the mean values of
the
two

groups are equal. The p
-
value is the observed significance level, the lower the p
-
value,
the more significant the
difference.


11. Click the Treatment Comparison tool (
Data

> Pattern Detection > Treatment
Comparison

in
the
menu
).


12. Create
two

New Groups, Rename to
DMD and Control
.


13. Select
DMD

under
Grouped Value Columns
. Select all
DMD

Value Columns

by
using th
e Shift and/or Control keys.


14. Move the
DMD

samples into the
DMD

group by clicking
Add >>
.




Microarray analysis course

Spotfire: getting started

Judith Boer


6

15.
Repeat steps 13 and 14 for the Control samples
. Work with
All Records
, and replace
empty values with
Empty value

(this dataset does not have empty values
).


16. Under
Calculation options
, we use t
-
test/ANOVA.


17. Optional: get an overview of the t
-
test/ANOVA method and algorithms under
Help
.


18. Click
OK

to start the t
-
test
.


19. A new profile chart is generated and one new column is added to the data ta
ble.

Visualization of diff
erential gene expression using c
lustering

To analyze the result, we will select the 40 most significant genes and view their expression
pattern in the data set using different exploratory tools.


20. Open the
Table

and scroll to t
he
t
-
test

column.


21. Sort the p
-
values by clicking on the column header.


22. Select the top 40 genes by using Shift and the Arrow Down keys. The status bar at the
bottom of the screen shows how many genes have been marked.


23.
Edit > New Column > From
Marked Records
. Name the new column "Top40".
Label for marked records "Yes". Label for unmarked records "No".


24. Scroll down in the
Query Devices

pane on the upper right to the Top40 column.
Untick the box for "No". Now only the 40 selected genes are vis
ible.


25. In the
Guides

pane on the upper left, choose
Group genes using hierarchical
clustering
.
Continue without normalization
, since the data has been normalized
already.


26. Perform
Hierarchical clustering

on all sample
Value columns

using the defaul
t
settings. Click on the bullet
Selected records

to cluster only the 40 selected genes.


Microarray analysis course

Spotfire: getting started

Judith Boer


7



27. Let the heatmap colors as they are.
Continue
. You get a heatmap with dendrograms.


28
. Maximize the Hierarchical Clustering window. The data have been clustere
d in two
dimensions: the columns represent the
muscle

samples, and the dendrogram (or tree) on
top of the heatmap shows the similarity between the different samples. The rows
represent the 40 selected genes, on the left their cluster tree is shown. Red is
relative
upregulation, green is relative downregulation.


29
. Right
-
click on the sample names at the bottom and select
Label Orientation >
Vertical
. Move the heatmap up a bit to show full sample names. Right
-
click on the
samples again and select
More Label
s

until all labels are shown.


30
. You can now verify that the hierarchical clustering algorithm using the Top40
differentially expressed genes indeed separated the data set into the
DMD and control
groups
.


Microarray analysis course

Spotfire: getting started

Judith Boer


8



31. Next in the guide, you can add gene id an
d annotation
from data in my dataset
.
Select the appropriate columns.
Add annotations
. Note that the genes are highlighted in
the Gene Annotation window when you mouse over the Hierarchical Clustering window,
and vice versa.


Microarray analysis course

Spotfire: getting started

Judith Boer


9

32. Continuing with the guide
, you can use the gene dendrogram on the left of the
heatmap to group genes into distinct clusters.
Enter number of clusters
: 7.




33. In the Gene Profile plots, genes showing the same profile over the
two

classes are
grouped together. On the horizontal
axis, the
muscle

samples are shown in their original
order (improve the labels, as above). The gene expressions are shown on the vertical axis.

Microarray analysis course

Spotfire: getting started

Judith Boer


10

34. Using AutoTile, you can view the Gene Profile plots together with the Hierarchical
Cluster and the Gene An
notation.




35. Finally, the guide lets you do a Principal Component Analysis of the 40 selected
genes. If all data points appear green in the 3D plot, go to
Edit > Marked Record(s) >
Reset

to unmark them. Hold
Ctrl

and use the
right mouse

to turn the 3D

plot!

Microarray analysis course

Spotfire: getting started

Judith Boer


11


Microarray analysis course

Spotfire: getting started

Judith Boer


12

Adding genes to a list in the Portfolio

You may now store the 40 genes that have the lowest p
-
values in the portfolio. The portfolio is
persistent across Spotfire sessions, which means that you can use the list to identify these genes
when you op
en a new dataset.


36. Show the Portfolio (click
Portfolio

in the
Tools

pane).


37. Mark the 40 genes with the lowest p
-
values in the ANOVA test. Some ways you may
do this:

a.

Use the Top40 Yes/No filter.

b.

Use the sorted table and the up/down arrow keys.

c.

Draw
a rectangle around the 40 genes, for instance in the PCA plot.


38. Verify in the status bar that you marked 40 genes.





39. In the portfolio, click the button
Add new list from marked
.





40. Call the list "top40 DMD
".





Microarray analysis course

Spotfire: getting started

Judith Boer


13

Data export and reporti
ng

You can save the entire session as an *.sfs file that can be opened again in Spotfire. In addition,
Spotfire has several functions to export your analyzed data and graphs and use them in other
programs.


41. To export your data table including the new c
olumns, click on
File > Export > Data
.
Give your file a name. The file type is *.csv and can be opened in Excel.


42. To present visualizations in a PowerPoint presentation, a Word file, or a web page,
click on the appropriate link in
Tools pane > Reportin
g
.


43. Visualizations can also be exported as *.bmp, *.jpg, *.png, or *.emf files under
File >
Export > Current Visualization
.