: Analyzing Agilent Gene Expression data in GeneSpring GX9 using the VSMC data
Overview of sections and how later sections depend upon earlier ones.
Importing Data into the GeneSpring GX, Preprocessing, and Creating an “Expe
: Import data and create an experiment
Setting up the Experiment (Experiment Setup)
: Define Experiment Parameters and assign parameter values to each sample.
Skip the section on manual entry.
ate experimental interpretations to group replicate samples into conditions
Allows for averaging over replicates
Viewing Expression Data in GeneSpring GX
View expression data in a Profile Plot
Defining the order in which you
want groupings displayed
View expression data in Spreadsheet View
View expression data in Scatterplot View
View data for a single entity
Perform quality control
Exercise 2: Use the hierarchical clustering algorithm to create a condition tree.
The sample data you receive has already been assessed for quality.
Perform quality control on probes (individual entities)
ter for pr
obes with reliable intensity measurements
Assembles QC probes list.
Identifying probes of interest.
These steps are required in order to complete Sections 7 and 8.
Find candidates for differential expression using statistics.
Requires QC probes list.
Saves 3 ANOVA lists
imilar effects of IL1
in artery and vein
Filter probes based on fold
Requires an ANOVA list
Saves FC greater than 1.5 in probes with significant interaction p
Find other genes with similar expression profiles to a target gene.
Clustering Gene Expression Data
clustering algorithm to build an entity and condition tree.
Saves Heirarchical cluster output
Use the K
means clustering algorithm to group probes with similar expression profiles.
Requires FC list
means clustering output
Creates lists of clustered entities
: Perform GO ontology analysis to determine the biological functions of your genes of interest.
Exercises 2 and 3
As you work through the tutorial, remember the biological problem you are atte
mpting to solve and the
You might enjoy working in pairs. That is fine!
Section 1 before we start lecture this morning.
For your reference, the table on p. 18 of the tutorial shows what each sample is.
There are 9
questions for you to answer as you progress through the tutorial this afternoon. See how far you
can get. You are responsible for as far as you can get with diligent use of your time. Type your answers directly
on the worksheet and email to me at the en
d of the period today.
We will stop at ~3:15 for a short lecture and discussion on RNA
Viewing Expression Data
Exercise 1, when you have reached p. 28.
Each line in the 3 profile plots you have just generated represents the data fro
m one entity or spot (=
gene) on the array
s, averaged over the replicate arrays
for a specific condition. In these plots, what is
the significance of the slope of any given line?
Each row represents data at one location (feature,
entity) on the microarray.
Look at the spreadsheet view as described in the tutorial, p. 28. There were 12 samples in the
experiment, but in the spreadsheet view, there are only 4 columns with numerical values.
w would you expect the number of columns with numerical values to change if you change
your interpretation to Tissue Type?
To all samples?
Bottom of page 31.
You selected genes from the scatterplot view that
met specific criteria. You are now viewing the
results with those genes in Profile view. Describe how the Profile view agrees with your
expections given how you selected genes from the scatterplot view.
In general, how does the effect of IL
on these same genes in veins compare to the effect in
Does this result lend support to your hypothesis that some genes respond differently to IL1β in
arteries than in veins?
Page 32, last bullet has an error. Change the settings on the rig
ht in the search box to be identical to
those shown in Figure 24.
Examining a single gene.
Page 35. Does the profile plot for VCAM 1 suggest that this gene should be among those in your entity
list for genes down
regulated in arteries by IL1β?
QC of probes
reason why an entity (probe) might be eliminated from consideration in this step is
because the gene it represents is not expressed in either arterial or venous smooth muscle cells in
either the presence or absence of I
reasons could you think of that would a result that would lead to the elimination you have
In this section you have generated lists of genes that show differential expression acr
and that show changes in expression level of greater than 1.5. In the process, you have made it
possible to eliminate genes that don’t meet these criteria from further consideration. Explain why having
software that performs these steps i
s important to the successful solution of the biological problem
Hierarchical clustering. To make this cluster tree, you used the list of genes that show at least a 1.5 fold
change in expression level.
What 3 samples clustered together at the far right of the tree
? (Move the cursor over the sample
number and compare to the table on p. 18 of the tutorial.)
What is the evidence from the tree that some genes are highly expressed
conditions relative to others, and that other genes are expressed at very low levels under these
conditions relative to other conditions
? (You can see this best if you compress the rows multiple
Is the general effect of IL1
β the same on all genes
If you have followed the instructions in the tutorial to the letter, the samples in
means clusters are displayed from left to right in each cluster in the same order as they are
presented in the table
on p. 18.
in the highest and lowest peaks?
Is this result (a) in agreement with the hierarchical clustering?
If you made a Venn diagram of the 4 clusters, would you expect any overlaps?
No matter how man
y individual gene profiles you look at in any of the 4 clusters, you will never
see a flat line. Explain why.
: Biological Queries
Run the GO analysis on Cluster 1 from K
Means, as described in the tutorial. Then repeat the analysis
with a d
What cluster did you use, and a
re the biological categories the same or