WORKSHOP
SPOTTED 2

channel ARRAYS
DATA PROCESSING AND
QUALITY CONTROL
Eugenia Migliavacca and
Mauro Delorenzi,
ISREC, December 11, 2003
AIMS
Discussion
Information
Introduction to the use of the webpage for
automated normalization
interface btw experimentalists and analysts
feedback
resource allocation
Acknowledgments
some slides originally provided by:
Terry Speed (Berkeley / WEHI)
Sandrine Dudoit (
Berkeley
)
Yee Hwa Yang (Berkeley)
Natalie Thorne (WEHI)
Otto Hagenbuechle
Eugenia Migliavacca
Darlene Goldstein
and others
RNA ISOLATION
(AMPLIFICATION)
AND LABELING
WITH FLUORO

DYES
Preparation
Hybridisation
Binding labelled samples (targets) to
complementary probes on a slide
Hybridise
for
5

12 hours
Wash
Mix
Scanning
1
2
Adjust scanner parameters; frequently can adapt:
1. excitation wave (laser) intensity
2. "gain" (amplification) of the photon detection system
1
2
Human 10K
cDNA Array
How to extract data ?
How to recognize problems ?
Part of the image of one channel false

coloured on a white (v. high)
red (high)
through yellow and
green
(med
ium
) to
blue (low)
and black scale.
Scanner's Spots
RNA preparation and Labeling
Data for further analysis
Slide scanning
Hybridisation
Image analysis
Normalization
Steps of a Microarray Experiment
Why perform an experiment ?
What is the aim ?
Which conclusions do you want to reach ?
first: DESIGN !
mRNA abundance
rRNA 80%
tRNA
tRNA
tRNA
mRNA 1%
1

50
50

500
500+
approx. 300'000 mRNA Molecules/cell
approx. 10

20'000 different genes
What do you want to measure ?
RNA mass
different
in different cells
Relative vs Absolute changes
200'000 mRNA Molecules/cell
200 for gene X (0.1%)
400'000 mRNA Molecules/cell
400 for gene X (0.1%)
Is gene X differentially expressed ?
RNA preparation and Labeling
Data for further analysis
Slide scanning
Hybridisation
Image analysis
Normalization
R
,
G,
M,
A,
etc
16

bit TIFF files
(Rfg, Rbg)
,
(Gfg, Gbg),
etc
What is needed for high quality data ?
Which are the critical steps ?
Steps of a Microarray Experiment
RNA preparation and Labeling
Data for further analysis
Slide scanning
Hybridisation
Image analysis
Normalization
Adjust / Balance
channels approx.;
avoid saturation
check normalized and
unnormalized data of
exp RNA and of
spiked RNA
Spike

in RNA in known conc.
and ratios
Steps of a Microarray Experiment
Why avoid saturation ?
Why balance channels ?
Why perform "normalization" ?
What to check before and after normalization ?
Why calculate ratios ?
Why calculate log ratios ?
Aim: Gene Expression Data
Gene expression data on
p
genes for
n
samples
Genes
Slides
Gene expression level of gene
5
in slide 4
j
M
=
Log
2
(
Red intensity
/
Green intensity
)
slide 1
slide 2
slide 3
slide 4
slide 5
…
1
0.46
0.30
0.80
1.51
0.90
...
2

0.10
0.49
0.24
0.06
0.46
...
3
0.15
0.74
0.04
0.10
0.20
...
4

0.45

1.03

0.79

0.56

0.32
...
5

0.06
1.06
1.35
1.09

1.09
...
These values are conventionally displayed on a
red
(>0)
yellow (0)
green (<0)
scale.
Objectives for high quality
Important aspects include:
•
Tentatively separating
•
systematic sources of variation ("artefacts"), that bias the
results,
•
from random sources of variation ("noise"), that hide the
truth.
•
Removing the former as well as possible and quantifying the
latter
Only if this is done can we hope to
reach good quality
and
make valid statements about the confidence in the results
Typical Statistical Approach
Measured value
= real value + systematic errors + noise
Corrected value
= real value + noise
•
Analysis of Corrected value =>
(unbiased) CONCLUSIONS
•
Estimation of Noise =>
quality of CONCLUSIONS, statistical significance
(level of confidence) of the conclusions
Image Analysis =>
Rfg ; Rbg
;
Gfg ; Gbg
(
fg = foreground, bg = background.)
For each spot on the slide calculate:
Red intensity = R = Rfg

Rbg
Green intensity = G = Gfg

Gbg
M =
Log
2
(
Red intensity
/
Green intensity
)
Subtraction of background values (additive background model
assuming to be locally constant …)
Sources of background: probe unspecifically sticking on slide,
irregular / dirty slide surface, dust,
and noise / errors) in the scanner measurement
Not included: real cross

hybridisation and unspecific
hybridisation to the probe
Step 1: a) Background Correction
b) Calculation of (log) ratios
Subtraction of background
has shown frequently not to improve the performance:
while making the average of many measurements closer to the true
values (reduced bias or systematic error)
it causes higher variability (lower reproducibility)
Comment to Background Correction
A. High variance

Unbiased Estimator
B. Low variance

Biased Estimator
average
single meas.
A.
High variance

Unbiased Estimator
when you take
many measurements
: the average will be closer to
the true value more frequently
B. Low variance

slightly biased Estimator
when you take
one or a few measurements
: the average will be
closer to the true value more frequently
DAF Microarrays 2002: we preferred no subtraction, should be
re

evaluated with Agilent scanner (and GenePix IAS)
Which is better ?
A reminder on logarithms
A numerical example
M = log R/G = logR

logG
A = ( logR + logG ) /2
Positive controls
(spotted in varying concentrations)
Negative
controls
blanks
Lowess
curve
Step 2: An M vs A (MVA) Plot
Why use an M vs A plot ?
1.
Logs stretch out region we are most interested in.
2.
Can more clearly see features of the data such as intensity
dependent variation, and dye

bias.
3.
Differentially expressed genes more easily identified.
4.
Intuitive interpretation
S1.n. Control Slide: Dye Effect, Spread.
MVA plot: looking at data
Lowess curve
Spot identifier
Normalisation

Median
•
Assumption: Changes roughly symmetric
•
First panel: smooth density of log
2
G and log
2
R.
•
Second panel: M vs A plot with median put to zero
Step 3: Normalisation

global median centering
common median
•
Assumption: changes roughly symmetric at all intensities.
Step 4: Normalisation

lowess

local
median centering
What is this normalization doing?
Local regression
•
Classical (global) regression: draws a
single line
to the entire set of points
•
Local regression: draws a
curve
through
noisy data by
smoothing
•
Lowess
(LOcally WEighted Scatterplot
Smoothing) is a type of local regression
•
Can correct for
both
print

tip and
intensity

dependent bias with
lowess
fits
to the data
within print

tip groups
Local regression illustrated
Lowess line
•
After within slide global lowess normalization.
•
Likely to be a spatial effect.
Print

tip groups
Step 5: Normalisation

spatial corrections
Normalization between groups (ctd)
•
After print

tip
location

and
scale

normalization.
Print

tip groups
normalized values look nice , but .....
Effects of
Location
Normalisati
on
(example)
Before
After
Boxplots of log ratios
by pin group
Lowess lines through points
from each pin group
Identifying sub

array effects
Assumption:
All (print

tip

)groups should have the same spread in M
True ratio is
ij
where i represents different
(print

tip)

groups
and j
represents different spots. Observed is M
ij
, where M
ij
= a
i
* log(
ij
)
Robust estimate of a
i
is
Corrected values are calculated as:
Taking varying scale into account
Step 6: Rescaling (Spread

Normalisation)
Illustration: print

tip

group

Normalisation
Assumption: For every print group:
changes roughly symmetric
at all intensities
.
Glass Slide
Array of bound cDNA probes
4x4 blocks = 16 pin groups
Which normalization to use?
Case 1:
A few genes that are likely to change and / or a random large
collection of genes
(expect as many up as down):
Each slide per se:
–
Location: print

tip

group
lowess normalization
.
–
Scale: for all print

tip

groups, adjust MAD to equal the geometric
mean for MAD for all print

tip

groups.
Case 2:
Non

random gene collection and / or many genes do change
appreciably
:
–
USE
DYE

SWAP APPROACH
–
Self

normalization: take the difference of the two log

ratios.
–
Check using controls or known information.
MVA plots: what to look at ?
How to use the spikes ?
Points:
signal intensity
background
saturation
homogeneity , normalizability
problem diagnosis
Webpage
How to use the plots ?
Use of the different options
Quality control before normalization (?)
Choice of normalization
Comments 0
Log in to post a comment