SCT2011.Handout v2 - Duke University

websterhissBiotechnology

Oct 1, 2013 (3 years and 9 months ago)

76 views



Development of m
ultiplex
biom
arkers:

Statistical considerations
with

high

dimensional

d
ata


Workshop Instructors
:

Herbert Pang





Duke University





Durham, NC





herbert.pang@duke.edu









William Barry

Duke

Universit
y





Durham, NC





bill.
barry@duke.edu


Abstract


Many
phase II and III
clinical studies

have begun to incorporate DNA microarrays and
other high throughput biotechnologies
in the discovery and
develop
ment of

molecular
biomarkers in clinical

oncology and other complex diseases.
A
n overview of
some
the
challenges
and advanced topics particular to high
-
dimensional data are discussed
,
includ
ing:


-

Multiple testing considerations in t
he detection of associations with clinical
outcomes

at the gene
-

and pathway
-
level

-

C
l
ass discovery amo
ng

samples/subjects



avoiding the detec
tion of noise and batch
effects

-

Proper validation techniques when
building predictive models of
clinical outcome

-

Methods of quality control and pre
-
processing

for the p
rospective
use
of

microarray
-
based biomarkers


To illustrate standards and solutions for conducting reproducible research, a
ll pre
-
processing, visualization and analysis will be performed using statistical and genomic
packages in R/Bioconductor.


The code
used to generate all
results presented in the
w
orkshop will be made available to the attendees.


Course Instructors:
Drs.
P
ang

and Barry are faculty members in the Department of
Biostatistics and Bioinformatics and the Statistical Center for the Cancer and Leukemia
Group B (CALGB). The CALGB is a multi
-
institutional cooperative group funded by the
National Cancer Institute. The workshop faculty members have experience in designing,
implementing and analyzing
gene expression microarrays in
single and multi
-
institutional
trials in oncology. The faculty wi
ll share their extensive experience with the attendees of
the workshop.




Slides and

R Code used in the Workshop:


Url :
http://www.duke.edu/~dinbarry/SCT2011/


References:


Books for microarray analysis:


1.

R. Simon, E. L. Korn, L .M. McShane, et al.

Desig
n and Analysis of DNA Microarray Investigations.

Springer Verlag, 2005


2.

S. Draghici.

Data Analysis Tools for DNA Microarrays.

Chapman and Hal/CRCl, New York, New York,
1
rst edition, 2003.


3.

R
.

Gentleman, V
.

Carey, W
.

Huber, R
.

Irizarry, and S
.
Dudoit, editor
s.
Bioinformatics and Computational Biology Solutions Using R and Bioconductor.
Springer, 2005


mRNA microarray reviews

and datasets
:


4.

Golub, T.R., et al. Molecular classification of cancer: class discovery and class
prediction by gene expression monitorin
g. Science 286, 531
-
537 (1999).


5.

Miller, L.D., et al. An expression signature for p53 status in human breast cancer
predicts mutation status, transcriptional effects, and patient survival. Proc Natl
Acad Sci U S A 102, 13550
-
13555 (2005).


6.

Barker, A.D., et

al. I
-
SPY 2: an adaptive breast cancer trial design in the setting of
neoadjuvant chemotherapy. Clin Pharmacol Ther 86, 97
-
100 (2009).


7.

Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Quantitative monitoring of
gene expression patterns with a complement
ary DNA microarray. Science 270,
467
-
470 (1995).


Technical platforms and pre
-
processing
:


8.

Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of
normalization methods for high density oligonucleotide array data based on
variance and bias
. Bioinformatics 19, 185
-
193 (2003).


9.

Irizarry, R.A., et al. Exploration, normalization, and summaries of high density
oligonucleotide array probe level data. Biostatistics 4, 249
-
264 (2003).




10.

Owzar, K., Barry, W.T., Jung, S.H., Sohn, I. & George, S.L. Sta
tistical challenges
in preprocessing in microarray experiments in cancer. Clin Cancer Res 14, 5959
-
5966 (2008).


11.

Benito, M., et al. Adjustment of systematic microarray data biases. Bioinformatics
20, 105
-
114 (2004).


12.

Johnson, W.E., Li, C. & Rabinovic, A. A
djusting batch effects in microarray
expression data using empirical Bayes methods. Biostatistics 8, 118
-
127 (2007).


13.

Shi, L., et al. The MicroArray Quality Control (MAQC)
-
II study of common
practices for the development and validation of microarray
-
based
predictive
models. Nat Biotechnol 28, 827
-
838 (2010).


14.

Shi, L.M., et al. The MicroArray Quality Control (MAQC) project shows inter
-

and intraplatform reproducibility of gene expression measurements. Nat
Biotechnol 24, 1151
-
1161 (2006).


Statistical Inferen
ce and Microarrays


15.

Smyth, G.K. Linear models and empirical bayes methods for assessing
differential expression in microarray experiments. Stat Appl Genet Mol Biol 3,
Article3 (2004).


16.

Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microar
rays
applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116
-
5121 (2001).




Multiple Comparisons



17.

Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate
-

a Practical
and Powerful Approach to Multiple Testing. Journal of
the Royal Statistical
Society Series B
-
Methodological 57, 289
-
300 (1995).


18.

Reiner, A., Yekutieli, D. & Benjamini, Y. Identifying differentially expressed
genes using false discovery rate controlling procedures. Bioinformatics 19, 368
-
375 (2003).


19.

Westfall,

P.H. & Young, S.S. P
-
Value Adjustments for Multiple Tests in
Multivariate Binomial Models. J Am Stat Assoc 84, 780
-
786 (1989).


20.

Jung, S.H. Sample size for FDR
-
control in microarray data analysis.
Bioinformatics 21, 3097
-
3104 (2005).



Pathway Analysis




21.

Ba
rry, W.T., Nobel, A.B. & Wright, F.A. Significance analysis of functional
categories in gene expression studies: a structured permutation approach.
Bioinformatics 21, 1943
-
1949 (2005).


22.

Barry, W.T., Nobel, A.B. & Wright, F.A. A Statistical Framework for Te
sting
Functional Categories in Microarray Data. Annals of Applied Statistics 2, 286
-
315 (2008).


23.

Gatti, D.M., Barry, W.T., Nobel, A.B., Rusyn, I. & Wright, F.A. Heading down
the wrong pathway: on the influence of correlation within gene sets. BMC
Genomics
11, 574 (2010).


24.

Subramanian, A., et al. Gene set enrichment analysis: a knowledge
-
based
approach for interpreting genome
-
wide expression profiles. Proc Natl Acad Sci U
S A 102, 15545
-
15550 (2005).


Classification


25.

Pang H, Lin A, Holford M, Enerson BE, Lu
B, Lawton MP, Floyd E, Zhao H.
Pathway analysis using random forests classification and regression.
Bioinformatics. 2006 Aug 15;22(16):2028
-
36.


26.

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater
reliability. Psycholog
ical Bulletin, 86, 420
-
428.


27.

Landis, J.R.; & Koch, G.G. (1977). "The measurement of observer agreement for
categorical data". Biometrics 33 (1): 159

174.



28.

Donner, A., & Eliasziw, M. (1987). Sample size requirements for reliability
studies. Statistics in
Medicine, 6, 441
-
448.



29.

Barry, W.T., et al. Intratumor Heterogeneity and Precision of Microarray
-
Based
Predictors of Breast Cancer Biology and Clinical Outcome. J Clin Oncol 28,
2198
-
2206 (2010).


Genomics and Clinical Trials



30.

Heagerty P, Thomas Lumley, M
argaret S. Pepe. Time
-
dependent ROC curves for
censored survival data and a diagnostic marker. Biomet
rics 2000;56:337
-
344.


31.

Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of
medical tests. Jama. 247(18): 2543
-
6, May 1982





32.

Pang

H, Datta D, Zhao H. Pathway analysis using random forests with bivariate
node
-
split for survival outcomes. Bioinformatics. 2010 Jan 15;26(2):250
-
8. Epub
2009 Nov 18.


33.

Simon, R. Diagnostic and prognostic prediction using gene expression profiles in
high
-
d
imensional microarray data. Br J Cancer 89, 1599
-
1604 (2003).


34.

Simon, R. Using DNA microarrays for diagnostic and prognostic prediction.
Expert Rev Mol Diagn 3, 587
-
595 (2003).


35.

Simon, R. Validation of pharmacogenomic biomarker classifiers for treatment
se
lection. Cancer Biomark 2, 89
-
96 (2006).


36.

Dupuy, A. & Simon, R.M. Critical review of published microarray studies for
cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer
Inst 99, 147
-
157 (2007).


37.

Dobbin, K. &
Simon
,

R. Sample

size determination in microarray experiments for
class comparison and prognostic
classification. Biostatistics 6,
27
-
38,
(
2005
)
.


38.

Subramanian, J

&
Simon
,

R. Gene expression
-
based prognostic signatures in lung
cancer: ready for clinical use? J Natl Cancer
Inst. 102
, 464
-
74 (2010).