Analyzing Microarray Data with Methods from Statistics and Machine Learning

unknownlippsAI and Robotics

Oct 16, 2013 (4 years and 28 days ago)

62 views

Analyzing Microarray Data


with Methods from


Statistics and Machine Learning

B
-
IT IPEC Winter School 2008


Prof. Dr. A. B. Cremers

Jörg Zimmermann

DNA Microarray Data


Genome Chips containing a collection
of microscopic DNA spots


Simultaneous determination of > 10
5

Gene Expression Levels


Dramatic acceleration of data
aquisition


New possibilities for disease diagnosis,
treatment studies, network analysis, …

Analyzing Microarray Data with Methods from Statistics and Machine Learning

DNA Microarray Data

The resulting data have the form:


x
11

x
12

… x
1n

( L
1
)


. . .


. . .


x
p1

x
p2

… x
pn

( L
p
)


n = number of measured cell states (e.g. gene expression levels)

p = number of samples

x
ij

= real number e.g. representing expression level of gene
j

in sample
i

L
i

= Label of sample
i

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Challenges for Data Analysis


Normalization

(removing systematic measurement effects)


Variable Selection

(Identification of relevant Variables)


Large sample Effects:

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Type I and Type II errors (False positives / False negatives)



Dimensionality Reduction



Identification

of new disease classes



Classification

of data into known disease classes

Cluster Analysis

Finding Structure in data without labels (
unsupervised learning
)

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Does a cluster characterize a (new) disease type?

Prediction Problem


Classify data into known disease classes:



Supervised Learning



Split data in
Training and Test set



Learn a model on the training set



Evaluate model on the test set


Analyzing Microarray Data with Methods from Statistics and Machine Learning

Prediction Problem

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Under
-

and Overlearning:

Data Analysis Methods

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Dimension Reduction



PCA (Principle Component Analysis)



ICA (Independent Component Analysis)



Multidimensional Scaling

Unsupervised Learning



K
-
Means / K
-
Medoid



Hierarchical Clustering Algorithms

Supervised Learning



Linear Discriminant Analysis



Maximum Likelihood Discrimination



Nearest Neighbor Methods



Decision Trees



Random Forests

Organisation

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Schedule:

31.3.2008


4.4.2008, B
-
IT Building

Language:

german and english (Slides in english)

Talk:

45 min + 15 min discussion

Documentation:

10


15 pages (german or english)

Bereich

(DPO Bonn): B

Summer Course:

Gene Mining and Network Analysis

Summer School:

Programming Data Analysis Algorithms with R

Contact:

jz@iai.uni
-
bonn.de

Background Literature:


Hastie, Tibshirani, Friedman:


The Elements of Statistical Learning, Springer, 2001