# N

Τεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 4 χρόνια και 8 μήνες)

100 εμφανίσεις

Presentations

07/11/08, 07/14/08, 07/15/08, 07/16/08

Room 020

07/09/08, 07/11/08, 07/17/08, 07/18/08

9 AM

1 PM

07/10/08, 07/14/08, 07/15/08, 07/16/08

2 PM

5 PM

Presentation #3

Classifier Design

From a sample form an estimate

n

of

opt
.

Design cost:

n

=

n

opt

Key issue
: It is often impossible to get
large enough samples to sufficiently
reduce
E
[

n
].

Constraint

Lower design cost, optimization is constrained to

a filter subclass
C
.

Constraint cost:

C

=

C

d
.

The savings in design error must exceed the cost of
constraint.

Small samples demand low complexity

Key problem
: find appropriate constraints.

-

A constraint may be defined in accordance with a
model, or maybe experience has shown a
certain constraint works well in a given setting.

Classifier Design Error

N
0

N
1

N
2

Sample size,
N

E
[

n
]

E
[

n
,C
]

opt

opt,C

Error Estimation

E
[

d
,
n
]

d

Misclassification error

number of variables,

d

Strong Features *

-

Analytical misclassification error representation

-

Analytic classifier design

Sensitive!

Less sensitive

* Kim et. al., Computational Biology, 9(1), 2002

Clustering

Data clustering algorithms can be
hierarchical

or
partitional
.

Hierarchical algorithms find successive clusters using
previously established clusters.

Partitional

algorithms determine all clusters at once.
Hierarchical algorithms can be agglomerative ("bottom
-
up") or divisive ("top
-
down"). Agglomerative algorithms
begin with each element as a separate cluster and
merge them into successively larger clusters. Divisive
algorithms begin with the whole set and proceed to
divide it into successively smaller clusters.

Clustering

The model is the clustering algorithm itself: a set of
spatial points are observed and a clustering algorithm is
applied.

The model is “inferred” by experience.

The model is checked by applying to data from known
distributions of the classes.

Can we infer the correct partition of the data from the
clusters?

Time Course Model

Class

1

[low variance]

measurement

Class

2

[high variance]

measurement

Objectives

Examine the precision of sample
-
based
clustering relative to population inference

Study the effects of the number of
replicates of microarray experiments

Comparison between the various
clustering methods

Clustering algorithms

K
-
means

Fuzzy c
-
means

Self Organizing Map

Hierarchical clustering with Euclidean

Hierarchical clustering with correlation

Single experiment

s
2

= 0.25, N=1

No error!

Tighter clusters due
to small variance

Results from Fuzzy c
-
means

Single experiment

s
2

= 3.0, N=1

many

misclassifications

clusters start mixing

22 misclassifications

(8.8%)

Results from Fuzzy c
-
means

Clusters well
separated due to
the replication

very few

misclassifications

2 misclassifications

(0.8%)

Results from Fuzzy c
-
means

Replicated experiment

s
2

= 3.0, N=3

Real Data Applications

Initial clustering to generate templates

means

variance (individual or pooled)

Simulate time course data based on the
templates generated by initial clustering

different # of replicates

Apply various clustering methods

expected clustering error for each method

Website for full scale analysis

http://gspsnap.tamu.edu/clustering/jcb/

clustering

clustering

All five clustering methods analyzed

Extensive study on

the variance and the replicates

lots of graphs and images

lots of error measures including confusion matrix

Goals of Functional Genomics

Process signals generated by the genome to
characterize their regulatory effects and their
relationship to changes at both the genotypic and
phenotypic levels.

Find analytical tools for expression profile data
that can detect the types of multivariate
influences on decision
-

making produced by
complex genetic networks.

Model relationships within the genetic regulatory
network.

Regulatory Genetic Function?

Cell-line
Condition
RCH1
BCL3
FRA1
REL-B
ATF3
IAP-1
PC-1
MBP-1
SSAT
MDM2
p21
p53
AHA
OHO
IR
MMS
UV
ML-1
IR
-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
ML-1
MMS
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
1
0
Molt4
IR
-1
0
0
1
1
0
1
0
0
1
1
1
1
1
1
0
0
Molt4
MMS
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
1
0
SR
IR
-1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
0
0
SR
MMS
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
1
0
A549
IR
0
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
A549
MMS
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
1
0
A549
UV
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
0
1
MCF7
IR
-1
0
1
1
0
0
0
0
0
1
1
1
0
1
1
0
0
MCF7
MMS
0
0
1
0
1
0
0
0
0
1
1
1
1
0
0
1
0
MCF7
UV
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
1
RKO
IR
0
1
0
1
1
1
1
0
0
1
1
1
1
0
1
0
0
RKO
MMS
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
1
0
RKO
UV
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
0
1
CCRF-CEM
IR
-1
1
1
1
1
0
1
0
0
0
0
-1
-1
0
1
0
0
CCRF-CEM
MMS
0
0
0
0
1
0
0
0
0
0
0
-1
0
0
0
1
0
HL60
IR
-1
1
0
1
1
0
1
0
1
0
1
-1
-1
-1
1
0
0
HL60
MMS
0
0
1
0
1
0
0
0
0
1
1
-1
0
1
0
1
0
K562
IR
0
0
0
0
0
0
0
0
0
0
0
-1
0
0
1
0
0
K562
MMS
0
0
0
0
1
0
0
0
0
0
0
-1
0
0
0
1
0
H1299
IR
0
0
0
1
0
0
1
0
0
0
0
-1
0
0
1
0
0
H1299
MMS
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
1
0
H1299
UV
0
0
0
0
1
0
1
0
0
0
1
-1
0
1
0
0
1
RKO/E6
IR
-1
1
0
1
0
1
1
0
0
0
0
-1
-1
0
1
0
0
RKO/E6
MMS
-1
0
0
0
1
0
0
0
0
0
1
-1
-1
1
0
1
0
RKO/E6
UV
-1
0
0
0
1
0
0
0
0
0
1
-1
-1
1
0
0
1
T47D
IR
0
0
0
1
0
0
0
0
0
0
1
-1
0
-1
1
0
0
T47D
MMS
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
1
0
T47D
UV
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
0
1
Rows are cell lines subjected to different experimental conditions.
Comparisons are to the same cell line not exposed to the experimental treatment.
-1 means expression goes down relative to untreated
0 means expression is unchanged relative to untreated
+1 means expression goes up relative to untreated
Genes
Condition
“If gene X
1

is active and gene X
2

is suppressed, gene Y would be
activated”

Can we infer
regulatory
genetic function

from the
cDNA microarray data,

for both known and
unknown

functions?

Genetic Regulatory Networks

Genes interact via multiprotein complexes,

feedback regulation, and pathway networks.

Complex molecular networks underlie biological

function.

Most diseases do not result from a single gene

product.

These interrelationships among genes constitute

gene regulatory networks.

Gene Regulation

p53

E1A

Rb

E2F

Myc

MDM2

DNA damage

Hypoxia

transcription

protein

translation

Gene
regulatory
controls

Gene expression

the process by which
gene products

Gene Networks Inference

Biochemical interaction network

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Protein 1

Protein 3

Protein 2

Protein 4

Complex 3
-
4

Protein space

Metabolite 1

Metabolite 2

Metabolic space

Biological
phenomena

Biochemical
model

Relationship

Variable

From Brazhnik et. Al. “
Gene networks: how to put the function in
genomics
”, TRENDS in Biotechnology, 20 (11), 2002

Gene Networks Inference

Biochemical interaction network

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Protein 1

Protein 3

Protein 2

Protein 4

Complex 3
-
4

Protein space

Metabolite 1

Metabolite 2

Metabolic space

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Projection

to the
gene space

Gene Regulatory
Network Model

From Brazhnik et. Al. “
Gene networks: how to put the function in
genomics
”, TRENDS in Biotechnology, 20 (11), 2002

Predictive Relationships *

To discover predictive relationships, we let each gene
be the target and design filters for each combination of
one, two, three, and four predictor genes.

Coefficient of Determination (COD) :

opt
COD
* Kim et. al., J. Biomedical Optics, 5(4), Oct 2000.

Course Project

Formulate the question

Organizing and

cleaning data

Normalize data

Analyze data

Interpretation of results