N

sharpfartsAI and Robotics

Nov 8, 2013 (3 years and 9 months ago)

68 views

Presentations

07/11/08, 07/14/08, 07/15/08, 07/16/08







Room 020

07/09/08, 07/11/08, 07/17/08, 07/18/08







9 AM


1 PM

07/10/08, 07/14/08, 07/15/08, 07/16/08







2 PM


5 PM

Presentation #3

Classifier Design



From a sample form an estimate

n

of

opt
.




Design cost:

n

=

n




opt





Key issue
: It is often impossible to get
large enough samples to sufficiently
reduce
E
[

n
].

Constraint


Lower design cost, optimization is constrained to


a filter subclass
C
.



Constraint cost:

C

=

C




d
.



The savings in design error must exceed the cost of
constraint.




Small samples demand low complexity




Key problem
: find appropriate constraints.

-

A constraint may be defined in accordance with a
model, or maybe experience has shown a
certain constraint works well in a given setting.

Classifier Design Error

N
0

N
1

N
2

Sample size,
N

E
[

n
]

E
[

n
,C
]


opt


opt,C

Error Estimation

E
[

d
,
n
]


d

Misclassification error

number of variables,

d

Strong Features *


Method: Spread sample points


-

Analytical misclassification error representation


-

Analytic classifier design


Sensitive!

Less sensitive

* Kim et. al., Computational Biology, 9(1), 2002

Clustering


Data clustering algorithms can be
hierarchical

or
partitional
.



Hierarchical algorithms find successive clusters using
previously established clusters.



Partitional

algorithms determine all clusters at once.
Hierarchical algorithms can be agglomerative ("bottom
-
up") or divisive ("top
-
down"). Agglomerative algorithms
begin with each element as a separate cluster and
merge them into successively larger clusters. Divisive
algorithms begin with the whole set and proceed to
divide it into successively smaller clusters.


Clustering


The model is the clustering algorithm itself: a set of
spatial points are observed and a clustering algorithm is
applied.



The model is “inferred” by experience.



The model is checked by applying to data from known
distributions of the classes.




Can we infer the correct partition of the data from the
clusters?


Time Course Model

Class

1

[low variance]

measurement

Class

2

[high variance]

measurement

Objectives


Examine the precision of sample
-
based
clustering relative to population inference


Study the effects of the number of
replicates of microarray experiments


Comparison between the various
clustering methods

Clustering algorithms


K
-
means


Fuzzy c
-
means


Self Organizing Map


Hierarchical clustering with Euclidean


Hierarchical clustering with correlation

Single experiment

s
2

= 0.25, N=1

No error!

Tighter clusters due
to small variance

Results from Fuzzy c
-
means

Single experiment

s
2

= 3.0, N=1

many


misclassifications

clusters start mixing

22 misclassifications

(8.8%)

Results from Fuzzy c
-
means

Clusters well
separated due to
the replication

very few

misclassifications

2 misclassifications

(0.8%)

Results from Fuzzy c
-
means

Replicated experiment

s
2

= 3.0, N=3

Real Data Applications


Initial clustering to generate templates


means


variance (individual or pooled)


Simulate time course data based on the
templates generated by initial clustering


different # of replicates


Apply various clustering methods


expected clustering error for each method


Website for full scale analysis


http://gspsnap.tamu.edu/clustering/jcb/

Username:
clustering

Password:
clustering


All five clustering methods analyzed


Extensive study on

the variance and the replicates

lots of graphs and images

lots of error measures including confusion matrix

Goals of Functional Genomics


Process signals generated by the genome to
characterize their regulatory effects and their
relationship to changes at both the genotypic and
phenotypic levels.




Find analytical tools for expression profile data
that can detect the types of multivariate
influences on decision
-

making produced by
complex genetic networks.



Model relationships within the genetic regulatory
network.


Regulatory Genetic Function?

Cell-line
Condition
RCH1
BCL3
FRA1
REL-B
ATF3
IAP-1
PC-1
MBP-1
SSAT
MDM2
p21
p53
AHA
OHO
IR
MMS
UV
ML-1
IR
-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
ML-1
MMS
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
1
0
Molt4
IR
-1
0
0
1
1
0
1
0
0
1
1
1
1
1
1
0
0
Molt4
MMS
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
1
0
SR
IR
-1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
0
0
SR
MMS
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
1
0
A549
IR
0
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
A549
MMS
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
1
0
A549
UV
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
0
1
MCF7
IR
-1
0
1
1
0
0
0
0
0
1
1
1
0
1
1
0
0
MCF7
MMS
0
0
1
0
1
0
0
0
0
1
1
1
1
0
0
1
0
MCF7
UV
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
1
RKO
IR
0
1
0
1
1
1
1
0
0
1
1
1
1
0
1
0
0
RKO
MMS
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
1
0
RKO
UV
0
0
0
0
1
0
0
0
0
0
1
1
1
0
0
0
1
CCRF-CEM
IR
-1
1
1
1
1
0
1
0
0
0
0
-1
-1
0
1
0
0
CCRF-CEM
MMS
0
0
0
0
1
0
0
0
0
0
0
-1
0
0
0
1
0
HL60
IR
-1
1
0
1
1
0
1
0
1
0
1
-1
-1
-1
1
0
0
HL60
MMS
0
0
1
0
1
0
0
0
0
1
1
-1
0
1
0
1
0
K562
IR
0
0
0
0
0
0
0
0
0
0
0
-1
0
0
1
0
0
K562
MMS
0
0
0
0
1
0
0
0
0
0
0
-1
0
0
0
1
0
H1299
IR
0
0
0
1
0
0
1
0
0
0
0
-1
0
0
1
0
0
H1299
MMS
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
1
0
H1299
UV
0
0
0
0
1
0
1
0
0
0
1
-1
0
1
0
0
1
RKO/E6
IR
-1
1
0
1
0
1
1
0
0
0
0
-1
-1
0
1
0
0
RKO/E6
MMS
-1
0
0
0
1
0
0
0
0
0
1
-1
-1
1
0
1
0
RKO/E6
UV
-1
0
0
0
1
0
0
0
0
0
1
-1
-1
1
0
0
1
T47D
IR
0
0
0
1
0
0
0
0
0
0
1
-1
0
-1
1
0
0
T47D
MMS
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
1
0
T47D
UV
0
0
0
0
1
0
0
0
0
0
1
-1
0
1
0
0
1
Rows are cell lines subjected to different experimental conditions.
Comparisons are to the same cell line not exposed to the experimental treatment.
-1 means expression goes down relative to untreated
0 means expression is unchanged relative to untreated
+1 means expression goes up relative to untreated
Genes
Condition
“If gene X
1

is active and gene X
2

is suppressed, gene Y would be
activated”

Can we infer
regulatory
genetic function

from the
cDNA microarray data,

for both known and
unknown

functions?

Genetic Regulatory Networks



Genes interact via multiprotein complexes,


feedback regulation, and pathway networks.




Complex molecular networks underlie biological


function.




Most diseases do not result from a single gene


product.




These interrelationships among genes constitute


gene regulatory networks.

Gene Regulation

p53

E1A

Rb

E2F

Myc

MDM2

DNA damage

Hypoxia

transcription

protein

translation

Gene
regulatory
controls

Gene expression

the process by which
gene products
(proteins) are made

Gene Networks Inference

Biochemical interaction network

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Protein 1

Protein 3

Protein 2

Protein 4

Complex 3
-
4

Protein space

Metabolite 1

Metabolite 2

Metabolic space

Biological
phenomena

Biochemical
model

Relationship

Variable

From Brazhnik et. Al. “
Gene networks: how to put the function in
genomics
”, TRENDS in Biotechnology, 20 (11), 2002


Gene Networks Inference

Biochemical interaction network

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Protein 1

Protein 3

Protein 2

Protein 4

Complex 3
-
4

Protein space

Metabolite 1

Metabolite 2

Metabolic space

Gene 1

Gene 4

Gene 2

Gene 3

Gene space

Projection

to the
gene space

Gene Regulatory
Network Model

From Brazhnik et. Al. “
Gene networks: how to put the function in
genomics
”, TRENDS in Biotechnology, 20 (11), 2002


Predictive Relationships *


To discover predictive relationships, we let each gene
be the target and design filters for each combination of
one, two, three, and four predictor genes.




Coefficient of Determination (COD) :









opt
COD
* Kim et. al., J. Biomedical Optics, 5(4), Oct 2000.

Course Project

Formulate the question

Organizing and

cleaning data

Normalize data

Analyze data

Interpretation of results