Theoretical Analysis of

strawberrycokevilleAI and Robotics

Nov 7, 2013 (3 years and 11 months ago)

81 views

Theoretical Analysis of
Multi
-
Instance Leaning


张敏灵

周志华


zml@ai.nju.edu.cn



南京大学软件新技术国家重点实验室

2002.10.11

Outline



Introduction


Theoretical analysis


PAC learning model


PAC learnablility of APR


Real
-
valued multi
-
instance learning


Future work


Introduction


Origin


Multi
-
instance

learning

originated

from

the

problem

of


drug

activity

prediction

,

and

was

first

formalized

by

T
.

G
.

Dietterich

et

al
.

in

their

seminal

paper


Solving

the

multiple
-
instance

problem

with

axis
-
parallel

rectangles

(
1997
)


Later

in

2001
,

J
.

D
.

Zuker

and

Y
.

Chevaleyre

extended

the

concept

of


multi
-
instance

learning


to


multi
-
part

learning

,

and

pointed

out

that

many

previously

studied

problems

are


multi
-
part


problems

rather

than


multi
-
instance


ones
.




Introduction
-
cont

d



Comparisons







Fig.1. The shape of a molecule changes as it rotates it

s bonds

Fig.2. Classical and multi
-
instance learning frameworks


Drug activity prediction problem







Introduction
-
cont

d


Experiment data

Data

set


#dim


#bags


#pos

bags


#neg


bags


#instan
-
ces


#instances/bag


max


min


ave


musk1


166


92


47


45


476


40


2


5.17


musk2


166


102


39


63


6598


1044


1


64.69



APR(Axis
-
Parallel Rectangles) algorithms

Fig.3. APR algorithms


GFS elim
-
count APR(standard)


GFS elim
-
kde APR(outside
-
in)


Iterated discrim APR(inside
-
out)


musk1: 92.4%


musk2: 89.2%



Introduction
-
cont

d


Various algorithms


APR (T. G. Dietterich et al.1997)


MULTINST (P. Auer 1997)


Diverse Density (O. Maron 1998)


Bayesian
-
k
NN, Citation
-
k
NN (J. Wang et al.
2000)


Relic (G. Ruffo 2000)



EM
-
DD (Q. Zhang & S. A. Goldman 2001)


……

Introduction
-
cont

d


Comparison on benchmark data sets

Algorithms


Musk1

(%correct)


Musk2

(%correct)


iterated
-
discrim APR


92.4


89.2


Citation
-
k
NN


92.4


86.3


Diverse Density


88.9


82.5


RELIC


83.7


87.3


MULTINST


76.7


84.0


BP


75.0


67.7


C4.5


68.5


58.8


Fig.4. A comparison of several multi
-
instance learning algorithm

Introduction
-
cont

d


Application area


Drug activity prediction (T. G. Dietterich et al. 1997)


Stock prediction (O. Maron 1998)


Learn a simple description of a person from a series
of images (O. Maron 1998)


Natural scene classification (O. Maron & A. L. Ratan
1998)


Event prediction (G. M. Weiss & H. Hirsh 1998)


Data mining and computer security (G. Ruffo 2000)


……


Multi
-
instance learning has been regarded as the
fourth machine learning framework parallel to
supervised learning, unsupervised learning, and
reinforcement learning.

Theoretical analysis


PAC learning model


Definition and it

s properties


VC dimension


PAC learnability of APR


Real
-
valued multi
-
instance learning




Theoretical Analysis

PAC model


Computational learning theory


L. G. Valiant (1984)


A theory of learnable


Deductive learning


Used for constructing a mathematical model of a
cognitive process.



W

P

Actual

example

M

Coded


example

0/1

Fig.5. Diagram of a framework for learning

PAC model
-
cont

d


Definition of PAC learning


We say that a learning algorithm
L
is a pac(probably
approximately correct) learning algorithm for the
hypothesis space
H
if, given


A confidence parameter
δ

(0<
δ<
1);


An accuracy parameter
ε

(0<
ε<
1);


then there is a positive integer
m
L


m
L
(
δ
,
ε
) such that


For any target concept
t

H


For any probability distribution
µ

on
X



whenever

m

m
L

,
µ
m
{s


S
(
m
,
t
) | er

µ
(
L
(s)

,
t
)<
ε
}>1
-

δ

PAC model
-
cont

d


Properties of a pac learning algorithm


It is
probable
that a useful training sample is presented.


One can only expect that the output hypothesis is
approximately

correct.


m
L
depends upon
δ
and
ε
, but not on
t

and
µ.


If there is a
pac

learning algorithm for a hypothesis space
H
, then we say that
H
is
pac
-
learnable
.



Efficient pac learning algorithm


If the running time of a pac learning algorithm
L

is
polynomial in 1/
δ
and 1/
ε
, then
L
is said to be
efficient
.


It is usually necessary to require a pac learning algorithm
to be
efficient.

PAC model
-
cont

d


VC dimension


VC (Vapnik
-
Chervonenkis) dimension of a
hypothesis space
H
is a notion originally defined
by Vapnik and Chervonenkis(1971), and was
introduced into computational learning theory by
Blumer
et al.
(1986)


VC dimension of a hypothesis space
H
,

denoted
by VCdim(
H
),

describes the

expressive power


of
H
in a sense.Generally, the greater of VCdim(
H
),
the greater

expressive power


of
H
, so
H
is more
difficult

to learn.


PAC model
-
cont

d


Consistency


If for any target concept
t

H
and any training sample
s=((
x
1
,
b
1
),(
x
2
,
b
2
), . . ., (
x
m
,
b
m
)) for
t
, the corresponding
hypothesis
L
(s)

H

agrees with s, i.e.
L
(s)(
x
i
)=
t
(
x
i
)=
b
i
,
then we say that
L
is a consistent algorithm.


VC dimension and pac learnability



L
is a
consistent

learning
algorithm for
H

H
has
finite

VC dimension

H
is
pac
-
learnable

Theoretical Analysis

PAC learning
of APR


Early work


While T. G. Dietterich et al. have proposed three APR
algorithms for multi
-
instance learning, P. M. Long & L.
Tan (1997) had some theoretical analysis of the pac
learnability of APR and showed that if,


Each instance in a bag is draw from a product distribution.


All instance in a bag are drawn independently.



2 6
10
( log )
d n nd
O
 
5 12
2
20
( log )
d n nd
O
 
then

APR is pac learnable under the multi
-
instance
learning framework with sample complexity
and time complexity .

PAC learning of APR
-
cont

d



A hardness result



3 2
2
( )
d n
O

2 2
2
( log )
d n d
O
 
Via the analysis of VC dimension, P. Auer et al.(1998)
gave a much more efficient pac learning algorithm
than with sample complexity and time
complexity .


More important, they proved that
if the instances
in a bag are not independent, then learning APR
under multi
-
instance learning framework is as
hard as learning DNF formulas, which is a NP
-
Complete problem
.


PAC learning of APR
-
cont

d



A further reduction


A. Blum & A. Kalai (1998) further studied the problem
of pac learning APR from multi
-
instance examples,
and proved that


If
H

is pac learnable from 1
-
sided (or 2
-
sided) random
classification noise, then
H
is pac learnable from multi
-
instance examples.


Via a reduction to the

Statistical Query


model ( M.
Kearns 1993), APR is pac learnable from multi
-
instance
examples with sample complexity and with time
complexity .

2
2
( )
d n
O

3 2
2
( )
d n
O

PAC learning of APR
-
cont

d


Summary




Sample



complexity

Time


complexity



Constrains


Theoretical
tools



P. M. Long et al.


product distribution,


independent
instances



p
-
concept,



VC dimension





P. Auer et al.






independent
instances



VC dimension



A. Blum et al.






independent
instances


statistical query
model,



VC dimension


2 6
10
( log )
d n nd
O
 
5 12
2
20
( log )
d n nd
O
 
2 2
2
( log )
d n d
O
 
3 2
2
( )
d n
O

2
2
( )
d n
O

3 2
2
( )
d n
O

Fig.6. A comparison of three theoretical algorithm

Theoretical Analysis


Real
-
valued
multi
-
instance learning


Real
-
valued multi
-
instance learning


It is worthwhile to note that in several applications of the
multiple instance problem, the actual predictions desired are
real valued. For example, the binding affinity between a
molecule and receptor is quantitative, so a real
-
valued label
of binding strength is preferable.


S. Ray & D. Page (2001) showed that the problem of multi
-
instance regression is NP
-
Complete, furthermore, D. R.
Dooly et al. (2001) showed that learning from real
-
valued
multi
-
instance examples is as hard as learning DNF.


Nearly at the same time, R. A. Amar et al.(2001) extended
the KNN, Citation
-
k
NN and Diverse Density algorithms for
real
-
valued multi
-
instance learning, they also provided a
flexible procedure for generating chemically realistic artificial
data sets and studied the performance of these modified
algorithms on them.

Future work


Further theoretical analysis of multi
-
instance
learning.


Design multi
-
instance modifications for neural
networks, decision trees, and other popular
machine learning algorithms.


Explore more issues which can be translated
into multi
-
instance learning problems.


Design appropriate bag generating methods.


……



Thanks