Theoretical Analysis of
Multi

Instance Leaning
张敏灵
周志华
zml@ai.nju.edu.cn
南京大学软件新技术国家重点实验室
2002.10.11
Outline
Introduction
Theoretical analysis
PAC learning model
PAC learnablility of APR
Real

valued multi

instance learning
Future work
Introduction
Origin
Multi

instance
learning
originated
from
the
problem
of
“
drug
activity
prediction
”
,
and
was
first
formalized
by
T
.
G
.
Dietterich
et
al
.
in
their
seminal
paper
“
Solving
the
multiple

instance
problem
with
axis

parallel
rectangles
”
(
1997
)
Later
in
2001
,
J
.
D
.
Zuker
and
Y
.
Chevaleyre
extended
the
concept
of
“
multi

instance
learning
”
to
“
multi

part
learning
”
,
and
pointed
out
that
many
previously
studied
problems
are
“
multi

part
”
problems
rather
than
“
multi

instance
”
ones
.
Introduction

cont
’
d
Comparisons
Fig.1. The shape of a molecule changes as it rotates it
’
s bonds
Fig.2. Classical and multi

instance learning frameworks
Drug activity prediction problem
Introduction

cont
’
d
Experiment data
Data
set
#dim
#bags
#pos
bags
#neg
bags
#instan

ces
#instances/bag
max
min
ave
musk1
166
92
47
45
476
40
2
5.17
musk2
166
102
39
63
6598
1044
1
64.69
APR(Axis

Parallel Rectangles) algorithms
Fig.3. APR algorithms
GFS elim

count APR(standard)
GFS elim

kde APR(outside

in)
Iterated discrim APR(inside

out)
musk1: 92.4%
musk2: 89.2%
Introduction

cont
’
d
Various algorithms
APR (T. G. Dietterich et al.1997)
MULTINST (P. Auer 1997)
Diverse Density (O. Maron 1998)
Bayesian

k
NN, Citation

k
NN (J. Wang et al.
2000)
Relic (G. Ruffo 2000)
EM

DD (Q. Zhang & S. A. Goldman 2001)
……
Introduction

cont
’
d
Comparison on benchmark data sets
Algorithms
Musk1
(%correct)
Musk2
(%correct)
iterated

discrim APR
92.4
89.2
Citation

k
NN
92.4
86.3
Diverse Density
88.9
82.5
RELIC
83.7
87.3
MULTINST
76.7
84.0
BP
75.0
67.7
C4.5
68.5
58.8
Fig.4. A comparison of several multi

instance learning algorithm
Introduction

cont
’
d
Application area
Drug activity prediction (T. G. Dietterich et al. 1997)
Stock prediction (O. Maron 1998)
Learn a simple description of a person from a series
of images (O. Maron 1998)
Natural scene classification (O. Maron & A. L. Ratan
1998)
Event prediction (G. M. Weiss & H. Hirsh 1998)
Data mining and computer security (G. Ruffo 2000)
……
Multi

instance learning has been regarded as the
fourth machine learning framework parallel to
supervised learning, unsupervised learning, and
reinforcement learning.
Theoretical analysis
PAC learning model
Definition and it
’
s properties
VC dimension
PAC learnability of APR
Real

valued multi

instance learning
Theoretical Analysis
－
PAC model
Computational learning theory
L. G. Valiant (1984)
A theory of learnable
Deductive learning
Used for constructing a mathematical model of a
cognitive process.
W
P
Actual
example
M
Coded
example
0/1
Fig.5. Diagram of a framework for learning
PAC model

cont
’
d
Definition of PAC learning
We say that a learning algorithm
L
is a pac(probably
approximately correct) learning algorithm for the
hypothesis space
H
if, given
A confidence parameter
δ
(0<
δ<
1);
An accuracy parameter
ε
(0<
ε<
1);
then there is a positive integer
m
L
＝
m
L
(
δ
,
ε
) such that
For any target concept
t
∈
H
For any probability distribution
µ
on
X
whenever
m
m
L
,
µ
m
{s
∈
S
(
m
,
t
)  er
µ
(
L
(s)
,
t
)<
ε
}>1

δ
PAC model

cont
’
d
Properties of a pac learning algorithm
It is
probable
that a useful training sample is presented.
One can only expect that the output hypothesis is
approximately
correct.
m
L
depends upon
δ
and
ε
, but not on
t
and
µ.
If there is a
pac
learning algorithm for a hypothesis space
H
, then we say that
H
is
pac

learnable
.
Efficient pac learning algorithm
If the running time of a pac learning algorithm
L
is
polynomial in 1/
δ
and 1/
ε
, then
L
is said to be
efficient
.
It is usually necessary to require a pac learning algorithm
to be
efficient.
PAC model

cont
’
d
VC dimension
VC (Vapnik

Chervonenkis) dimension of a
hypothesis space
H
is a notion originally defined
by Vapnik and Chervonenkis(1971), and was
introduced into computational learning theory by
Blumer
et al.
(1986)
VC dimension of a hypothesis space
H
,
denoted
by VCdim(
H
),
describes the
‘
expressive power
’
of
H
in a sense.Generally, the greater of VCdim(
H
),
the greater
‘
expressive power
’
of
H
, so
H
is more
difficult
to learn.
PAC model

cont
’
d
Consistency
If for any target concept
t
∈
H
and any training sample
s=((
x
1
,
b
1
),(
x
2
,
b
2
), . . ., (
x
m
,
b
m
)) for
t
, the corresponding
hypothesis
L
(s)
∈
H
agrees with s, i.e.
L
(s)(
x
i
)=
t
(
x
i
)=
b
i
,
then we say that
L
is a consistent algorithm.
VC dimension and pac learnability
L
is a
consistent
learning
algorithm for
H
H
has
finite
VC dimension
H
is
pac

learnable
Theoretical Analysis
－
PAC learning
of APR
Early work
While T. G. Dietterich et al. have proposed three APR
algorithms for multi

instance learning, P. M. Long & L.
Tan (1997) had some theoretical analysis of the pac
learnability of APR and showed that if,
Each instance in a bag is draw from a product distribution.
All instance in a bag are drawn independently.
2 6
10
( log )
d n nd
O
5 12
2
20
( log )
d n nd
O
then
APR is pac learnable under the multi

instance
learning framework with sample complexity
and time complexity .
PAC learning of APR

cont
’
d
A hardness result
3 2
2
( )
d n
O
2 2
2
( log )
d n d
O
Via the analysis of VC dimension, P. Auer et al.(1998)
gave a much more efficient pac learning algorithm
than with sample complexity and time
complexity .
More important, they proved that
if the instances
in a bag are not independent, then learning APR
under multi

instance learning framework is as
hard as learning DNF formulas, which is a NP

Complete problem
.
PAC learning of APR

cont
’
d
A further reduction
A. Blum & A. Kalai (1998) further studied the problem
of pac learning APR from multi

instance examples,
and proved that
If
H
is pac learnable from 1

sided (or 2

sided) random
classification noise, then
H
is pac learnable from multi

instance examples.
Via a reduction to the
“
Statistical Query
”
model ( M.
Kearns 1993), APR is pac learnable from multi

instance
examples with sample complexity and with time
complexity .
2
2
( )
d n
O
3 2
2
( )
d n
O
PAC learning of APR

cont
’
d
Summary
Sample
complexity
Time
complexity
Constrains
Theoretical
tools
P. M. Long et al.
product distribution,
independent
instances
p

concept,
VC dimension
P. Auer et al.
independent
instances
VC dimension
A. Blum et al.
independent
instances
statistical query
model,
VC dimension
2 6
10
( log )
d n nd
O
5 12
2
20
( log )
d n nd
O
2 2
2
( log )
d n d
O
3 2
2
( )
d n
O
2
2
( )
d n
O
3 2
2
( )
d n
O
Fig.6. A comparison of three theoretical algorithm
Theoretical Analysis
－
Real

valued
multi

instance learning
Real

valued multi

instance learning
It is worthwhile to note that in several applications of the
multiple instance problem, the actual predictions desired are
real valued. For example, the binding affinity between a
molecule and receptor is quantitative, so a real

valued label
of binding strength is preferable.
S. Ray & D. Page (2001) showed that the problem of multi

instance regression is NP

Complete, furthermore, D. R.
Dooly et al. (2001) showed that learning from real

valued
multi

instance examples is as hard as learning DNF.
Nearly at the same time, R. A. Amar et al.(2001) extended
the KNN, Citation

k
NN and Diverse Density algorithms for
real

valued multi

instance learning, they also provided a
flexible procedure for generating chemically realistic artificial
data sets and studied the performance of these modified
algorithms on them.
Future work
Further theoretical analysis of multi

instance
learning.
Design multi

instance modifications for neural
networks, decision trees, and other popular
machine learning algorithms.
Explore more issues which can be translated
into multi

instance learning problems.
Design appropriate bag generating methods.
……
Thanks
Comments 0
Log in to post a comment