# Theoretical Analysis of

Theoretical Analysis of
Multi
-
Instance Leaning

zml@ai.nju.edu.cn

2002.10.11

Outline

Introduction

Theoretical analysis

PAC learning model

PAC learnablility of APR

Real
-
valued multi
-
instance learning

Future work

Introduction

Origin

Multi
-
instance

learning

originated

from

the

problem

of

drug

activity

prediction

,

and

was

first

formalized

by

T
.

G
.

Dietterich

et

al
.

in

their

seminal

paper

Solving

the

multiple
-
instance

problem

with

axis
-
parallel

rectangles

(
1997
)

Later

in

2001
,

J
.

D
.

Zuker

and

Y
.

Chevaleyre

extended

the

concept

of

multi
-
instance

learning

to

multi
-
part

learning

,

and

pointed

out

that

many

previously

studied

problems

are

multi
-
part

problems

rather

than

multi
-
instance

ones
.

Introduction
-
cont

d

Comparisons

Fig.1. The shape of a molecule changes as it rotates it

s bonds

Fig.2. Classical and multi
-
instance learning frameworks

Drug activity prediction problem

Introduction
-
cont

d

Experiment data

Data

set

#dim

#bags

#pos

bags

#neg

bags

#instan
-
ces

#instances/bag

max

min

ave

musk1

166

92

47

45

476

40

2

5.17

musk2

166

102

39

63

6598

1044

1

64.69

APR(Axis
-
Parallel Rectangles) algorithms

Fig.3. APR algorithms

GFS elim
-
count APR(standard)

GFS elim
-
kde APR(outside
-
in)

Iterated discrim APR(inside
-
out)

musk1: 92.4%

musk2: 89.2%

Introduction
-
cont

d

Various algorithms

APR (T. G. Dietterich et al.1997)

MULTINST (P. Auer 1997)

Diverse Density (O. Maron 1998)

Bayesian
-
k
NN, Citation
-
k
NN (J. Wang et al.
2000)

Relic (G. Ruffo 2000)

EM
-
DD (Q. Zhang & S. A. Goldman 2001)

……

Introduction
-
cont

d

Comparison on benchmark data sets

Algorithms

Musk1

(%correct)

Musk2

(%correct)

iterated
-
discrim APR

92.4

89.2

Citation
-
k
NN

92.4

86.3

Diverse Density

88.9

82.5

RELIC

83.7

87.3

MULTINST

76.7

84.0

BP

75.0

67.7

C4.5

68.5

58.8

Fig.4. A comparison of several multi
-
instance learning algorithm

Introduction
-
cont

d

Application area

Drug activity prediction (T. G. Dietterich et al. 1997)

Stock prediction (O. Maron 1998)

Learn a simple description of a person from a series
of images (O. Maron 1998)

Natural scene classification (O. Maron & A. L. Ratan
1998)

Event prediction (G. M. Weiss & H. Hirsh 1998)

Data mining and computer security (G. Ruffo 2000)

……

Multi
-
instance learning has been regarded as the
fourth machine learning framework parallel to
supervised learning, unsupervised learning, and
reinforcement learning.

Theoretical analysis

PAC learning model

Definition and it

s properties

VC dimension

PAC learnability of APR

Real
-
valued multi
-
instance learning

Theoretical Analysis

PAC model

Computational learning theory

L. G. Valiant (1984)

A theory of learnable

Deductive learning

Used for constructing a mathematical model of a
cognitive process.

W

P

Actual

example

M

Coded

example

0/1

Fig.5. Diagram of a framework for learning

PAC model
-
cont

d

Definition of PAC learning

We say that a learning algorithm
L
is a pac(probably
approximately correct) learning algorithm for the
hypothesis space
H
if, given

A confidence parameter
δ

(0<
δ<
1);

An accuracy parameter
ε

(0<
ε<
1);

then there is a positive integer
m
L

m
L
(
δ
,
ε
) such that

For any target concept
t

H

For any probability distribution
µ

on
X

whenever

m

m
L

,
µ
m
{s

S
(
m
,
t
) | er

µ
(
L
(s)

,
t
)<
ε
}>1
-

δ

PAC model
-
cont

d

Properties of a pac learning algorithm

It is
probable
that a useful training sample is presented.

One can only expect that the output hypothesis is
approximately

correct.

m
L
depends upon
δ
and
ε
, but not on
t

and
µ.

If there is a
pac

learning algorithm for a hypothesis space
H
, then we say that
H
is
pac
-
learnable
.

Efficient pac learning algorithm

If the running time of a pac learning algorithm
L

is
polynomial in 1/
δ
and 1/
ε
, then
L
is said to be
efficient
.

It is usually necessary to require a pac learning algorithm
to be
efficient.

PAC model
-
cont

d

VC dimension

VC (Vapnik
-
Chervonenkis) dimension of a
hypothesis space
H
is a notion originally defined
by Vapnik and Chervonenkis(1971), and was
introduced into computational learning theory by
Blumer
et al.
(1986)

VC dimension of a hypothesis space
H
,

denoted
by VCdim(
H
),

describes the

expressive power

of
H
in a sense.Generally, the greater of VCdim(
H
),
the greater

expressive power

of
H
, so
H
is more
difficult

to learn.

PAC model
-
cont

d

Consistency

If for any target concept
t

H
and any training sample
s=((
x
1
,
b
1
),(
x
2
,
b
2
), . . ., (
x
m
,
b
m
)) for
t
, the corresponding
hypothesis
L
(s)

H

agrees with s, i.e.
L
(s)(
x
i
)=
t
(
x
i
)=
b
i
,
then we say that
L
is a consistent algorithm.

VC dimension and pac learnability

L
is a
consistent

learning
algorithm for
H

H
has
finite

VC dimension

H
is
pac
-
learnable

Theoretical Analysis

PAC learning
of APR

Early work

While T. G. Dietterich et al. have proposed three APR
algorithms for multi
-
instance learning, P. M. Long & L.
Tan (1997) had some theoretical analysis of the pac
learnability of APR and showed that if,

Each instance in a bag is draw from a product distribution.

All instance in a bag are drawn independently.

2 6
10
( log )
d n nd
O
 
5 12
2
20
( log )
d n nd
O
 
then

APR is pac learnable under the multi
-
instance
learning framework with sample complexity
and time complexity .

PAC learning of APR
-
cont

d

A hardness result

3 2
2
( )
d n
O

2 2
2
( log )
d n d
O
 
Via the analysis of VC dimension, P. Auer et al.(1998)
gave a much more efficient pac learning algorithm
than with sample complexity and time
complexity .

More important, they proved that
if the instances
in a bag are not independent, then learning APR
under multi
-
instance learning framework is as
hard as learning DNF formulas, which is a NP
-
Complete problem
.

PAC learning of APR
-
cont

d

A further reduction

A. Blum & A. Kalai (1998) further studied the problem
of pac learning APR from multi
-
instance examples,
and proved that

If
H

is pac learnable from 1
-
sided (or 2
-
sided) random
classification noise, then
H
is pac learnable from multi
-
instance examples.

Via a reduction to the

Statistical Query

model ( M.
Kearns 1993), APR is pac learnable from multi
-
instance
examples with sample complexity and with time
complexity .

2
2
( )
d n
O

3 2
2
( )
d n
O

PAC learning of APR
-
cont

d

Summary

Sample

complexity

Time

complexity

Constrains

Theoretical
tools

P. M. Long et al.

product distribution,

independent
instances

p
-
concept,

VC dimension

P. Auer et al.

independent
instances

VC dimension

A. Blum et al.

independent
instances

statistical query
model,

VC dimension

2 6
10
( log )
d n nd
O
 
5 12
2
20
( log )
d n nd
O
 
2 2
2
( log )
d n d
O
 
3 2
2
( )
d n
O

2
2
( )
d n
O

3 2
2
( )
d n
O

Fig.6. A comparison of three theoretical algorithm

Theoretical Analysis

Real
-
valued
multi
-
instance learning

Real
-
valued multi
-
instance learning

It is worthwhile to note that in several applications of the
multiple instance problem, the actual predictions desired are
real valued. For example, the binding affinity between a
molecule and receptor is quantitative, so a real
-
valued label
of binding strength is preferable.

S. Ray & D. Page (2001) showed that the problem of multi
-
instance regression is NP
-
Complete, furthermore, D. R.
Dooly et al. (2001) showed that learning from real
-
valued
multi
-
instance examples is as hard as learning DNF.

Nearly at the same time, R. A. Amar et al.(2001) extended
the KNN, Citation
-
k
NN and Diverse Density algorithms for
real
-
valued multi
-
instance learning, they also provided a
flexible procedure for generating chemically realistic artificial
data sets and studied the performance of these modified
algorithms on them.

Future work

Further theoretical analysis of multi
-
instance
learning.

Design multi
-
instance modifications for neural
networks, decision trees, and other popular
machine learning algorithms.

Explore more issues which can be translated
into multi
-
instance learning problems.

Design appropriate bag generating methods.

……

Thanks