# Random Set Framework for Multiple

Conjuntive

Formulation of the
Random Set Framework for Multiple
Instance Learning:

Application to Remote Sensing

Jeremy Bolton

Paul

CSI

Laboratory

University of Florida

2
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Highlights

Conjunctive forms of Random Sets for
Multiple Instance Learning:

Random Sets can be used to solve MIL problem when multiple
concepts are present

Previously Developed Formulations assume Disjunctive
relationship between concepts learned

New formulation provides for a conjunctive relationship
between concepts and its utility is exhibited on a Ground
Penetrating Radar (GPR) data set

3
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Outline

I.
Multiple Instance Learning

I.
MI Problem

II.
RSF
-
MIL

III.
Multiple Target Concepts

II.
Experimental Results

I.
GPR Experiments

III.

Future Work

Multiple Instance Learning

5
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Standard Learning vs. Multiple
Instance Learning

Standard supervised learning

O
ptimize some model (or learn a target concept) given training
samples and corresponding labels

MIL

Learn a target concept given
multiple

sets
of samples and
corresponding labels for the
sets
.

Interpretation: Learning with uncertain labels / noisy teacher

}
,...,
{
},
,...,
{
1
1
n
n
y
y
Y
x
x
X

?}
?,...,
{
,
1
},
,...,
{
1
1

i
i
in
i
i
in
i
i
y
y
Y
x
x
X
6
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Multiple Instance Learning (MIL)

Given:

Set of
I

bags

Labeled + or
-

The
i
th

bag is a set of
J
i

samples in some feature space

Interpretation of labels

Goal: learn
concept

What characteristic is common to the positive bags that is not
observed in the negative bags

}
,...,
,
,..
{
1
1

I
i
i
B
B
B
B
B
}
,...,
{
1
i
iJ
i
i
x
x
B

1
)
(
:

ij
i
x
label
j
B
0
)
(
,

ij
i
x
label
j
B
7
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Multiple Instance Learning

x
1

label = 1

x
2

label = 1

x
3

label =
0

x
4

label =
0

x
5

label = 1

{
x
1
,
x
2
, x
3
, x
4
}

label = 1

{x
1
,
x
2
,
x
3
,
x
4
}

label = 1

{x
1
,
x
2
,
x
3
, x
4
}

label =
0

Multiple Instance Learning

8
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

MIL Application: Example GPR

Collaboration:
Frigui
,
Collins,
Torrione

Construction
of bags

Collect 15 EHD feature
vectors from
the 15
depth bins

Mine images = + bags

FA images =
-

bags

15
4
3
2
1
,...,
,
,
,
x
x
x
x
x
EHD: Feature Vector

9
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Standard vs. MI Learning: GPR Example

Standard Learning

Each training sample
(feature vector) must
have a label

many feature vectors per
image and multiple
images

difficult to label given
GPR echoes, ground
truthing

errors, etc …

label of each vector may
not be known

EHD: Feature Vector

1
x

1
y

2
y

3
y

4
y

n
y
2
x
3
x
4
x
n
x
10
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Standard
vs

MI Learning: GPR Example

Multiple Instance
Learning

Each training
bag

must
have a label

No need to label all feature
vectors,
just identify images
(bags) where targets are
present

Implicitly accounts for class
label uncertainty …

15
4
3
2
1
,...,
,
,
,

x
x
x
x
x

Y
EHD: Feature Vector

Random Set Framework for Multiple
Instance Learning

12
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Random Set Brief

Random Set

)
(R)
(R,
B
))
(
,
(

B
)
),
(
,
(
P
B

R
)
),
(
,
(
P
B

13
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

How can we use Random Sets for MIL?

Random set for MIL:
Bags are sets (multi
-
sets)

Idea of finding commonality of positive bags inherent in random
set formulation

Sets have an empty intersection or
non
-
empty intersection
relationship

Find
commonality

using
intersection

operator

Random sets governing functional is based on intersection operator

Capacity functional :
T

It is
NOT

the case that
EACH

element is
NOT

the

target concept

X
x
x
T
X
T
)
(
1
1
)
(
}
,...,
{
1
n
x
x
X

A.K.A. : Noisy
-
OR gate (Pearl 1988)

14
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Random Set Functionals

Capacity
functionals

for intersection calculation

Use germ and grain model to model random set

Multiple (J) Concepts

Calculate probability of intersection given X and germ and grain pairs:

Grains are governed by random radii with assumed cumulative:

)
(
)
(
X
T
X
P

J
j
j
j
1
)
}
({

j
j
j
j
T
j
j
j
j
j
x
r
r
r
r
R
P
r
R
P
x
T
j

,
)
exp(
1
2
2
)
(
1
)
(
})
({

j
X
x
x
T
X
T
j
)
(
1
1
)
(
Random Set model
parameters

}
,
{

Germ

Grain

15
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

RSF
-
MIL: Germ and Grain Model

Positive Bags
= blue

Negative Bags
= orange

Distinct
shapes =
distinct bags

x

x

x

x

x

x

x

x

x

T

T

T

T

T

Multiple Instance Learning with
Multiple Concepts

17
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Multiple Concepts:

Disjunction or Conjunction?

Disjunction

When you have multiple types of concepts

When each instance can indicate the presence of a
target

Conjunction

When you have a target type that is composed of
multiple (necessary concepts)

When each instance can indicate a concept, but not
necessary the composite target type

18
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Conjunctive RSF
-
MIL

Previously Developed Disjunctive RSF
-
MIL (RSF
-
MIL
-
d)

Conjunctive RSF
-
MIL (RSF
-
MIL
-
c)

j
X
x
x
T
X
T
j
)
(
1
1
)
(

j
X
x
x
T
X
T
j
)
(
1
1
)
(
Standard noisy
-
OR
for one concept
j

Noisy
-
AND combination
across concepts

Noisy
-
OR combination
across concepts and samples

19
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Synthetic Data Experiments

Extreme Conjunct data
set requires that a
target bag
exhibits two
distinct concepts
rather than
one or
none

AUC (
AUC when initialized near solution
)

Application to Remote Sensing

21
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Disjunctive Target Concepts

Target Concept

Type 1

Noisy
OR

Noisy
OR

Target Concept

Type 2

Target Concept

Type n

Noisy
OR

O
R

Target
Concept
Present?

Using Large overlapping bins (GROSS
Extraction) the target
concept can be
encapsulated within 1 instance
: Therefore a
disjunctive relationship exists

22
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

What if we want features with finer
granularity

Fine Extraction

More detail about image and more shape
information, but may loose disjunctive
nature between (multiple) instances

Noisy
OR

Noisy
OR

AND

Target
Concept
Present?

Constituent Concept 1

(top of hyperbola)

Constituent Concept 2

(wings of hyperbola)

Our features have more granularity,
therefore our concepts may be
constituents of a target, rather than
encapsulating the target concept

23
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

GPR Experiments

Extensive GPR Data set

~800 targets

~ 5,000 non
-
targets

Experimental Design

Run RSF
-
MIL
-
d (disjunctive) and RSF
-
MIL
-
c
(conjunctive)

Compare both feature extraction methods

Gross extraction: large enough to encompass target concept

Fine extraction: Non
-
overlapping bins

Hypothesis

RSF
-
MIL will perform well when using gross extraction whereas
RSF
-
MIL
-
c will perform well using Fine extraction

24
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Experimental Results

Highlights

RSF
-
MIL
-
d using gross extraction performed best

RSF
-
MIL
-
c performed better than RSF
-
MIL
-
d when using fine
extraction

Other influencing factors: optimization methods for RSF
-
MIL
-
d
and RSF
-
MIL
-
c are not the same

Gross Extraction

Fine Extraction

25
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Future Work

Implement a general form that can learn
disjunction or conjunction relationship from
the data

Implement a general form that can learn the
number of concepts

Incorporate spatial information

Develop an improved optimization scheme for
RSF
-
MIL
-
C

Backup Slides

27
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

MIL Example (AHI Imagery)

Robust learning tool

MIL tools can learn target signature with
limited or incomplete ground truth

Which spectral
signature(s) should we
use to train a target
model or classifier?

1.
Spectral mixing

2.
Background signal

3.
Ground truth not exact

28
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

MI
-
RVM

Addition of set observations and inference
using noisy
-
OR to an RVM model

Prior on the weight
w

)
exp(
1
1
)
(
)
(
1
1
)
|
1
(
1
z
z
x
w
X
y
P
K
j
j
T

)
,
0
|
(
)
(
1

A
w
N
w
p
29
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

SVM review

Classifier structure

Optimization

b
y

)
(
)
(
T
x
φ
w
x
,
0
,
1
)
)
(
(
:

2
1
min
2
,

i
i
i
T
i
i
i
b
w
b
t
i
st
C

x
φ
w
w
30
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

MI
-
SVM Discussion

RVM was altered to fit MIL problem by changing
the form of the target variable’s posterior to
model a noisy
-
OR gate.

SVM can be altered to fit the MIL problem by
changing how the margin is calculated

Boost the margin between the bag (rather than
samples) and decision surface

Look for the MI separating linear
discriminant

There is at least one sample from each bag in the half space

31
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

mi
-
SVM

Enforce MI scenario using extra
constraints

1
:
,
1
,
1
:
,
1
2
1

I
i
I
i
I
i
T
I
t
T
I
t
}
1
,
1
{
,
0
,
1
)
)
(
(
:

2
1
min
min
2
,
}
{

i
i
i
i
T
i
i
i
b
w
t
t
b
t
i
st
C
i

x
φ
w
w
Mixed integer
program: Must find
optimal hyperplane
and optimal labeling
set

At least one sample in
each positive bag must
have a label of 1.

All samples in each
negative bag must
have a label of
-
1.

32
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Current Applications

I.
Multiple Instance Learning

I.
MI Problem

II.
MI Applications

II.
Multiple Instance Learning: Kernel Machines

I.
MI
-
RVM

II.
MI
-
SVM

III.

Current Applications

I.
GPR imagery

II.
HSI imagery

33
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

HSI:
T
arget Spectra Learning

Given labeled areas of interest: learn
target signature

Given test areas of interest: classify set of
samples

34
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Overview of MI
-
RVM Optimization

Two step optimization

1.
Estimate optimal w, given posterior of w

There is no closed form solution for the parameters
of the posterior, so a gradient update method is
used

Iterate until convergence. Then proceed to step 2.

2.
Update parameter on prior of w

The distribution on the target variable has no
specific parameters.

Until system convergence, continue at step 1.

35
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

1) Optimization of
w

Optimize posterior (
Bayes
’ Rule) of
w

Update weights using Newton
-
Raphson

method

)
(
log
)
|
(
log
max
arg
ˆ
w
p
w
X
p
w
w
MAP

g
w
w
t
t
1
1

H

36
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

2) Optimization of Prior

Optimization of covariance of prior

Making a large number of assumptions,
diagonal elements of
A

can be estimated

dw
A
w
p
w
X
p
A
X
p
A
A
A
)
|
(
)
|
(
max
arg
)
|
(
max
arg
ˆ

1
2
1

ii
i
new
i
H
w
a
37
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Random Sets: Multiple Instance
Learning

Random set
framework for multiple instance
learning

Bags are sets

Idea of finding commonality of positive bags inherent
in random set formulation

Find
commonality

using
intersection

operator

Random sets governing functional is based on intersection
operator

)
(
)
(

K
P
K
T
38
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

MI issues

MIL approaches

Some approaches are biased to believe only
one sample in each bag caused the target
concept

Some
approaches can only label bags

It is not clear whether anything is gained over
supervised approaches

39
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

RSF
-
MIL

MIL
-
like

Positive
Bags = blue

Negative
Bags =
orange

Distinct
shapes =
distinct bags

x

x

x

x

x

x

x

x

x

T

T

T

T

T

40
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Side Note: Bayesian Networks

Noisy
-
OR Assumption

Bayesian Network representation of Noisy
-
OR

Polytree: singly connected DAG

41
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Side Note

Full Bayesian network may be intractable

Occurrence of causal factors are rare (sparse co
-
occurrence)

So assume
polytree

So assume result has
boolean

relationship with causal factors

Absorb

I
,
X

and
A

into one node, governed by randomness of
I

These assumptions greatly simplify inference calculation

Calculate
Z

based on probabilities rather than constructing a
distribution using
X

j
j
X
Z
P
X
X
X
X
Z
P
)
|
1
(
1
1
})
,
,
,
{
|
1
(
4
3
2
1
42
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Diverse Density (DD)

Probabilistic Approach

Goal:

Standard statistics approaches identify areas in a feature space with high
density of target samples and low density of non
-
target samples

DD: identify areas in a feature space with a high “density” of samples from
EACH of the
postitive

bags (“diverse”), and low density of samples from
negative bags.

Identify attributes or characteristics similar to positive bags, dissimilar with
negative bags

Assume
t

is a target characterization

Goal:

Assuming the bags are conditionally independent

t
B
B
B
B
P
m
n
t
|
,...,
,
,...,
max
arg
1
1

j
j
i
i
t
t
B
P
t
B
P
|
|
max
arg
43
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Diverse Density

Calculation (Noisy
-
OR Model):

Optimization

j
ij
i
B
t
P
B
t
P
)
|
(
1
1
)
|
(
}
,...,
{
1
i
iJ
i
i
x
x
B

j
ij
i
B
t
P
B
t
P
)
|
(
1
)
|
(

2
2
exp
exp
)
|
(
t
x
t
B
B
t
P
ij
ij
ij
It is
NOT

the case that
EACH

element is
NOT

the

target concept

j
j
i
i
t
t
B
P
t
B
P
|
|
max
arg
44
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Random Set Brief

Random Set

)
(R)
(R,
B
))
(
,
(

B
)
),
(
,
(
P
B

R
)
),
(
,
(
P
B

45
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Random Set Functionals

Capacity and avoidance
functionals

Given a germ and grain model

)
(
)
(

K
P
K
T

i
n
j
ij
ij
i
1
)
}
({

ij
ij
ij
ij
T
ij
ij
ij
ij
ij
x
r
r
r
r
R
P
r
R
P
x
T
x
P
ij

,
)
exp(
1
2
)
(
1
)
(
})
({
)
|
}
({
)
(
)
(

K
P
K
Q
)
(
1
)
(
K
Q
K
T

46
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

When disjunction makes sense

Using Large overlapping bins the target
concept can be encapsulated within 1
instance
: Therefore a disjunctive
relationship exists

OR

Target
Concept
Present

47
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

Theoretical and Developmental
Progress

Previous Optimization:

Did not necessarily promote

diverse density

Current optimization

Better for context learning and MIL

Previously no feature relevance or
selection (hypersphere)

Improvement: included learned weights on
each feature dimension

j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,

j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,

Previous TO DO list

Improve Existing Code

Develop joint optimization for context
learning and MIL

Apply MIL approaches (broad scale)

Learn similarities between feature sets of
mines

Aid in training existing algos: find “best”
EHD features for training / testing

Construct set
-
based classifiers?

48
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

How do we impose the MI scenario?:
Diverse Density
(
Maron

et al.
)

Calculation (Noisy
-
OR Model
):

Inherent in Random Set formulation

Optimization

Combo of exhaustive search and gradient ascent

j
ij
i
B
t
P
B
t
P
)
|
(
1
1
)
|
(
}
,...,
{
1
i
iJ
i
i
x
x
B

j
ij
i
B
t
P
B
t
P
)
|
(
1
)
|
(

2
2
exp
exp
)
|
(
t
x
t
B
B
t
P
ij
ij
ij

j
j
i
i
t
B
t
P
B
t
P
|
|
max
arg
It is
NOT

the case that
EACH

element is
NOT

the

target concept

49
/23

CSI

Laboratory

Jeremy Bolton Paul

2010

How can we use Random Sets for MIL?

Random set for MIL:
Bags are sets

Idea of finding commonality of positive bags inherent in random
set formulation

Sets have an empty intersection or non
-
empty intersection relationship

Find
commonality

using
intersection

operator

Random sets governing functional is based on intersection operator

Example:

Bags with target

{
l,
a
,
e,i,o
,p,
u
,f
}

{
f,b,
a
,
e,i
,z
,o,u
}

{
a
,b,c,
i,o,u,e
,p,f
}

{
a
,f,t,
e,i,u,o
,d,v
}

Bags without
target

{s,r,n,m,p,l}

{z,s,w,t,g,n,c}

{f,p,k,r}

{q,x,z,c,v}

{p,l,f}

{
a,e,i,o,u
,f
}

intersection

union

{f,s,r,n,m,p,l,z,w,g,n,c,v,q,k}

Target concept
=

\

= {
a,e,i,o,u
}