Conjuntive
Formulation of the
Random Set Framework for Multiple
Instance Learning:
Application to Remote Sensing
Jeremy Bolton
Paul
Gader
CSI
Laboratory
University of Florida
2
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Highlights
•
Conjunctive forms of Random Sets for
Multiple Instance Learning:
–
Random Sets can be used to solve MIL problem when multiple
concepts are present
–
Previously Developed Formulations assume Disjunctive
relationship between concepts learned
–
New formulation provides for a conjunctive relationship
between concepts and its utility is exhibited on a Ground
Penetrating Radar (GPR) data set
3
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Outline
I.
Multiple Instance Learning
I.
MI Problem
II.
RSF

MIL
III.
Multiple Target Concepts
II.
Experimental Results
I.
GPR Experiments
III.
Future Work
Multiple Instance Learning
5
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Standard Learning vs. Multiple
Instance Learning
•
Standard supervised learning
–
O
ptimize some model (or learn a target concept) given training
samples and corresponding labels
•
MIL
–
Learn a target concept given
multiple
sets
of samples and
corresponding labels for the
sets
.
–
Interpretation: Learning with uncertain labels / noisy teacher
}
,...,
{
},
,...,
{
1
1
n
n
y
y
Y
x
x
X
?}
?,...,
{
,
1
},
,...,
{
1
1
i
i
in
i
i
in
i
i
y
y
Y
x
x
X
6
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Multiple Instance Learning (MIL)
•
Given:
–
Set of
I
bags
–
Labeled + or

–
The
i
th
bag is a set of
J
i
samples in some feature space
–
Interpretation of labels
•
Goal: learn
concept
–
What characteristic is common to the positive bags that is not
observed in the negative bags
}
,...,
,
,..
{
1
1
I
i
i
B
B
B
B
B
}
,...,
{
1
i
iJ
i
i
x
x
B
1
)
(
:
ij
i
x
label
j
B
0
)
(
,
ij
i
x
label
j
B
7
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Multiple Instance Learning
x
1
label = 1
x
2
label = 1
x
3
label =
0
x
4
label =
0
x
5
label = 1
{
x
1
,
x
2
, x
3
, x
4
}
label = 1
{x
1
,
x
2
,
x
3
,
x
4
}
label = 1
{x
1
,
x
2
,
x
3
, x
4
}
label =
0
Traditional Classification
Multiple Instance Learning
8
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
MIL Application: Example GPR
•
Collaboration:
Frigui
,
Collins,
Torrione
•
Construction
of bags
–
Collect 15 EHD feature
vectors from
the 15
depth bins
–
Mine images = + bags
–
FA images =

bags
15
4
3
2
1
,...,
,
,
,
x
x
x
x
x
EHD: Feature Vector
9
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Standard vs. MI Learning: GPR Example
•
Standard Learning
–
Each training sample
(feature vector) must
have a label
•
Arduous task
–
many feature vectors per
image and multiple
images
–
difficult to label given
GPR echoes, ground
truthing
errors, etc …
–
label of each vector may
not be known
EHD: Feature Vector
1
x
1
y
2
y
3
y
4
y
n
y
2
x
3
x
4
x
n
x
10
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Standard
vs
MI Learning: GPR Example
•
Multiple Instance
Learning
–
Each training
bag
must
have a label
–
No need to label all feature
vectors,
just identify images
(bags) where targets are
present
–
Implicitly accounts for class
label uncertainty …
15
4
3
2
1
,...,
,
,
,
x
x
x
x
x
Y
EHD: Feature Vector
Random Set Framework for Multiple
Instance Learning
12
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Random Set Brief
•
Random Set
)
(R)
(R,
B
))
(
,
(
B
)
),
(
,
(
P
B
R
)
),
(
,
(
P
B
13
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
How can we use Random Sets for MIL?
•
Random set for MIL:
Bags are sets (multi

sets)
–
Idea of finding commonality of positive bags inherent in random
set formulation
•
Sets have an empty intersection or
non

empty intersection
relationship
•
Find
commonality
using
intersection
operator
•
Random sets governing functional is based on intersection operator
–
Capacity functional :
T
It is
NOT
the case that
EACH
element is
NOT
the
target concept
X
x
x
T
X
T
)
(
1
1
)
(
}
,...,
{
1
n
x
x
X
A.K.A. : Noisy

OR gate (Pearl 1988)
14
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Random Set Functionals
•
Capacity
functionals
for intersection calculation
•
Use germ and grain model to model random set
–
Multiple (J) Concepts
–
Calculate probability of intersection given X and germ and grain pairs:
–
Grains are governed by random radii with assumed cumulative:
)
(
)
(
X
T
X
P
J
j
j
j
1
)
}
({
j
j
j
j
T
j
j
j
j
j
x
r
r
r
r
R
P
r
R
P
x
T
j
,
)
exp(
1
2
2
)
(
1
)
(
})
({
j
X
x
x
T
X
T
j
)
(
1
1
)
(
Random Set model
parameters
}
,
{
Germ
Grain
15
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
RSF

MIL: Germ and Grain Model
•
Positive Bags
= blue
•
Negative Bags
= orange
•
Distinct
shapes =
distinct bags
x
x
x
x
x
x
x
x
x
T
T
T
T
T
Multiple Instance Learning with
Multiple Concepts
17
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Multiple Concepts:
Disjunction or Conjunction?
•
Disjunction
–
When you have multiple types of concepts
–
When each instance can indicate the presence of a
target
•
Conjunction
–
When you have a target type that is composed of
multiple (necessary concepts)
–
When each instance can indicate a concept, but not
necessary the composite target type
18
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Conjunctive RSF

MIL
•
Previously Developed Disjunctive RSF

MIL (RSF

MIL

d)
•
Conjunctive RSF

MIL (RSF

MIL

c)
j
X
x
x
T
X
T
j
)
(
1
1
)
(
j
X
x
x
T
X
T
j
)
(
1
1
)
(
Standard noisy

OR
for one concept
j
Noisy

AND combination
across concepts
Noisy

OR combination
across concepts and samples
19
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Synthetic Data Experiments
•
Extreme Conjunct data
set requires that a
target bag
exhibits two
distinct concepts
rather than
one or
none
AUC (
AUC when initialized near solution
)
Application to Remote Sensing
21
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Disjunctive Target Concepts
Target Concept
Type 1
Noisy
OR
…
Noisy
OR
Target Concept
Type 2
Target Concept
Type n
Noisy
OR
O
R
Target
Concept
Present?
•
Using Large overlapping bins (GROSS
Extraction) the target
concept can be
encapsulated within 1 instance
: Therefore a
disjunctive relationship exists
22
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
What if we want features with finer
granularity
•
Fine Extraction
–
More detail about image and more shape
information, but may loose disjunctive
nature between (multiple) instances
…
Noisy
OR
Noisy
OR
AND
Target
Concept
Present?
Constituent Concept 1
(top of hyperbola)
Constituent Concept 2
(wings of hyperbola)
Our features have more granularity,
therefore our concepts may be
constituents of a target, rather than
encapsulating the target concept
23
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
GPR Experiments
•
Extensive GPR Data set
–
~800 targets
–
~ 5,000 non

targets
•
Experimental Design
–
Run RSF

MIL

d (disjunctive) and RSF

MIL

c
(conjunctive)
–
Compare both feature extraction methods
•
Gross extraction: large enough to encompass target concept
•
Fine extraction: Non

overlapping bins
•
Hypothesis
–
RSF

MIL will perform well when using gross extraction whereas
RSF

MIL

c will perform well using Fine extraction
24
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Experimental Results
•
Highlights
–
RSF

MIL

d using gross extraction performed best
–
RSF

MIL

c performed better than RSF

MIL

d when using fine
extraction
–
Other influencing factors: optimization methods for RSF

MIL

d
and RSF

MIL

c are not the same
Gross Extraction
Fine Extraction
25
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Future Work
•
Implement a general form that can learn
disjunction or conjunction relationship from
the data
•
Implement a general form that can learn the
number of concepts
•
Incorporate spatial information
•
Develop an improved optimization scheme for
RSF

MIL

C
Backup Slides
27
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
MIL Example (AHI Imagery)
•
Robust learning tool
–
MIL tools can learn target signature with
limited or incomplete ground truth
Which spectral
signature(s) should we
use to train a target
model or classifier?
1.
Spectral mixing
2.
Background signal
3.
Ground truth not exact
28
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
MI

RVM
•
Addition of set observations and inference
using noisy

OR to an RVM model
•
Prior on the weight
w
)
exp(
1
1
)
(
)
(
1
1
)

1
(
1
z
z
x
w
X
y
P
K
j
j
T
)
,
0

(
)
(
1
A
w
N
w
p
29
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
SVM review
•
Classifier structure
•
Optimization
b
y
)
(
)
(
T
x
φ
w
x
,
0
,
1
)
)
(
(
:
2
1
min
2
,
i
i
i
T
i
i
i
b
w
b
t
i
st
C
x
φ
w
w
30
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
MI

SVM Discussion
•
RVM was altered to fit MIL problem by changing
the form of the target variable’s posterior to
model a noisy

OR gate.
•
SVM can be altered to fit the MIL problem by
changing how the margin is calculated
–
Boost the margin between the bag (rather than
samples) and decision surface
–
Look for the MI separating linear
discriminant
•
There is at least one sample from each bag in the half space
31
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
mi

SVM
•
Enforce MI scenario using extra
constraints
1
:
,
1
,
1
:
,
1
2
1
I
i
I
i
I
i
T
I
t
T
I
t
}
1
,
1
{
,
0
,
1
)
)
(
(
:
2
1
min
min
2
,
}
{
i
i
i
i
T
i
i
i
b
w
t
t
b
t
i
st
C
i
x
φ
w
w
Mixed integer
program: Must find
optimal hyperplane
and optimal labeling
set
At least one sample in
each positive bag must
have a label of 1.
All samples in each
negative bag must
have a label of

1.
32
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Current Applications
I.
Multiple Instance Learning
I.
MI Problem
II.
MI Applications
II.
Multiple Instance Learning: Kernel Machines
I.
MI

RVM
II.
MI

SVM
III.
Current Applications
I.
GPR imagery
II.
HSI imagery
33
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
HSI:
T
arget Spectra Learning
•
Given labeled areas of interest: learn
target signature
•
Given test areas of interest: classify set of
samples
34
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Overview of MI

RVM Optimization
•
Two step optimization
1.
Estimate optimal w, given posterior of w
•
There is no closed form solution for the parameters
of the posterior, so a gradient update method is
used
•
Iterate until convergence. Then proceed to step 2.
2.
Update parameter on prior of w
•
The distribution on the target variable has no
specific parameters.
•
Until system convergence, continue at step 1.
35
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
1) Optimization of
w
•
Optimize posterior (
Bayes
’ Rule) of
w
•
Update weights using Newton

Raphson
method
)
(
log
)

(
log
max
arg
ˆ
w
p
w
X
p
w
w
MAP
g
w
w
t
t
1
1
H
36
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
2) Optimization of Prior
•
Optimization of covariance of prior
•
Making a large number of assumptions,
diagonal elements of
A
can be estimated
dw
A
w
p
w
X
p
A
X
p
A
A
A
)

(
)

(
max
arg
)

(
max
arg
ˆ
1
2
1
ii
i
new
i
H
w
a
37
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Random Sets: Multiple Instance
Learning
•
Random set
framework for multiple instance
learning
–
Bags are sets
–
Idea of finding commonality of positive bags inherent
in random set formulation
•
Find
commonality
using
intersection
operator
•
Random sets governing functional is based on intersection
operator
)
(
)
(
K
P
K
T
38
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
MI issues
•
MIL approaches
–
Some approaches are biased to believe only
one sample in each bag caused the target
concept
–
Some
approaches can only label bags
–
It is not clear whether anything is gained over
supervised approaches
39
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
RSF

MIL
•
MIL

like
•
Positive
Bags = blue
•
Negative
Bags =
orange
•
Distinct
shapes =
distinct bags
x
x
x
x
x
x
x
x
x
T
T
T
T
T
40
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Side Note: Bayesian Networks
•
Noisy

OR Assumption
–
Bayesian Network representation of Noisy

OR
–
Polytree: singly connected DAG
41
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Side Note
•
Full Bayesian network may be intractable
–
Occurrence of causal factors are rare (sparse co

occurrence)
•
So assume
polytree
•
So assume result has
boolean
relationship with causal factors
–
Absorb
I
,
X
and
A
into one node, governed by randomness of
I
•
These assumptions greatly simplify inference calculation
•
Calculate
Z
based on probabilities rather than constructing a
distribution using
X
j
j
X
Z
P
X
X
X
X
Z
P
)

1
(
1
1
})
,
,
,
{

1
(
4
3
2
1
42
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Diverse Density (DD)
•
Probabilistic Approach
–
Goal:
•
Standard statistics approaches identify areas in a feature space with high
density of target samples and low density of non

target samples
•
DD: identify areas in a feature space with a high “density” of samples from
EACH of the
postitive
bags (“diverse”), and low density of samples from
negative bags.
–
Identify attributes or characteristics similar to positive bags, dissimilar with
negative bags
–
Assume
t
is a target characterization
–
Goal:
–
Assuming the bags are conditionally independent
t
B
B
B
B
P
m
n
t

,...,
,
,...,
max
arg
1
1
j
j
i
i
t
t
B
P
t
B
P


max
arg
43
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Diverse Density
•
Calculation (Noisy

OR Model):
•
Optimization
j
ij
i
B
t
P
B
t
P
)

(
1
1
)

(
}
,...,
{
1
i
iJ
i
i
x
x
B
j
ij
i
B
t
P
B
t
P
)

(
1
)

(
2
2
exp
exp
)

(
t
x
t
B
B
t
P
ij
ij
ij
It is
NOT
the case that
EACH
element is
NOT
the
target concept
j
j
i
i
t
t
B
P
t
B
P


max
arg
44
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Random Set Brief
•
Random Set
)
(R)
(R,
B
))
(
,
(
B
)
),
(
,
(
P
B
R
)
),
(
,
(
P
B
45
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Random Set Functionals
•
Capacity and avoidance
functionals
–
Given a germ and grain model
–
Assumed random radii
)
(
)
(
K
P
K
T
i
n
j
ij
ij
i
1
)
}
({
ij
ij
ij
ij
T
ij
ij
ij
ij
ij
x
r
r
r
r
R
P
r
R
P
x
T
x
P
ij
,
)
exp(
1
2
)
(
1
)
(
})
({
)

}
({
)
(
)
(
K
P
K
Q
)
(
1
)
(
K
Q
K
T
46
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
When disjunction makes sense
•
Using Large overlapping bins the target
concept can be encapsulated within 1
instance
: Therefore a disjunctive
relationship exists
OR
Target
Concept
Present
47
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
Theoretical and Developmental
Progress
•
Previous Optimization:
•
Did not necessarily promote
diverse density
•
Current optimization
•
Better for context learning and MIL
•
Previously no feature relevance or
selection (hypersphere)
–
Improvement: included learned weights on
each feature dimension
j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,
j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,
•
Previous TO DO list
•
Improve Existing Code
–
Develop joint optimization for context
learning and MIL
•
Apply MIL approaches (broad scale)
•
Learn similarities between feature sets of
mines
•
Aid in training existing algos: find “best”
EHD features for training / testing
•
Construct set

based classifiers?
48
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
How do we impose the MI scenario?:
Diverse Density
(
Maron
et al.
)
•
Calculation (Noisy

OR Model
):
–
Inherent in Random Set formulation
•
Optimization
–
Combo of exhaustive search and gradient ascent
j
ij
i
B
t
P
B
t
P
)

(
1
1
)

(
}
,...,
{
1
i
iJ
i
i
x
x
B
j
ij
i
B
t
P
B
t
P
)

(
1
)

(
2
2
exp
exp
)

(
t
x
t
B
B
t
P
ij
ij
ij
j
j
i
i
t
B
t
P
B
t
P


max
arg
It is
NOT
the case that
EACH
element is
NOT
the
target concept
49
/23
CSI
Laboratory
Jeremy Bolton Paul
Gader
2010
How can we use Random Sets for MIL?
•
Random set for MIL:
Bags are sets
–
Idea of finding commonality of positive bags inherent in random
set formulation
•
Sets have an empty intersection or non

empty intersection relationship
•
Find
commonality
using
intersection
operator
•
Random sets governing functional is based on intersection operator
•
Example:
Bags with target
{
l,
a
,
e,i,o
,p,
u
,f
}
{
f,b,
a
,
e,i
,z
,o,u
}
{
a
,b,c,
i,o,u,e
,p,f
}
{
a
,f,t,
e,i,u,o
,d,v
}
Bags without
target
{s,r,n,m,p,l}
{z,s,w,t,g,n,c}
{f,p,k,r}
{q,x,z,c,v}
{p,l,f}
{
a,e,i,o,u
,f
}
intersection
union
{f,s,r,n,m,p,l,z,w,g,n,c,v,q,k}
Target concept
=
\
= {
a,e,i,o,u
}
Comments 0
Log in to post a comment