Random Set Framework for Multiple

tripastroturfAI and Robotics

Nov 7, 2013 (3 years and 10 months ago)

76 views

Conjuntive

Formulation of the
Random Set Framework for Multiple
Instance Learning:

Application to Remote Sensing

Jeremy Bolton

Paul
Gader


CSI

Laboratory

University of Florida

2
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Highlights


Conjunctive forms of Random Sets for
Multiple Instance Learning:



Random Sets can be used to solve MIL problem when multiple
concepts are present



Previously Developed Formulations assume Disjunctive
relationship between concepts learned



New formulation provides for a conjunctive relationship
between concepts and its utility is exhibited on a Ground
Penetrating Radar (GPR) data set

3
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Outline


I.
Multiple Instance Learning

I.
MI Problem

II.
RSF
-
MIL

III.
Multiple Target Concepts


II.
Experimental Results

I.
GPR Experiments


III.

Future Work


Multiple Instance Learning

5
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Standard Learning vs. Multiple
Instance Learning


Standard supervised learning


O
ptimize some model (or learn a target concept) given training
samples and corresponding labels



MIL


Learn a target concept given
multiple

sets
of samples and
corresponding labels for the
sets
.


Interpretation: Learning with uncertain labels / noisy teacher




}
,...,
{
},
,...,
{
1
1
n
n
y
y
Y
x
x
X


?}
?,...,
{
,
1
},
,...,
{
1
1




i
i
in
i
i
in
i
i
y
y
Y
x
x
X
6
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Multiple Instance Learning (MIL)


Given:


Set of
I

bags


Labeled + or
-



The
i
th

bag is a set of
J
i



samples in some feature space



Interpretation of labels




Goal: learn
concept


What characteristic is common to the positive bags that is not
observed in the negative bags

}
,...,
,
,..
{
1
1






I
i
i
B
B
B
B
B
}
,...,
{
1
i
iJ
i
i
x
x
B

1
)
(
:




ij
i
x
label
j
B
0
)
(
,




ij
i
x
label
j
B
7
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Multiple Instance Learning

x
1

label = 1

x
2

label = 1

x
3

label =
0

x
4

label =
0

x
5

label = 1



{
x
1
,
x
2
, x
3
, x
4
}


label = 1


{x
1
,
x
2
,
x
3
,
x
4
}


label = 1


{x
1
,
x
2
,
x
3
, x
4
}


label =
0



Traditional Classification

Multiple Instance Learning

8
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

MIL Application: Example GPR


Collaboration:
Frigui
,
Collins,
Torrione



Construction
of bags


Collect 15 EHD feature
vectors from
the 15
depth bins


Mine images = + bags


FA images =
-

bags



15
4
3
2
1
,...,
,
,
,
x
x
x
x
x
EHD: Feature Vector

9
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Standard vs. MI Learning: GPR Example


Standard Learning


Each training sample
(feature vector) must
have a label


Arduous task


many feature vectors per
image and multiple
images


difficult to label given
GPR echoes, ground
truthing

errors, etc …


label of each vector may
not be known


EHD: Feature Vector

1
x


1
y


2
y


3
y


4
y


n
y
2
x
3
x
4
x
n
x
10
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Standard
vs

MI Learning: GPR Example


Multiple Instance
Learning


Each training
bag

must
have a label



No need to label all feature
vectors,
just identify images
(bags) where targets are
present



Implicitly accounts for class
label uncertainty …



15
4
3
2
1
,...,
,
,
,

x
x
x
x
x


Y
EHD: Feature Vector

Random Set Framework for Multiple
Instance Learning

12
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Random Set Brief



Random Set










)
(R)
(R,
B
))
(
,
(


B
)
),
(
,
(
P
B


R
)
),
(
,
(
P
B




13
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

How can we use Random Sets for MIL?


Random set for MIL:
Bags are sets (multi
-
sets)



Idea of finding commonality of positive bags inherent in random
set formulation


Sets have an empty intersection or
non
-
empty intersection
relationship


Find
commonality

using
intersection

operator


Random sets governing functional is based on intersection operator


Capacity functional :
T




It is
NOT

the case that
EACH


element is
NOT

the

target concept










X
x
x
T
X
T
)
(
1
1
)
(
}
,...,
{
1
n
x
x
X

A.K.A. : Noisy
-
OR gate (Pearl 1988)

14
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Random Set Functionals


Capacity
functionals

for intersection calculation




Use germ and grain model to model random set


Multiple (J) Concepts





Calculate probability of intersection given X and germ and grain pairs:





Grains are governed by random radii with assumed cumulative:



)
(
)
(
X
T
X
P







J
j
j
j
1
)
}
({






j
j
j
j
T
j
j
j
j
j
x
r
r
r
r
R
P
r
R
P
x
T
j
















,
)
exp(
1
2
2
)
(
1
)
(
})
({










j
X
x
x
T
X
T
j
)
(
1
1
)
(
Random Set model
parameters


}
,
{


Germ

Grain

15
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

RSF
-
MIL: Germ and Grain Model


Positive Bags
= blue


Negative Bags
= orange


Distinct
shapes =
distinct bags



x

x

x

x

x

x

x

x

x

T

T

T

T

T

Multiple Instance Learning with
Multiple Concepts

17
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Multiple Concepts:

Disjunction or Conjunction?


Disjunction


When you have multiple types of concepts


When each instance can indicate the presence of a
target


Conjunction


When you have a target type that is composed of
multiple (necessary concepts)


When each instance can indicate a concept, but not
necessary the composite target type

18
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Conjunctive RSF
-
MIL


Previously Developed Disjunctive RSF
-
MIL (RSF
-
MIL
-
d)





Conjunctive RSF
-
MIL (RSF
-
MIL
-
c)











j
X
x
x
T
X
T
j
)
(
1
1
)
(
















j
X
x
x
T
X
T
j
)
(
1
1
)
(
Standard noisy
-
OR
for one concept
j

Noisy
-
AND combination
across concepts

Noisy
-
OR combination
across concepts and samples

19
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Synthetic Data Experiments


Extreme Conjunct data
set requires that a
target bag
exhibits two
distinct concepts
rather than
one or
none

AUC (
AUC when initialized near solution
)

Application to Remote Sensing

21
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Disjunctive Target Concepts

Target Concept

Type 1

Noisy
OR



Noisy
OR

Target Concept


Type 2

Target Concept


Type n

Noisy
OR

O
R

Target
Concept
Present?


Using Large overlapping bins (GROSS
Extraction) the target
concept can be
encapsulated within 1 instance
: Therefore a
disjunctive relationship exists

22
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

What if we want features with finer
granularity


Fine Extraction


More detail about image and more shape
information, but may loose disjunctive
nature between (multiple) instances



Noisy
OR

Noisy
OR

AND

Target
Concept
Present?

Constituent Concept 1

(top of hyperbola)

Constituent Concept 2

(wings of hyperbola)

Our features have more granularity,
therefore our concepts may be
constituents of a target, rather than
encapsulating the target concept

23
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

GPR Experiments


Extensive GPR Data set


~800 targets


~ 5,000 non
-
targets


Experimental Design


Run RSF
-
MIL
-
d (disjunctive) and RSF
-
MIL
-
c
(conjunctive)


Compare both feature extraction methods


Gross extraction: large enough to encompass target concept


Fine extraction: Non
-
overlapping bins


Hypothesis


RSF
-
MIL will perform well when using gross extraction whereas
RSF
-
MIL
-
c will perform well using Fine extraction

24
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Experimental Results


Highlights


RSF
-
MIL
-
d using gross extraction performed best


RSF
-
MIL
-
c performed better than RSF
-
MIL
-
d when using fine
extraction


Other influencing factors: optimization methods for RSF
-
MIL
-
d
and RSF
-
MIL
-
c are not the same

Gross Extraction

Fine Extraction

25
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Future Work


Implement a general form that can learn
disjunction or conjunction relationship from
the data


Implement a general form that can learn the
number of concepts


Incorporate spatial information


Develop an improved optimization scheme for
RSF
-
MIL
-
C




Backup Slides

27
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

MIL Example (AHI Imagery)


Robust learning tool


MIL tools can learn target signature with
limited or incomplete ground truth

Which spectral
signature(s) should we
use to train a target
model or classifier?

1.
Spectral mixing

2.
Background signal

3.
Ground truth not exact

28
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

MI
-
RVM


Addition of set observations and inference
using noisy
-
OR to an RVM model






Prior on the weight
w



)
exp(
1
1
)
(
)
(
1
1
)
|
1
(
1
z
z
x
w
X
y
P
K
j
j
T











)
,
0
|
(
)
(
1


A
w
N
w
p
29
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

SVM review


Classifier structure




Optimization


b
y


)
(
)
(
T
x
φ
w
x
,
0
,
1
)
)
(
(
:

2
1
min
2
,







i
i
i
T
i
i
i
b
w
b
t
i
st
C



x
φ
w
w
30
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

MI
-
SVM Discussion


RVM was altered to fit MIL problem by changing
the form of the target variable’s posterior to
model a noisy
-
OR gate.


SVM can be altered to fit the MIL problem by
changing how the margin is calculated


Boost the margin between the bag (rather than
samples) and decision surface


Look for the MI separating linear
discriminant


There is at least one sample from each bag in the half space


31
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

mi
-
SVM


Enforce MI scenario using extra
constraints


1
:
,
1
,
1
:
,
1
2
1











I
i
I
i
I
i
T
I
t
T
I
t
}
1
,
1
{
,
0
,
1
)
)
(
(
:

2
1
min
min
2
,
}
{









i
i
i
i
T
i
i
i
b
w
t
t
b
t
i
st
C
i



x
φ
w
w
Mixed integer
program: Must find
optimal hyperplane
and optimal labeling
set

At least one sample in
each positive bag must
have a label of 1.

All samples in each
negative bag must
have a label of
-
1.

32
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Current Applications


I.
Multiple Instance Learning

I.
MI Problem

II.
MI Applications


II.
Multiple Instance Learning: Kernel Machines

I.
MI
-
RVM

II.
MI
-
SVM


III.

Current Applications

I.
GPR imagery

II.
HSI imagery


33
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

HSI:
T
arget Spectra Learning


Given labeled areas of interest: learn
target signature


Given test areas of interest: classify set of
samples

34
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Overview of MI
-
RVM Optimization


Two step optimization

1.
Estimate optimal w, given posterior of w


There is no closed form solution for the parameters
of the posterior, so a gradient update method is
used


Iterate until convergence. Then proceed to step 2.

2.
Update parameter on prior of w


The distribution on the target variable has no
specific parameters.


Until system convergence, continue at step 1.

35
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

1) Optimization of
w


Optimize posterior (
Bayes
’ Rule) of
w





Update weights using Newton
-
Raphson

method







)
(
log
)
|
(
log
max
arg
ˆ
w
p
w
X
p
w
w
MAP


g
w
w
t
t
1
1




H

36
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

2) Optimization of Prior


Optimization of covariance of prior




Making a large number of assumptions,
diagonal elements of
A

can be estimated

dw
A
w
p
w
X
p
A
X
p
A
A
A
)
|
(
)
|
(
max
arg
)
|
(
max
arg
ˆ


1
2
1



ii
i
new
i
H
w
a
37
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Random Sets: Multiple Instance
Learning


Random set
framework for multiple instance
learning


Bags are sets


Idea of finding commonality of positive bags inherent
in random set formulation


Find
commonality

using
intersection

operator


Random sets governing functional is based on intersection
operator





)
(
)
(






K
P
K
T
38
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

MI issues


MIL approaches


Some approaches are biased to believe only
one sample in each bag caused the target
concept


Some
approaches can only label bags


It is not clear whether anything is gained over
supervised approaches

39
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

RSF
-
MIL


MIL
-
like


Positive
Bags = blue


Negative
Bags =
orange


Distinct
shapes =
distinct bags



x

x

x

x

x

x

x

x

x

T

T

T

T

T

40
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Side Note: Bayesian Networks


Noisy
-
OR Assumption


Bayesian Network representation of Noisy
-
OR


Polytree: singly connected DAG


41
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Side Note


Full Bayesian network may be intractable


Occurrence of causal factors are rare (sparse co
-
occurrence)


So assume
polytree


So assume result has
boolean

relationship with causal factors


Absorb

I
,
X

and
A

into one node, governed by randomness of
I


These assumptions greatly simplify inference calculation


Calculate
Z

based on probabilities rather than constructing a
distribution using
X









j
j
X
Z
P
X
X
X
X
Z
P
)
|
1
(
1
1
})
,
,
,
{
|
1
(
4
3
2
1
42
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Diverse Density (DD)


Probabilistic Approach


Goal:


Standard statistics approaches identify areas in a feature space with high
density of target samples and low density of non
-
target samples


DD: identify areas in a feature space with a high “density” of samples from
EACH of the
postitive

bags (“diverse”), and low density of samples from
negative bags.


Identify attributes or characteristics similar to positive bags, dissimilar with
negative bags


Assume
t

is a target characterization


Goal:



Assuming the bags are conditionally independent



t
B
B
B
B
P
m
n
t
|
,...,
,
,...,
max
arg
1
1













j
j
i
i
t
t
B
P
t
B
P
|
|
max
arg
43
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Diverse Density


Calculation (Noisy
-
OR Model):










Optimization










j
ij
i
B
t
P
B
t
P
)
|
(
1
1
)
|
(
}
,...,
{
1
i
iJ
i
i
x
x
B









j
ij
i
B
t
P
B
t
P
)
|
(
1
)
|
(




















2
2
exp
exp
)
|
(
t
x
t
B
B
t
P
ij
ij
ij
It is
NOT

the case that
EACH


element is
NOT

the

target concept










j
j
i
i
t
t
B
P
t
B
P
|
|
max
arg
44
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Random Set Brief



Random Set










)
(R)
(R,
B
))
(
,
(


B
)
),
(
,
(
P
B


R
)
),
(
,
(
P
B



45
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Random Set Functionals


Capacity and avoidance
functionals








Given a germ and grain model



Assumed random radii



)
(
)
(






K
P
K
T

i
n
j
ij
ij
i
1
)
}
({






ij
ij
ij
ij
T
ij
ij
ij
ij
ij
x
r
r
r
r
R
P
r
R
P
x
T
x
P
ij

















,
)
exp(
1
2
)
(
1
)
(
})
({
)
|
}
({
)
(
)
(






K
P
K
Q
)
(
1
)
(
K
Q
K
T




46
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

When disjunction makes sense


Using Large overlapping bins the target
concept can be encapsulated within 1
instance
: Therefore a disjunctive
relationship exists

OR

Target
Concept
Present

47
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

Theoretical and Developmental
Progress


Previous Optimization:


Did not necessarily promote


diverse density





Current optimization


Better for context learning and MIL





Previously no feature relevance or
selection (hypersphere)


Improvement: included learned weights on
each feature dimension












j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,










j
j
j
i
i
i
B
Q
B
T
)
(
)
(
max
arg
,
,

Previous TO DO list


Improve Existing Code


Develop joint optimization for context
learning and MIL


Apply MIL approaches (broad scale)


Learn similarities between feature sets of
mines


Aid in training existing algos: find “best”
EHD features for training / testing


Construct set
-
based classifiers?

48
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

How do we impose the MI scenario?:
Diverse Density
(
Maron

et al.
)


Calculation (Noisy
-
OR Model
):


Inherent in Random Set formulation






Optimization



Combo of exhaustive search and gradient ascent









j
ij
i
B
t
P
B
t
P
)
|
(
1
1
)
|
(
}
,...,
{
1
i
iJ
i
i
x
x
B








j
ij
i
B
t
P
B
t
P
)
|
(
1
)
|
(




















2
2
exp
exp
)
|
(
t
x
t
B
B
t
P
ij
ij
ij








j
j
i
i
t
B
t
P
B
t
P
|
|
max
arg
It is
NOT

the case that
EACH


element is
NOT

the

target concept

49
/23

CSI

Laboratory

Jeremy Bolton Paul
Gader

2010

How can we use Random Sets for MIL?


Random set for MIL:
Bags are sets


Idea of finding commonality of positive bags inherent in random
set formulation


Sets have an empty intersection or non
-
empty intersection relationship


Find
commonality

using
intersection

operator


Random sets governing functional is based on intersection operator



Example:



Bags with target

{
l,
a
,
e,i,o
,p,
u
,f
}

{
f,b,
a
,
e,i
,z
,o,u
}

{
a
,b,c,
i,o,u,e
,p,f
}

{
a
,f,t,
e,i,u,o
,d,v
}


Bags without
target

{s,r,n,m,p,l}

{z,s,w,t,g,n,c}

{f,p,k,r}

{q,x,z,c,v}

{p,l,f}


{
a,e,i,o,u
,f
}


intersection

union

{f,s,r,n,m,p,l,z,w,g,n,c,v,q,k}


Target concept
=

\


= {
a,e,i,o,u
}