Probabilistic Shape Parsing for View-Based Object Recognition

giantsneckspiffyElectronics - Devices

Oct 13, 2013 (3 years and 9 months ago)


21st International Conference on Pattern Recognition (ICPR 2012)
November 11-15, 2012. Tsukuba, Japan
Probabilistic Shape Parsing for View-Based Object Recognition
1 1 1 2
Diego Macrini, Chris Whiten, Robert Laganiere, ` Michael Greenspan
1 2
We present a novel probabilistic model for parsing
shapes into several distinguishable parts for accurate
shape recognition. This shape parsing is based on ro-
curacy. Althoughmodellingshapesisaninherentlyun-
certain process, our approach is lenient, in that the de-
siredparseofashapeonlyneedstobewithinitsk most
probable parses. Using this set of shape decomposi-
tions, we can improve recognition accuracy even fur-
(a) (b) (c)
Figure 1. Example output of the algo-
rithm: (a) Each shape cut (orange seg-
ments) joins a concave corner to another
1 Introduction
concave corner, or to its closest bound-
ary point across selected axes of symme-
A fundamental problem in the representation and
try. (b) and (c) are the two most proba-
matching of shapes is to determine whether or not a
ble parses of the shapes in (a), according
shape should be divided into parts, and if so, what
to a parameterization that encodes pref-
those parts should be. Part-based approaches are at-
erences for both short and parallel inter-
tractive because they provide a mechanism for compar-
part boundaries.
ing both shape contours and part structures, but require
expensive matching algorithms to account for similar
shapes having markedly different decompositions. Al-
ternatively, whole contour approaches require simpler
correct choices made during the shape decomposition
matching algorithms, but are only effective when ob-
process, which can lead to large representational dif-
jects have little articulation and within-class variation.
ferences among similar shapes. Our approach aims to
Mi and DeCarlo [5] suggest an approach of decom-
solve these problems.
posing a shape into parts iteratively based on the “sta-
bility” of skeleton points to get simple shapes. Other We propose a novel approach that combines the ben-
approaches [6, 4, 2] find shape parts using boundary efits of both whole-contour and part-based approaches
analysis and rules for corner associations, and then it- by considering multiple ways of parsing a shape. We
eratively remove parts from the shape to compute the achieve this by probabilistically selecting boundaries
unperturbed medial axis. These approaches all provide to cut a shape into parts, where a scenario of no cuts
a unique part decomposition for each shape, impose is equivalent to working with the whole shape con-
hard constraints on which regions of a shape can be- tour. We do not advocate for a specific paramterization
come parts, and depend on fixed decomposition rules. of the model or a specific choice of shape descriptor,
In addition, having only one representation per shape and therefore, evaluate the approach considering known
demands a matching approach that can deal with in- shape decomposition principles. For example, Figure 1
978-4-9906441-1-6 ©2012 IAPR 2302shows the two most probable parses, (b) and (c), for symmetry, such that the symmetry axis of pointsp and
the shapes in (a) based on a parameterization of the p is the skeletal branch containing the center of the
model in which short and parallel inter-part boundaries maximally inscribed disc bitangent top andp (Fig. 2).
i j
are more likely than long and/or perpendicular ones. We are interested in keeping only one shape cut per
We recognize that selecting a parsing of a shape by corner-symmetry axis pairing. Let the tuple (p ;p ;b)
i j
the MAP estimate can lead to unforeseen results due denote that the boundary point p is symmetric to the
to small perturbations in the shape. To combat this boundary point p with respect to skeletal branch b.
and increase matching accuracy, we propose a matching Shape cuts are formed by pairing the peak of each con-
paradigm where thek most probable parsings of a shape cave corner with selected boundary points across differ-
are considered during matching, rather than the single ent axes of symmetry. Let W = f(p ;p ;b)g be the
i j
most probable parse. This allows for more loose proba- set of all possible tuples encoding pairs of symmetric
blistic encodings and higher within-class variation. points, such thati belongs to some concave cornerC.
Finally, most classes of objects have some parts that Also, let Z =f(p ;p )g be the set of all shape cuts,
i j
are characteristic of the class. The absence of such parts such that for each (p ;p )2 Z, there is a concave cor-
i j
should make it unlikely that a given shape belongs to the nerC with (C) =p , and a symmetry axisb for which
class. We present an algorithm for learning the impor- one of the following two conditions is satisfied:
tance of each shape part according to how characteristic
0 0
(a) for some (p ;p ;b)2W and some concave corner
that part is in its object class.
i j
0 0
D,p 2C;p 2D; (D) =p ,p <p ;
j i j
i j
2 The Shape Parsing Framework
(b) p is not a concave corner point, and the distance
between pointsp andp is shorter than the distance
i j
0 0 0 0 0
We present a probabilistic model for shape parsing
betweenp andp ,8 (p ;p ;b)2W andp 2C.
i j i j i
based on the assumption that the inter-part boundaries
That is, shape cut (i;j) connects either the peaks of
are a subset of the shape cuts proposed by Singh et al.
two concave corners that contain symmetric boundary
[8]. We begin by presenting a novel definition of shape
points, or the peak of corner C and its closest point
cuts, from which a graphical representation of a shape
among all points symmetric toC across axisb. Figure
is built. Then, standard graph matching techniques for
2 illustrates the set of cuts for a sample silhouette.
part-based representations are used to match shapes.
Note that (i;j)2 Z does not necessarily imply that
pointsi andj are symmetric, because of condition (a)
2.1 Shape Cut Identification
above where we “snap” to the corner peaks if two con-
cave corners have symmetric boundary points. This
The first step in the parsing process involves identi-
is advantageous because it makes the identification of
fying all shape cuts. These are line segments that con-
shape cuts less dependent on exact point symmetries.
nect two boundary points in a silhouette, which can be
For example, the cut (3;14) in Figure 2 is formed by
selected as part boundaries. The proposed cut detection
two corner peaks that are not strictly symmetric.
algorithm is not sensitive to boundary curvature, scale,
discretization artifacts or the precise location of corners.
2.2 Shape Parse Graphs
Before defining shape cuts, we briefly introduce the
two geometric features that define them: concave cor-
ners and the medial axis transform (MAT). In simple It is assumed that the “hidden” boundaries separating
terms, for a silhouette boundary B, a concave corner a shape into parts correspond to a subset of the shape
C B is a maximal interval of connected bound- cuts inZ. The uncertainty about which parts are inter-
ary points whose negative curvatures are below some partboundaries is represented by a joint probability dis-
threshold. We often consider just the single peak tribution over a collection of binary random variables
boundary point in a concave corner, (C). (X ) , called cut variables, such thatX = 1 if
ij ij
The medial axis transform [3] is the union of max- shape cut (i;j) is an inter-part boundary, andX = 0
imally inscribed discs in a shape. Building on this, a otherwise. Then, each possible shape parse can be ex-
skeletalpoint is the center of a maximally inscribed disc pressed by a joint instantiation of eachX .
such that the disc is bitangent to at least 2 points, and a With a given instantiation of each cut variable im-
regular skeletal points is the center of such a disc that plies a set of inter-part boundaries, which implic-
touches exactly 2 points,p andp . The curves formed itly yield a set of parts representing a given shape.
i j
by connected regular skeletal points are called skeletal These boundaries and shape parts give rise to a shape
branches. These skeletal branches are used as axes of parse graph (SPG), an undirected graph G (V ;E )
0 0 0
11 sible parses of a shape can be ranked by their probabili-
ties. Rather than selecting the single most probable pa-
rameterization and always matching against it, we pro-
pose to keep thek most probable parses of each shape,
s 8
wherek is a small integer. Although this increases the
size of the search space from O(n) to O(kn), the im-
provement in matching performance makes this bene-
3 ficial. By keeping more than the single parsing of a
shape, the matching algorithm becomes more resistant
to choices of parameterizations where unexpected per-
turbations in the shape cause an incorrect parse to be
ranked as more probable than an expected parse. This
outlier is unlikely to match well against other query
Figure 2. LEFT: Concave corner points
shapes, but thek 1 other parsings may yield the de-
(colored cyan). Example symmetric
sired matching.
points p and p wrt skeletal point s.
i j
Subset of skeletal spokes (pink line seg-
ments). RIGHT: Shape cuts (orange seg-
3 Shape Part Training
ments). The shape cut endpoints corre-
spond to peaks of concave corners and/or
The effects of outlier parses can be further min-
to nearest symmetric points to a corner.
imized by exploiting how common a single part is
among all objects of its class. This is achieved by pun-
ishing graph matches that exclude shape parts that are
“characteristic” of a given class. Each node within each
with V = f1;:::;ng, representing each maximal
SPG is weighted based on how well it matches other
shape region with no interior inter-part boundaries, and
SPGs within the same model class. For each partp, all
E =f(p ;p )g, representing the inter-part boundaries
0 i j
SPGs from all shapes in the same class as p are gath-
(p ;p )2Z connecting two adjacent shape parts inV .
i j 0
ered. We denote this set as S. For example, if p cor-
Each shape maps to several SPGs, each correspond-
responds to a shape of a dog, S is the set of all SPGs
ing to a different joint instantiation of the cut variables.
representing dogs. The SPG of p is then matched to
This graph representation permits enforcing spatial con-
each SPG s 2 S, using the method described in [2].
straints between parts when matching SPGs, and casts
Based on these matches, a graph G (V ;E ) is built
1 1 1
the recognition problem as graph matching.
whereV is the set containingp and all SPG nodes that
p matched with, andE is the set of edges connectingp
2.3 Shape Matching Against the k-Most Prob-
to eachv2V . Then, everyv2V is matched against
1 1
able Parses
the SPG of every other node inV . Ifv is matched to
a node already inV , these two nodes are connected by
Given all cut variables for a target shape, it is nec-
an edge inG .
essary to decide which to activate to maximize match-
Finally, let the weight of shape part p as the cardi-
ing accuracy. That is, which shape cuts should be used
nality of the maximum clique of the final graph G ,
to break a silhouette into shape parts? We propose us-
built based on the nodes that p matches with. Larger
ing a probablistic measure to decide which parses of a
cliques indicate good, representative parts for match-
shape are more probable than others. This probablistic
ing.The byproduct of this weighting scheme is that spu-
measure can be described by natural potential functions
rious outlier parses of a shape tend to generate smaller
along connected shape cuts. Although we do not ad-
cliques, and will get poor weights. This encourages
vocate a specific parameterization, an example of this
the matching algorithm to select one of the otherk 1
could be “from each cycle of shape cuts, select the most
parses of the shape for matching.
parallel cuts” and “for each maximal cycle formed by
Although finding the maximum clique on a graph
the shape cuts, select the shortest cut in that cycle”.
is NP-complete, we are more concerned with which
This is similar to the parameterization used in Figure
graphs have larger max-cliques than others. This can
1. The cycles we describe are purely in the sense of cy-
be approximated by representing the nodes in a graph
cles on graphs, where two shape cuts are connected if
as a bit string and randomly permuting the bits to gen-
they share a nodep .
i erate subgraphs. It is easy to test if these subgraphs
Once such a parameterization is established, all pos- form a clique. Running this for the same number of it-
2304erations on every generated graph gives a good approxi-
mation for the weights, and generates favourable results
in practice.
4 Shape Matching Experiments
Our object recognition database consists of 1065 dif-
ferent images, spanning 43 different object classes [7].
Intra-class object views have high variation, including
different scales, rotations, views, and occlusions. We
test recognition performance by taking a single query
view out of the database and computing its SPGs. Then,
the k most probable SPGs are retrieved and a graph
matching algorithm [2] is used to compute the dissim-
ilarity between the query SPGs and all SPGs in the
database. The descriptor used to match shape parts is
Figure 3. Object recognition results show-
shape contexts [1]. The result of the graph matching is
ing the accuracy of matching SPGs on a
a distance between the two graphs. The final value
1065 view, 43 class database. Consider-
given to an SPG matching can be denotedv =f(; ! ),
ing how common a specific shape part is
a function of the graph distance and the weights defined
within its class (blue) achieves measur-
by computing the maximum cliques on our database
ably better matching accuracy. On aver-
shapes,!. Afterv is computed between the query and
age, we see a 5% accuracy improvement
all other SPGs, scores are ranked in increasing order. If
over unweighted matching (green).
the class of the top ranked SPG matches the query, then
recognition is deemed successful.
In Figure 3, the impact of considering not only the
thek most probable ones. Experiments on a challeng-
graph distance , but also the maximum clique weights
ing dataset showed that we are able to achieve 95% ac-
! is outlined. When matching against the 4-most prob-
curacy based on a simple parameterization. We expect
able SPGs for each model in the database, we achieved
that more sophisticated models for selecting shape cuts
just over 90% recognition accuracy. However, this
will yield even better performance. Investigating these
result is boosted to 95% when the maximum-clique
more elaborate models is a topic of future work.
weights ! are accounted for. We have observed simi-
lar improvements at all tested values ofk.
Finally, we evaluate the unweighted recognition per-
formance on our data set for different values ofk, which
[1] S. Belongie, J. Malik, and J. Puzicha. Shape match-
give the following results. This showcases the power of
ing and object recognition using shape contexts. PAMI,
considering more than the single most probable parse.
24(4):509–522, 2002. 4
[2] D. F. D. Macrini, S. Dickinson and K. Siddiqi. Object
k = 1 k = 2 k = 3 k = 4 k = 5
categorization using bone graphs. CVIU, 2011. 1, 3, 4
[3] P. Dimitrov, C.Phillips, and K. Siddiqi. Robust and ef-
82.3% 89.2% 90.7% 90.9% 91.9 %
ficient skeletal graphs. In Proceedings, IEEE Confer-
enceonComputerVisionandPatternRecognition, Hilton
5 Conclusions
Head, SC, June 2000. 2
[4] R. Juengling and L. Prasad. Parsing silhouettes without
boundary curvature. InICIAP, pages 665–670, 2007. 1
We have proposed a model for parsing shapes that
[5] X. Mi and D. DeCarlo. Separating parts from 2d shapes
allows for different parameterizations and measures of
using relatability. InICCV, 2007. 1
shape similarity, based on arbitrary representations of
[6] H. Rom and G. Medioni. Hierarchical decomposition and
shape parts. The main requirement for instantiating the axial shape description. PAMI, 15(10):973–981, 1993. 1
[7] T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock-
model is to give a parameterization such that the desired
based indexing into large shape databases. In ECCV,
parses of a shape are within itsk most probable parses,
pages 731–746, 2002. 4
which is a benefit granted by our proposal of keeping
[8] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhou-
not only the single most probable parse of a shape, but
ettes: The short-cut rule. Perception and Psychophysics,
61(4):636–660, 1999. 2
Source code dmac/spg