21st International Conference on Pattern Recognition (ICPR 2012)
November 1115, 2012. Tsukuba, Japan
Probabilistic Shape Parsing for ViewBased Object Recognition
1 1 1 2
Diego Macrini, Chris Whiten, Robert Laganiere, ` Michael Greenspan
1
UniversityofOttawa,Ottawa,Canada
2
Queen’sUniversity,Kingston,Canada
1 2
fdmacrin2,cwhit025,laganierg@uottawa.ca, michael.greenspan@queensu.ca
Abstract
We present a novel probabilistic model for parsing
shapes into several distinguishable parts for accurate
shape recognition. This shape parsing is based on ro
bustgeometricfeaturesthatpermithighrecognitionac
curacy. Althoughmodellingshapesisaninherentlyun
certain process, our approach is lenient, in that the de
siredparseofashapeonlyneedstobewithinitsk most
probable parses. Using this set of shape decomposi
tions, we can improve recognition accuracy even fur
(a) (b) (c)
therbydeterminingwhichpartsofashapearecommon
acrossmostviewsofobjectsinthesameclass.
Figure 1. Example output of the algo
rithm: (a) Each shape cut (orange seg
ments) joins a concave corner to another
1 Introduction
concave corner, or to its closest bound
ary point across selected axes of symme
A fundamental problem in the representation and
try. (b) and (c) are the two most proba
matching of shapes is to determine whether or not a
ble parses of the shapes in (a), according
shape should be divided into parts, and if so, what
to a parameterization that encodes pref
those parts should be. Partbased approaches are at
erences for both short and parallel inter
tractive because they provide a mechanism for compar
part boundaries.
ing both shape contours and part structures, but require
expensive matching algorithms to account for similar
shapes having markedly different decompositions. Al
ternatively, whole contour approaches require simpler
correct choices made during the shape decomposition
matching algorithms, but are only effective when ob
process, which can lead to large representational dif
jects have little articulation and withinclass variation.
ferences among similar shapes. Our approach aims to
Mi and DeCarlo [5] suggest an approach of decom
solve these problems.
posing a shape into parts iteratively based on the “sta
bility” of skeleton points to get simple shapes. Other We propose a novel approach that combines the ben
approaches [6, 4, 2] ﬁnd shape parts using boundary eﬁts of both wholecontour and partbased approaches
analysis and rules for corner associations, and then it by considering multiple ways of parsing a shape. We
eratively remove parts from the shape to compute the achieve this by probabilistically selecting boundaries
unperturbed medial axis. These approaches all provide to cut a shape into parts, where a scenario of no cuts
a unique part decomposition for each shape, impose is equivalent to working with the whole shape con
hard constraints on which regions of a shape can be tour. We do not advocate for a speciﬁc paramterization
come parts, and depend on ﬁxed decomposition rules. of the model or a speciﬁc choice of shape descriptor,
In addition, having only one representation per shape and therefore, evaluate the approach considering known
demands a matching approach that can deal with in shape decomposition principles. For example, Figure 1
9784990644116 ©2012 IAPR 2302shows the two most probable parses, (b) and (c), for symmetry, such that the symmetry axis of pointsp and
i
the shapes in (a) based on a parameterization of the p is the skeletal branch containing the center of the
j
model in which short and parallel interpart boundaries maximally inscribed disc bitangent top andp (Fig. 2).
i j
are more likely than long and/or perpendicular ones. We are interested in keeping only one shape cut per
We recognize that selecting a parsing of a shape by cornersymmetry axis pairing. Let the tuple (p ;p ;b)
i j
the MAP estimate can lead to unforeseen results due denote that the boundary point p is symmetric to the
i
to small perturbations in the shape. To combat this boundary point p with respect to skeletal branch b.
j
and increase matching accuracy, we propose a matching Shape cuts are formed by pairing the peak of each con
paradigm where thek most probable parsings of a shape cave corner with selected boundary points across differ
are considered during matching, rather than the single ent axes of symmetry. Let W = f(p ;p ;b)g be the
i j
most probable parse. This allows for more loose proba set of all possible tuples encoding pairs of symmetric
blistic encodings and higher withinclass variation. points, such thati belongs to some concave cornerC.
Finally, most classes of objects have some parts that Also, let Z =f(p ;p )g be the set of all shape cuts,
i j
are characteristic of the class. The absence of such parts such that for each (p ;p )2 Z, there is a concave cor
i j
should make it unlikely that a given shape belongs to the nerC with (C) =p , and a symmetry axisb for which
i
class. We present an algorithm for learning the impor one of the following two conditions is satisﬁed:
tance of each shape part according to how characteristic
0 0
(a) for some (p ;p ;b)2W and some concave corner
that part is in its object class.
i j
0 0
D,p 2C;p 2D; (D) =p ,p <p ;
j i j
i j
2 The Shape Parsing Framework
(b) p is not a concave corner point, and the distance
j
between pointsp andp is shorter than the distance
i j
0 0 0 0 0
We present a probabilistic model for shape parsing
betweenp andp ,8 (p ;p ;b)2W andp 2C.
i j i j i
based on the assumption that the interpart boundaries
That is, shape cut (i;j) connects either the peaks of
are a subset of the shape cuts proposed by Singh et al.
two concave corners that contain symmetric boundary
[8]. We begin by presenting a novel deﬁnition of shape
points, or the peak of corner C and its closest point
cuts, from which a graphical representation of a shape
among all points symmetric toC across axisb. Figure
is built. Then, standard graph matching techniques for
2 illustrates the set of cuts for a sample silhouette.
partbased representations are used to match shapes.
Note that (i;j)2 Z does not necessarily imply that
pointsi andj are symmetric, because of condition (a)
2.1 Shape Cut Identiﬁcation
above where we “snap” to the corner peaks if two con
cave corners have symmetric boundary points. This
The ﬁrst step in the parsing process involves identi
is advantageous because it makes the identiﬁcation of
fying all shape cuts. These are line segments that con
shape cuts less dependent on exact point symmetries.
nect two boundary points in a silhouette, which can be
For example, the cut (3;14) in Figure 2 is formed by
selected as part boundaries. The proposed cut detection
two corner peaks that are not strictly symmetric.
algorithm is not sensitive to boundary curvature, scale,
discretization artifacts or the precise location of corners.
2.2 Shape Parse Graphs
Before deﬁning shape cuts, we brieﬂy introduce the
two geometric features that deﬁne them: concave cor
ners and the medial axis transform (MAT). In simple It is assumed that the “hidden” boundaries separating
terms, for a silhouette boundary B, a concave corner a shape into parts correspond to a subset of the shape
C B is a maximal interval of connected bound cuts inZ. The uncertainty about which parts are inter
ary points whose negative curvatures are below some partboundaries is represented by a joint probability dis
threshold. We often consider just the single peak tribution over a collection of binary random variables
boundary point in a concave corner, (C). (X ) , called cut variables, such thatX = 1 if
ij ij
(i;j)2Z
The medial axis transform [3] is the union of max shape cut (i;j) is an interpart boundary, andX = 0
ij
imally inscribed discs in a shape. Building on this, a otherwise. Then, each possible shape parse can be ex
skeletalpoint is the center of a maximally inscribed disc pressed by a joint instantiation of eachX .
ij
such that the disc is bitangent to at least 2 points, and a With a given instantiation of each cut variable im
regular skeletal points is the center of such a disc that plies a set of interpart boundaries, which implic
touches exactly 2 points,p andp . The curves formed itly yield a set of parts representing a given shape.
i j
by connected regular skeletal points are called skeletal These boundaries and shape parts give rise to a shape
branches. These skeletal branches are used as axes of parse graph (SPG), an undirected graph G (V ;E )
0 0 0
230310
11 sible parses of a shape can be ranked by their probabili
ties. Rather than selecting the single most probable pa
1
rameterization and always matching against it, we pro
9
pose to keep thek most probable parses of each shape,
s 8
12
i
j
wherek is a small integer. Although this increases the
13
size of the search space from O(n) to O(kn), the im
2
14
provement in matching performance makes this bene
15
7
3 ﬁcial. By keeping more than the single parsing of a
6
shape, the matching algorithm becomes more resistant
4
5
to choices of parameterizations where unexpected per
turbations in the shape cause an incorrect parse to be
ranked as more probable than an expected parse. This
outlier is unlikely to match well against other query
Figure 2. LEFT: Concave corner points
shapes, but thek 1 other parsings may yield the de
(colored cyan). Example symmetric
sired matching.
points p and p wrt skeletal point s.
i j
Subset of skeletal spokes (pink line seg
ments). RIGHT: Shape cuts (orange seg
3 Shape Part Training
ments). The shape cut endpoints corre
spond to peaks of concave corners and/or
The effects of outlier parses can be further min
to nearest symmetric points to a corner.
imized by exploiting how common a single part is
among all objects of its class. This is achieved by pun
ishing graph matches that exclude shape parts that are
“characteristic” of a given class. Each node within each
with V = f1;:::;ng, representing each maximal
0
SPG is weighted based on how well it matches other
shape region with no interior interpart boundaries, and
SPGs within the same model class. For each partp, all
E =f(p ;p )g, representing the interpart boundaries
0 i j
SPGs from all shapes in the same class as p are gath
(p ;p )2Z connecting two adjacent shape parts inV .
i j 0
ered. We denote this set as S. For example, if p cor
Each shape maps to several SPGs, each correspond
responds to a shape of a dog, S is the set of all SPGs
ing to a different joint instantiation of the cut variables.
representing dogs. The SPG of p is then matched to
This graph representation permits enforcing spatial con
each SPG s 2 S, using the method described in [2].
straints between parts when matching SPGs, and casts
Based on these matches, a graph G (V ;E ) is built
1 1 1
the recognition problem as graph matching.
whereV is the set containingp and all SPG nodes that
1
p matched with, andE is the set of edges connectingp
1
2.3 Shape Matching Against the kMost Prob
to eachv2V . Then, everyv2V is matched against
1 1
able Parses
the SPG of every other node inV . Ifv is matched to
1
a node already inV , these two nodes are connected by
1
Given all cut variables for a target shape, it is nec
an edge inG .
1
essary to decide which to activate to maximize match
Finally, let the weight of shape part p as the cardi
ing accuracy. That is, which shape cuts should be used
nality of the maximum clique of the ﬁnal graph G ,
1
to break a silhouette into shape parts? We propose us
built based on the nodes that p matches with. Larger
ing a probablistic measure to decide which parses of a
cliques indicate good, representative parts for match
shape are more probable than others. This probablistic
ing.The byproduct of this weighting scheme is that spu
measure can be described by natural potential functions
rious outlier parses of a shape tend to generate smaller
along connected shape cuts. Although we do not ad
cliques, and will get poor weights. This encourages
vocate a speciﬁc parameterization, an example of this
the matching algorithm to select one of the otherk 1
could be “from each cycle of shape cuts, select the most
parses of the shape for matching.
parallel cuts” and “for each maximal cycle formed by
Although ﬁnding the maximum clique on a graph
the shape cuts, select the shortest cut in that cycle”.
is NPcomplete, we are more concerned with which
This is similar to the parameterization used in Figure
graphs have larger maxcliques than others. This can
1. The cycles we describe are purely in the sense of cy
be approximated by representing the nodes in a graph
cles on graphs, where two shape cuts are connected if
as a bit string and randomly permuting the bits to gen
they share a nodep .
i erate subgraphs. It is easy to test if these subgraphs
Once such a parameterization is established, all pos form a clique. Running this for the same number of it
2304erations on every generated graph gives a good approxi
mation for the weights, and generates favourable results
in practice.
4 Shape Matching Experiments
Our object recognition database consists of 1065 dif
ferent images, spanning 43 different object classes [7].
Intraclass object views have high variation, including
different scales, rotations, views, and occlusions. We
test recognition performance by taking a single query
view out of the database and computing its SPGs. Then,
the k most probable SPGs are retrieved and a graph
matching algorithm [2] is used to compute the dissim
ilarity between the query SPGs and all SPGs in the
database. The descriptor used to match shape parts is
Figure 3. Object recognition results show
shape contexts [1]. The result of the graph matching is
ing the accuracy of matching SPGs on a
a distance between the two graphs. The ﬁnal value
1065 view, 43 class database. Consider
given to an SPG matching can be denotedv =f(; ! ),
ing how common a speciﬁc shape part is
a function of the graph distance and the weights deﬁned
within its class (blue) achieves measur
by computing the maximum cliques on our database
ably better matching accuracy. On aver
shapes,!. Afterv is computed between the query and
age, we see a 5% accuracy improvement
all other SPGs, scores are ranked in increasing order. If
over unweighted matching (green).
the class of the top ranked SPG matches the query, then
1
recognition is deemed successful.
In Figure 3, the impact of considering not only the
thek most probable ones. Experiments on a challeng
graph distance , but also the maximum clique weights
ing dataset showed that we are able to achieve 95% ac
! is outlined. When matching against the 4most prob
curacy based on a simple parameterization. We expect
able SPGs for each model in the database, we achieved
that more sophisticated models for selecting shape cuts
just over 90% recognition accuracy. However, this
will yield even better performance. Investigating these
result is boosted to 95% when the maximumclique
more elaborate models is a topic of future work.
weights ! are accounted for. We have observed simi
lar improvements at all tested values ofk.
References
Finally, we evaluate the unweighted recognition per
formance on our data set for different values ofk, which
[1] S. Belongie, J. Malik, and J. Puzicha. Shape match
give the following results. This showcases the power of
ing and object recognition using shape contexts. PAMI,
considering more than the single most probable parse.
24(4):509–522, 2002. 4
[2] D. F. D. Macrini, S. Dickinson and K. Siddiqi. Object
k = 1 k = 2 k = 3 k = 4 k = 5
categorization using bone graphs. CVIU, 2011. 1, 3, 4
[3] P. Dimitrov, C.Phillips, and K. Siddiqi. Robust and ef
82.3% 89.2% 90.7% 90.9% 91.9 %
ﬁcient skeletal graphs. In Proceedings, IEEE Confer
enceonComputerVisionandPatternRecognition, Hilton
5 Conclusions
Head, SC, June 2000. 2
[4] R. Juengling and L. Prasad. Parsing silhouettes without
boundary curvature. InICIAP, pages 665–670, 2007. 1
We have proposed a model for parsing shapes that
[5] X. Mi and D. DeCarlo. Separating parts from 2d shapes
allows for different parameterizations and measures of
using relatability. InICCV, 2007. 1
shape similarity, based on arbitrary representations of
[6] H. Rom and G. Medioni. Hierarchical decomposition and
shape parts. The main requirement for instantiating the axial shape description. PAMI, 15(10):973–981, 1993. 1
[7] T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock
model is to give a parameterization such that the desired
based indexing into large shape databases. In ECCV,
parses of a shape are within itsk most probable parses,
pages 731–746, 2002. 4
which is a beneﬁt granted by our proposal of keeping
[8] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhou
not only the single most probable parse of a shape, but
ettes: The shortcut rule. Perception and Psychophysics,
61(4):636–660, 1999. 2
1
Source code atwww.cs.toronto.edu/ dmac/spg
˜
2305
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment