21st International Conference on Pattern Recognition (ICPR 2012)

November 11-15, 2012. Tsukuba, Japan

Probabilistic Shape Parsing for View-Based Object Recognition

1 1 1 2

Diego Macrini, Chris Whiten, Robert Laganiere, ` Michael Greenspan

1

UniversityofOttawa,Ottawa,Canada

2

Queen’sUniversity,Kingston,Canada

1 2

fdmacrin2,cwhit025,laganierg@uottawa.ca, michael.greenspan@queensu.ca

Abstract

We present a novel probabilistic model for parsing

shapes into several distinguishable parts for accurate

shape recognition. This shape parsing is based on ro-

bustgeometricfeaturesthatpermithighrecognitionac-

curacy. Althoughmodellingshapesisaninherentlyun-

certain process, our approach is lenient, in that the de-

siredparseofashapeonlyneedstobewithinitsk most

probable parses. Using this set of shape decomposi-

tions, we can improve recognition accuracy even fur-

(a) (b) (c)

therbydeterminingwhichpartsofashapearecommon

acrossmostviewsofobjectsinthesameclass.

Figure 1. Example output of the algo-

rithm: (a) Each shape cut (orange seg-

ments) joins a concave corner to another

1 Introduction

concave corner, or to its closest bound-

ary point across selected axes of symme-

A fundamental problem in the representation and

try. (b) and (c) are the two most proba-

matching of shapes is to determine whether or not a

ble parses of the shapes in (a), according

shape should be divided into parts, and if so, what

to a parameterization that encodes pref-

those parts should be. Part-based approaches are at-

erences for both short and parallel inter-

tractive because they provide a mechanism for compar-

part boundaries.

ing both shape contours and part structures, but require

expensive matching algorithms to account for similar

shapes having markedly different decompositions. Al-

ternatively, whole contour approaches require simpler

correct choices made during the shape decomposition

matching algorithms, but are only effective when ob-

process, which can lead to large representational dif-

jects have little articulation and within-class variation.

ferences among similar shapes. Our approach aims to

Mi and DeCarlo [5] suggest an approach of decom-

solve these problems.

posing a shape into parts iteratively based on the “sta-

bility” of skeleton points to get simple shapes. Other We propose a novel approach that combines the ben-

approaches [6, 4, 2] ﬁnd shape parts using boundary eﬁts of both whole-contour and part-based approaches

analysis and rules for corner associations, and then it- by considering multiple ways of parsing a shape. We

eratively remove parts from the shape to compute the achieve this by probabilistically selecting boundaries

unperturbed medial axis. These approaches all provide to cut a shape into parts, where a scenario of no cuts

a unique part decomposition for each shape, impose is equivalent to working with the whole shape con-

hard constraints on which regions of a shape can be- tour. We do not advocate for a speciﬁc paramterization

come parts, and depend on ﬁxed decomposition rules. of the model or a speciﬁc choice of shape descriptor,

In addition, having only one representation per shape and therefore, evaluate the approach considering known

demands a matching approach that can deal with in- shape decomposition principles. For example, Figure 1

978-4-9906441-1-6 ©2012 IAPR 2302shows the two most probable parses, (b) and (c), for symmetry, such that the symmetry axis of pointsp and

i

the shapes in (a) based on a parameterization of the p is the skeletal branch containing the center of the

j

model in which short and parallel inter-part boundaries maximally inscribed disc bitangent top andp (Fig. 2).

i j

are more likely than long and/or perpendicular ones. We are interested in keeping only one shape cut per

We recognize that selecting a parsing of a shape by corner-symmetry axis pairing. Let the tuple (p ;p ;b)

i j

the MAP estimate can lead to unforeseen results due denote that the boundary point p is symmetric to the

i

to small perturbations in the shape. To combat this boundary point p with respect to skeletal branch b.

j

and increase matching accuracy, we propose a matching Shape cuts are formed by pairing the peak of each con-

paradigm where thek most probable parsings of a shape cave corner with selected boundary points across differ-

are considered during matching, rather than the single ent axes of symmetry. Let W = f(p ;p ;b)g be the

i j

most probable parse. This allows for more loose proba- set of all possible tuples encoding pairs of symmetric

blistic encodings and higher within-class variation. points, such thati belongs to some concave cornerC.

Finally, most classes of objects have some parts that Also, let Z =f(p ;p )g be the set of all shape cuts,

i j

are characteristic of the class. The absence of such parts such that for each (p ;p )2 Z, there is a concave cor-

i j

should make it unlikely that a given shape belongs to the nerC with (C) =p , and a symmetry axisb for which

i

class. We present an algorithm for learning the impor- one of the following two conditions is satisﬁed:

tance of each shape part according to how characteristic

0 0

(a) for some (p ;p ;b)2W and some concave corner

that part is in its object class.

i j

0 0

D,p 2C;p 2D; (D) =p ,p <p ;

j i j

i j

2 The Shape Parsing Framework

(b) p is not a concave corner point, and the distance

j

between pointsp andp is shorter than the distance

i j

0 0 0 0 0

We present a probabilistic model for shape parsing

betweenp andp ,8 (p ;p ;b)2W andp 2C.

i j i j i

based on the assumption that the inter-part boundaries

That is, shape cut (i;j) connects either the peaks of

are a subset of the shape cuts proposed by Singh et al.

two concave corners that contain symmetric boundary

[8]. We begin by presenting a novel deﬁnition of shape

points, or the peak of corner C and its closest point

cuts, from which a graphical representation of a shape

among all points symmetric toC across axisb. Figure

is built. Then, standard graph matching techniques for

2 illustrates the set of cuts for a sample silhouette.

part-based representations are used to match shapes.

Note that (i;j)2 Z does not necessarily imply that

pointsi andj are symmetric, because of condition (a)

2.1 Shape Cut Identiﬁcation

above where we “snap” to the corner peaks if two con-

cave corners have symmetric boundary points. This

The ﬁrst step in the parsing process involves identi-

is advantageous because it makes the identiﬁcation of

fying all shape cuts. These are line segments that con-

shape cuts less dependent on exact point symmetries.

nect two boundary points in a silhouette, which can be

For example, the cut (3;14) in Figure 2 is formed by

selected as part boundaries. The proposed cut detection

two corner peaks that are not strictly symmetric.

algorithm is not sensitive to boundary curvature, scale,

discretization artifacts or the precise location of corners.

2.2 Shape Parse Graphs

Before deﬁning shape cuts, we brieﬂy introduce the

two geometric features that deﬁne them: concave cor-

ners and the medial axis transform (MAT). In simple It is assumed that the “hidden” boundaries separating

terms, for a silhouette boundary B, a concave corner a shape into parts correspond to a subset of the shape

C B is a maximal interval of connected bound- cuts inZ. The uncertainty about which parts are inter-

ary points whose negative curvatures are below some partboundaries is represented by a joint probability dis-

threshold. We often consider just the single peak tribution over a collection of binary random variables

boundary point in a concave corner, (C). (X ) , called cut variables, such thatX = 1 if

ij ij

(i;j)2Z

The medial axis transform [3] is the union of max- shape cut (i;j) is an inter-part boundary, andX = 0

ij

imally inscribed discs in a shape. Building on this, a otherwise. Then, each possible shape parse can be ex-

skeletalpoint is the center of a maximally inscribed disc pressed by a joint instantiation of eachX .

ij

such that the disc is bitangent to at least 2 points, and a With a given instantiation of each cut variable im-

regular skeletal points is the center of such a disc that plies a set of inter-part boundaries, which implic-

touches exactly 2 points,p andp . The curves formed itly yield a set of parts representing a given shape.

i j

by connected regular skeletal points are called skeletal These boundaries and shape parts give rise to a shape

branches. These skeletal branches are used as axes of parse graph (SPG), an undirected graph G (V ;E )

0 0 0

230310

11 sible parses of a shape can be ranked by their probabili-

ties. Rather than selecting the single most probable pa-

1

rameterization and always matching against it, we pro-

9

pose to keep thek most probable parses of each shape,

s 8

12

i

j

wherek is a small integer. Although this increases the

13

size of the search space from O(n) to O(kn), the im-

2

14

provement in matching performance makes this bene-

15

7

3 ﬁcial. By keeping more than the single parsing of a

6

shape, the matching algorithm becomes more resistant

4

5

to choices of parameterizations where unexpected per-

turbations in the shape cause an incorrect parse to be

ranked as more probable than an expected parse. This

outlier is unlikely to match well against other query

Figure 2. LEFT: Concave corner points

shapes, but thek 1 other parsings may yield the de-

(colored cyan). Example symmetric

sired matching.

points p and p wrt skeletal point s.

i j

Subset of skeletal spokes (pink line seg-

ments). RIGHT: Shape cuts (orange seg-

3 Shape Part Training

ments). The shape cut endpoints corre-

spond to peaks of concave corners and/or

The effects of outlier parses can be further min-

to nearest symmetric points to a corner.

imized by exploiting how common a single part is

among all objects of its class. This is achieved by pun-

ishing graph matches that exclude shape parts that are

“characteristic” of a given class. Each node within each

with V = f1;:::;ng, representing each maximal

0

SPG is weighted based on how well it matches other

shape region with no interior inter-part boundaries, and

SPGs within the same model class. For each partp, all

E =f(p ;p )g, representing the inter-part boundaries

0 i j

SPGs from all shapes in the same class as p are gath-

(p ;p )2Z connecting two adjacent shape parts inV .

i j 0

ered. We denote this set as S. For example, if p cor-

Each shape maps to several SPGs, each correspond-

responds to a shape of a dog, S is the set of all SPGs

ing to a different joint instantiation of the cut variables.

representing dogs. The SPG of p is then matched to

This graph representation permits enforcing spatial con-

each SPG s 2 S, using the method described in [2].

straints between parts when matching SPGs, and casts

Based on these matches, a graph G (V ;E ) is built

1 1 1

the recognition problem as graph matching.

whereV is the set containingp and all SPG nodes that

1

p matched with, andE is the set of edges connectingp

1

2.3 Shape Matching Against the k-Most Prob-

to eachv2V . Then, everyv2V is matched against

1 1

able Parses

the SPG of every other node inV . Ifv is matched to

1

a node already inV , these two nodes are connected by

1

Given all cut variables for a target shape, it is nec-

an edge inG .

1

essary to decide which to activate to maximize match-

Finally, let the weight of shape part p as the cardi-

ing accuracy. That is, which shape cuts should be used

nality of the maximum clique of the ﬁnal graph G ,

1

to break a silhouette into shape parts? We propose us-

built based on the nodes that p matches with. Larger

ing a probablistic measure to decide which parses of a

cliques indicate good, representative parts for match-

shape are more probable than others. This probablistic

ing.The byproduct of this weighting scheme is that spu-

measure can be described by natural potential functions

rious outlier parses of a shape tend to generate smaller

along connected shape cuts. Although we do not ad-

cliques, and will get poor weights. This encourages

vocate a speciﬁc parameterization, an example of this

the matching algorithm to select one of the otherk 1

could be “from each cycle of shape cuts, select the most

parses of the shape for matching.

parallel cuts” and “for each maximal cycle formed by

Although ﬁnding the maximum clique on a graph

the shape cuts, select the shortest cut in that cycle”.

is NP-complete, we are more concerned with which

This is similar to the parameterization used in Figure

graphs have larger max-cliques than others. This can

1. The cycles we describe are purely in the sense of cy-

be approximated by representing the nodes in a graph

cles on graphs, where two shape cuts are connected if

as a bit string and randomly permuting the bits to gen-

they share a nodep .

i erate subgraphs. It is easy to test if these subgraphs

Once such a parameterization is established, all pos- form a clique. Running this for the same number of it-

2304erations on every generated graph gives a good approxi-

mation for the weights, and generates favourable results

in practice.

4 Shape Matching Experiments

Our object recognition database consists of 1065 dif-

ferent images, spanning 43 different object classes [7].

Intra-class object views have high variation, including

different scales, rotations, views, and occlusions. We

test recognition performance by taking a single query

view out of the database and computing its SPGs. Then,

the k most probable SPGs are retrieved and a graph

matching algorithm [2] is used to compute the dissim-

ilarity between the query SPGs and all SPGs in the

database. The descriptor used to match shape parts is

Figure 3. Object recognition results show-

shape contexts [1]. The result of the graph matching is

ing the accuracy of matching SPGs on a

a distance between the two graphs. The ﬁnal value

1065 view, 43 class database. Consider-

given to an SPG matching can be denotedv =f(; ! ),

ing how common a speciﬁc shape part is

a function of the graph distance and the weights deﬁned

within its class (blue) achieves measur-

by computing the maximum cliques on our database

ably better matching accuracy. On aver-

shapes,!. Afterv is computed between the query and

age, we see a 5% accuracy improvement

all other SPGs, scores are ranked in increasing order. If

over unweighted matching (green).

the class of the top ranked SPG matches the query, then

1

recognition is deemed successful.

In Figure 3, the impact of considering not only the

thek most probable ones. Experiments on a challeng-

graph distance , but also the maximum clique weights

ing dataset showed that we are able to achieve 95% ac-

! is outlined. When matching against the 4-most prob-

curacy based on a simple parameterization. We expect

able SPGs for each model in the database, we achieved

that more sophisticated models for selecting shape cuts

just over 90% recognition accuracy. However, this

will yield even better performance. Investigating these

result is boosted to 95% when the maximum-clique

more elaborate models is a topic of future work.

weights ! are accounted for. We have observed simi-

lar improvements at all tested values ofk.

References

Finally, we evaluate the unweighted recognition per-

formance on our data set for different values ofk, which

[1] S. Belongie, J. Malik, and J. Puzicha. Shape match-

give the following results. This showcases the power of

ing and object recognition using shape contexts. PAMI,

considering more than the single most probable parse.

24(4):509–522, 2002. 4

[2] D. F. D. Macrini, S. Dickinson and K. Siddiqi. Object

k = 1 k = 2 k = 3 k = 4 k = 5

categorization using bone graphs. CVIU, 2011. 1, 3, 4

[3] P. Dimitrov, C.Phillips, and K. Siddiqi. Robust and ef-

82.3% 89.2% 90.7% 90.9% 91.9 %

ﬁcient skeletal graphs. In Proceedings, IEEE Confer-

enceonComputerVisionandPatternRecognition, Hilton

5 Conclusions

Head, SC, June 2000. 2

[4] R. Juengling and L. Prasad. Parsing silhouettes without

boundary curvature. InICIAP, pages 665–670, 2007. 1

We have proposed a model for parsing shapes that

[5] X. Mi and D. DeCarlo. Separating parts from 2d shapes

allows for different parameterizations and measures of

using relatability. InICCV, 2007. 1

shape similarity, based on arbitrary representations of

[6] H. Rom and G. Medioni. Hierarchical decomposition and

shape parts. The main requirement for instantiating the axial shape description. PAMI, 15(10):973–981, 1993. 1

[7] T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock-

model is to give a parameterization such that the desired

based indexing into large shape databases. In ECCV,

parses of a shape are within itsk most probable parses,

pages 731–746, 2002. 4

which is a beneﬁt granted by our proposal of keeping

[8] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhou-

not only the single most probable parse of a shape, but

ettes: The short-cut rule. Perception and Psychophysics,

61(4):636–660, 1999. 2

1

Source code atwww.cs.toronto.edu/ dmac/spg

˜

2305

## Comments 0

Log in to post a comment