Describing Texture Directions with Von Mises Distributions

jamaicacooperativeAI and Robotics

Oct 17, 2013 (4 years and 26 days ago)

84 views

Describing Texture Direction
s

with Von Mises Distributions



Costantino Grana, Daniele Borghesani, Rita Cucchiara

Dipartimento di Ingegneria dell’Informazione

Università degli Studi di Modena e Reggio Emilia

name.surname@unimore.it



Abstract


A
new approach for document analysis has been
proposed
.

The goal is to find a coherent way to
describe texture within
generic documents,
in order to
classify

text (mainly), background and images.
Autocorrelation matrixes has been computed, and an
elegant des
cription
through mixture of Von Mises
distributions has been
implemented
.

1.

Introduction

Document image analysis has a quite long story

in
pattern recognition
.
Several techniques has been
proposed for content and layout segmentation to
provide the basis for
semantic annotatio
n,
classification and retrieval: for the implementation
over a large collection of digital documents, the
accuracy of the analysis and the computational effort
required are both significant.
A
specific

use case are
ancient books or illuminated manuscripts, that cannot
be flipped through by the public due to their value and
delicacy.

Computer science has the power to fill the
gap between people and all these
precious
libraries of
masterpieces:

in fact

di
gital versions of the artistic
works can be publicly accessible, either locally or
remotely, giving the user the freedom to choose his
personal way to navigate and enjoy it.

The quality of images

of illuminated manuscripts

heavily depends on the way
they h
ave

been acquired
or the p
reservation status of the work. S
mall rotations
or scaling can occur, pages can be spoiled, grayscale
or low quality acquisition is also possible
, generally
resulting in a set of noisy textures
. Moreover different
manuscripts have

different contents and layout. For all
these reasons, a simple approach based on color, shape
or layout would not be effective enough for a large
scale implementation. In this paper, a flexible
approach to the problem is proposed

based on texture
analysis
.

Images are

analyzed by blocks through

autocorrelation
, and a direction histogram is computed

for each block
. Then
we
described it
with
a mixture

of
Von Mises distribution (MoVM)
, a
statistical
formulation

that
is more suitable for angular data than
a mix
ture of Gaussians. We implemented an EM
algorithm for parameters extraction and finally we
exploited Support Vector Machine (SVM) for block
classification.

The goal is to provide a very fast and
compact representation for each class, in order to
make the r
etrieval as fast and effective as possible.

The test set used for our experiments is composed
by illuminated manuscripts provided by the Franco
Cosimo Panini S.P.
, precisely the photo collection of
The Borso D’Este Holy Bible
.

2.

Related works

Document segmen
tation is normally based on
partitioning of image in block
s

and
then
texture
analysis.
Several works for
text segmentation
has been
proposed: a clustering approach is presented in
[
1
]
,
while in [
2
] a classification using
Gabor filters

has
been used. A comprehensive
survey

is proposed by
Busch et.al. [
3
] exploring and comparing several
techniques and situations. More general approaches
dealing also with background and pictures
segmentation
have been
proposed
.
Some works expl
oit
geometric constraints over the layout:
for a literature
survey, please refer to [
4
].
Many others compute
specific descriptors followed by classification: an
example is provided by [
5
] with hidden tree Markov
models.

The major
of works have been developed for
printed documents, while on illuminated manuscripts

only

a limited set of works
has

been carried out. A
reference paper is the description of the DEBORA
system [
6
], which consists of a complete system for
analy
sis of Renaissance. I
n [
7
] a
n effective
technique
for texture characterization in old books has been
proposed, exploiting the autocorrelation matrix in
order to extract the relevant directions within the
texture (
called
directional rose).
In
out work, we
extended this approach
formulating a more elegant
description of the different directions within the block
with the use of mixtures of

Von Mises distributions.

3.

Texture analysis

Documents
are mainly characterized by three kind
of textures: a noisy background, the text and colored
images or decorations

(Fig
.

1
)
.

The quality of these
textures heavily depends on the way the image has
been acquired
.
Moreover
each document can have
various

contents and layout. For all these reasons, a
simple approach based on color, shape or layout would
not be
eff
ective

enough for a large scale
implementation. A flexible approach to the problem is
necessary, so w
e

chose to look at the texture structure
using a well known texture feature, the autocorrelation
matrix.

Autocorrelation is a very powerful and
discriminat
ive feature in our context because textual
textures have a pronounced orientation, that heavily
differs from background
,

pictures or decorations. The
autocorrelation function is a typical signal processing
technique
.

F
orm
al
ly
,

it is a cross correlation of
a signal
with itself, and it represents a measure of similarity
between

two signals. Once applied to a grayscale
image, it produces a central symmetry matrix, that
gives an idea of how regular
the
texture

is
.

The image is divided into square blocks whose s
ize
bs

must be set according to the scale at which the
texture should be analyzed.
The definition of the
autocorrelation for a
block

is
:


(
1
)

w
here

l

and
k

are

defined in
.

The result of

the autocorrelation can be analyzed
extracting an estimat
e

of the relevant directions within
the texture

(a similar approach has been proposed in
[
7
])
. Each angle determines a direction, and the sum of
all the pixel
s

along each direction is
computed to form
a polar representation of the autocorrelation matrix
,
called direction histogram
. In this way, each direction
will be characterized by a weight, indicating its
importance within the block.


.

(
2
)

Since the autocorrelation matrix has a central
symmetry by definition, we consider only the first half
of the direction histogram, from 0° to 179°.

and
r

are quantized: the step of

is
set to

1 degree (s
o we
obtain 180 values, representing 180 possible
directions),
r

is defined as
of the block size. A
text block will be characterized by peaks around 0°
and 180° because of the dominant direction is
horizontal, and this behavior is

different from image
textures (described by a generic monomodal or
multimodal distribution) and background texture
(described by a nearly uniform flat distribution).

4.

Texture characterization

The polar distribution obtained by autocorrelation
in the previo
us step
s

can be easily modeled using Von
Mises distributions. Gaussian distributions are
inappropriate to model periodic datasets: setting the
origin in 0°, elements
crossing
this angle will be
classified into two distinct directions, even if they
express

almost the same one. The choice of the origin
becomes very critical, and for this reason
could be

a
weak point of the
fit
. Instead, a Von Mises distribution
is circular
ly

defined so it can correctly represent
angular datasets.

The probability density func
tion is
defined as
follows
:


.

(
3
)




Figure

1
.
Example of
illuminated manuscripts and relative
ground truth
.

White identifies
background, red identifies text and blue identifies pictures or decorations.


The parameter
m

denotes how concentrate the
distribution
is
around the mean
angle
. In our
context, we used a slightly different formulation
(we
simply multiply the angles by 2)
with a periodicity of

instead of
, considering only angles in

representative for valuable and meaningful directions.
I
0

is the modified
ord
er 0
Bessel function, and is
defined as:


.

(
4
)

To catch the general multimodal
behavior

of input
datasets, we chose a mixture of Von Mises
distributions. We used mixtures
with

2

components

only
, because
they

proved to be

sufficient
in order
to
recognize the two most meaningful directions
(horizontal and vertical)

while

keeping an

affordable
computational cost
.

An example of fitting for the three
types of texture analyzed is shown in Fig.

2
.

Genera
lly, a mixture of

K

Von Mises distributions
is defined as follows:


,

(
5
)

where


represents a weight of the distribution
within the mixture. An optimal way to get the
maximum likelihood
estimates of the mixture
parameters is the Expectation
-
Maximization
algorithm

[
8
]
. In the
E

step the expected values for the
likelihood are computed, then a set of parameters to
maximize such values are obtained, repeating
the
process until
con
vergence or maximum number of
iteration
s

is
reached. To maximize the likelihood, a set
of responsibilities
of

the
bin
s f
or

each Von Mises is
necessary.
L
et

be the index of t
he
bin
.

T
he
responsibilities are computed as follows:


.

(
6
)

A new set of weights for the Von Mises of the
mixture can now be computed:


.

(
7
)

This
formulation
differ
s

from the
one in

[
8
]
, and the
motivation lies
on the dataset we used: we do not have
a general distribution of angular data to fit, but a
sampling of directions and relative weights. For this
reason, we consider the weight as a multiplier value
for each angle, so form
al
ly we have


times the
angle theta in our dataset.

In the

M
step, we compute the new

and
m

values
for each Von Mises within the mixture. In particular,

is computed by maximization of the relative
likelihood as

follows:


.

(
8
)

Note the multiplication by 2 in order to
relate

to a

periodicity. The retrieval of
m

by maximization is a
bit more complicate, due to the presence o
f the Bessel
functions. Given the derivative of the modified Bessel
function

defined as:


,

(
9
)

the problem could be mathematically solved using this
formulation:




Figure
2
.
Example of
directional histograms and the corresponding fitting with Von Mises
mixtures
.


.

(
11
)

The
value of

can be found by
the numerical
inversion

of
. In particular we use the
approximation proposed in [
9
].

At this point, we have 6 parameters to play with:
,

, and

of both Von Mises. This represent
s

a
very consistent a
nd co
mpact way to describe a who
le
distribution, making the retrieval faster and effective.

The similarity between two Von Mises distributions
can be
defined using the
Bhattacharyya distance
.
Given
two Von Mises distributions
V
1

and
V
2
, the
formulation is shown in
Eq.

10
. N
o explicit form is
available for mixtures, so we

propose

a new metric
that also takes
into
account the relative weights of
the
components of the mixture.
Given two mixture
distributions
, we test
the Bhattacharyya distance between the two
components of one distribution and the two of the
other, select the best matching two (call them
b
, and
the other two
o
) and then measure the distance as:


,

(
12
)

where



(
13
)

This
metric takes into account the fact that two
components can be very similar, but their contribution
to the mixtures is quite low.

5.

Experimental results

Te
sts have been
performed using 20

uncompressed
24bit high definition pictures of biblical illustrated
manuscripts, provided in high resolution (8373x6039)
and with 400dpi. For each image, a ground truth has
been manually annotated, focusing on the three main
characterist
ics of these images: text, images and
background. Images have been
divided

using different
fixed window sizes, then a suitable single window size
has been chosen. For each window (block) over the
image,
the direction histogram has been computed.
A
preproce
ssing stage is necessary to highlight the real
shape of the distribution, besides the numerical value
assumed by bins. A simple normalization approach has
the major drawback
of
exalt
ing

the shape
, and thus the
nois
e in low variability
data.

In order to re
present the data distribution
coherently with the real shape distribution, we chose
to change the baseline reference of values, subtracting
to each bin the minimum of the entire block. In this
way, a noisy block like the background will show a
nearly flat
distribution with a very small variance.
Instead a text block, with a dominant direction over
the horizontal axis, will continue to show a coherent
distribution, with peaks near 0
°

and 180°.


Moreover, we would like to keep the same scale in
all blocks, so that the shape can be compared. For this
reason the direction histogram bins have been
quantized to a fixed step.
The normalized distribution
has been used to fit a mixture of Von Mises
distr
ibutions.

In our
experiment
s, we
tested

mixtures with
different numbers of Von Mises distribution
s
, and we
observed that even a very limited set is sufficient to
produce good retrieval results in terms of precision
and recall, without affecting too much t
he
computational time.

To perform a
first

evaluation of the characterization
proposed,
a confusion matrix

ha
s

been computed using
a
1
-
nearest neighbor approach.
The training set

was

analyzed, then
a test set was

classif
ied
: for each block
within the test

set
, the classification of the most similar
block within the training set has been chosen as
classification of the block.
The result are
shown
in
T
ab
le

1
.

Results
were

quite promising: this feature has
a good discriminative power with a
ll three kinds of
texture used.




(
10
)


text

background

image

recall

text

431

5

54

0.879592

background

7

446

172

0.7136

image

27

63

631

0.875173


Table
1
. SVM classification using radial basis
function as kernel
.



recall

precision

text

0.931183

0.911579

background

0.854086

0.87976

image

0.826138

0.898477


Table
2
. Confusion Matrix relative to a 1
-
nearest neighbor classification
.


In order to produce a more generic classifier, we
implemented

the
SVM classification.
The best results
(in terms of recall and precision) have been
obtained
exploiting the radial basis kernel.

The results of this
classification are shown in Table

2
, and a visual
representation of the classification is show
n in Fig
.

3
.

The text is the main focus of this feature, because of
the typical horizontal orientation, and these
experiments reveal a good discriminative power for
this kind of texture. The background either has a very
peculiar characterist
ics that make
s

this feature suitable
for retrieval: a flat distribution is far more different
than the monomodal distribution of the text. Instead
this feature is not designed to be effective for image
classification: in practice, pictures present a generi
c
multimodal distribution, occasionally they show a
particular symmetry but it is not representative of the
entire class. For this reason, the good classification
results in this case have to be considered as a side
effect of the limited number of Von Mise
s within the
distribution: the algorithm tends to classify every
image with a two
-
dimensional distribution

(horizontal
and vertical orientations)
, that have proved to have a
sufficiently pronounced characteristic respect of the
other two.

6.

Conclusions

In th
is paper a new technique for document analysis
is presented
, characterizing each texture with a 6
-
vector feature
.

The autocorrelation computation and
the fitting of the direction histogram with the mixture
of Von Mises
distributions
has a reasonable
processi
ng
time for this application: a

high resolution page with
more than 1800 blocks
can be processed in about 100
seconds

on a standard PC
. The result of the feature
extraction provides a very compact description of
blocks. For this reason, these featu
res can be easily
exploited by a content
-
based image retrieval system:
the computational time needed for comparison and
searching will be ext
remely low
.

Finally w
e thank
the
Franco Cosimo Panini S.P.A
that give us the possibility to analyze
an

invaluable
p
ieces of art such as The Borso D’Este Holy Bible for
this work.

References


[
1
]

Bres, S.; Eglin, W.; Gagneux, A., "Unsupervised
clustering of text entities in heterogeneous grey level
documents," Pattern Recognition, 2002. Proceedings.
1
6th International Conference on , vol.3, no., pp. 224
-
227 vol.3, 2002

[
2
]

A. K. Jain and S. Bhattacharjee, "Text segmentation
using Gabor filters for automatic document processing,"
Machine Vision and Applications, vol. 5, no. 3, pp.
169
--
184, 1992.

[
3
]

Busch, A.; Boles, W.W.; Sridharan, S., "Texture for
script identification," Pattern Analysis and Machine
Intelligence, IEEE Transactions on , vol.27, no.11, pp.
1720
-
1732, Nov. 2005

[
4
]

Mao, S., Rosenf
eld, A., Kanungo, T.: Document
structure analysis algorithms: a literature survey. Proc.
SPIE Electronic Imaging 5010 (2003) 197

207

[
5
]

Diligenti, M.; Frasconi, P.; Gori, M., "Hidden tree
Markov models for document image
classification,"
Pattern Analysis and Machine Intelligence, IEEE
Transactions on , vol.25, no.4, pp. 519
-
523, April 2003

[
6
]

F. Le Bourgeois, H. Emptoz, “DEBORA: Digital
AccEss to BOoks of the RenAissance”, in International
Journal on Do
cument Analysis and Recognition, vol. 9,
n. 2
-
4, pp. 193
-
221, 2007

[
7
]

Journet, N.; Eglin, V.; Ramel, J.Y.; Mullot, R.,
"Dedicated texture based tools for characterisation of
old books," Document Image Analysis for Libraries,
2006. DIAL
'06. Second International Conference on ,
vol., no., pp. 10 pp.
-
, 27
-
28 April 2006

[
8
]

A. Prati, S. Calderara, R. Cucchiara,

Using Circular
Statistics for Trajectory Analysis


in Proceedings of
International Conference on Computer Visio
n and
Pattern Recognition (CVPR 2008), Anchorage, Alaska
(USA), June 24
-
26, 2008

[
9
]

G.W. Hill, “Evaluation and Inversion of the Ratios of
Modified Bessel Functions,
and

, ACM Transactions on Mathematical
Software, vol. 7, n. 2, June 1981, pp. 199
-
208







Figure
3
. Visualization of the proposed
classification. A filtering technique has been
applied to clean out the results.