Describing Texture Direction
s
with Von Mises Distributions
Costantino Grana, Daniele Borghesani, Rita Cucchiara
Dipartimento di Ingegneria dell’Informazione
Università degli Studi di Modena e Reggio Emilia
name.surname@unimore.it
Abstract
A
new approach for document analysis has been
proposed
.
The goal is to find a coherent way to
describe texture within
generic documents,
in order to
classify
text (mainly), background and images.
Autocorrelation matrixes has been computed, and an
elegant des
cription
through mixture of Von Mises
distributions has been
implemented
.
1.
Introduction
Document image analysis has a quite long story
in
pattern recognition
.
Several techniques has been
proposed for content and layout segmentation to
provide the basis for
semantic annotatio
n,
classification and retrieval: for the implementation
over a large collection of digital documents, the
accuracy of the analysis and the computational effort
required are both significant.
A
specific
use case are
ancient books or illuminated manuscripts, that cannot
be flipped through by the public due to their value and
delicacy.
Computer science has the power to fill the
gap between people and all these
precious
libraries of
masterpieces:
in fact
di
gital versions of the artistic
works can be publicly accessible, either locally or
remotely, giving the user the freedom to choose his
personal way to navigate and enjoy it.
The quality of images
of illuminated manuscripts
heavily depends on the way
they h
ave
been acquired
or the p
reservation status of the work. S
mall rotations
or scaling can occur, pages can be spoiled, grayscale
or low quality acquisition is also possible
, generally
resulting in a set of noisy textures
. Moreover different
manuscripts have
different contents and layout. For all
these reasons, a simple approach based on color, shape
or layout would not be effective enough for a large
scale implementation. In this paper, a flexible
approach to the problem is proposed
based on texture
analysis
.
Images are
analyzed by blocks through
autocorrelation
, and a direction histogram is computed
for each block
. Then
we
described it
with
a mixture
of
Von Mises distribution (MoVM)
, a
statistical
formulation
that
is more suitable for angular data than
a mix
ture of Gaussians. We implemented an EM
algorithm for parameters extraction and finally we
exploited Support Vector Machine (SVM) for block
classification.
The goal is to provide a very fast and
compact representation for each class, in order to
make the r
etrieval as fast and effective as possible.
The test set used for our experiments is composed
by illuminated manuscripts provided by the Franco
Cosimo Panini S.P.
, precisely the photo collection of
The Borso D’Este Holy Bible
.
2.
Related works
Document segmen
tation is normally based on
partitioning of image in block
s
and
then
texture
analysis.
Several works for
text segmentation
has been
proposed: a clustering approach is presented in
[
1
]
,
while in [
2
] a classification using
Gabor filters
has
been used. A comprehensive
survey
is proposed by
Busch et.al. [
3
] exploring and comparing several
techniques and situations. More general approaches
dealing also with background and pictures
segmentation
have been
proposed
.
Some works expl
oit
geometric constraints over the layout:
for a literature
survey, please refer to [
4
].
Many others compute
specific descriptors followed by classification: an
example is provided by [
5
] with hidden tree Markov
models.
The major
of works have been developed for
printed documents, while on illuminated manuscripts
only
a limited set of works
has
been carried out. A
reference paper is the description of the DEBORA
system [
6
], which consists of a complete system for
analy
sis of Renaissance. I
n [
7
] a
n effective
technique
for texture characterization in old books has been
proposed, exploiting the autocorrelation matrix in
order to extract the relevant directions within the
texture (
called
directional rose).
In
out work, we
extended this approach
formulating a more elegant
description of the different directions within the block
with the use of mixtures of
Von Mises distributions.
3.
Texture analysis
Documents
are mainly characterized by three kind
of textures: a noisy background, the text and colored
images or decorations
(Fig
.
1
)
.
The quality of these
textures heavily depends on the way the image has
been acquired
.
Moreover
each document can have
various
contents and layout. For all these reasons, a
simple approach based on color, shape or layout would
not be
eff
ective
enough for a large scale
implementation. A flexible approach to the problem is
necessary, so w
e
chose to look at the texture structure
using a well known texture feature, the autocorrelation
matrix.
Autocorrelation is a very powerful and
discriminat
ive feature in our context because textual
textures have a pronounced orientation, that heavily
differs from background
,
pictures or decorations. The
autocorrelation function is a typical signal processing
technique
.
F
orm
al
ly
,
it is a cross correlation of
a signal
with itself, and it represents a measure of similarity
between
two signals. Once applied to a grayscale
image, it produces a central symmetry matrix, that
gives an idea of how regular
the
texture
is
.
The image is divided into square blocks whose s
ize
bs
must be set according to the scale at which the
texture should be analyzed.
The definition of the
autocorrelation for a
block
is
:
(
1
)
w
here
l
and
k
are
defined in
.
The result of
the autocorrelation can be analyzed
extracting an estimat
e
of the relevant directions within
the texture
(a similar approach has been proposed in
[
7
])
. Each angle determines a direction, and the sum of
all the pixel
s
along each direction is
computed to form
a polar representation of the autocorrelation matrix
,
called direction histogram
. In this way, each direction
will be characterized by a weight, indicating its
importance within the block.
.
(
2
)
Since the autocorrelation matrix has a central
symmetry by definition, we consider only the first half
of the direction histogram, from 0° to 179°.
and
r
are quantized: the step of
is
set to
1 degree (s
o we
obtain 180 values, representing 180 possible
directions),
r
is defined as
of the block size. A
text block will be characterized by peaks around 0°
and 180° because of the dominant direction is
horizontal, and this behavior is
different from image
textures (described by a generic monomodal or
multimodal distribution) and background texture
(described by a nearly uniform flat distribution).
4.
Texture characterization
The polar distribution obtained by autocorrelation
in the previo
us step
s
can be easily modeled using Von
Mises distributions. Gaussian distributions are
inappropriate to model periodic datasets: setting the
origin in 0°, elements
crossing
this angle will be
classified into two distinct directions, even if they
express
almost the same one. The choice of the origin
becomes very critical, and for this reason
could be
a
weak point of the
fit
. Instead, a Von Mises distribution
is circular
ly
defined so it can correctly represent
angular datasets.
The probability density func
tion is
defined as
follows
:
.
(
3
)
Figure
1
.
Example of
illuminated manuscripts and relative
ground truth
.
White identifies
background, red identifies text and blue identifies pictures or decorations.
The parameter
m
denotes how concentrate the
distribution
is
around the mean
angle
. In our
context, we used a slightly different formulation
(we
simply multiply the angles by 2)
with a periodicity of
instead of
, considering only angles in
representative for valuable and meaningful directions.
I
0
is the modified
ord
er 0
Bessel function, and is
defined as:
.
(
4
)
To catch the general multimodal
behavior
of input
datasets, we chose a mixture of Von Mises
distributions. We used mixtures
with
2
components
only
, because
they
proved to be
sufficient
in order
to
recognize the two most meaningful directions
(horizontal and vertical)
while
keeping an
affordable
computational cost
.
An example of fitting for the three
types of texture analyzed is shown in Fig.
2
.
Genera
lly, a mixture of
K
Von Mises distributions
is defined as follows:
,
(
5
)
where
represents a weight of the distribution
within the mixture. An optimal way to get the
maximum likelihood
estimates of the mixture
parameters is the Expectation

Maximization
algorithm
[
8
]
. In the
E
step the expected values for the
likelihood are computed, then a set of parameters to
maximize such values are obtained, repeating
the
process until
con
vergence or maximum number of
iteration
s
is
reached. To maximize the likelihood, a set
of responsibilities
of
the
bin
s f
or
each Von Mises is
necessary.
L
et
be the index of t
he
bin
.
T
he
responsibilities are computed as follows:
.
(
6
)
A new set of weights for the Von Mises of the
mixture can now be computed:
.
(
7
)
This
formulation
differ
s
from the
one in
[
8
]
, and the
motivation lies
on the dataset we used: we do not have
a general distribution of angular data to fit, but a
sampling of directions and relative weights. For this
reason, we consider the weight as a multiplier value
for each angle, so form
al
ly we have
times the
angle theta in our dataset.
In the
M
step, we compute the new
and
m
values
for each Von Mises within the mixture. In particular,
is computed by maximization of the relative
likelihood as
follows:
.
(
8
)
Note the multiplication by 2 in order to
relate
to a
periodicity. The retrieval of
m
by maximization is a
bit more complicate, due to the presence o
f the Bessel
functions. Given the derivative of the modified Bessel
function
defined as:
,
(
9
)
the problem could be mathematically solved using this
formulation:
Figure
2
.
Example of
directional histograms and the corresponding fitting with Von Mises
mixtures
.
.
(
11
)
The
value of
can be found by
the numerical
inversion
of
. In particular we use the
approximation proposed in [
9
].
At this point, we have 6 parameters to play with:
,
, and
of both Von Mises. This represent
s
a
very consistent a
nd co
mpact way to describe a who
le
distribution, making the retrieval faster and effective.
The similarity between two Von Mises distributions
can be
defined using the
Bhattacharyya distance
.
Given
two Von Mises distributions
V
1
and
V
2
, the
formulation is shown in
Eq.
10
. N
o explicit form is
available for mixtures, so we
propose
a new metric
that also takes
into
account the relative weights of
the
components of the mixture.
Given two mixture
distributions
, we test
the Bhattacharyya distance between the two
components of one distribution and the two of the
other, select the best matching two (call them
b
, and
the other two
o
) and then measure the distance as:
,
(
12
)
where
(
13
)
This
metric takes into account the fact that two
components can be very similar, but their contribution
to the mixtures is quite low.
5.
Experimental results
Te
sts have been
performed using 20
uncompressed
24bit high definition pictures of biblical illustrated
manuscripts, provided in high resolution (8373x6039)
and with 400dpi. For each image, a ground truth has
been manually annotated, focusing on the three main
characterist
ics of these images: text, images and
background. Images have been
divided
using different
fixed window sizes, then a suitable single window size
has been chosen. For each window (block) over the
image,
the direction histogram has been computed.
A
preproce
ssing stage is necessary to highlight the real
shape of the distribution, besides the numerical value
assumed by bins. A simple normalization approach has
the major drawback
of
exalt
ing
the shape
, and thus the
nois
e in low variability
data.
In order to re
present the data distribution
coherently with the real shape distribution, we chose
to change the baseline reference of values, subtracting
to each bin the minimum of the entire block. In this
way, a noisy block like the background will show a
nearly flat
distribution with a very small variance.
Instead a text block, with a dominant direction over
the horizontal axis, will continue to show a coherent
distribution, with peaks near 0
°
and 180°.
Moreover, we would like to keep the same scale in
all blocks, so that the shape can be compared. For this
reason the direction histogram bins have been
quantized to a fixed step.
The normalized distribution
has been used to fit a mixture of Von Mises
distr
ibutions.
In our
experiment
s, we
tested
mixtures with
different numbers of Von Mises distribution
s
, and we
observed that even a very limited set is sufficient to
produce good retrieval results in terms of precision
and recall, without affecting too much t
he
computational time.
To perform a
first
evaluation of the characterization
proposed,
a confusion matrix
ha
s
been computed using
a
1

nearest neighbor approach.
The training set
was
analyzed, then
a test set was
classif
ied
: for each block
within the test
set
, the classification of the most similar
block within the training set has been chosen as
classification of the block.
The result are
shown
in
T
ab
le
1
.
Results
were
quite promising: this feature has
a good discriminative power with a
ll three kinds of
texture used.
(
10
)
text
background
image
recall
text
431
5
54
0.879592
background
7
446
172
0.7136
image
27
63
631
0.875173
Table
1
. SVM classification using radial basis
function as kernel
.
recall
precision
text
0.931183
0.911579
background
0.854086
0.87976
image
0.826138
0.898477
Table
2
. Confusion Matrix relative to a 1

nearest neighbor classification
.
In order to produce a more generic classifier, we
implemented
the
SVM classification.
The best results
(in terms of recall and precision) have been
obtained
exploiting the radial basis kernel.
The results of this
classification are shown in Table
2
, and a visual
representation of the classification is show
n in Fig
.
3
.
The text is the main focus of this feature, because of
the typical horizontal orientation, and these
experiments reveal a good discriminative power for
this kind of texture. The background either has a very
peculiar characterist
ics that make
s
this feature suitable
for retrieval: a flat distribution is far more different
than the monomodal distribution of the text. Instead
this feature is not designed to be effective for image
classification: in practice, pictures present a generi
c
multimodal distribution, occasionally they show a
particular symmetry but it is not representative of the
entire class. For this reason, the good classification
results in this case have to be considered as a side
effect of the limited number of Von Mise
s within the
distribution: the algorithm tends to classify every
image with a two

dimensional distribution
(horizontal
and vertical orientations)
, that have proved to have a
sufficiently pronounced characteristic respect of the
other two.
6.
Conclusions
In th
is paper a new technique for document analysis
is presented
, characterizing each texture with a 6

vector feature
.
The autocorrelation computation and
the fitting of the direction histogram with the mixture
of Von Mises
distributions
has a reasonable
processi
ng
time for this application: a
high resolution page with
more than 1800 blocks
can be processed in about 100
seconds
on a standard PC
. The result of the feature
extraction provides a very compact description of
blocks. For this reason, these featu
res can be easily
exploited by a content

based image retrieval system:
the computational time needed for comparison and
searching will be ext
remely low
.
Finally w
e thank
the
Franco Cosimo Panini S.P.A
that give us the possibility to analyze
an
invaluable
p
ieces of art such as The Borso D’Este Holy Bible for
this work.
References
[
1
]
Bres, S.; Eglin, W.; Gagneux, A., "Unsupervised
clustering of text entities in heterogeneous grey level
documents," Pattern Recognition, 2002. Proceedings.
1
6th International Conference on , vol.3, no., pp. 224

227 vol.3, 2002
[
2
]
A. K. Jain and S. Bhattacharjee, "Text segmentation
using Gabor filters for automatic document processing,"
Machine Vision and Applications, vol. 5, no. 3, pp.
169

184, 1992.
[
3
]
Busch, A.; Boles, W.W.; Sridharan, S., "Texture for
script identification," Pattern Analysis and Machine
Intelligence, IEEE Transactions on , vol.27, no.11, pp.
1720

1732, Nov. 2005
[
4
]
Mao, S., Rosenf
eld, A., Kanungo, T.: Document
structure analysis algorithms: a literature survey. Proc.
SPIE Electronic Imaging 5010 (2003) 197
–
207
[
5
]
Diligenti, M.; Frasconi, P.; Gori, M., "Hidden tree
Markov models for document image
classification,"
Pattern Analysis and Machine Intelligence, IEEE
Transactions on , vol.25, no.4, pp. 519

523, April 2003
[
6
]
F. Le Bourgeois, H. Emptoz, “DEBORA: Digital
AccEss to BOoks of the RenAissance”, in International
Journal on Do
cument Analysis and Recognition, vol. 9,
n. 2

4, pp. 193

221, 2007
[
7
]
Journet, N.; Eglin, V.; Ramel, J.Y.; Mullot, R.,
"Dedicated texture based tools for characterisation of
old books," Document Image Analysis for Libraries,
2006. DIAL
'06. Second International Conference on ,
vol., no., pp. 10 pp.

, 27

28 April 2006
[
8
]
A. Prati, S. Calderara, R. Cucchiara,
“
Using Circular
Statistics for Trajectory Analysis
”
in Proceedings of
International Conference on Computer Visio
n and
Pattern Recognition (CVPR 2008), Anchorage, Alaska
(USA), June 24

26, 2008
[
9
]
G.W. Hill, “Evaluation and Inversion of the Ratios of
Modified Bessel Functions,
and
”
, ACM Transactions on Mathematical
Software, vol. 7, n. 2, June 1981, pp. 199

208
Figure
3
. Visualization of the proposed
classification. A filtering technique has been
applied to clean out the results.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο