Recognition
of
Isolated Handwritten Telugu
Numeral
Characters
Moka Sree
Sree.moka@gmail.com
Department of Computer Science and Engineering
National Institute of Technology Calicut
Kerala
-
673601 ,India
V.K.Govindan
vkg@nitc.ac.in
Department of Computer Science and Engineering
National Institute of Technology Calicut
Kerala
-
673601 ,India
Abstract
—
India is a multi
-
lingual and multi
-
script
country, and Telugu language is one of the most
popular languages which consist of nearly 5000
characters and 234 strokes. The recognition of
handwritten Telugu script is difficult because of its
complexity of stro
kes. Handwritten character
recognition is a challenging task for computers.
Character recognition involves feature extraction and
classification as the main processing steps. In this
paper, we proposed Chain Code feature extraction
and K
-
nearest neighbour
classification
for the
recognition of Telugu numerals
. Performance
evaluation carried out on a database
of 700 samples
pr
ovided an overall accuracy of 97.85
%.
A novel
post
-
processing technique was employed to further
improve the performance leading to a
n accuracy of
99.14%. The advantage
of the proposed method is
that it is free from thinning process.
Keywords
—
Telugu numeric characters, K
-
nearest
neighbour, Chain code histogram, Euclidean distance
I. I
ntroduction
Handwritten character recognition (HCR)
is an important area
in Image processing and
Pattern
recognition. The popularity gained by
handwritten character recognition is due to its
extensive
use
in both academic and production
fields. Handwriting character recognition can be
divided into online an
d off
-
line character
recognition. In on
-
line handwriting character
recognition, information like order of strokes
written and pressure while writing are also
considered. Where as, off
-
line handwriting
character recognition is carried out on alread
y
written
characters
. Recognition of handwritten
characters has been a popular research area for
many decades because of its various potential
applications areas like Postal Automat
ion, Bank
cheque processing,
Automatic data entry
etc
.
,
The
problem in recognition o
f HCR is due to the
variations in writing styles of different writers; the
same writer may write the same numeric character
in different styles in different time. Much work on
handwritten character recognition was done on
languages like English, Arabic, Ch
inese, Japanese
and Latin; some of the work in these languages are
[1, 2, 3, 4, 5]. There is not much work on Telugu
character recognition. Some of the important work
on Indian character recognition are those by Benne
et al. [5], Rajput et al. [6], Jagadee
sh et al. [7],
Dhandra et al. [8] and
Rajashekararadhya
et al. [9].
These are reviewed in the following:
Benne et al. [5]
has proposed a
recognition schema suitable for Telugu, Kannada,
Devanagari
numeral
sets. Features used are
directional density of pixels, water reservoirs,
maximum profile distances, and fill hole density.
K
-
nearest neighbour classification with Euclidean
distance criterion is employed for recognition.
Fourier and Chain code Descriptors
are extracted as
features for Marathi handwritten
numeral
recognition by Rajput et al. [6]. These features are
fed to Support vector machine (SVM) for
classification. Jagadeesh et al. [7] proposed an
approach based on Hidden Markov Models (HMM)
for classi
fication employing a combination of time
-
domain and frequency
domain feature
extraction
approaches for recognition of handwriting Telugu
symbols. Another work for Telugu, Kannada,
Devanagari
numeral
sets is that by
Dhandra et al.
[8]
. Features used are si
milar to what Benne et al
[5] employed, however, the classifier used is the
Probabilistic Neural Network (PNN) classifier. The
novelty of the proposed method is that it does not
require any thinning and size normalization
processes. The work proposed by
R
ajashekararadhya et.al [9]
employs a Zone and
Distance metric based feature extraction system for
Handwritten Numeral Kannada and Telugu Scripts.
Feed forward back propagation neural network is
designed for classification and recognition purpose.
It is cl
ear from above literature survey that
there is only a little work in Telugu numeral
recognition in the literature. There is a lot of scope
for
the design
of a
robust system for recognition of
isolated handwritten Telugu
numeral
characters
.
This
motivated u
s to propose a method for
recognition of isolated Telugu
numeral
character
recognition using Chain code features and K
-
nearest neighbour classifier. The novelty of the
approach is that this does not require thinning,
which is a time consuming process.
Res
t of the paper is organized as follows:
In Section II, the properties of Telugu language are
discussed. Data set and pre
-
processing approach
employed are presented in Section III. Section IV
and V deal with feature extraction and recognition
approaches. Th
e experimental results are given in
Section VI and finally the work is concluded in
Section VII.
II. Telugu Language
In Indian languages, Telugu language is
one of the most popular languages, which consists
of 5000 characters and 234 strokes. Each
handwritten character is represented as sequence of
strokes. As most Indian languages, Telugu script is
generally written in non
-
cursive style. Telugu
language is an official language in state Andhra
Pradesh and it is also spoken in the neighbouring
state
s Chhattisgarh, Karnataka, Maharashtra, Orissa
and Tamil Nadu. According to the 2001 Census of
India, Telugu is the language with the second
largest number of native speakers in India. But not
much work has been done towards handwriting
charact
er recogniti
on on Telugu script.
In this
paper, we propose an approach for handwri
tten
Telugu
numeral
character recognition
. Figure 1
depicts Telugu
numeral
characters from 0 to 9.
Table
1. Telugu
numeral
characters
III. Data set and pre
-
processing
There is no
standard database available for
south Indian scripts, so we created our own
database with help of different writers. In the pre
-
processing phase, the character image will undergo
different processing
steps
to a form acceptable for
the feature extractor. Pr
e
-
processing steps include
binarization, size normalisation, and thinning.
Among these, size normalisation and thinning are
more important. As our proposed method is free
from thinning, we do only size normalization.
Normalisation is required to take care
of different
writers
writing characters
in different size, as well
as the same writer may write the same character in
different size at different times. The character
images are normalized by finding the bounding
box of each
numeral
character image. All
character
images are normalized to size of 100 X 100. Canny
method is used for edge detection.
F
igure 1 depicts the sequence of steps in a
pre
-
processing phase
. Picture A
in fig.2
is a query
image. As shown in picture B
in fig.2
, this Query
image is norma
lised to size 100X100 after finding
the bounding box of it. Picture C
in fig.2
shows
image obtained after applying edge detection
method.
A: Query Image B: Size Normalisation
C: Edge Detection
Figure
1
. Steps in pre
-
processing phase
The next step is feature extraction and recognition.
IV. Feature extraction
The kind of feature extraction method
selected plays a main role in the performance of
character recognition system. The input image for
the feature extraction is the
output image of the
pre
-
processing step. Given a scaled binary i
mage,
edge detection is
done
after the pre
-
processing
phase. After getting the contour points of an image,
the left most
pixels
in the upper most row is
considered as the first pixel. From th
e first pixel
either clockwise or contour clockwise direction the
contour points of the image is traversed. At each
contour point in its traversing path, a Freeman's
chain code[10] is obtained by assigning one of 8
possible direction codes to each contour
point
depending on its change in direction to its next
point. The process is terminated when the next
pixel in the contour becomes the same as the
starting pixel. After applying the chain code along
the edge, the image is divided into blocks, each of
size
25 X 25 (total 16 blocks). For each block, the
Chain code histogram is calculated. Total 16 X 8
=128 fea
ture vectors are thus obtained.
The algorithm is as given below:
Input : Binary Telugu
Numeral
Images
Output : Feature Vector
Method : Chain code H
istogram
1: Consider an input image and resize it Into
size
100x100
2: Apply Edge Detection method
3: Fix, first pixel as left most pixel in upper most
row
4: Apply 8
-
Direction code method by
traversing
either
clock wise (or) Anti
-
clock wise direct
ion
5: Divide the image into 16 blocks, each block
of
25X25 sizes
.
6: Compute the Histogram of direction codes in
each
block
Finally, we obtained 4x4x8=128 feature vectors for
the given
numeral
character image.
Below, figure 3 depicts 8
-
connectivity chain code
on
Telugu
numeral
character 2.
A:
8
-
connectivity
direction code
B: Direction C
ode representation of Telugu
numeral
‘2’
Figure 2
:
Direction code representation of an
character sample
V. K
-
Nearest
Neighbour Classifier:
In this phase, K is a user defined constant,
classification of unlabelled query image is done by
assigning the label which is most frequent among
the K training samples nearest to that query image.
To classify a given unknown image,
its features
vector is extracted by using chain code histogram
method. After getting the feature vector of an
unknown image
, Euclidean
distance measures are
calculated between an unknown image and with all
the images in the training set (known images). The
image in the set corresponding to the smallest
distance (most similar) with the input image will be
the more suitable one. The images corresponding
to
the best
K matches with the
prototypes in
the set
are chosen, and
the identity
of the unknown is
taken a
s
the
corresponding to the majority labels
among the K prototype images. The algorithm is
as given below:
Input: unknown Binary Telugu
Numeral
character
Output: Recognition of the character
Method: Chain code and K
-
NN classifier
1: Take a query image
2: Extract the chain code feature of the image
(i.e.,
4x4x8 features)
3: Compute the Euclidean distance between
the
query
image and training samples
4
: Choose the K Nearest prototypes to the Input
character
5
. Classify the input image to the class of
the
majorit
y of the images chosen in step 4
.
VI. Experimental Results
We carried out our evaluations
with different
K
-
values (K=1,3,5,7 and 9) and obtained promising
results, which concludes that K
-
nn classification
with chain code feature extraction method on
Telugu
numeral
s gives more accuracy when K=7.
The experimental evaluation on combination of
chain code feature extraction and K
-
nn classifier
methods is carried out on total of
1400 samples
of
Telugu
numeral
ch
aracters. Out of which 700
Telugu
numeral
samples are used for training and
700 samples
for testing.
Recognition performance
of 70 samples of each of the 10 numerals is shown
in Table 1. The average recognition accuracy
obtained with combination of Chain
code feature
extraction and K
-
nearest neighbour classifier is
97.85 %.
Table2
: Experimental results with different K
-
values
Class
‘0’
(%)
Class
‘
1
’
(%)
Class
‘
2
’
(%)
Class
‘
3
’
(%)
Class
‘
4
’
(%)
Class
‘
5
’
(%)
Class
‘
6
’
(%)
Class
‘
7
’
(%)
Class
‘
8
’
(%)
Class‘
9
’
(%)
Overall
Performancee
(%)
K=
1
97.14
97.14
94.28
95.71
98.57
95.71
95.71
94.28
95.71
87.14
95.13
K=
3
98.
57
97.14
100
95.71
98.57
97.14
97.14
95.71
98.57
94.28
97.28
K=
5
98.57
100
100
97.14
100
98.57
95.71
96.0
98.57
92.85
97.74
K=
7
98.57
100
100
95.71
100
100
98.57
97.14
100
88.57
97.85
K=
9
98.57
100
100
95.71
100
100
98.57
97.14
100
87.14
97.71
From experimental result
we found
that the
recognition accuracy of the Telugu numeral
s
three,
seven and nine
are lower
than other numerals due
to the similarity between these three numerals.
The
numerals
three and nine
are
fo
und to be sometimes
mis
classified
as
seven and the
numeral
seven is
misclassified sometimes with three or nine
.
As an
attempt to improve the recogn
ition accuracy in
these cases we proposed a post
-
processing
technique as presented in the following subsection.
Post
-
processing
We used the frequencies
of eight
direction
codes as features in the above experiment.
In
the
post
-
processing phase
,
we add
ed
2 additional
features.
Referring to the Fig 4, w
e
used the sum of
the frequencies
of direction
codes 1, 2 and 3
to
form the 9
th
feature,
and
the sum of the frequencies
of direction
codes 5, 6, and 7 to form the 10
th
feature.
Fig 3
:
Additional
features (
9
th
and 10
th
feature) in post
-
processing phase
The process of post
-
processing is as follows:
Using
the training samples, for the case of numerals 3, 7
and 9,
for each class, a
10 elements
prototype
feature
vector
is computed
as an average of all
training samples of that class
.
A three
-
class
classifier is
thus
defined
with the prototypes
for
these three numerals. After the classification phase,
the samples classified as three, seven and nine are
again fed to this three
-
class cl
assifier
to classify
them using minimum Euclidean distance criterion
.
Table3
:
Experimental results after post
-
processing when k=7
K=7
Clas
s
’0’
(%)
Class
’
1
’
(%)
Class
’
2
’
(%)
Class
’
3
’
(%)
Class
’
4
’
(%)
Class
’
5
’
(%)
Class
’
6
’
(%)
Class
’
7
’
(%)
Class
’
8
’
(%)
Class
’
9
’
(%)
Overall
performance
(%)
Before Post
-
processi
n
g
98.57
100
100
95.71
100
100
98.57
97.14
100
88.57
97.85
After
Post
-
processing
98.57
100
100
98.57
100
100
98.57
100
100
95.71
99.14
After the above post
-
processing, the recognition
accuracies of the numerals
-
three, seven and nine
improved a greater extend. The overall accuracy
figure was increased from
97.85% to 99.14%
as
shown in Table
2
.
VII. Conclusion
Telugu
numeral character recognition is a
difficult task. In this paper we have presented a
Chain code histogram feature extraction method
with K
-
nearest neighbour classifier on isolated
Telugu
numeral
characters. The average
recognition accuracy obtained is 97.
85%. The
experimental results reveal that the error is mainly
due to the simila
r shaped
numeral
characters.
We
attempted
to improve the accuracy by incorporating
context based po
st processing and achieved
99.14
%.
The advantage of the above
methods is
that this does not require the costly thinning
process.
R
EFERENCES
[1] Plamondon R. and S. N
. Srihari:
''On
-
line and off
-
line
handwritten
character recognition
: A comprehensive survey''
,
IEEE. Transactions
on
Pattern Analysis
and Machine
Intelligence,
vol. 22, no. 1, pp. 63
-
84, 2000
[2] Nafiz. Arica and Fatos T. Yarman
-
Vural
''An Overview of
character
recognition
focused on off
-
line handwriting''
,
IEEE
Transactions on System. Man. Cybernetics
-
Part C: Applications
and
Reviews
, vol. 31,
No
-
2, pp. 216
-
233, 2001
[3] Liana M. Lorigo and Venu Govindaraju:
''Offline Arabic
handwriting recognition: A survey''
,
IEEE Transactions on
Pattern Analysis and
Machine Intelligence
, vol. 22, no. 5, pp.
712
-
724, 2006.
[4]
Toru Wakaha
ra
, Yoshimasa Kimura
and Mutsuo Sano
:
“
Handwritten Japanese Character Recognition Using Adaptive
Normalization by Global Affine Transformation
”
Document
Analysis and Recognition, 2001. Proceedings. Sixth
International Conference on
2001
[5] Benne R.G.,
Dhandra B.V
and
Mallikarjun Hangarge
:
“Tri
-
scripts handwritten
numeral
recognition:a novel approach”
Advances in computational research , ISSN: 0975
-
3273, volume
1, Issue 2,2009, pp
-
47
-
51
[6]
G. G. Rajput
.,
S. M. Mali :
“
Marathi Handwritten Numeral
Recognition using Fourier Descriptors and Normalized Chain
Code
”
IJCA Special Issue on “ Recent Trends in Image
Processing and Pattern Recognition” RTIPPR, 2010.
[7] V. Jagadeesh Babu., L. Prasanth
., R. Sharma.,
G.V.Rao and
A.Bharat
:
“HMM
-
based Online
Han
dwriting Recognition
System for Telugu Symbols”
Ninth International Conference on
Document Analysis and Recognition (ICDAR 2007) 0
-
7695
-
2822
-
8/07
[8]
B.V.Dhandra.
,
R.G.Benne
and
Mallikarjun Hangarge
:
“
Kannada, Telugu and Devanagari Handwritten Numeral
Recognition with Probabilistic Neural Network: A Novel
Approach
”
IJCA Special Issue on “Recent Trends in Image
Processing and Pattern Recognition” RTIPPR, 2010
[9]
S.V. Rajashekararadhya
and
P. Vanaja Ranjan
: “
Neural
Network Based Handwritten Numeral Reco
gnition of Kannada
and Telugu Scripts
”
TENCON 2008
-
2008 IEEE Region 10
conference 9
-
21 Nov. 2008.
[10]
Nor Amizam jusoh and
Jasni mohamad Jain .:
“
Application of Freeman Chain Codes: An Alternative
Recognition Technique for Malaysian Car Plates
”
IJCSNS
International Journal of Computer Science and Network
security, VOL.9 No.11, November 2009
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment