Recognition of Isolated Handwritten Telugu Numeral Characters

glassesbeepingAI and Robotics

Oct 20, 2013 (3 years and 5 months ago)

75 views



Recognition
of

Isolated Handwritten Telugu
Numeral

Characters


Moka Sree

Sree.moka@gmail.com

Department of Computer Science and Engineering

National Institute of Technology Calicut

Kerala
-
673601 ,India

V.K.Govindan

vkg@nitc.ac.in

Department of Computer Science and Engineering

National Institute of Technology Calicut

Kerala
-
673601 ,India


Abstract


India is a multi
-
lingual and multi
-
script
country, and Telugu language is one of the most
popular languages which consist of nearly 5000
characters and 234 strokes. The recognition of
handwritten Telugu script is difficult because of its
complexity of stro
kes. Handwritten character
recognition is a challenging task for computers.
Character recognition involves feature extraction and
classification as the main processing steps. In this
paper, we proposed Chain Code feature extraction
and K
-
nearest neighbour

classification
for the
recognition of Telugu numerals
. Performance
evaluation carried out on a database
of 700 samples
pr
ovided an overall accuracy of 97.85
%.
A novel
post
-
processing technique was employed to further
improve the performance leading to a
n accuracy of
99.14%. The advantage

of the proposed method is
that it is free from thinning process.

Keywords


Telugu numeric characters, K
-
nearest
neighbour, Chain code histogram, Euclidean distance

I. I
ntroduction

Handwritten character recognition (HCR)
is an important area

in Image processing and
Pattern

recognition. The popularity gained by
handwritten character recognition is due to its
extensive

use

in both academic and production
fields. Handwriting character recognition can be
divided into online an
d off
-
line character
recognition. In on
-
line handwriting character
recognition, information like order of strokes
written and pressure while writing are also
considered. Where as, off
-
line handwriting
character recognition is carried out on alread
y
written

characters
. Recognition of handwritten
characters has been a popular research area for
many decades because of its various potential
applications areas like Postal Automat
ion, Bank
cheque processing,

Automatic data entry

etc
.
,

The
problem in recognition o
f HCR is due to the
variations in writing styles of different writers; the
same writer may write the same numeric character
in different styles in different time. Much work on
handwritten character recognition was done on
languages like English, Arabic, Ch
inese, Japanese
and Latin; some of the work in these languages are
[1, 2, 3, 4, 5]. There is not much work on Telugu
character recognition. Some of the important work
on Indian character recognition are those by Benne
et al. [5], Rajput et al. [6], Jagadee
sh et al. [7],
Dhandra et al. [8] and
Rajashekararadhya

et al. [9].
These are reviewed in the following:

Benne et al. [5]

has proposed a
recognition schema suitable for Telugu, Kannada,
Devanagari
numeral

sets. Features used are
directional density of pixels, water reservoirs,
maximum profile distances, and fill hole density.
K
-
nearest neighbour classification with Euclidean
distance criterion is employed for recognition.
Fourier and Chain code Descriptors

are extracted as
features for Marathi handwritten
numeral

recognition by Rajput et al. [6]. These features are
fed to Support vector machine (SVM) for
classification. Jagadeesh et al. [7] proposed an
approach based on Hidden Markov Models (HMM)
for classi
fication employing a combination of time
-
domain and frequency
domain feature

extraction
approaches for recognition of handwriting Telugu
symbols. Another work for Telugu, Kannada,
Devanagari
numeral

sets is that by
Dhandra et al.
[8]
. Features used are si
milar to what Benne et al
[5] employed, however, the classifier used is the
Probabilistic Neural Network (PNN) classifier. The


novelty of the proposed method is that it does not
require any thinning and size normalization
processes. The work proposed by
R
ajashekararadhya et.al [9]

employs a Zone and
Distance metric based feature extraction system for
Handwritten Numeral Kannada and Telugu Scripts.
Feed forward back propagation neural network is
designed for classification and recognition purpose.


It is cl
ear from above literature survey that
there is only a little work in Telugu numeral
recognition in the literature. There is a lot of scope
for
the design

of a

robust system for recognition of
isolated handwritten Telugu
numeral

characters
.
This

motivated u
s to propose a method for
recognition of isolated Telugu
numeral

character
recognition using Chain code features and K
-
nearest neighbour classifier. The novelty of the
approach is that this does not require thinning,
which is a time consuming process.


Res
t of the paper is organized as follows:
In Section II, the properties of Telugu language are
discussed. Data set and pre
-
processing approach
employed are presented in Section III. Section IV
and V deal with feature extraction and recognition
approaches. Th
e experimental results are given in
Section VI and finally the work is concluded in
Section VII.

II. Telugu Language

In Indian languages, Telugu language is
one of the most popular languages, which consists
of 5000 characters and 234 strokes. Each
handwritten character is represented as sequence of
strokes. As most Indian languages, Telugu script is
generally written in non
-
cursive style. Telugu
language is an official language in state Andhra
Pradesh and it is also spoken in the neighbouring
state
s Chhattisgarh, Karnataka, Maharashtra, Orissa
and Tamil Nadu. According to the 2001 Census of
India, Telugu is the language with the second
largest number of native speakers in India. But not
much work has been done towards handwriting
charact
er recogniti
on on Telugu script.

In this
paper, we propose an approach for handwri
tten
Telugu
numeral

character recognition
. Figure 1
depicts Telugu
numeral

characters from 0 to 9.


Table

1. Telugu
numeral

characters


III. Data set and pre
-
processing

There is no
standard database available for
south Indian scripts, so we created our own
database with help of different writers. In the pre
-
processing phase, the character image will undergo
different processing

steps

to a form acceptable for
the feature extractor. Pr
e
-
processing steps include
binarization, size normalisation, and thinning.
Among these, size normalisation and thinning are
more important. As our proposed method is free
from thinning, we do only size normalization.
Normalisation is required to take care
of different
writers
writing characters

in different size, as well
as the same writer may write the same character in
different size at different times. The character
images are normalized by finding the bounding
box of each
numeral

character image. All
character
images are normalized to size of 100 X 100. Canny
method is used for edge detection.


F
igure 1 depicts the sequence of steps in a
pre
-
processing phase
. Picture A
in fig.2
is a query
image. As shown in picture B

in fig.2
, this Query
image is norma
lised to size 100X100 after finding
the bounding box of it. Picture C
in fig.2
shows
image obtained after applying edge detection
method.





A: Query Image B: Size Normalisation


C: Edge Detection

Figure
1
. Steps in pre
-
processing phase




The next step is feature extraction and recognition.


IV. Feature extraction


The kind of feature extraction method
selected plays a main role in the performance of
character recognition system. The input image for
the feature extraction is the

output image of the
pre
-
processing step. Given a scaled binary i
mage,
edge detection is
done

after the pre
-
processing
phase. After getting the contour points of an image,
the left most
pixels

in the upper most row is
considered as the first pixel. From th
e first pixel
either clockwise or contour clockwise direction the
contour points of the image is traversed. At each
contour point in its traversing path, a Freeman's
chain code[10] is obtained by assigning one of 8
possible direction codes to each contour
point
depending on its change in direction to its next
point. The process is terminated when the next
pixel in the contour becomes the same as the
starting pixel. After applying the chain code along
the edge, the image is divided into blocks, each of
size
25 X 25 (total 16 blocks). For each block, the
Chain code histogram is calculated. Total 16 X 8
=128 fea
ture vectors are thus obtained.


The algorithm is as given below:

Input : Binary Telugu
Numeral

Images

Output : Feature Vector

Method : Chain code H
istogram

1: Consider an input image and resize it Into
size
100x100

2: Apply Edge Detection method

3: Fix, first pixel as left most pixel in upper most
row

4: Apply 8
-
Direction code method by
traversing
either

clock wise (or) Anti
-
clock wise direct
ion

5: Divide the image into 16 blocks, each block
of
25X25 sizes
.

6: Compute the Histogram of direction codes in
each


block


Finally, we obtained 4x4x8=128 feature vectors for
the given
numeral

character image.

Below, figure 3 depicts 8
-
connectivity chain code
on

Telugu
numeral

character 2.


A:
8
-
connectivity
direction code



B: Direction C
ode representation of Telugu
numeral

‘2’

Figure 2
:
Direction code representation of an
character sample



V. K
-
Nearest
Neighbour Classifier:


In this phase, K is a user defined constant,
classification of unlabelled query image is done by
assigning the label which is most frequent among
the K training samples nearest to that query image.
To classify a given unknown image,
its features
vector is extracted by using chain code histogram
method. After getting the feature vector of an
unknown image
, Euclidean

distance measures are
calculated between an unknown image and with all
the images in the training set (known images). The

image in the set corresponding to the smallest
distance (most similar) with the input image will be
the more suitable one. The images corresponding
to
the best

K matches with the
prototypes in

the set
are chosen, and
the identity

of the unknown is
taken a
s
the

corresponding to the majority labels
among the K prototype images. The algorithm is
as given below:



Input: unknown Binary Telugu
Numeral

character

Output: Recognition of the character

Method: Chain code and K
-
NN classifier



1: Take a query image


2: Extract the chain code feature of the image


(i.e.,
4x4x8 features)

3: Compute the Euclidean distance between
the
query

image and training samples

4
: Choose the K Nearest prototypes to the Input
character

5
. Classify the input image to the class of

the
majorit
y of the images chosen in step 4
.



VI. Experimental Results


We carried out our evaluations
with different

K
-
values (K=1,3,5,7 and 9) and obtained promising
results, which concludes that K
-
nn classification
with chain code feature extraction method on
Telugu
numeral
s gives more accuracy when K=7.

The experimental evaluation on combination of
chain code feature extraction and K
-
nn classifier
methods is carried out on total of
1400 samples

of
Telugu
numeral

ch
aracters. Out of which 700
Telugu
numeral

samples are used for training and
700 samples

for testing.

Recognition performance
of 70 samples of each of the 10 numerals is shown
in Table 1. The average recognition accuracy
obtained with combination of Chain

code feature
extraction and K
-
nearest neighbour classifier is
97.85 %.


Table2
: Experimental results with different K
-
values


Class

‘0’

(%)


Class


1


(%)

Class


2


(%)

Class


3


(%)

Class


4


(%)

Class


5


(%)

Class


6


(%)

Class


7


(%)

Class


8


(%)

Class‘
9


(%)

Overall

Performancee

(%)

K=
1

97.14

97.14

94.28

95.71

98.57

95.71

95.71

94.28

95.71

87.14

95.13

K=
3

98.
57

97.14

100

95.71

98.57

97.14

97.14

95.71

98.57

94.28

97.28

K=
5

98.57

100

100

97.14

100

98.57

95.71

96.0

98.57

92.85

97.74

K=
7

98.57

100

100

95.71

100

100

98.57

97.14

100

88.57

97.85

K=
9

98.57

100

100

95.71

100

100

98.57

97.14

100

87.14

97.71


From experimental result
we found
that the

recognition accuracy of the Telugu numeral
s
three,
seven and nine

are lower
than other numerals due
to the similarity between these three numerals.

The

numerals

three and nine

are
fo
und to be sometimes
mis
classified

as

seven and the
numeral

seven is
misclassified sometimes with three or nine
.
As an
attempt to improve the recogn
ition accuracy in
these cases we proposed a post
-
processing
technique as presented in the following subsection.


Post
-
processing


We used the frequencies
of eight

direction
codes as features in the above experiment.

In
the
post
-
processing phase
,

we add
ed

2 additional
features.
Referring to the Fig 4, w
e
used the sum of
the frequencies

of direction
codes 1, 2 and 3

to
form the 9
th

feature,

and
the sum of the frequencies
of direction

codes 5, 6, and 7 to form the 10
th

feature.






Fig 3
:
Additional
features (

9
th

and 10
th

feature) in post
-
processing phase

The process of post
-
processing is as follows:

Using
the training samples, for the case of numerals 3, 7
and 9,
for each class, a
10 elements

prototype

feature

vector
is computed

as an average of all
training samples of that class
.
A three
-
class
classifier is

thus
defined
with the prototypes
for
these three numerals. After the classification phase,
the samples classified as three, seven and nine are
again fed to this three
-
class cl
assifier
to classify
them using minimum Euclidean distance criterion
.

Table3
:

Experimental results after post
-
processing when k=7

K=7

Clas
s
’0’

(%)

Class

1


(%)

Class

2


(%)

Class


3


(%)

Class


4


(%)

Class


5


(%)

Class


6


(%)

Class


7


(%)

Class


8


(%)

Class


9


(%)

Overall
performance

(%)

Before Post
-
processi
n
g

98.57

100

100

95.71

100

100

98.57

97.14

100

88.57

97.85

After

Post
-
processing

98.57

100

100

98.57

100

100

98.57

100

100

95.71

99.14


After the above post
-
processing, the recognition
accuracies of the numerals
-

three, seven and nine
improved a greater extend. The overall accuracy
figure was increased from
97.85% to 99.14%

as
shown in Table

2
.



VII. Conclusion


Telugu

numeral character recognition is a
difficult task. In this paper we have presented a
Chain code histogram feature extraction method
with K
-
nearest neighbour classifier on isolated
Telugu
numeral

characters. The average
recognition accuracy obtained is 97.
85%. The
experimental results reveal that the error is mainly
due to the simila
r shaped
numeral

characters.
We
attempted

to improve the accuracy by incorporating
context based po
st processing and achieved
99.14
%.
The advantage of the above
methods is

that this does not require the costly thinning
process.


R
EFERENCES


[1] Plamondon R. and S. N
. Srihari:

''On
-
line and off
-

line
handwritten

character recognition
: A comprehensive survey''
,
IEEE. Transactions

on
Pattern Analysis

and Machine
Intelligence,
vol. 22, no. 1, pp. 63
-
84, 2000

[2] Nafiz. Arica and Fatos T. Yarman
-
Vural

''An Overview of
character
recognition
focused on off
-
line handwriting''
,
IEEE

Transactions on System. Man. Cybernetics
-
Part C: Applications
and
Reviews
, vol. 31,
No
-
2, pp. 216
-
233, 2001

[3] Liana M. Lorigo and Venu Govindaraju:
''Offline Arabic
handwriting recognition: A survey''
,
IEEE Transactions on
Pattern Analysis and

Machine Intelligence
, vol. 22, no. 5, pp.
712
-
724, 2006.


[4]
Toru Wakaha
ra
, Yoshimasa Kimura
and Mutsuo Sano
:

Handwritten Japanese Character Recognition Using Adaptive
Normalization by Global Affine Transformation


Document
Analysis and Recognition, 2001. Proceedings. Sixth
International Conference on
2001

[5] Benne R.G.,
Dhandra B.V

and
Mallikarjun Hangarge
:
“Tri
-
scripts handwritten

numeral

recognition:a novel approach”

Advances in computational research , ISSN: 0975
-
3273, volume
1, Issue 2,2009, pp
-
47
-
51

[6]
G. G. Rajput
.,
S. M. Mali :

Marathi Handwritten Numeral
Recognition using Fourier Descriptors and Normalized Chain
Code

IJCA Special Issue on “ Recent Trends in Image
Processing and Pattern Recognition” RTIPPR, 2010.

[7] V. Jagadeesh Babu., L. Prasanth
., R. Sharma.,
G.V.Rao and
A.Bharat
:
“HMM
-
based Online

Han
dwriting Recognition
System for Telugu Symbols”
Ninth International Conference on
Document Analysis and Recognition (ICDAR 2007) 0
-
7695
-
2822
-
8/07


[8]
B.V.Dhandra.
,
R.G.Benne

and
Mallikarjun Hangarge
:

Kannada, Telugu and Devanagari Handwritten Numeral
Recognition with Probabilistic Neural Network: A Novel



Approach


IJCA Special Issue on “Recent Trends in Image
Processing and Pattern Recognition” RTIPPR, 2010

[9]

S.V. Rajashekararadhya

and

P. Vanaja Ranjan
: “
Neural
Network Based Handwritten Numeral Reco
gnition of Kannada
and Telugu Scripts


TENCON 2008
-
2008 IEEE Region 10
conference 9
-
21 Nov. 2008.

[10]


Nor Amizam jusoh and
Jasni mohamad Jain .:

Application of Freeman Chain Codes: An Alternative
Recognition Technique for Malaysian Car Plates


IJCSNS
International Journal of Computer Science and Network
security, VOL.9 No.11, November 2009