Web OCR Services in Indian Language

farrightΛογισμικό & κατασκευή λογ/κού

15 Αυγ 2012 (πριν από 5 χρόνια και 1 μήνα)

326 εμφανίσεις

Web OCR Services in Indian
Language

Tushar

Patnaik

School of Information
Technology

CDAC,
Noida

tusharpatnaik@cdacnoida.in


Bhupendra

Kumar

School of Information
Technology

CDAC,
Noida

bhupendra.kumar@cdac.in


Deepak Kumar
Arya

School of Information
Technology

CDAC,
Noida

deepakarya@cdacnoida.in


Presented By:

Introduction to OCR


Provides translation of scanned documents images into
machine encoded text format.


OCR System

Input Image

Output text

Introduction to Web Services


Software

system

designed

to

support

interoperable

machine
-
to
-
machine

interaction

over

the

network
.


Builds

distributed

computing

platform

for

the

web
.


Can

combine

different

web

applications

and

components


A

web

application

is

an

application

that

is

accessed

over

a

network

such

as

the

internet

or

an

intranet
.



do

not

require

any

complex

procedure

to

deploy

in

large

organizations
.



little

or

no

disk

space

on

the

client
.


Easy

upgrade

since

all

new

features

are

implemented

on

the

server


cross
-
platform

compatibility


Integration

to

other

web

applications




Web OCR Services


The OCR is implemented as a web application where user
can upload image and generate text output on the fly through
the web.


Challenges:


Multiple User’s support with session handling


Handling of non standard documents


Administrative control over the user and application


Controlled access to resources


Scalability issues



Need for Web OCR Service


The

need

to

develop

a

web

based

OCR

arise

so

as

to

make

the

stand
-
alone

OCR

for

Indian

Scripts

online

and

to

get

feedback

from

users

about

the

availability

of

such

a

service
.



Their

was

a

need

to

provide

an

online

service

to

users

globally

to

take

advantage

of

OCR

service

for

Indian

Scripts

also
.



To

maintain

large

volumes

of

data

through

digital

library
.



To

preserve

old

and

historical

documents

in

electronic

format
.


Key

Criteria

Visual

Studio

Netbeans

PHP

Nature

ASP
.
net

has

dynamic

nature

&

has

broken

new

grounds

by

entering

into

new

languages

(even

developing

some

of

its

own)
.

It

is

form

oriented,

object

oriented

and

precompiled
.

It

is

also

form

oriented,

object

oriented

and

precompiled

as

VS

PHP

is

still

stuck

to

its

scripting

language
.

It

is

an

old

software

with

no

newer

versions
.

Programming

Languages

Visual

Studio

supports

different

programming

languages

by

means

of

language

services,

which

allow

the

code

editor

and

debugger

to

support

nearly

any

programming

languages
.

Written

in

Java

but

can

run

anywhere

a

JVM

is

installed
.

PHP

code

is

embedded

into

the

HTML

source

document

and

interpreted

by

a

web

server
,

which

generates

the

web

page

document
.

It

also

has

evolved

to

include

a

command
-
line

interface

capability

and

can

be

used

in

standalone

graphical

applications

Why ASP.NET??


Compiler


Parallel

compilation

on

multicore

systems

does

improve

performance

by

a

good

25
-
30
%

over

previous

versions

on

C#

apps
.

It

integrates

web

services

hosting,

which

earlier

had

to

be

done

separately

by

the

users
.


Newer

Lexer

makes

faster

runtime

compilation


PHP

is

a

loosely

typed,

objects

optional,

fixed

syntax,

component
-
less,

runtime

interpreted,

structured

programming

model
.

It

is

not

precompiled

and

form

oriented

as

VS
.

Space

utilization

ASP
.
Net

utilizes

server

space

while

running
.

It

uses

server

space

and

not

inbuilt

memory
.

Inbuilt

memory

space

is

used

by

PHP

while

running
.

Security

ASP
.

Net

is

reputed

for

creating

sophisticated

techniques

to

ensure

the

safety

of

confidential

data
.

It

is

professional

in

nature

and

is

used

for

corporate

projects
.

Security

techniques

present

but

not

as

great

as

VS
.

PHP

provides

security

but

does

not

ensure

as

much

as

DOT

Net
.

It

is

not

professional

and

secured

as

required

for

corporate

projects
.

Architecture


The

OCR

Web

portal

can

be

accessed

by

two

different

types

of

users
.



The

administrator

user

control

other

user

activities

and

have

write

access

to

web

applications
.


Web

portal

provides

services

to

the

users

through

the

backhand

web

applications

like

OCR

services,

preprocessing

service,

Text

editing

and

other

image

processing

facilities
.



Web OCR Service…a View

User’s Dashboard..

No files to Display

File Upload

Preview Uploaded File

File Editing

Editted File

OCR File

OCRed File with Output

User’s Dashboard after File has
been OCRed

Web OCR Services using Grid


Grid computing follows service oriented architecture and
provide hardware and software services and infrastructure
for secure and uniform access to heterogeneous resources
and enables formation and management of virtual
organizations


A computational grid is a hardware and software
infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high
-
end computational
capabilities.”


-
”The Grid: Blueprint for a New Computing Infrastructure”,
Kesselman

& Foster



Exploit underutilized resources


the application must be executable remotely.


remote machine must meet any special hardware, software, or
resource requirements imposed by the application.


Parallel CPU capacity


Access to additional resources


Resource balancing


Reliability


multiple copies


automatically resubmit jobs



What

grid computing can provide?


Internet


Internet
or
Intranet

Manager

Users


Web

OCR

Services

Worker Agent


Basic Grid Architecture

Future Scope..


Multiple file upload with status bar.


Animation/progress indicator at the time of OCR execution.


Batch processing of files.


Deciding the process
-
flow and saving the workflow for future use.


Dictionary based corrections in the output of OCR.


Controls for applying multiple types of text formatting like Bold,
Italics, Underline etc.


Zoom
-
in and Zoom
-
out functions for both input and output images.


Conversion of exe files for all the OCR’s to dll library files and
integrating them.


Authenticating user login through OCR CAPTCHA.


C
ONCLUSION



The proposed system has been designed and implemented
providing the services defined.


At present five scripts OCR have been integrated.


Seven more scripts OCR are planned to be integrated during
next two years.


The computational job of OCR engine will be provided by
the grid architecture.


The number of users for Web OCR services may be not large
in number but as facilities and more number of OCRs will be
included large number of users will be benefited.


References


Software Works, “Comparison of dot net, J2ee, PHP”
http://software
-
orks.blogspot.com/2008/12/comparison
-
chart
-
net
-
j2ee
-
php.html


MSDN
http://msdn.microsoft.com/en
-
us/netframework/aa496123



Foster, Carl
Kesselman
, and S.
Tuecke
,
The Anatomy of the
Grid: Enabling Scalable Virtual Organizations
,
International Journal of Supercomputer Applications, 15(3), Sage
Publications, 2001, USA.


Rajkumar

Buyya

and
Srikumar

Venugopal


A Gentle
Introduction to Grid Computing and Technologies”

CSI
Communication VOL 9,
july

2005



Abstract


In this paper development methodology for the web OCR services
is proposed.


The term Web services describes a standardized way of integrating
Web
-
based applications using the XML, SOAP, WSDL and HTTP


Web services instead share business logic, data and processes
through a programmatic interface across a network.


Developers can then add the Web service to a GUI (such as a Web
page or an executable program) to offer specific functionality to
users. Services like optical character recognition are still not
available on web for Indian languages, where user can upload the
image and get the text output on the fly through web.


Keywords
-

Web Services, OCR Services, Image processing





I
NTRODUCTION



A

web application

is an

application

that is accessed over a network
such as the

internet or an intranet. Web applications are popular due to
the ubiquity of web browsers, and the convenience of using a web
browser as a client, sometimes called a thin client.


Common web applications include webmail, online retail sales, online
auctions, wikis and many other functions.


Services like optical character recognition are still not available on web
for Indian languages, where user can upload the image and get the text
output on the fly through web.


The framework for the OCR services will be using the ASP DOT NET
in middle tier application logic. The framework supports multiuser,
authentication, session handling, multiple file upload, user control on
technical flow, session saving, multilingual facilities for the user.


It also supports handling of non standard images, administrative control
to the client request and resources, multilevel priorities to users,
handling scalabilities (horizontal and vertical) and transparency to
replace, repair and upgrade the application.



Ministry of Information and Technology which has constituted a
Consortium to develop Indian language OCR where digitization
of all Indian languages can be done. CDAC
Noida

as a consortium
member has developed a Web OCR service portal for the internet
users.


The comparative study leads us to selection of Visual studio dot
net as it has dynamic nature and has broken new grounds by
entering into new languages (even developing some of its own). It
is form oriented, object oriented and precompiled unlike PHP.


VS Team System Database Edition has excellent database
-
code
integration tools. LINQ code generators are another excellent
feature.
Winforms

and ASP forms are great and better than
Netbeans
.


In this paper, we propose Development mythology for the Web
OCR services using Visual studio 2010 dot net tools with ASP dot
net version 4 as development technology.



Architecture



The Architecture in figure 1
defines the OCR Web portal can
be accessed by two different
types of users.



The administrator user
functions and controls are
different from normal user
controls.


Web portal provides services to
the users through the backhand
web applications like OCR
services, preprocessing service,
Text editing and other image
processing facilities.


Figure : 1


The use
-
case diagram is
shown in figure 2. The
diagram describes the set
of actions that system can
perform in collaboration of
external users or actors.


Figure : 2



OCR Web Portal is to
incorporate End
-
to
-
end OCR
system for different scripts,
preprocessing modules and
different level of access to the
end user and administrator.


The user can upload input
file or files through the web
portal after proper login to
the server and then can select
the OCR or preprocessing
module for the execution.


The text outputs can be
edited by the user through
web portal thus it requires
online keyboard for each
script.


The administrator control of
web portal is provided with
the facility of controlling
other user activity and to
control the configuration of
OCR and preprocessing
modules.


Fig 3. Workflow


M
ODULES

AND

P
ROCESSES



This section provides a general description of the modules
and where each fits in the global picture. The OCR Web
Portal comprises of the following modules.


User activity and control


Login module


New registration module


Keyboard


Administrator activity and control


OCR modules


Preprocessing Modules


Log creation and maintenance


Output Generation modules


A.
User activity and control



This module defines the role and accessibility of the end user
of the OCR Web Portal.


User module interfaces with the login and new registration
module.


The Login module checks for the credentials properties and
verifies the user.


The new registration module defines the method to get the
new credentials for the new user.


The user module specifies the services provided for the end
user and to maintain sessions. The services includes file/files
uploading, downloading the output data, selecting OCR or
preprocessing module to execute on the input file, editing
the output text file using online keyboard and logout.

B.
Online Keyboard module



The module specifies the design and usage of online keyboard
to be used by the user module for text editing through OCR
web portal.


This module will generate a online keyboard for all the script
(Included in OCR module).


This also interfaces with the selected OCR by the user so it
can initialize the correct keyboard for the user on web
portal.


C.
Administrator activity and control



The module specifies the control mechanism for the Web
portal.


The administrator privileged user need to provide the valid
credential for accessing the services. The service includes
checking the input and output files of normal user and to
control the configuration files for the OCR and
preprocessing modules.


The OCR and preprocessing modules access the
configuration file before executing the input to control the
technical flow.


The Administrator can control/change the configuration files
that help in generating better output to the user.


It can also access the various log file as it interfaces with log
generation and maintenance module.


D.
OCR
M
odule/Preprocessing
M
odule





The module is responsible for generating the xml files
according to schemas which in turn helps in global interfaces
for any OCR and preprocessing module.


Current version of Web OCR contains OCR engine for five
scripts.


The other image editing facilities are also provided in the
Web portal like image rotation, brightness control and image
cropping.



E.
Log creation and
M
aintenanc
e



This module interfaces with user module and administrator
module and log the information about the activities on Web
portal.


The interface for the user module is used for creating log of user
activities while the interface for administrator module is used for
retrieving the log information.


This module also provides the important information about the
text editing done by user.


The log information contains all the activity done on the text
output of the document image.


This information is very much useful for improving the OCR
engine performance as it can specify the more frequent errors
caused by the OCR itself at character and word level.


F.
Output Generation module



The module defines the format for the output text generated by
OCR engine and the other facilities of text editing to be provided.


The output text should support Unicode format so that all the
scripts output are standardized and accessible everywhere.


The text editing services is provided by the rich text control
where user edits output with bold, italics, underline, coloring and
other services.


.Also this control provided print control where user can get the
output thorough the printer without saving the output to local
disk.


The dictionary module can also be embedded into the rich text
control.