Intelligent Document Capture Solution

courageouscellistΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

76 εμφανίσεις


Census Data Capture
Challenge



Intelligent Document Capture Solution


UNSD Workshop
-

Minsk Dec 2008


Amir Angel

Director of Government Projects



2

The evolution of data capture in
census projects

From OCR into IDR Solution

eFLOW

Five steps:

3

Manual data entry (Key from paper)

Slow process

High error rate in the data entry process

Recruitment, training and management of personnel


Key from Image:

Archive

Approx 20% faster than key from paper


The evolution of data capture in
census projects

Key From
Paper

Key From
Image

4

OMR (Hardware readers for checkbox)


Requires special scanners and specially printed forms


Cannot handle handwritten/printed data


Forms are not user
-
friendly


OMR requires more answers => more space => increased paper
expenditures => more handling and printing costs


Not flexible, difficult to adjust to other applications once census is over


No possibility to add business rules: imputation, validations, coding

The evolution of data capture in
census projects

OMR

5

The evolution of data capture in census
projects

Automated Data Capture


Requires less human intervention
, enables to complete the census
data capture much faster (less space, less salaries, less hardware)


Full flexibility

in the type of data gathered (checkbox, OMR,
handwritten, alpha and numeric, barcode…)

²
Ensures data integrity



enables the use of automatic AND manual:
online validations, exception handling, coding


The most advanced and proven technology for Censuses
,
recommended by the UN and used by all modern countries for
census projects


Creates a
correlation

between the image and the actual form


Remote capabilities

enable all forms to be scanned locally and
then sent to a central site for processing

eFLOW

Automated
Data
Capture

6

The evolution of data capture in census
projects

Intelligent data capture platform (IDR)


by using OCR/ICR/OMR/PDA/Web/email:


Automated data capture

+


Automatic classification for documents


understands and differentiates between various types of
documents and languages and Based on state
-
of
-
the
-
art
Machine Learning algorithms


Artificial intelligence algorithms which provides enough
information for the system to find the location of the
fields on its own


Intelligent
Data
Capture

eFLOW

7

Mail Room

Scanning

Data Entry

Back
-
Office

End Users

Document prep

Sorting

Manual

Key from image

Traditional Data Capture

8

Mail Room

Scanning

Data Entry

Back
-
Office

End Users

Document prep

No sorting

Reduce manual data

entry by
40
-
70
%

Increase accuracy

and consistency

Intelligent Document Capture

9

India
2001

Turkey
1997


Brazil
2000


South Africa
2001


Ireland
2002


Italy
2002

Cyprus
2002


Turkey
2000


Kenya
2000


Slovak Republic
2001


Hong Kong
2001


Thailand
2008
(Community)


Slovenia
2006


Hong Kong
2006

South Africa Survey
2007

Ireland
2006





10

Manual


Saving of
25
%

Saving of
50
%

(Source: CSO


Central Statistic Office Ireland)

Automated Data Capture
= time saving

The technology is there


No need to invent the wheel


Reducing risks by using
an
‘Off the shelf’
technologies.


11

12

OCR

OMR

ICR

Data Types

13

Automatic Recognition

A * C * E F

1 2 3 4 5
*
7

ICR

*
=Unrecognized
Character

Improve Recognition


Voting mechanism


14

Voting

Single Engine vs. Virtual Engines

16

Figure Of Merit Example

A system recognizes
90
% of the characters contained in a batch,
but misclassifies
4
%



90
-

(
10
*
4
) =
50


The Figure Of Merit in this example is
50


A system recognizes
80
% of the characters contained in a batch,
but misclassifies
1
%



80
-

(
10
*
1
) =
70


The Figure Of Merit in this example is
50

The second system is more efficient

Benefits of Multiple ICRs

2 8 9 5 6 3 7 4 3 1 6 7 8 5


Identify false positives


Alpha & Numeric fields


Highlight for verifications


Quality control for ICR

Unique
Tiling

station


Checking for false positives

19

Engine

Result

1

25
***
8

2

2
*
5378

3

253478

4

2
*
34
*
8

Voting Methods Example


Assume we have a V. engine that includes
4
engines


We want to identify the following number:
253478


The results of each engine are displayed on the right


The
final results of the V. engines
will be:


Safe
:


2
****
8



Normal
:
25
**
78



Majority
:

253478



Order
:


25
53
78



Equalizer
:
??????

20

3

3

8

3

Majority =
3

Safe = *

ICR
1

ICR
2

ICR
3

ICR
4

Processing Example

Automatic Recognition Time

+


Completion Time

+

Correction Time
=

THROUGHPUT


Recognition

Completion

Image

Fuzzy/Approximate Search


Completion

Recognition

Image

Other Approaches


Auto Coding


Coding tasks and data validations performed on the data
capture platform: a ‘cost
-
effective’ solution


Use artificial intelligent & statistic software's for “understand”
sentences


Q: “What do you do for living?”


A: “I am guiding children”
“Teacher”
2030


Use Approximate Search tools for improving results via DB
(
Exorbyte
)




25

25

Scanning

OCR

Validation

Process integrality,
Questioner integrity
-


a work flow according to the client needs

Export

M
Flexibility
ctiva
tor


26

Flexibility

Flexibility


27


Thank You


Census Data Capture Platform