MS Thesis - Cs

goldbashedΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

352 εμφανίσεις



SEARCH IN CLASSROOM VIDEOS

WITH OPTICAL CHARACTER RECOGNITION

FOR VIRTUAL LEARNING



A Thesis

Presented to

t
he

Faculty of the Department of Computer Science

University of Houston



In Partial Fulfillment

o
f the Requirements for the Degree

Master of Science


By

Tayfun Tuna

December 2010


ii


SEARCH IN CLASSROOM VIDEOS

WITH OPTICAL CHARACTER

RECOGNITION

FOR VIRTUAL LEARNING



Tayfun

Tuna

APPROVED:


Dr. Jaspal Subhlok
, Advisor

Dep
.

of Computer Science
, University of Houston


Dr.
Shishir

Shah

Dep
.

of Computer Science
, University of Houston


Dr.
Lecia

Barker

School of Information
, University of
Texas at Austin




Dean, College of Natural Sciences

a
nd Mathematics




iii


Acknowledgements

I am very much grateful to
my advisor, Dr.
Jaspal
Subhlok for his guidance,
encouragement
,

and support during this work.
He kept

me motivated by his insightful
suggestions
for

solv
ing

many problems, which would otherwise seem impossible to
solve.
I would not be able to complete my work in time without his guidance and
encouragement.

I would like to express my deepest gratitude to
wards

Dr.
Shish
i
r

Shah
, who gave
me innumerable suggestions
in weekly meetings
and
in
image processing class
, both of
them

helped me many time
s

to solve difficult problems in this research.


I am heartily thankful to
Dr
Lecia

Barker

for

her

support

and
fo
r agreeing to be a
part of my thesis committee.

Without the love and support of my
wife
, it would have been hard to get
my thesis

done

on time
. I am forever indebted to my
wife Naile Tuna.












iv


SEARCH IN CLASSROOM VIDEOS

WITH OPTICAL CHARACTER RECOGNITION

FOR VIRTUAL LEARNING



An Abstract of a

Thesis

Presented to

t
he Faculty of the Department of Computer Science

University of Houston



In Partial Fulfillment

o
f the Requirements for the Degree

Master of Science


By

Tayfun Tuna

December

2010


v


Abstract

D
igital videos have been extensively used for educational p
urposes and distance
learn
ing.
Tablet PC based lecture videos have been commonly used at UH for many
years. To enhance the user experience and improve usability

of classroom lecture videos,
we designed an indexed, captioned and searchable

(ICS)

video player
.
The focus of this
thesis is search.



Searching inside of a lecture is
useful
especially
for long videos
;

i
nstead of
losing an hour watching the entire video
,
it will allow us

to find the relevant scenes

instantly
.

This f
ea
ture requires extracting the

text from video screenshots by using
Optical Character Recognition (OCR).
Since ICS video frames include
c
o
mp
le
x
images,

graphs
, and shapes in different colors w
ith
non
-
uniform background
s,

our

text detection
requires a more specialized approach than is provided by off
-
the
-
shelf OCR
engines
,
which
are

designed primarily for recognizing text within
scanned
documents

in black and
white format.

In this thesis, we des
cribe how we
used and increased the detection of these OCR
engines for ICS video player.

We

surveyed the current OCR engines for ICS video
frames and realized tha
t
the
accuracy of recognition
should be increased by preprocessing
the images. By using some image processing techniques such as
resizing
,

segmentation
,
inversion

on images, we increased the accuracy rate of search in ICS video player.










vi


Table of Contents

CHAPTER 1. INTRODUCTION

................................
................................
................................
................

1

1.1

MOTIVATION

................................
................................
................................
................................
..

1

1.2

BACKGROUND..
................................
................................
................................
...............................

2

1.2.1 VIDEOINDEXER
.

................................
................................
................................
......................

3

1.2.2 OVERWIEW OF OPTICAL CHARACTER RECOGNITION (OCR) TOOL
.

.........................

5

1.2.3 ICS VIDEO PLAYER
.

................................
................................
................................
...............

7

1.3

RELATED

WORK..
................................
................................
................................
...........................
10

1.3.1 VIDEOPL
AYERS
.

................................
................................
................................
....................
11

1.3.2 OCR IMPLEMENTATION IN VIDEOS
.

................................
................................
.................
12

1.4

THESIS

OUTLINE

..

................................
................................
................................
........................
14

CHAPTER 2. SURVEY OF OCR TOOLS

................................
................................
...............................
15

2.
1

POPULAR

OCR

TOOLS

................................
................................
................................
..................
15

2.2

THE

CRITERIA

FOR

A

“GOOD”

OCR

TOOL

FOR

ICS

VIDEO

IMAGES

................................
...
16

2.3

SIMPLE

OCR

................................
................................
................................
................................
....
18

2.4

ABBYY

FINEREADER..

................................
................................
................................
..................
20

2.5

TESSERACTOCR..

................................
................................
................................
...........................
21

2.6

GOCR..

................................
................................
................................
................................
..............
22

2.7

MICROSOFT

OFFICE

DOCUMENT

IMAGING(MODI)..

................................
.............................
24

2.7

CONCLUSION
..

................................
................................
................................
................................
2
7


CHAPTER 3. OCR CHALLENGES AND ENHANCEMENTS
................................
.............................
28

3.1

WHAT

IS

OCR

?


................................
................................
................................
..............................
28

3.2

HOW

DOES

OCR

WORK

?


................................
................................
................................
..............
29

3.3

CAUSES

OF

FALSE

DETECTION


................................
................................
................................
..
31

CHAPTER 4. ENHANCEMENTS FOR OCR

................................
................................
.........................
34

4.1

SEGMENTATION


................................
................................
................................
............................
34

4.1.1
THREASHOLDING

................................
................................
................................
...................
37

4.1.2
EROSION AND DIALATION

................................
................................
................................
...
39

4.13 EDGE DETECTION
................................
................................
................................
...................
41

4.14 BLOB EXTRACTION

................................
................................
................................
...............
43

4.2

RESIZING

FOR

TEXT

FONT

SIZE


................................
................................
................................
44

4.3

INVERSION

................................
................................
................................
................................
.....
46

4.4

RESIZING

IMAGE

................................
................................
................................
...........................
48

4.5

INVERS
ION

................................
................................
................................
................................
.....
49

CHAPTER 5. OCR ACCURACY CITERIA AND TEST RESUTLS

................................
....................
51

5.1

TEST

DATA

................................
................................
................................
................................
.....
51

5.1.1
THE IMAGES FOR OCR DETECTION TEST

................................
................................
.......
51

5.1.2 THE TEXT FOR OCR DETECTION TEST

................................
................................
............
53

5.1.2 SEARCH ACCURACY

................................
................................
................................
............
54

5.2

W
ORD

ACCURACY

A
ND
SEARCH

ACCURACY


................................
................................
.......

54

5.3

PREPARING

AND

TESTIN

TOOLS

................................
................................
...............................
55

5.3.1 TEXTPICTURE EDITOR

................................
................................
................................
........
56

5.3.2
OCR TOOL MANAGER AND ACCURACY TESTER


................................
........................
57

5.1.2
SEARCH ACCURACY

................................
................................
................................
............
59

5.4

EXPERIMENTS

AND

TEST

RESULTS

................................
................................
.........................
60

CHAPTER 6. CONCLUSION

................................
................................
................................
....................
69

REFERENCES

................................
................................
................................
................................
............
72

vii



List of Figures

FIGURE
1.1:

BLOCK DIAGRAM OF THE

VIDEO INDEXER

................................
................................
....

3

FIGURE
1.2:

A SNAPSHOOT FROM THE

VIDEO INDEXER

................................
................................
....

4

FIGURE
1.3

A SNAPSHOOT OF AN OU
TPUT FROM THE VIDEO
INDEXER

................................
.............

4

FIGURE
1.4.

THE OCR TOOL FUNCTIO
N IN ICS VIDEO PLAYE
R

................................
..........................

5

FIGURE
1.5.
A SNAPSHOOT OF AN OU
TPUT FOLDER OF OCR T
OOL

................................
....................

5

FIGURE
1.6.

A SNAPSHOOT OF RUNNI
NG OCR TOOL

................................
................................
.........

6

FIGURE
1.7.

A SNAPSHOOT OF ICS V
IDEO PLAYER XML OUTP
UT OF OCR TOOL

................................

6

FIGURE
1.8.

FLOW OF ICS VIDEO
PLAYER

................................
................................
.......................

7

FIGURE
1.9.

A SNAPSHOT OF THE VI
DEO PLAYER
SCREEN

................................
................................

8

FIGURE
1.10.

LIST WIEW OF SEARCH
FEATURE

................................
................................
................

9

FIGURE
1.11.
ICS VIDEO PLAYER PRO
GRESSBAR

................................
................................
.............

10

FIGURE
2.1

SIMPLE OCR DETECTION

EXAMPLE
1

................................
................................
..........

18

FIGURE
2.2

SIMPLE OCR DETECTION

EXAMPLE
2.

................................
................................
..........

19

FIGURE
2.3

SIMPLE OCR DETECTION

EXAMPLE
3.

................................
................................
..........

19

FIGURE
2.4

ABBY FINE READER DET
ECTION EXAMPLE
.

................................
................................
.

20

FIGURE
2.5

USER INTERFACE OF AB
BY FINE READER
.

................................
................................
....

20

FIGURE
2.6

TESSERACTOCR DETECTI
ON EXAMPLE
1.

................................
................................
....

21

FIGURE
2.7

TESSERACTOCR DETECTI
ON EXAMPLE
2.

................................
................................
....

22

FIGURE
2.8

TESSERACTOCR DETECTI
ON EXAMPLE
3.

................................
................................
....

22

FIGURE
2.9

GOCR TOOL DETECTION
EXAMPLE
1.

................................
................................
...........

23

FIGURE
2.10

GOCR TOOL DETECTION
EXAMPLE
2.

................................
................................
.........

23

FIGURE
2.11

GOCR TOOL DET
ECTION EXAMPLE
3.

................................
................................
.........

24

FIGURE
2.12

USING THE MODI OCR E
NGINE IN C PROGRAMMI
NG LANGUAGE
.

...............................

25

FIGURE
2.13

MODI DETECTION EXAMP
LE
1.

................................
................................
...................

25

FIGURE
2.14

MODI DETECTION
EXAMPLE
2.

................................
................................
...................

26

FIGURE
2.15

MODI DETECTION EXAMP
LE
3.

................................
................................
...................

26

FIGURE
3.1

PATTERN RECOGNITION
.
STEPS FOR CLASSIFICA
TION

................................
..................

30

FIGURE
3.2

CHARACTER REPRESENTA
TION FOR FUT
URE EXTRACTION
.

................................
........

30

FIGURE
3.3

DISTORTED IMAGE ANAL
YSIS
.

................................
................................
.....................

31

FIGURE
3.4

CONTRAST AND COLOR D
IFFERENCES IN CHARAC
TERS IN AN IMAGE
.

........................

32

FIGURE
3.5

SIZE DIFFERENCE

IN CHARACTERS IN AN
IMAGE
.

................................
........................

32

FIGURE
4.1

BLACK FONT TEXT ON T
HE WHITE COLOR BACKG
ROUND
.

................................
..........

35

FIGURE
4.2

COMPLEX BACKGROUND W
ITH DIFFERENT COLOR
FONT
.

................................
...........

35

FIGURE
4.3

OCR RESULTS FOR A WH
OLE IMAGE
.

................................
................................
...........

36

FIGURE
4.4

OCR RESULTS FOR A SE
GMENTED IMAGE
.

................................
................................
...

36

FIGURE
4.5

SIS THRESHOLD EXAMPL
E
1.

................................
................................
........................

38

FIGURE
4.6

SIS THRESHOLD

EXAMPLE
2.

................................
................................
........................

38

FIGURE
4.7

STRUCTURAL ELEMENT M
OVEMENT FOR MORPHOLO
GICAL OPERATIONS
.

.................

40

FIGURE
4.8

STRUCTURED ELEMENT F
OR EROSION AND DILAT
ION
.

................................
................

40

FIGURE
4.9

DILATATION AFFECT ON

AN IMAGE
.

................................
................................
............

41

FIGURE
4.10

EDGE DETECTION EFFEC
T ON A DILATING IMAG
E
.
................................
.....................

42

FIGURE
4.11

BLOB EXTRACTION EXAM
PLE ON AN IMAGE
.

................................
............................

44

FIGURE
4.12

R
ESIZE PROCESS IN AN
EXAMPLE
.

................................
................................
..............

45

FIGURE
4.13

RESIZE PROCESS IN IN
TERPOLATION
.

................................
................................
.........

45

FIGURE
4.14

RESIZE PROCESS IN BI
LINEAR INTERPOLATION
.

................................
.........................

46

FIGURE
4.15

RGP
COLOR MODEL
.

................................
................................
................................
...

47

FIGURE
4.16

THE INVERSION OPERAT
ION ON THE LEFT INPU
T IMAGE
.

................................
..........

48

viii


FIGURE
4.17

INVERSION EQUATIONS
AND THEIR EFFECT ON
THE IMAGES
.

................................
....

49

FIGURE
4.1
8

OCR ENGINES


DETECTIONS FOR ORIGI
NAL IMAGE
.

................................
...................

50

FIGURE
5.1

EXAMPLE ICS VIDEO IM
AGE
.

................................
................................
........................

52

FIGURE
5.2

EXAMPLES OF SOME IMA
GES THAT ARE NOT INC
LUDED IN THE TEST
.

........................

52

FIGURE
5.3

AN EXAMPLE OF SOME T
EXT THAT ARE NOT INC
LUDED IN THE TEST
.

.........................

53

FIGURE
5.4

SCREENSHOOT OF TEXTP
ICTURE EDITOR TOOL
.

................................
..........................

56

FIGURE
5.5

INPUT FOLDER FOR OCR

TEST CREATED BY TEXT
PICTURE

EDITOR TOOL
.

...................

57

FIGURE
5.6

.
SCREENSHOOT OF OCR T
OOL MANAGER AND ACCU
RACY TESTER

.............................

58

FIGURE
5.7

SCREENSHOOT OF OCR

TOOL MANAGER AND ACC
URACY TESTER

.............................

58

FIGURE

5.8

EXCEL FILE CREATED B
Y OCR MANAGER TOOL F
OR FOLDER
.

................................
.....

59

FIGURE
5.9

EXCEL FILE CREATED B
Y OCR MANAGER TOOL F
OR
AN IMAGE
.

................................
..

59

FIGURE
5.10

EXAMPLE SCREENS FROM

THE VIDEOS H
AVE

HIGHEST FALSE P
OSITIVES
.

................

67

FIGURE
5.11

EXAMPLE SCREENS

FROM VIDEOS HAVE HIG
HEST WORD DETECTIONS

.....................

67

FIGURE
5.12

EXAMPLE SCREENS FROM

THE VIDEOS WHICH HAV
E LOWEST DETECTION

...............

67






List of
Graphs

GRAPH
5.1

OCR ACCURACY TEST GR
APH FOR

WORD ACCURACY


................................
................

63

GRAPH
5.2

GRAPH FOR OCR TEST R
ESULTS OF

SEARCH ACCURACY


................................
............

64

GRAPH
5.3

GRAPH FOR OCR TEST R
ESULTS OF
EXECUTION TIMES

................................
.................

64

GRAPH
5.4

OCR TEST RESULTS FOR

FALSE POSITIVES

................................
................................
....

65

GRAPH
5.5

GRAPH FOR OCR TEST R
ESULTS OF SEARCH ACC
URACY RATE
FOR ALL VIDEOS

..........

68







List of Tables



TABLE
2.1

POPULAR OCR TOOLS

................................
................................
................................
..

15

TABLE
2.2

SELECTED OCR TOOLS T
EST

................................
................................
........................

18

TABLE
5.1

FORMULATION OF

WORD ACCURACY


................................
................................
........

54

TABLE
5.2

FORMULATION OF

SEARCH ACCURACY


................................
................................
.....

55

TABLE
5.3

OCR ACCURACY TEST RE
SULTS FOR

WORD ACCURACY


................................
.............

61

TABLE
5.4

NUMBER OF UNDETECTED

WORDS WITH METHODS

................................
.....................

62

TABLE
5.5

OCR ACCURACY TEST RE
SULTS FOR

SEARCH ACCURACY


................................
.........

62

TABLE
5.6

TEST RESULTS FOR

EXECUTION TIMES


................................
................................
......

64

TABLE
5.7

NUMBER OF

FALSE POSITIVES


................................
................................
...................

65

TABLE
5.8

VIDEOS WHICH HAVE TH
E HIGHEST

FALSE POSITIVIVES


................................
............

64



1


Chapter 1:

Introduction

1.1 Motivation



There is a huge database of digital videos in any school that

employs lecture
video

recording. Traditionally
,

students would dow
nload the video and watch using
a
basic

video player. This method is not suitable for some users like students who w
ant to
quickly refer to a specifi
c topic in a lecture video as it is hard

to tell exactly when that
topic was taught.
It is not also suitable for deaf students.
To make these videos more
accessible and

exciting, we needed to make the content inside videos easily navigable,
searchable

and associate closed captions with videos th
rough a visually attractive and
easy to

use interface of a video player.


To provide easy access to video content and enhance

user experience
,

we
designed

a

video player

in ICS video project
, focused on making the video content more
accessible
and navigable

to users. Th
is

video player allows users to search for a topic
they want in a

lecture video
,

which saves time as users do not need to view the whole
lecture stream

to
find

what they were looking for.


To provide search ability to our v
ideo player
,

we need

to get the text of each video
frames.
This can be done by using o
ptical character recognition
(OCR)
.

Since ICS video
frames include complex images, graphs, and shapes in different colors with non
-
uniform
backgrounds, our text detectio
n requires a more specialized approach than is provided by
off
-
the
-
shelf OCR softwares, which are designed primarily for recognizing text within
scanned documents in black and white format.
Apart

from the choosing the right OCR
2


tool for ICS video player, u
sing
basic
pre
-
image processing techniques

to improve
accuracy are required.

1.2 Background


Digital videos in education have been a successful medium for students to study
or

revise the subject matter taught in a classroom
[1]
. Although, a
practical method

of
education, it was never meant to replace or substitute live classroom interaction

as a live
classroom lecture and student
-
instructor interaction cannot be retained in a

video, but

we
still pr
ovide anytime
-
anywhere accessibility

by allow
ing web based access

to lecture
videos
[
2]
.
W
e wanted to enhance the

user experience and make the content of video
lectures easily accessible to students

by designing a player which could support indexing
(or visual transition points),

search and captioning
.


At
the
University of Houston, video recordings have been used for many years for

distance learning. In all those years, lecture videos have only grown in popularity
[
2,3,
4]
.
A problem that students face while viewin
g these videos is that it i
s diffi
cult

to access
specifi
c content. To solve this problem we started a project known as

Indexed, Captioned
and Searchable (ICS) Videos. Indexing (a process of locating

visual transitions in video),
search
ing

and captioning have been incorporated in the

project to attain the goal of
making lecture videos accessible to a wide variety of

users in a
n

easy to use manner.

We
are

looking at
the

project from the perspective of an end user (most likely

a
student). To incre
ase usefulness of the ICS Video

project,

all videos contain

meta
-
information associated with them. This meta
-
information contains information

like
description of lecture, a series of points in the video time
-
line where a visual

transition
exists (also known as index points) along with keywords n
eeded to search

and closed
3


caption text.
The indexer,

explained in the

following sections, creates index
and transition
points

of the video as image files for OCR tool. OCR tool detect
s

the text from these
images and store
s it

in a way that
ICS Video
Player
,

explained in the

following sections,
organizes this

meta
-
information in

a manner

which is practical to the end user while
preserving the emphasis on the video.



As stated earlier, this work is a cu
lmination of a larger ICS Video

project. In

this
section we present a summary of contributions made by others for this project.

1.2.1 Video Indexer

The job of the indexer is to divide the video into segments where each division
occurs

at a visual transition as shown in figure 1.1.

By d
ividing a video in this manner we
get a division of topics

taught in a lecture because the visual transitions in a video are
nothing but slide

transitions. The indexer is also supposed to eliminate duplicate
transition points

and place index points at appr
oximately similar time intervals.


Figure

1.1: Block diagram of the video indexer. The
output from the indexer
is

image
files
and
a

textual document which essentially contains a list of index points i.e. time
stamps

where a visual transition exists.


4


Joanna Li
[3]

outlined a method

to identify visual transitio
ns and eliminate
duplicates by fi
ltering. Later,
this approach was enhanced with
new algorithms
[4]
.


Figure

1.
2
:
A Snapsh
ot from

the video indexer.

It is running to find index point and
transitio
n points.





Figure

1.
3
:
Output from

the video indexer.

It created all transition points and a data file
shows which one is index point.



In figure 1.2 a snapshot from the video indexer is shown. After it finishes
processes, it creates the outputs in a folder
for OCR tool
as shown in figure 1.3.

5


1.2.2
Overview of
Optical Character Recognition (OCR) Tool

We will discuss OCR d
eeply in the fo
llowing chapters. Figure 1.4 shows a
workflow with a short description
.


Figure

1.
4
: The OCR tool takes each frame where an index (or visual transition)

exists
and extracts a list of keywords written on it. This list is then organized in

such a way that
it can be cross referenced by the index points.


After video indexer creates the index point and transit
ion points which are

image
files, OCR module runs to get the

keywords from

the
written text on the
se

video frames
(which are essentially power

point sl
ides)
.

As a

result we get all the keywords for a video
segment from this tool. These keywords,

among other data, are then used to facilitate
search function
s

in the video player.


Figure

1.
5
: The OCR tool
rename files according to their index point number
.

L1
-
082310_i_1_1

refers to first index point and first transition points.

L1
-
082310_
t
_1_
2
refers to first index point and second transition points.

6



Figure

1.
6
: The OCR tool
running for extracting text from images.


OCR tool finishes ex
tracting text from

all images one

by

one

and then
creates an
XML file for output that includes the keywo
rds for each transition point as shown in
figure 1.7
.


Figure

1.
7
:
XML file, output of OCR tool.


Once the xml file is ready, the ICS Video Player is ready to use it on
its interface.



We discuss in the next chapter how the information supplied by the indexer and

the OCR
tool is used in the ICS Video Player.

7


1.
2.3

ICS video player


Figure

1.
8
: The video player is fed the meta
-
information consisting of index points

and

keywords along with some
of the course information, which

that lecture belongs

to
,

and
the information
about

the
lecture itself. Caption
fil
e is an optional resource, if present,

will be display
ed below the video as shown in
Figure

1
.
9
.



In essenc
e the ICS Videos project aims at providing three features
-

indexing,

captioning and search to distance digital video education. Here is a
n

overview of how

th
ose three features were

integrated in the video player:

1. Indexing

The recorded video lectures we
re divided into segments where the division

occurs at a visual transition (which is assumed to be a change of topic in a

lecture). These segments (or index points) were organized in a list of index

point
s in the player interface (see
Figure

1
.
9

(d)).

2.
Captioning

The video player was designed to contain a panel which
can be minimized

and

display close
d captions if a closed caption fi
le was associated with that video

(see
Figure

1
.
9

(f)). At the time

of writing this, the captions fi
le needs to be

manually

generated by the author.

8


3. Search

User
can

search for a topic of interest in a video by searching for a keyword.

The
result shows

all occurrences of that keyword among the video segments.

This is
implemented by incorporating the indexer and OCR tool disc
ussed

earlier in the
video processing pipeline. The search result allow
s

users to easily

navigate to the
video segment where a match for the search keyword was found

as shown f
igure

1.9 (b) and
f
igure

1
.
10
.




Figure

1
.
9
: A snapshot of the vi
deo player screen. Highlighted components
-

(a)

video display, (b) search box, (c) lecture title, (d) index list, (e) playhead slider, (f)

closed captions, (g) video controls



In
Figure

1
.
9

we show a running example of the video player
.
The lec
ture in
f
igure

1
.
9

belongs to the COSC 1410
-

Introduction to Computer Science course by Dr.
9


Nouhad Rizk in University of Houston
. The
Figure

gives

a view of the player as a whole
along with every component. The player interface

is
mostly self explanatory,

but

we
should clarify some of the functionality. Video

display (
f
igure

1
.
9

(a)) shows the current

status of the video. I
f the video

is paused, it shows a gra
y

overlay over the video with a
play button.

Index list (
Figure

1
.
9

(d)) contains a descriptive en
try for each index point
(also

known as visual transition) in the video. Each entry in the index list is made up

of
a
snapshot image of the video at the index point, name of the index and its

description
shown in f
igure

1
.
10
.


Figure

1.10
: The
Figure

show
s the list of results when the user searched for the keyword

"program" in the lecture.


One component

that is not shown in
f
igure

1.9

is the search result component.
When
the user searches for a keyword
,

all
indices that
contain that keyword in

their
keyw
ord list
,

are displayed in
the

list of search results. The user can then click

on a result
to go to that
index
p
oint in the video. As shown in
Figure

1.10
, every result also

contains
the snapshot of the video at the index point with the name and
description

of the index
point. It also shows where the keyword was found
-

in the keyword list

(along with
10


number of matches), in title or in

the

description of the index point.

All of the information
here comes from the xml file created in OCR tool as we

explained in the previous
section.


Figure

1.1
1

The
Figure

shows the
progress bar of ICS Video Player. In this case
the video
is playing at
index point 1.



One thing we need to
point out

for the search feature on this player is that when
a
user
search
es

a keyword and find
s

it in

a

keyword list, the progress bar pointer goes to the
beginning of that index
region;

it does not go to the exact position of the videos.


We talked about

the

work flow

of ICS Video player
briefly
and
how

the OCR

tool
is
used i
n the project.

The

purpose of the work done in this
thesis

is to
create an OCR
tool
,

for the

video
player
,

which

will provide the text of the video frames
,

so that
the
user
will be able search inside
a

video by using ICS video player.
There

are several ways to
design

an OCR tool that will create text for ICS video player
; our main

goal
is

to make it
accurate enough

for the end user to use and
find

the
keyword in the right place of the
video

content. For

this, we
tested the

current OCR tools and used some pre
-
processing
techniques to improve their accuracy. With the experiments and
results, we

con
cluded

that the OCR tool
s

we presented
here can be used for
ICS Video P
layer
.

M
odifying
images

by using image processing technique
s
,

p
rior

to
sending
to
the
OCR tool will
increase the accuracy of these tools.


1.3

Related Work

There have been eff
orts around the industry al
ong the lines of video indexing and
usage of OCR for getting text from videos. We will take a look
at
each of
them

separately.

11


1.
3.
1 Video Players


Google

v
ideo is

one of the most famous video player
s

and it is
a free video sharing
website developed by Google Inc.[5]. Google Video

has incorporated indexing feature in
their videos which allows users to
search for a

particular type of video among any video
available publicly on the internet. But,

they index videos based on the context it was
found in and not on the content of

the video itself
,

which means if a video is located in a
website about animals a
nd one

searches for a video about animals then there is a chance
that this video will appear

in the search result. T
he main di
ff
erence here is that the search
result do
es

not

guarantee that videos appearing in search result do indeed have the
required cont
ent

in them (attributed to the fact that indexing was not done on the video
content).

This method does not suit our needs because one of the main requirements of
the

project was to allow students to be able to locate the topic they were looking for

inside
a video.


Another implementation
,

known

as Project Tuva, implemented by Microsoft Re
-
search, features searchable videos along with closed captions and annotations
[
6]
. It

also
features division of the video time
-
line into segments where a segment repre
sents

a topic
taught in the lecture. However, the division of video into segments is done

manually in
Projec
t Tuva. Tuva also off
ers an Enhan
ced Video Player to play videos.

There exists a related technology known as hypervideo which can synchronize

conten
t inside a video with annotations and hyperlink
s
[
7]
. Hypervideo allows a

user

to
navigate between video chunks using these annotations and hyperlinks. Detail
-
on
-

demand video is a type of hypervideo which allows user
s

to locate information in an

interrelated video
[
8]
. For editing and authoring detail
-
on
-
demand type hypervideo

there
12


exists a video editor known as Hyper
-
Hitchcock
[
8,

9
,
10]
. Hyper
-
Hitchcock

video player
can support indexing bec
ause it plays hypervideos, but
one still has to

manuall
y put
annotations and hyperlinks in the hypervideo to index it.

There has

been some research for impl
ementation of search function

a
nd topics
inside a video. Authors of
[
11]

have developed a system known as iView that

features
intelligent searching of Eng
lish and Chinese conten
t inside a video. It con
tains image
processing to extrac
t keywords. In addition to this

iView
, it

also features

speech
processing techniques.


Searchinsidevideo

is another implementation of indexing, searching and
captioning videos.
Searchinsidevideo

is

able to automatically transcribe
the

video content
and let the search engines accurately index the content, so that
they

can include it within
their search results.

Users

can

also
find all of the relevant results for their searches acr
oss
all of
the

content (text, audio and video) in a single, integrated search
.
[
12]

1.
3
.2 OCR implementation
s

in Videos

OCR

is

used in videos in
a lot

of applications such as car plate number
recognition in surveillance cameras
, or text recognition on
news and

sport videos. And
there are a lot of projects going on in universities
that

aim to get a better OCR detection
in videos.


SRI International (SRI) has developed ConTEXTract™, a text recognition
technology that can find and read text (such as street

signs, name tags, and billboards) in
real scenes. This optical character recognition (OCR) for text within imagery and video
requires a more specialized approach than is provided by off
-
the
-
shelf OCR software,
which is designed primarily for recognizing t
ext within documents. ConTEXTract
13


distinguishes lines of text from other contents in the imagery, processes the lines, and
then sends them to an OCR
sub module
, which recognizes the text. Any OCR engine can
be integrated into ConTEXTract with minor modific
ations.
[
14]

The idea of segment
ing

in
this work is inspired by this work.

Authors of
[
13]

proposed

a fully automatic method for summarizing and indexing

unstructured presentation videos based on text extracted from the

projected slides.
They

use changes of

text in the slides as a means to

segment the video into semantic shots.
Unlike precedent approaches,

their

method does not depend on availability of the
electronic source

of the slides, but rather extracts and recognizes the text directly from

the video.
Once text regions are detected within key

frames, a novel

binarization
algorithm, Local Adaptive Otsu (LOA), is employed to

deal with the
low quality of video
scene text
.

We
are
inspired
by
this work
by its application

of

threshold

to images and
its

use
of the
Tesseract OCR tool.

Authors of
[
15]
, worked on
Automatic Video Text Localization and Recognition
for
Content
-
based video indexing for sports applications using multi
-
modal approach.
They used segmentation by using dilation methods for localizing
.

Th
e

method for
segmentation

in our work
is inspired
by this

work.


Authors of
[
16]

from

Augsburg

University
worked on a project named
MOCA
where
they have a paper for
Automatic Text Segmentation
, known

also as text
localization,

and Text Recognition for

Video Indexing
. They used OCR engines to detect
the text from TV programs
. T
o increase the OCR engine accuracy they
presented
a
new
approach to text segmentation

and text recognition in digital video and demonstrated its

suitability for indexing and retrie
val
.
Their idea of using differ
ent snapshots of the same
14


scene is not applicable to our work since our videos are indexed and only these indexed
screenshots are available
to the

OCR tool.

1.
4

Thesis Outline


Th
is

thesis is organized as follows: Chapter 2

gives an introduction to commonly
used
OCR
engines
in
today
s world

and explain
s

the reason for using the
three

OCR
engines MODI
Tesseract

OCR

and GOCR
. In Chapter 3 we
discuss

OCR challenges
for
ICS video images
and explain

our approaches to deal with these challenges
. The methods

we used to enhance text
recognition

are discussed
in Chapter 4
.
The criteria for

enhancement on text detection explained in Chapter

5
. We
also
show

the resu
lts of
experiments in Chapter 5.
The work

is finally concluded in Chapter
6
.

















15


Chapter 2
:

Survey of OCR T
ools

Developing
a
proprietary OCR system is a complicated task and requires a lot of
effort.
I
nstead of creating a new OCR tool it is better to use the existing ones
.


In the previous chapter we mentioned that there are many OCR tools that allow us
to
extract text from an image.
In this chapter
,
we

discuss

the criteria
for

a good
OCR
tool
suitable for

our goals
and then
we mention some of the tools we tested and
justify
our
choice(s)
.

2.1
Popular

OCR Tool
s

Table 2.1 shows us some popular OCR tools.

ABBYY FineReader

Pum
a.NET

AnyDoc Software

Readiris

Brainware

ReadSoft

CuneiForm
/
OpenOCR

RelayFax

ExperVision

TypeReader & RTK

Scantron

Cognition

GOCR

SimpleOCR

LEADTOOLS

SmartScore

Microsoft Office Document Imaging

Tesseract

Ocrad

Transym OCR

OCRopus

Zonal OCR

O
mniPage


Table
2
.1:

Popular OCR tools


These tools

can
be
classified

into 2 types:


a)

Tools
that
can be integrated
in
to our project


Open Source tools

and

some

commercial tools

for

which the

OCR module can be
integrated
in
to a project
such as
Office 2007
MODI.


b)

Tools
that
cannot be integrated
in
to our project

16



Commercial Tools such as ABBY FineReader

that encapsulate OCR
,

mainly

aim
to

scan
,

print

and
edit
. They are successful in getting text
,

but
the

OCR component cannot
be im
ported to as a module to a project as

t
hey have their own
custom
user interface and
everything should be done in this interface.


2.
2

The criteria for a “Good” OCR Tool

for ICS

video images


There
are many

criteria for a good OCR tool in general such as design, user
interface, p
erformance, accessibility etc. T
he priorities for our project
are
the
accessibility, usability and accuracy. So the criteria for being a good tool for our project
are:

1
.

Accessibili
ty
-
Usability:
In ICS video project we will process many image files
and we need to do them automatically. We will not do it one by one or go to a
process of clicking to a program
-
>browse files
-
> run the tool
-
>get the text
-
> put
th
e text to a place we can
use. A
ccessibility is our first concern. How can we
access the tool? Can we call it in a command prompt so that we can access in C++,
C# or JAVA programming languages
?

Or can we access
in windows operating
system and use it with parameters

as many times as

possible and
any t
ime we
want? C
an we include this tool as a package, a
dll

or a
header

to our project so
th
at we can import
and
use as a part of our project?

2.

Accuracy:

The tool
should also
have

a reasonable rate of accuracy in
converting images to te
xt. It is also important that the accuracy of the text
recognition we are looking for is only for our project inputs. In other words, most
of the OCR tools are designed for scanned images which are mostly black and
w
hite. T
hey may claim their accuracy is u
p to 95%
,

but what about the accuracy
17


for colored images? So accuracy for our inputs is another important
criterion

to
decide if the tool is good.


3
.

Complexity:

A program doi
ng one task can be considered
simple, while a
program doing l
ot of tasks can be
considered a

complex tool. In this s
ens
e
,

we
only need the tool
to extract the text from images. A
nything else it does will
increase the complexity of it.


4
.

Updatability:

No algorithm can be considered as the final algorithm. To
increase the accuracy or
performance can we change it so it will work better for
our project? It may be a good tool
,

but it may not support our input type (which is
jpg files). Can we update it so that it will be able to processes our inputs
?

5
.

Performance and
Use S
pace:

Most
too
ls we examined
have

reasonable
performance and they use reasonable me
mory and hard drive space. F
or the ICS
video project
, the

OCR module will work in server side as a web server. That
means the speed for converting the images to text or the required space

or
memory usage for the OCR tool is not
as
important.

Now we can take a look at the tools in the ne
xt section. Since testing all tools

that

work mostly in different environments
-
operating systems
-

would be time consuming, we
test the tools below as exampl
es for their group. We filtered the
popular
tools
shown in
table 2.1 to a table 2.2 as an example according to the classification of accessibility and
complexity in previous section. We presented one example for each group
.

N
ot
importable tools such as com
mercial big tools and small applications

and

importable
tools
such as

MODI. We
tested

two OCR tools in op
en source
tools:

GOCR and
Tesse
ract

OCR which can work in windows environment.

18


NAME

CATEGORY

Simple OCR

Not importable

p浡汬⁁灰汩c~瑩潮
=
䅂A奙=
c楮敒i~der

Not importable

_ig⁁=灬pc~瑩潮
=
呥獳敲~ct
=
佃l
=
f浰潲m~扬攠

佰l渠n潵oce

GOCR

Importable

佰l渠n潵oce
=
佦晩fe′〰㜠=佄f
=
f浰潲m~扬攠

=
_ig⁁灰汩=~瑩潮
=
=
†††††††††††††
=
呡扬攠㈮
O
W
=
pe汥l瑥搠t`o⁴潯=猠
瑯⁴e獴
=
2.
3

Simple OCR

SimpleOCR

is a free tool that can read bi
-
level and grayscale, and create TIFF files
containing bi
-
level (i.e. black & white) images. It works with all fully compliant TWAIN
scanners and also accepts input fro
m TIFF files.
With this tool, it is expected that we
cou
ld easily and accurately convert a paper document into

editable electronic text for
use

in any application including Word a
nd

WordPerfect, with 99% accuracy
.
[
17]



Figure

2.1
:
SimpleOCR
detection example

1: a
-

Input
,

b
-

Output


Sim
pleOCR has a user interface

in which

we

can

open a file by clicking
,

browsing
,

running copying and pasting manually
. I
n other words
,

it does not have
command line usability and also it is not importable to our tool. Hence it could not be

a
)


b
)

19


adopted
for

our project. It also failed to create text in colored images like
Figure

2.1 and
Figure

2.2. It gives an error message “Could n
ot convert
the page to text.


a

b



Figure

2.2
:

Simple
OCR
detection example

2
:

a
-

Input
,
b
-

Output



SimpleOCR was able detect some of the text

in colored images like
f
igure

2.3 but

with

a low accuracy, only the word
Agents

detected correctly.



Figure

2.3
:

SimpleOCR
detection example

3
,

a
-

Input b
-

Output


SimpleOCR failed to be
a
good OCR tool for our project in the first and second
criterion,

Accessibility &

Usability and Accuracy.

a

b

20


2.
4

ABBYY FineReader


ABBYY is a leading provider of document conversion, data capture, and
linguistic software and services. The key areas of ABBYY's research and development
include document recognition and linguistic technologies.
[
18]




Figure 2.4 ABBY
Y Fine
R
eader detection
e
xample a
-

Input b
-

Output


ABBYY showed good accuracy for our test images,
Figure 2.4. B
ut it is not
applicable to our project: to be able to use OCR part of ABBYY Fine Reader, it is
required to use its own interface to get the text. O
pen the files from
the
menu
,

run the
OCR engine

and

s
ee
if
the text

is

accurate
or have

to be corrected

manually as shown in

f
igure 2.5
.







Figure

2.5 User interface of ABBY Fine Reader

a



b

21


Even though the accuracy of ABBY Fine reader is
high, i
t is not a “good”

tool for
our ICS Video Project. It

does not satisfy our first criteria which is
Accessibility &
Usability
.

2.
5

Tesseract

OCR


The Tesseract

OCR engine is one of the most accurate open source OCR engines
available. The source code will read a binary, grey or color image and output text. A tiff
reader is built in that will read uncompressed TIFF images, or lib tiff can be added to read
compress
ed images.
Most of the work on T
esseract is sponsored by Google
[
19]
.




Figure

2.6
:

Tesseract

OCR
detection Example

1
; a
-

Input b
-

Output



Tesseract

OCR engine is being updated frequently
and

the accuracy

of the tool is
precise f
or colored images. Figure 2.6 and
f
igure 2.7 are good example
s

detection of
capabilities of
Tess
e
ract

OCR.

a

b

22





Figure

2.7 Tesseract

OCR
detection
example

2 a
-

Input b
-

Output


Tesseract OCR

may

be

the most accurate open source tool
,

but the accuracy rate
is not perfect,

in figure 2.6,
the las
t line is not recognized at all
.

The image in f
igure

2.7
is recognized precisely, whereas in figure 2.8 is missed the word
Summary
.

B
ut

i
t is
accessible, easy to use and can be called from command prompt

in any programming
languages
.





Figure

2.8

Tesseract

OCR
detection Example

3
; a
-

Input b
-

Output


a


b



a
)


b
)

23


2.
6

GOCR


GOCR is an OCR program,
developed under

the GNU Public License
, initially

written by Jörg Schulenburg,

it is also called

JOCR. It converts scanned images to text
files

[
20]
.


GOCR engine assume
s

no colors, black on white only
, assume
s

no rotation
, same

font
, all characters are
separated

and
every char
acter

is recognized empirically based on
its

pixel pattern

[
21]
.


Figure

2
.9
:

GOCR tool

detection Example

1
;

a
-

Input b
-

Output





Figure

2
.
10
:

GOCR tool

detection Example

2
; a
-

Input b
-

Output

a




b

a


b

24




Figure

2
.1
1
:

GOCR tool

detection
Example

3 a
-

Input b
-

Output

GOCR is also accessible, easy to use and can be called from command prompt in
any programming languages like Tesseract and the detection accuracies

for the images in
figure 2.9
-
2.
11

are similar to Tesseract
. GOCR is not
regularly

updated l
ike Tesseract
.

2.
7

Microsoft

Office Document Imaging (MODI)



Microsoft Office Document Imaging

(
MODI
)

is a

Microsoft Office

application
that supports editing documents scanned by

Microsoft Office Document Scanning. It was
first introduc
ed in

Microsoft Office XP

and is included in later Office versions
including

Office.

Via

COM
, MODI provides an

object model

based on 'document' and 'image' (page)
objects. One feature that has elicited particular interest on the Web is MODI's ability to
convert scanned images to text under program control, using its built
-
in

OCR

engine.

The MODI object model is accessible from development tools that support the
Component Object Model (COM) by using a reference to the Microsoft Office Document
Imaging 11.0

Type Library. The MODI Viewer control is accessible from any


25


development tool that supports

ActiveX

controls by adding Microsoft Office Document
Imaging Viewer Control 11.0 or 12.0 (MDIVWCTL.DLL) to the application project
.

When

optical character recognit
ion (OCR)

is performed on a scanned document,
text is recognized using sophisticated pattern
-
recognition software that compares scanned
text characters with a built
-
in dictionary of character shapes and sequences. The
dictionary supplies all uppercase and
lowercase letters, punctuation, and accent mark
s
used in the selected language
[
22]
.

In the images

tested, the accuracy of Modi
was very good

and it was easy to
access via code.

After importing the

Microsoft Office Document Imaging 12.0 Type
Library

it i
s
accessible from any development tool that supports

ActiveX



MODI.Document md =
new

MODI.Document();


md.Create(FileName));


md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH,
true
,
true
);


MODI.Image image = (MODI.Image)md.Images
[
];


writeFile.Write(image.Layout.Text)



Figure

2
.1
2
:
Using the MODI OCR engine in C# programming language



Figure

2
.
13: MODI
detection
example

1 a
-

Input b
-

Output

a



b

26



Figure

2
.
14
:

MODI

detection
e
xample

2
; a
-

Input b
-

Output



Figure

2
.1
5
:

MODI

detection
e
xample

3
; a
-

Input b
-

Output



A
ccessibility
-
Usability

of Modi
made
it
easier
to be imported to a C# project and
it has
a
very good accuracy rate for the ICS video images.

In figure 2.15 it was able to
detect the thumbnail of video in the left.


We found MODI

to be a friendly engine for the type of
im
ages
in

ICS videos
which are
general
ly

fully colored texts and images.

a)

b)



27


2.7 Conclusion:


We presented some popular OCR tools and checked if they can be integrated to
our ICS video project or not
.
And w
e justified our choice by giving some exa
mples, 3
input images and the results of for each OCR tool. We conclude that we can use any of
the

3

tools
GOCR, Tesseract OCR or
MODI
can

be integrated.

It is hard to say

which one
is best, by looking at the outputs of 3 examples.
We

decided
to
include all of them in our
experiments. A
large scale
of
test

images from ICS video images and the results of these
tools will give us a better perspective and this will be done in the experiments and results
section in the
chapter 5
. B
efore that,
in the f
ollowing chapter we will look at challenges

of OCR and

our proposed methods to enhance the detection.




















28


Chapter 3
:
OCR and Challenges


In the previous chapter we looked for the OCR tools and decided which tool
s

to
use: MODI
, GOCR
and

Tess
e
ract

OCR. In the examples
we
provided in the

previous
section, we saw that colored images confuses OCR tool
s

and their accuracy goes down.

That means there
are

some issues we need to deal with

in
OCR engines and ICS video
images
.

We introduce

what is OCR
,

how

OCR works

and

what are the challenges in
OCR detection for ICS Video frames
so
that we

can discuss

how we can deal with them
in the next chapter
.

3.1
What is OCR?

Optical character recognition, more commonly known as OCR, is the
interpretation of scan
ned images of handwritten, typed or printed text into text that can be
edited on a

computer. There are various components that work together to perform
optical character recognition. These elements include pattern identification, artificial
intelligence an
d machine vision. Research in this area continues, developing
more

effective

read rates and greater precision

[
23]
.

In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed by
Handel who obtained a

US patent

on OCR in USA in 1933(U.S. Patent 1
,915,993). In
1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329).
Tauschek's machine was a mechanical device that used templates and a

photo detector
.

RCA engineers in 1949 worked on the first primitive computer
-
type OCR to help
blind people for the US Veterans Administration, but instead of converting the printed
29


characters to machine language, their device converted it to machine language and then
spok
e the letters. It proved far too expensive a
nd was not pursued after testing

[
24]
.



Since that time
,

OCR
is
used for
credit card

imprints for billing purposes
,
for
digitiz
ing

the serial numbers on coupo
ns returned from advertisements, sorting mails
in

Uni
ted States Postal Service
,

converting the text for

blind people
to
have a computer
read text to them out loud

and digitizing and storing scanned documents in arc
hives like
hospitals, libraries etc.

3.2

How Does OCR Work?


OCR engines are

good

pattern

recognition

engines and robust classifiers, with the
ability to generalize in
decision making
based on imprecise input data. They offer ideal
solutions to a variety of classification
character
.


There are two basic methods used for OCR: Matrix matc
hing and feature extraction.
Of the two ways to recognize characters, matrix matching is the simpler and more
common.

Matrix Matching compares what the OCR scanner sees as a character with a
library of character matrices or templates. When an image matches

one of these
prescribed matrices of dots within a given level of similarity, the computer labels that
image as the corresponding ASCII character.


Matrix matching works best when the OCR
encounters a limited repertoire of type styles, with little or no va
riation within each style.
Where the characters are less predictable, feature, or topographical analysis is superior.

Feature Extraction is OCR without strict matching to prescribed templates. Also known
as Intelligent Character Recognition (ICR),
or Topological Feature Analysis, this method
varies by how much "computer intelligence" is applied by the manufacturer. The
computer looks for general features such as open areas, closed shapes, diagonal lines, line
30


intersections, etc. This method is much
more
versatile than matrix matching but it needs a
pattern recognition process as shown in Figure 3.1
.
[
25]



Figure 3.1 Pattern Recognition steps for Classification



In OCR engines, for
future extraction,

computer need
s

to define which pixels are
path
a
nd
which are not. In another words, it needs to

classify

all the pixels as path
s

and
not path
s
. Paths can be considered as 1
,

oth
ers can be considered as 0. In f
igure 3.2
paths

creates a character of E.






a

b

Figure
3.2 Character

Representation for Future Extraction
:

a) Black and white image


b)
Binary

representation of image
[
26]



31


3.3 Causes of False Detection

OCR engines works on pattern recognition on t
he images
,

before recognition path
they need to classify the each image pixels
as path

(1) or not path(0). Like most of the
images ICS video images are colored in different shades and sometimes distorted or
noisy
. T
his makes pattern recognition fail
at

some level. Even though they have the
ability to lie

in their resilience against distortions in the input dat
a and their capability to
learn, they have a limit. After a certain point of distortions they start to make mistakes.




a b c d

Figure 3.3 Distorted image pattern
analyses
.


a is distorted but could b
e detected
and
considered as

b, c is distorted more and co
uld not be detected.


This pattern recognition and machine learning problem is related to computer
vision problem which is related to
the
human visual system. In that sense
,

we can say if a
picture is hard to read for human
s
, it is also hard to re
ad for computer
s
. (Reverse is not
applicable:

An irony of computer science is that tasks humans struggle with can be
performed easily by computer programs, but tasks humans can perform effortlessly
remain difficult for computers. We can write a computer pr
ogram to beat the very best
human chess players, but we can't write a program to identify objects in a photo or
understand a sentence with anywhere near the precision of even a child.
)

32


The h
uman visual system and
visibility
is
affected

by the following

fa
ctors:


1)
Contrast


relationship between the luminance of an object and the luminance
of

the background.
The

luminance
-

proportion of incident light reflected into the
eye
-

can be affected by location of light sources

and room reflectance (glare
problems)
.

2) Size



The
larger the object, the easier it is to see. However, it is the size of the
image

on the retina, not the size of the object per se that is important. Therefore we

bring smaller objects closer to the eye to see details
.


3)

Color



not really a factor in itself, but closely related both to contrast and
luminance

factors
.



For human
s,

it is essential to have a certain amount of contrast in an image to define
what it is,
which

is
the

same for computers: computers need a certain
amount of

contrast
in
shapes to
be able to detect differences
. This is

also

important

for character recognition.
Ch
aracters in the image should have enough contrast to be able to be defined.


Figure 3.4
:

Contrast and color difference in characters in an i
mage. White text has high
contrast and is easy to read, blue text has low contrast and

it

is hard to read.



Figure 3.5 Size difference in characters in an image. Bigger size text
is easier

to read.


B
est OCR results depend on various factors, the most
important being font and size used
for OCR. Other noted factors are color, contrast, brightness, and density of content.

OCR
33


engines fail in pattern recognition in low contrast, small size and complex colored text.
We can do some image processing technique
s to modify images before using OCR to
reduce

the number of fails of OCR.


OCR engine detection is also affected by font style of text. To detect a certain font
style of a character it should be previously defined and stored. Since our ICS video
p
layer’s fonts are in the type of fonts which most of the OCR engines supported such as
Tahoma, Arial, S
an Serif, Times News Roman etc. so f
ont style problem will barely
affect detection of our OCR engines. So our enhancement would be about segmentation,
te
xt size,
color, contrast, brightness, and density of content.

We will talk about what
approach we used in the next section.














34


Chapter
4
:
Enhancements for OCR detection


In the previous chapter, we looked at the challenges in OCR detection for IC
S
Video frames. Here, we will discuss the approach and the methods we used to get a better
recognition from OCR engines.


OCR engines possess complex algorithms, predefined libraries, and training
datasets
.
Modifying an OCR
algorithm
requires

an u
nderstanding of the algorithm from
the beginning to the end. Apart from that, sometimes images become too complex to be
defined as ICS

video images; therefore, for
better OCR engine results, doing
enhancements on the image before sending it to the OCR engi
ne can be used.

4.1
Segmentation

In the pre
vious chapter, we stated that OC
R engines use segmentation mostly
designed for scanned images with black font on a white background.
Segmentation of
text is two phase: detection of the word and detection of character. Detecting the word
can be considered
to be

locating the word’s text place in the image, and detection of
character is locating the character in the image as well.
Wh
ile

us
ing OCR engines, we
saw that this segmentation is not enough for some ICS video images. Due to the lack of
segmentation, OCR engines make mistakes in an image with complex colored objects.
The mistakes are introduced during setting up a threshold of the im
age to correct
binarization.


A successful

image segmentation for a black and white image and a failed
segmentation for a colored image on a uniform background are represented by figure 4.1
and figure 4.2 respectively.

Segmentation 1 can be considered
to
be

a segmentation of the
35


words and segmentation 2 can be considered
to be

a segmentation of the characters

in
these figures.


Figure

4.1
:

Black

font text on the white color background segmentation for OCR
character recognition.



In Figure 4.
1
,

segmentation 1 and segmen
tation 2 are done successfully s
o that all
texts could be separated to
words and then to characters. I
n figure 4.2,

due to the lack of
difference between tex
t

1 and background, text 1 is

not
segmented as a word.

So it is not
segme
nted

to characters also. Text 2 in figure
4.
2, is in a different background
that
could
be segmented to word and then to characters also.
Text3 and Text4 backgrounds are very
close to each other
in figure 4.2;

because of that

they are considered as a
single

word
, but
since their
font

and background color
are
close to
one an
other, character segmentation
could not be done.


Figure

4.2
:

Complex

background with different
font
color text segmentation for OCR.

36


We need to remember that these figures are used only
for the illustration of OCR
segmentation. We may have to look at ICS video image examples and OCR outputs to
see the importance of segmentation.

Figure 4.3 shows that without segmentation OCR
engines fail, whereas in figure 4.4, segmented input allows for
a better performance of
OCR engines.


_
_c____c_0__
_0_____
c0___0___

___________
____0
0_00__
___0 _0 0 _

\
l_l_;_l'__ll
l'__
\
_)l)l)
\
, __
\

_'___
-
li_'i__ '

__ l il
\
.
\

i t
-
i__ il t i__)

i
-

''
-
" _


-
4 I. I’
-

-
4 C I 41




*






\
'Cl'
\
iCil|

n;_i';"
-
. `

hOl`iZ0llIL
l|

$(llll
\
l`€L|
l`€5[)0ll5€5

smoothed
mean


a)

b)

c)

d
)


Figure

4.3
:

OCR

results for a whole image has complex objects with different colors : a)
input image
b) GOCR result c
) MODI OCR result

d) Tesseract

OCR result



squared responses

squared responses

squared responses


ven ca
I

yen I ci I

vertical


C
lassification

classification

classification


H
oni
zontal

horizontal

horizontal


smoothed mean

smoothed mean

smoothed mean

a)

b)

c)

d)


Figure

4.4
:

OCR Results for a segmented image which has complex objects with
different
colors:

a) segmented part of input
b) GOCR result c
) MODI OCR result

d)
Tesseract

OCR result

37


Image segmentation is probably the most widely studied topic in computer v
ision
and image processing.
Hence, many studies have been done on segmentation for a
particular application. We
mentioned
some

of them in chapter
1
. In our approach, we
simply grouped the objects on the screen by
thresholding,
dilating and doing blob
color
ing extraction

which will be explained in the
following

sections.


4.1
.1

Thresholding



Images in ICS videos are
colored;

therefore we need to convert colored

image
s

to
black and white image
s
, for

segmentation and morphological operations. We will do so
by performing image binariz
a
tion known as thresholding. We used the SIS filter in
AForge Image Library
, a free source

image processing library for C#
,
which

performs
image thresholding calculating t
he threshold automatically using simple image statistics
method. For each pixel:



two gradients are calculated



ex = |I(x + 1, y)
-

I(x
-

1, y)| and |I(x, y + 1)
-

I(x, y
-

1)|;



weight is calculated as maximum of two gradients;



sum of weights is updated

(weight

t
otal += weight);



s
um

of weighted pixel values is updated (total += weight * I(x, y)).

The result threshold is calculated as sum of weighted pixel values divided by sum of
weight

[
27]
.

38


a) Input Image

b) Threshol
ded Image


FIGURE 4.5
:
SIS Threshold

Example 1: Out
put result is in the white foreground
on the
black background.

Sis thresholding

results can be different as shown in Figure 4.5 and Figure 4.6. In
Figure 4.5, output is white foreground and black background; in Figure 4.6, it is reverse