Using the Ullman Algorithm for

beeuppityΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

125 εμφανίσεις

Optical Character Recognition:

Using the Ullman Algorithm for
Graphical Matching


Iddo

Aviram

OCR
-

a Brief Review


Optical character recognition
, usually
abbreviated to
OCR
, is the mechanical or
electronic translation of scanned images of
handwritten, typewritten or printed text into
machine
-
encoded text
.


OCR
is
a
task, and not a

mathematically
defined problem.

OCR
-

a Brief Review


People are using many disciplines for OCR.








We will show just
a simple, not
representative,
approach to deal partly with the OCR task.


Fourier Transforms

Pattern Matching

Machine Learning

Differential Geometry

Computer Vision

Neural Networks

Expert Systems

Optimization Problems

Topology

Decision Making

OCR
-

a Brief
Review


The task can be very hard, and state
-
of
-
the
-
art algorithms
might be
not good
enough for some practical purposes. In
several cases, however, OCR tools can perform well and be
useful.

Harder

Easier

Handwritten

Printed

Cursive

Block letters

Free handwritten

Scribe script

Offline

Online

Connected writing

Non
-
connected writing

䑥杲慤a搠浡m畳c物p瑳

W敬e
-
灲敳敲v敤慮畳c物p瑳

Non restricted

writing

Restricted

writing

OCR
-

a Brief
Review


The human brain does amazingly well with
OCR tasks, so usually the computer results are
evaluated by a comparison with a manually
created ground truth data.


However, sometimes even humans are not
capable of recognition.

OCR
-

a Brief
Review


Can you read these scripts?


הווקת חתפב ןלדנ
:

(
הלעמל
:

ךותמ
yad
1
.co.il
,


2012

הטמל
:

ה ךותמ
"
תלצבח
"

1912
)


OCR
-

a Brief
Review


Can you read this script?


תמדקומ הסרג
ל
"
באכ ריש
+ "
"
םיבר םימ ריש
"
,
לאירא ריאמ
,
ףוס
ה תונש
-
70

OCR
-

a Brief Review


Can you read this script?



ל
ֵּ
בְל
ִ
בְל רֹמֱא ךלמל ר
ֶ
מֹא
:
ָת
ַ
א םלֹ ָשֲה
?
ךָ
ִ
תְכ
ַ
רְב
ִ
הו
סֹקְל
.
ר
ֶ
שֲא ל
ֶ
כֹאָה ת
ֶ
א ן
ֵּ
ת ָת
ַ
עְו
ד
ַ
מ
ִ
ע
הֹמ
ִ
א
ִ
חֲא

] [
ם
ִ
ר
ֵּ
הְו
ע
[
ז
]
לא
ל
ַ
ע
ְז
ִ
מ
[
ח
ַ
ב


ֶ
י ן
ֶ
פ סֹק
]
ל
ֶ
כֹאָה ר
ַ
מְח
.


סרח לע תבותכ
(
ןוקרטסוא
)
-

הזוע תברוח

לזרבה תפוקת
II,
ה האמה
-
7

ינפל
הריפסה

סרח לע ויד

תוקיתעה תושר


Using graphical tools for object recognition.


A possible scheme:


Binarization


Segmentation by connected components


Thinning


Graphical modeling


Graphical matching


Rule
-
Based Selection


OCR
-

Motivation for Graphical
Matching


Binarization:



OCR
-

Motivation for Graphical
Matching


Segmentation
-
> Thinning
-
> Graphical modeling:



OCR
-

Motivation for Graphical
Matching


Given
an historical manuscript, a blessing of
Brit
Milah
:

OCR
-

Motivation for Graphical
Matching


We’re interested in finding the occurrences of
the letter
Mem

(not final):

OCR
-

Motivation for Graphical
Matching


By sub
-
graph matching we can find
candidates:

OCR
-

Motivation for Graphical
Matching

Graphical
modeling

Graphical
matching


Given two graphs H and G as input, the
problem is whether H has a subgraph that is
isomorphic to G.


Subgraph Isomorphism Proble
m



In this example the answer is
‘yes’ since there’s an isomorphic
correspondence
:



1
G
-
1
H
,
2
G
-
3
H
,
3
G
-
2
H
.

(There are additional
isomorphic correspondences).




Graph isomorphism



Graphs
G(V
G
,E
G
) and H(V
H
,E
H
) are isomorphic if
|
V
G
|=|
V
H
| and there is an invertible
function F
from V
G

to V
H

such that for all nodes u and v in
V
G
, (
u,v
)

E
G

if and only if (F(u),
F(v))

E
H
.


Such a function F is said to be an isomorphic
correspondence.

Subgraph Isomorphism Proble
m


The subgraph problem is NP
-
complete.


There is a very simple reduction:

CLIQUE ≤
P

Subgraph
Isomorphism


However, for many specific types of practical
problems (even with ‘big’ inputs), algorithms
do answer fast.

Subgraph Isomorphism Proble
m



An Algorithm for Subgraph
Isomorphism
,
J. R.
Ullmann
, Journal of the
ACM,
1976
.


Although old, this algorithm is still very
popular and having good results in practice.

The Ullman Algorithm


There are algebraic formulations for graph
isomorphism and subgraph isomorphism, that
we will take use of.


The
adjacency matrix
A
H

of a graph H would
be:



The Ullman Algorithm


We will use the notion of

a
permutation matrix
.


Any permutation matrix is equivalent to an isomorphic
correspondence.


The Ullman Algorithm

M’=




-







-







-







-




Isomorphic Correspondence

Permutation Matrix

F=

F~M’


Two graphs
𝐻

and
𝐻
2

are isomorphic with a
correspondence F


𝐴


is similar to
𝐴

2
, and the
similarity matrix is M’~F.

The Ullman Algorithm

F=

𝐴

2
=
𝑀

𝐴

𝑀


1

Isomorphic Correspondence

Permutation Matrix

~




-







-







-







-




M’=

F~M’

Isomorphism criterion:

iff

𝐻
2

is
isomorphic
to H, with a
correspondence F~M
’.


𝐴

2
=
𝑀

(
𝑀

𝐴

)
𝑇


We can develop this equation that defines an
isomorphism:


The Ullman Algorithm

𝐴

2
=

𝑀

𝐴

𝑀


1



=
𝑀

𝐴

𝑇
𝑀


1



=
𝑀

(
𝑀

𝐴

)
𝑇

Since M’ is an
orthonormal
matrix, thus
𝑀

𝑀

𝑇
=I

Since
𝐴


is a
symmetric matrix

Isomorphism criterion:

iff

𝐻
2

is
isomorphic
to H, with a
correspondence F~M’.


In a similar fashion (without proof) we have
an algebraic criterion for a subgraph

isomorphism.

The Ullman Algorithm

M’=

1
G
-
1
H

2
G
-
3
H

3
G
-
2
H

4
G
-
φ

Isomorphic Correspondence

Permutation Matrix

~

F=

Subgraph isomorphism criterion:

𝐴

=
𝑀

(
𝑀

𝐴

)
𝑇

iff

G is
subgraph

isomorphic to H, with a
correspondence
F~rectangular

M’.


We have a graph G and a graph H, and we want to know if G is
subgraph

isomorphic to H
.


So, We’ll search for
a permutation matrix M* of size



|
𝑽

|
x |
𝑽

| that satisfies the subgraph isomorphism criterion
.


We will enumerate over candidate
permutation matrices of the
same size,
denoting a candidate by M’
, from a set of candidates
that satisfies:


(The
set
of all M*
-
s
)

(The set of all M

-
s
) .
During the



enumeration, we check the isomorphism criterion over each



candidate. If a candidate satisfies the criterion, we will return




‘yes’. If we would not find such a candidate, we will return ‘no’.

The Ullman Algorithm

The Ullman Algorithm


Ullmann

s algorithm I


Construction of another matrix M
(
0
)

with the same size of the M

-
s:












Generation of all M

-
s by setting to
0

all but one
1
in each row of M
(
0
)


A subgraph isomorphism has been found
if
M implies:












.

The Ullman Algorithm


Ullmann

s algorithm I


Example

Inner Nodes


M
-
s

Root
-

M
(
0
)

Leaves


M’
-
s

The Ullman Algorithm


Ullman

s algorithm II


Construction of another matrix M
(
0
)

with the same size of the M

-
s:












Generation of all M

-
s by setting to
0

all but one
1
in each row of M
(
0
)

.
However,
i
n
this version, we
will also
prune
all inner
nodes M
-
s that
have at least one
1
entry that
doesn‘t comply with the refinement
rule (to be defined).
We
are guaranteed to end up with the right
answer since we still hold:



(The set of all M
*
-
s
)


(The set of all M’
-
s)


A subgraph isomorphism has been found if there is M‘ that satisfies







.

The Ullman Algorithm


Ullmann

s refinement rule for prunning the search tree:



Observation:


If a vertex of G,
𝑣


, corresponds to a vertex of H,
𝑣

, then for each
adjacent vertex of
𝑣



in
G
, denoted
𝑣
𝐴

, there must be a vertex in H,
denoted
𝑣
𝐴

, in H that holds:


A
.
𝑣
𝐴


is adjacent to
𝑣


in

H


B.
𝑣
𝐴


corresponds to
𝑣
𝐴


𝑣


𝑣
𝐴


𝑣


𝑣
𝐴


The Ullman Algorithm


Algebraic notation:


For all
m
i,j

=

1
(that is already fixed):




:
1



𝑉

(
𝑎


,

=
1
)



:
1



𝑉

(
𝑚

,



𝑎


,

=
1
)



Any
inner node M that does not
satisfy
this rule is
prunned, because all of its
decendants
are not M*
-
s.