Computer Vision Systems for the Blind andVisuallyDisabled

builderanthologyAI and Robotics

Oct 19, 2013 (3 years and 5 months ago)

59 views

Computer Vision Systems for the
Blind and

Visually

Disabled
.

STATS 19 SEM 2. 263057202. Talk 3.

Alan Yuille.

UCLA. Dept. Statistics and Psychology.

www.stat.ucla/~yuille

Computer Vision Systems


Digital Camera + Portable Computer +

Speech Synthesizer.


(I) Input image from camera.


(II) Algorithm on PC searches the image
to detect and read text.


(III) Speech Synthesizer speaks the text.


LED Reader


LED/LCD displays are very common. But

impossible for the Blind to use.


Controlled domain. Design system to
detect and read the displays.

LED Reader.


Prototype System. (1999).


Subjects using the LED Reader.


Implementation using special purpose
hardware being built.

Blind Volunteer with Camera


Blind volunteers take
photographs. Still digital
camera, or video
camera.


Automatic camera
settings. Gain control.


Dynamic range of the
eye is far larger than the
range of a camera.

Gain Control: Digital Cameras


Limitation due to the quality of the input
images.


Blind users cannot point camera, focus,
adjust camera gain, or keep the camera
steady.


Enormous variation in the intensity in
natural images:


range 10,000,000,


camera range is 100.

Biologically Inspired Cameras.


Ideal: cameras with the ability of the

human retina:

(I)
Large gain control (from 100 to
100,000,000).

(II)
More than 30 frames/second (to
decrease motion blur).


Companies are designing cameras with
these abilities. (Carver Mead).



Images taken by the Blind

Top two rows are

Images taken by

blind volunteers
.

Bottom two rows

are images by

Scientists.

Scientists better

at orienting the

camera and

Centering text.

Experiments with Blind Volunteers


Experiments with Blind Volunteers. In San
Francisco.


Experiments showed:

1.
Blind volunteers could keep the camera
approximately horizontal.

2.
They could hold it steady so there is little
motion blur.

3.
Automatic gain control was usually sufficient to
give good quality images.

Visual Search to Detect Text.


The human visual system has mechanisms for
directing “interesting parts” of images.


Known as “Visual Attention”.


Visual attention causes eye movements and
directs gaze.


We need a form of visual attention to detect
text.


This must be fast. We want to quickly
reject non
-
text areas of the image.


Strategy I: Twenty Questions.


Divide the image up into many small
windows.


Apply “filter tests” to each window.


If the window fails the test, then eliminate
it.


If it passes, then proceed to the next test.


Apply tests until there are only a few (1
-
5)
windows in the image which pass all tests.

Strategy II: Test Selection.


Choose a vocabulary of tests. E.g. average
image brightness, local image variability.


Use a Machine Learning algorithm
“AdaBoost” to select and combine tests.


Requires a training dataset of text and
non
-
text. (Learning with a teacher).


AdaBoost combines “weak tests” into a
“strong test”.

AdaBoost Example: Face Detection.


AdaBoost was

used in Computer

Vision to detect
faces.


Best test:

Forehead brighter

than eyes.

Example Sequence I:


Series of tests, selected by AdaBoost.

Example II.


Results of AdaBoost.

Strong Performance: Very

High Detection Rate.

Failures of AdaBoost.


AdaBoost fails to detect some text.

Next Stage: Binarization.


AdaBoost detects regions of text in
windows of the image.


Apply a binarization algorithm. Label the
points within the window as letters/digits
or as background.


Extend the binarization to areas outside
the window


to include letters/digits that
are just outside the window.

Results of Binarization.

Optical Character Recognition (OCR)


OCR has been developed for reading text
on documents.


Black and white images. High resolution.


We apply it to the binarized output of
AdaBoost.


OCR will read the text and reject regions
which are not
-
text.

Text detected by AdaBoost,
Binarized, and read by OCR.

Text detected, but not read.

Non
-
text detected, rejected by OCR.

Non
-
text detected, read by OCR.

Performance


Can detect text within our dataset (San
Francisco) with false negative rate of
2.8%.


We can read the detected text correctly at
93.0%.


Read detected non
-
text as text at 1.0%.


Prototype System: room for improvement.

Summary


It will soon be practical to build Computer


Vision systems for text detection and
reading that work in unconstrained


domains.