Computer Vision Systems for the Blind andVisuallyDisabled

builderanthologyAI and Robotics

Oct 19, 2013 (3 years and 5 months ago)


Computer Vision Systems for the
Blind and



STATS 19 SEM 2. 263057202. Talk 3.

Alan Yuille.

UCLA. Dept. Statistics and Psychology.


Computer Vision Systems

Digital Camera + Portable Computer +

Speech Synthesizer.

(I) Input image from camera.

(II) Algorithm on PC searches the image
to detect and read text.

(III) Speech Synthesizer speaks the text.

LED Reader

LED/LCD displays are very common. But

impossible for the Blind to use.

Controlled domain. Design system to
detect and read the displays.

LED Reader.

Prototype System. (1999).

Subjects using the LED Reader.

Implementation using special purpose
hardware being built.

Blind Volunteer with Camera

Blind volunteers take
photographs. Still digital
camera, or video

Automatic camera
settings. Gain control.

Dynamic range of the
eye is far larger than the
range of a camera.

Gain Control: Digital Cameras

Limitation due to the quality of the input

Blind users cannot point camera, focus,
adjust camera gain, or keep the camera

Enormous variation in the intensity in
natural images:

range 10,000,000,

camera range is 100.

Biologically Inspired Cameras.

Ideal: cameras with the ability of the

human retina:

Large gain control (from 100 to

More than 30 frames/second (to
decrease motion blur).

Companies are designing cameras with
these abilities. (Carver Mead).

Images taken by the Blind

Top two rows are

Images taken by

blind volunteers

Bottom two rows

are images by


Scientists better

at orienting the

camera and

Centering text.

Experiments with Blind Volunteers

Experiments with Blind Volunteers. In San

Experiments showed:

Blind volunteers could keep the camera
approximately horizontal.

They could hold it steady so there is little
motion blur.

Automatic gain control was usually sufficient to
give good quality images.

Visual Search to Detect Text.

The human visual system has mechanisms for
directing “interesting parts” of images.

Known as “Visual Attention”.

Visual attention causes eye movements and
directs gaze.

We need a form of visual attention to detect

This must be fast. We want to quickly
reject non
text areas of the image.

Strategy I: Twenty Questions.

Divide the image up into many small

Apply “filter tests” to each window.

If the window fails the test, then eliminate

If it passes, then proceed to the next test.

Apply tests until there are only a few (1
windows in the image which pass all tests.

Strategy II: Test Selection.

Choose a vocabulary of tests. E.g. average
image brightness, local image variability.

Use a Machine Learning algorithm
“AdaBoost” to select and combine tests.

Requires a training dataset of text and
text. (Learning with a teacher).

AdaBoost combines “weak tests” into a
“strong test”.

AdaBoost Example: Face Detection.

AdaBoost was

used in Computer

Vision to detect

Best test:

Forehead brighter

than eyes.

Example Sequence I:

Series of tests, selected by AdaBoost.

Example II.

Results of AdaBoost.

Strong Performance: Very

High Detection Rate.

Failures of AdaBoost.

AdaBoost fails to detect some text.

Next Stage: Binarization.

AdaBoost detects regions of text in
windows of the image.

Apply a binarization algorithm. Label the
points within the window as letters/digits
or as background.

Extend the binarization to areas outside
the window

to include letters/digits that
are just outside the window.

Results of Binarization.

Optical Character Recognition (OCR)

OCR has been developed for reading text
on documents.

Black and white images. High resolution.

We apply it to the binarized output of

OCR will read the text and reject regions
which are not

Text detected by AdaBoost,
Binarized, and read by OCR.

Text detected, but not read.

text detected, rejected by OCR.

text detected, read by OCR.


Can detect text within our dataset (San
Francisco) with false negative rate of

We can read the detected text correctly at

Read detected non
text as text at 1.0%.

Prototype System: room for improvement.


It will soon be practical to build Computer

Vision systems for text detection and
reading that work in unconstrained