Computer Vision Systems for the
STATS 19 SEM 2. 263057202. Talk 3.
UCLA. Dept. Statistics and Psychology.
Computer Vision Systems
Digital Camera + Portable Computer +
(I) Input image from camera.
(II) Algorithm on PC searches the image
to detect and read text.
(III) Speech Synthesizer speaks the text.
LED/LCD displays are very common. But
impossible for the Blind to use.
Controlled domain. Design system to
detect and read the displays.
Prototype System. (1999).
Subjects using the LED Reader.
Implementation using special purpose
hardware being built.
Blind Volunteer with Camera
Blind volunteers take
photographs. Still digital
camera, or video
settings. Gain control.
Dynamic range of the
eye is far larger than the
range of a camera.
Gain Control: Digital Cameras
Limitation due to the quality of the input
Blind users cannot point camera, focus,
adjust camera gain, or keep the camera
Enormous variation in the intensity in
camera range is 100.
Biologically Inspired Cameras.
Ideal: cameras with the ability of the
Large gain control (from 100 to
More than 30 frames/second (to
decrease motion blur).
Companies are designing cameras with
these abilities. (Carver Mead).
Images taken by the Blind
Top two rows are
Images taken by
Bottom two rows
are images by
at orienting the
Experiments with Blind Volunteers
Experiments with Blind Volunteers. In San
Blind volunteers could keep the camera
They could hold it steady so there is little
Automatic gain control was usually sufficient to
give good quality images.
Visual Search to Detect Text.
The human visual system has mechanisms for
directing “interesting parts” of images.
Known as “Visual Attention”.
Visual attention causes eye movements and
We need a form of visual attention to detect
This must be fast. We want to quickly
text areas of the image.
Strategy I: Twenty Questions.
Divide the image up into many small
Apply “filter tests” to each window.
If the window fails the test, then eliminate
If it passes, then proceed to the next test.
Apply tests until there are only a few (1
windows in the image which pass all tests.
Strategy II: Test Selection.
Choose a vocabulary of tests. E.g. average
image brightness, local image variability.
Use a Machine Learning algorithm
“AdaBoost” to select and combine tests.
Requires a training dataset of text and
text. (Learning with a teacher).
AdaBoost combines “weak tests” into a
AdaBoost Example: Face Detection.
used in Computer
Vision to detect
Example Sequence I:
Series of tests, selected by AdaBoost.
Results of AdaBoost.
Strong Performance: Very
High Detection Rate.
Failures of AdaBoost.
AdaBoost fails to detect some text.
Next Stage: Binarization.
AdaBoost detects regions of text in
windows of the image.
Apply a binarization algorithm. Label the
points within the window as letters/digits
or as background.
Extend the binarization to areas outside
to include letters/digits that
are just outside the window.
Results of Binarization.
Optical Character Recognition (OCR)
OCR has been developed for reading text
Black and white images. High resolution.
We apply it to the binarized output of
OCR will read the text and reject regions
which are not
Text detected by AdaBoost,
Binarized, and read by OCR.
Text detected, but not read.
text detected, rejected by OCR.
text detected, read by OCR.
Can detect text within our dataset (San
Francisco) with false negative rate of
We can read the detected text correctly at
Read detected non
text as text at 1.0%.
Prototype System: room for improvement.
It will soon be practical to build Computer
Vision systems for text detection and
reading that work in unconstrained