CSC 425/525: No
As with natural language understanding and speech recognition, the task of computer vision, or vision
comprehension and understanding, is complex. To solve the entire problem (from image, say bitmap, t
identification of the scene) requires numerous substeps and mappings between them. Unlike speech
recognition though, which has found some adequate solutions using HMMs, or by combining several
distinct approaches like neural networks, HMMs, fuzzy reason
ing and symbolic knowledge, each solving
different levels of the problem, the computer vision problem has not yet been suitably solved. Only
pieces of it have concrete solutions and other pieces continue to be researched. What is true is that there
pecific algorithms that might be applicable to solve specific tasks:
Color and intensity detection
Optical character recognition
Detection of a specific object (e.g., a person versus furniture)
we do not have the time in this class to look at all of the algorithms that have thus
employed to solve these various problems. In fact, a study of computer vision could take up a full
long class and not even come close to looking at the
depth of research both past and present. So
instead, we will consider the full vision problem and look at the mappings required. We will then
highlight just a few algorithms. Any further study will have to wait for CSC 625 or your own personal
Computer Vision: the Mappings
Like speech recognition, computer vision, as a complete problem, is considered to consist of numerous
mappings. That is, the input data (say a bitmap), must be broken down into smaller parts, each of which
is then analyz
ed in several ways. First, edges are detected. From those edges, shapes have to be
identified (that is, which edge belongs to which shape). Shapes are then used to determine where the
distinct objects are. Then, object identification takes place where
each object is classified as to what type
of thing it is. Finally, the entire scene must be classified. For instance, is this collection of identified
objects that of students in a classroom sitting at tables, or people drinking coffee in a Starbucks or
of military personnel in a battlefield?
However, even within this series of mappings, there are more steps. Edge detection requires first
determining shadows, lines, contours. Between two objects, will the “edge” be a solid black line? Not
cessarily. It might be a transition from one color to another. If both objects are similarly colored, the
edge may be barely discernible. Additionally, based on the shape of the objects, the edge may not be a
straight line and certainly the edge might c
onsist of a number of bends and angles.
Object classification requires first joining together all of the edges that make up the border of the object,
plus internal edges used to indicate curvature and whether portions of the object protrude (come out of t
image when considering the image in 3
D) or indent. This requires yet another series of steps in the
mapping. First, common edges must be identified. Next, the edges are combined to form a 2
From there, shadows are used to form a 3
(in fact, since the image is flat, the image will be
known as a 2 ½
D image, 2 dimensions plus depth being “simulated” or guessed at).
will be complicated, not only because an object may not appear in the expected orientation (for in
the computer may know what a cup looks like, but what if the cup is upside down?), but the object may
also be partially occluded by other objects.
Below, the mapping shows that the raw data may first require filtering. Filtering might include sim
to D conversion although we will assume that the input is a bitmap. Other filtering algorithms though
may “clean up” the image to make the other steps easier. Segmentation breaks the image into edges and
identifies textures, and from there, the edg
es and textures are grouped into regions (objects). Regions are
then enhanced into 2 ½
D or 3
D and objects are recognized. Finally, the entire image must be classified.
Aside from the algorithms that perform the mappings, knowledge about objects (
their appearance, how
they might vary), understanding interactions of visual images (how shadows form, textures, how images
might occlude each other), and how images go together in terms of identifying a scene are all required to
solve the entire problem.
This information can come in many different forms as indicated by the set of
knowledge listed on the left of the above figure.
The problem is this: given a bitmap (of black & white, grey
scale or color images), identify the edges.
can be a challenge even in a black & white image because edges may not be a unique color. Instead,
pixels must be scanned with the algorithm looking for changes in intensity and color (if a color image).
Not all changes in intensity and color will denote
an edge however, but large changes usually do. There
are mathematical approaches to edge detection, so the actual solution is omitted here. However, the
identification of edges is only the start of the problem. Which edges go with which edges to make u
object? This step is critical yet far more challenging.
Below to the right is an idea of identifying an
object based on combining edges.
Waltz came up with an algorithm based on junction
points. A junction point is a point in the image
where an e
dge meets another edge. By labeling
edges as either convex or concave, Waltz came up
with a limited number of ways that edges could
combine. Thus, some of the possible combinations
of edge connections, and therefore possible
collection of edges that make
up an object, can be ruled out
due to constraints of legal junction points. Waltz’s algorithm then is a constraint
based one. Notice in the
image above on the right, the inner cube must be indented, not extended, because only the indented one
e, therefore constraints must be used to eliminate some of the interpretations of those edges.
Below is a collection of some of the ways that three edges, so
hedrals, can fit together. In
Waltz’s algorithm, there are 27 different ways that 3 e
dges can be combined. The + signs indicate convex
edges and the
signs indicate concave edges.
Waltz’s algorithm is limited to tri
hedral edge connections. Thus, other algorithms would be required if
our image contains blobs, contours,
curves, or other connections of edges that are more or fewer than 3.
Often, these approaches use such mathematical models as eigen
models (eigenform, eigenface),
quadratics or superquadratics, distance measures, closest point computations, and so forth.
Other Computer Vision Algorithms
Aside from those discussed above, a number of different algorithms have been utilized to solve various
specific computer vision problems. We touch on those here:
Machine produced character recognition has been solved sat
isfactorily through neural networks
written character recognition
has many potential solutions, none of which has satisfactorily
solved the problem, particularly when you consider cursive writing. Among the solutions are
neural networks, genetic alg
orithms, HMMs, Bayesian probabilities, nearest neighbor matching
. Two approaches are shown below.
also has many potential solutions. The problem is, given an image that contains
a person’s face
, map the face’s contours and textures into mathematical equations and Gaussian
distributions. Then use some form of matching (perhaps nearest neighbor) to identify the face in
a database of faces.
Image stabilization and image (object) tracking
, which ar
e often used in such problems as digital
cameras or automated video cameras, use a variety of solutions as well. Among them are
networks, fuzzy logic
best fit search
using some well
Autonomous vehicle input
ly does not use images themselves but instead radar and sonar
(mostly sonar). Thus, road side detection and obstacle detection are required algorithms.
However, in some cases, roadways are detected by video images looking for the lines (white and
lines). Again, algorithms might include neural networks or mathematical approaches.
Here are two solutions to hand
written character recognition (of printed characters, not cursive). In the
first, multiple neural networks are employed, for instance one
to detect horizontal versus vertical lines and
other to detect primitive shapes. Then, the neural networks supply information that is “voted” upon based
on which neural network outputs are most convincing and which features are required for the character
being recognized. Below, a “4” is being recognized. It might consist of a near vertical line, a horizontal
line and another vertical line half as long, but it could also include curves. The neural networks are
trained only on the shapes, not the charact
ers. It is up to voting to decide how well the expected features
are identified by the neural networks.
The second possible approach uses abduction and symbolic knowledge to look for explicit features of
written characters. Here, a character rec
ognizer exists for every character to be recognized. Each
character recognizer searches for features that it expects to find or expects not to find, and scores that
character appropriately. For instance, if a letter looks like it could be an E or F, we m
ight score an E as
“highly likely” and an F as “somewhat likely. The algorithm will be explored in more detail in next