CSC 425/525: Notes on Computer Vision

linksnewsΤεχνίτη Νοημοσύνη και Ρομποτική

18 Οκτ 2013 (πριν από 4 χρόνια και 21 μέρες)

92 εμφανίσεις

CSC 425/525: No
t
es on
Computer Vision


I.

Introduction


As with natural language understanding and speech recognition, the task of computer vision, or vision
comprehension and understanding, is complex. To solve the entire problem (from image, say bitmap, t
o
identification of the scene) requires numerous substeps and mappings between them. Unlike speech
recognition though, which has found some adequate solutions using HMMs, or by combining several
distinct approaches like neural networks, HMMs, fuzzy reason
ing and symbolic knowledge, each solving
different levels of the problem, the computer vision problem has not yet been suitably solved. Only
pieces of it have concrete solutions and other pieces continue to be researched. What is true is that there
are s
pecific algorithms that might be applicable to solve specific tasks:



Edge detection



Color and intensity detection



Shape detection



Optical character recognition



Detection of a specific object (e.g., a person versus furniture)



Face detection


Unfortunately,
we do not have the time in this class to look at all of the algorithms that have thus
-
far been
employed to solve these various problems. In fact, a study of computer vision could take up a full
semester
-
long class and not even come close to looking at the

depth of research both past and present. So
instead, we will consider the full vision problem and look at the mappings required. We will then
highlight just a few algorithms. Any further study will have to wait for CSC 625 or your own personal
research
.


2.

Computer Vision: the Mappings


Like speech recognition, computer vision, as a complete problem, is considered to consist of numerous
mappings. That is, the input data (say a bitmap), must be broken down into smaller parts, each of which
is then analyz
ed in several ways. First, edges are detected. From those edges, shapes have to be
identified (that is, which edge belongs to which shape). Shapes are then used to determine where the
distinct objects are. Then, object identification takes place where
each object is classified as to what type
of thing it is. Finally, the entire scene must be classified. For instance, is this collection of identified
objects that of students in a classroom sitting at tables, or people drinking coffee in a Starbucks or
a group
of military personnel in a battlefield?


However, even within this series of mappings, there are more steps. Edge detection requires first
determining shadows, lines, contours. Between two objects, will the “edge” be a solid black line? Not
ne
cessarily. It might be a transition from one color to another. If both objects are similarly colored, the
edge may be barely discernible. Additionally, based on the shape of the objects, the edge may not be a
straight line and certainly the edge might c
onsist of a number of bends and angles.


Object classification requires first joining together all of the edges that make up the border of the object,
plus internal edges used to indicate curvature and whether portions of the object protrude (come out of t
he
image when considering the image in 3
-
D) or indent. This requires yet another series of steps in the
mapping. First, common edges must be identified. Next, the edges are combined to form a 2
-
D image.
From there, shadows are used to form a 3
-
D image
(in fact, since the image is flat, the image will be
known as a 2 ½
-
D image, 2 dimensions plus depth being “simulated” or guessed at).
Object classification
will be complicated, not only because an object may not appear in the expected orientation (for in
stance,
the computer may know what a cup looks like, but what if the cup is upside down?), but the object may
also be partially occluded by other objects.


Below, the mapping shows that the raw data may first require filtering. Filtering might include sim
ple A
to D conversion although we will assume that the input is a bitmap. Other filtering algorithms though
may “clean up” the image to make the other steps easier. Segmentation breaks the image into edges and
identifies textures, and from there, the edg
es and textures are grouped into regions (objects). Regions are
then enhanced into 2 ½
-
D or 3
-
D and objects are recognized. Finally, the entire image must be classified.




Aside from the algorithms that perform the mappings, knowledge about objects (
their appearance, how
they might vary), understanding interactions of visual images (how shadows form, textures, how images
might occlude each other), and how images go together in terms of identifying a scene are all required to
solve the entire problem.

This information can come in many different forms as indicated by the set of
knowledge listed on the left of the above figure.


3.

Edge Detection


The problem is this: given a bitmap (of black & white, grey
-
scale or color images), identify the edges.
This
can be a challenge even in a black & white image because edges may not be a unique color. Instead,
pixels must be scanned with the algorithm looking for changes in intensity and color (if a color image).
Not all changes in intensity and color will denote

an edge however, but large changes usually do. There
are mathematical approaches to edge detection, so the actual solution is omitted here. However, the
identification of edges is only the start of the problem. Which edges go with which edges to make u
p an
object? This step is critical yet far more challenging.
Below to the right is an idea of identifying an
object based on combining edges.


Waltz came up with an algorithm based on junction
points. A junction point is a point in the image
where an e
dge meets another edge. By labeling
edges as either convex or concave, Waltz came up
with a limited number of ways that edges could
combine. Thus, some of the possible combinations
of edge connections, and therefore possible
collection of edges that make

up an object, can be ruled out
due to constraints of legal junction points. Waltz’s algorithm then is a constraint
-
based one. Notice in the
image above on the right, the inner cube must be indented, not extended, because only the indented one
makes sens
e, therefore constraints must be used to eliminate some of the interpretations of those edges.
Below is a collection of some of the ways that three edges, so
-
called tri
-
hedrals, can fit together. In
Waltz’s algorithm, there are 27 different ways that 3 e
dges can be combined. The + signs indicate convex
edges and the


signs indicate concave edges.


















Waltz’s algorithm is limited to tri
-
hedral edge connections. Thus, other algorithms would be required if
our image contains blobs, contours,

curves, or other connections of edges that are more or fewer than 3.
Often, these approaches use such mathematical models as eigen
-
models (eigenform, eigenface),
quadratics or superquadratics, distance measures, closest point computations, and so forth.


4.

Other Computer Vision Algorithms


Aside from those discussed above, a number of different algorithms have been utilized to solve various
specific computer vision problems. We touch on those here:



Machine produced character recognition has been solved sat
isfactorily through neural networks




Hand
-
written character recognition

has many potential solutions, none of which has satisfactorily
solved the problem, particularly when you consider cursive writing. Among the solutions are
neural networks, genetic alg
orithms, HMMs, Bayesian probabilities, nearest neighbor matching
approaches,
and various
symbolic approaches
. Two approaches are shown below.



Face recognition

also has many potential solutions. The problem is, given an image that contains
a person’s face
, map the face’s contours and textures into mathematical equations and Gaussian
distributions. Then use some form of matching (perhaps nearest neighbor) to identify the face in
a database of faces.



Image stabilization and image (object) tracking
, which ar
e often used in such problems as digital
cameras or automated video cameras, use a variety of solutions as well. Among them are
neural
networks, fuzzy logic

controllers
,
and
best fit search

using some well
-
defined heuristic.



Autonomous vehicle input

usual
ly does not use images themselves but instead radar and sonar
(mostly sonar). Thus, road side detection and obstacle detection are required algorithms.
However, in some cases, roadways are detected by video images looking for the lines (white and
yellow
lines). Again, algorithms might include neural networks or mathematical approaches.


Here are two solutions to hand
-
written character recognition (of printed characters, not cursive). In the
first, multiple neural networks are employed, for instance one
to detect horizontal versus vertical lines and
other to detect primitive shapes. Then, the neural networks supply information that is “voted” upon based
on which neural network outputs are most convincing and which features are required for the character
being recognized. Below, a “4” is being recognized. It might consist of a near vertical line, a horizontal
line and another vertical line half as long, but it could also include curves. The neural networks are
trained only on the shapes, not the charact
ers. It is up to voting to decide how well the expected features
are identified by the neural networks.




The second possible approach uses abduction and symbolic knowledge to look for explicit features of
hand
-
written characters. Here, a character rec
ognizer exists for every character to be recognized. Each
character recognizer searches for features that it expects to find or expects not to find, and scores that
character appropriately. For instance, if a letter looks like it could be an E or F, we m
ight score an E as
“highly likely” and an F as “somewhat likely. The algorithm will be explored in more detail in next
week’s lecture.