A mini-thesis submitted in partial fulfilment of the

stemswedishAI and Robotics

Oct 15, 2013 (3 years and 10 months ago)

81 views










DEPARTMENT OF COMPUT
ER SCIENCE

FISH IDENTIFICATION
SYSTEM

By

Diego Mushfieldt


A mini
-
thesis submitted in partial fulfilment of the
requirements for the degree of

BSc. Honours


Supervisor: Mehrdad

Ghaziasgar

Co
-
Supervisor: James Connan


Date:
2011
-
11
-
02



Acknowledgements


I would like to take this opportunity to thank my parents for
their constant motivation and inspiration and for granting me
the opportunity to study at the University of the Western Cape.
I would also like to thank those dear

to my heart including my
family for always having the patience and understanding

when I
could not always be in their presence. Last, but certainly not
least, I want to thank my supervisors, Mehrdad Ghaziasgar and
James Connan, for believing in me and givi
ng me the confidence
I needed to keep on working so hard at this project. Your
dedication is truly an inspiration
to me and to other students

at
UWC.


ABSTRACT


Aquariums currently display a large number of various kinds of
fish

and visitors regularly desi
re to know more about a particular
kind of fish.

Visitors can currently obtain this information by
asking either an expert

or by scanning through the
documentation in and around the aquarium.

However,
information may not be readily available at times. Ther
efore
,

it

is
desirable to create a system that allows such information to be
readily

available in an interactive manner. The project aims to
develop a system

that uses a video stream of a wide range of fish
as its input. The user

then clicks on a particula
r fish and the
system then classifies the fish and

displays information about
the fish accordingly. The system aims to give

users an enjoyable
and educational experience by allowing them to interact

with the
system via the click of a mouse. This project is

scalable enough

to incorporate functionality such as a touch screen in order to
improve

interaction and enhance the user experience.






Table of Contents


CHAPTER 1

................................
................................
................................
................................
........

6

INTRODUCTION

................................
................................
................................
.............................

6

1.1 Computer Vision and Image Processing

................................
................................
.....................

6

1.2 OpenCV (Open Computer Vision)

................................
................................
.............................

6

1.3 Current Research

................................
................................
................................
........................

7

CHAPTER 2

................................
................................
................................
................................
........

8

USER REQUIREMENTS

................................
................................
................................
...................

8

2.1
User’s
view of the problem

................................
................................
................................
.........

8

2.2
Description of the problem

................................
................................
................................
........

8

2.3
Expectations from the software solution

................................
................................
....................

9

2.4
Not expected from the software solution

................................
................................
....................

9

CHAPTER 3

................................
................................
................................
................................
......

10

REQUIREMENTS ANALYSI
S

................................
................................
................................
........

10

3.1 De
signer’s Interpretation and Breakdown of the Problem

................................
........................

10

3.2
Complete Analysis of the Problem

................................
................................
............................

11

3.3
Current Solutions

................................
................................
................................
.....................

12

3.4
Suggested Solution

................................
................................
................................
....................

14

CHAPTE
R 4

................................
................................
................................
................................
......

15

USER INTERFACE SPECI
FICATION (UIS)

................................
................................
.................

15

4.1
The complete user interface

................................
................................
................................
......

15

4.2
The input video frame

................................
................................
................................
..............

16

4.
3
How the user interface behaves

................................
................................
................................

16

CHAPTER 5

................................
................................
................................
................................
......

17

HIGH LEVEL DESIGN (H
LD)
................................
................................
................................
.......

17

5.1
Description of Concepts

................................
................................
................................
...........

18

5.2
Relationships between objects

................................
................................
................................
..

19

5.3 Subsystems of HLD

................................
................................
................................
.................

19

5.4 Complete Subsystem

................................
................................
................................
................

20

CHAPTE
R 6

................................
................................
................................
................................
......

21

LOW LEVEL DESIGN (LL
D)

................................
................................
................................
.........

21

6.1 Low Level Description of Concepts

................................
................................
.........................

21

6.2 Detailed Methodology

................................
................................
................................
..............

22

CHAPTER 7

................................
................................
................................
................................
......

32

TESTING AND RESULTS

................................
................................
................................
..............

32

CHAPTER 8

................................
................................
................................
................................
......

34

USER MANUAL

................................
................................
................................
...............................

34

8.1 Starting the

System

................................
................................
................................
...................

34

8.2 Load Video

................................
................................
................................
...............................

35

CHAPTER 9

................................
................................
................................
................................
......

36

CODE DOCUMENTATION

................................
................................
................................
..........

36















CHAPTER 1

INTRODUCTION


1.1
Computer Vision and Image Processing


Computer
vision
[1]

is the study of
techniques that can be used to
make
machines

see. In this context ’see’

refers to

a machine that
is able
to extract information from an image that is

necessary to
solve some task. The image can take many forms, such as video

sequences or even views from multiple cameras. Image
processing
[2]

is basically

any kind of signal processing whereby
the input for the

processing is an image

and the output is either
another image or a set of parameters related to the

image.


1.2
OpenCV (Open Computer Vision)


OpenCV
[3]

is an open source computer vision library which is
written in C

and C++. OpenCV runs on the following
platforms: Linux, Windows
and

MacOS
. This library helps
people to build complicated vision applications with

its simple
-
to
-
use vision infrastructure. Its ease
-
of
-
use can be experienced
when

using its library which cont
ains over 500 functions.
OpenCV also contains
a

machine

learning library since
computer vision and machine learning go hand

in hand, so it can
be used to solve any machine learning problem.


1.3
Current Research


Information about specific fish within an
aquarium is not always
readily available

at times. At the moment people can obtain
information either by scanning the

documentation in the
aquarium or ask an expert. Therefore, it is desirable

to develop a
system that provides
i
nstant information about spe
cific fish in

an interactive manner. The proposed system identifies a fish,
using
OpenCV’s

libraries
, by creating an image of the fish when
the user clicks on it with
a mouse
. The image is processed using
various algorithmic tec
hniques

and the necessary in
formation is
then displayed on the

screen.













CHAPTER 2

USER REQUIREMENTS


The following section describes the problem from the user’s
point of

view. It is critical to gather information from the user in
order
to produce

a meaningful solution.


2.1

User’s view of the problem


The user requires the system to provide an easy mechanism for a
user to select

a fish (in this case via a click of a mouse) on a live
or pre
-
recorded
video stream. The system

should be able to
classify the fish, while it is
in motion within the video stream,

when it is clicked on. The system should also be capable of
providing the user

with information that is structured in a
sensible manner and which is easy

to understand. It is very
important to consider how user friendly t
he system

should be in
order to present the information with clarity so that the
provided

information remains unambiguous.


2.2

Description of the problem


The main purpose of this project is to develop an interactive
system that is

capable of providing instant feedback about a
particular fish which the user
is interested

in. The system should
be able to assist in educating its users about

different
fish

species

by presenting certain facts about the particular type of fish that
the us
er
is interested

in.


2.3
Expectations from the software solution


The system is expected to classify one fish at a time when the
user clicks
on it

in the live video stream. The focus of this
project is not only on
classifying the

fish, but also on the kin
d of
information displayed to the user as well as
the manner

in which
information is displayed.


2.4

Not expected from the software solution


The system is not expected to display information about more
than one fish
at the

same time. Therefore, the syste
m can only
process and perform analysis

on
one fish at a time. Since one
camera is used to capture the fish, the system can

only process
the fish in two dimensions with only one camera angle.
Therefore

the system is not expected to do its processing in
three dimensions.



CHAPTER
3

REQUIREMENTS ANALYSI
S


The following section describes the problem from the designer’s
perspective

and uses the previous
chapter (CHAPTER
2
) as a
starting

point.


3.1 Designer’s Interpretation and Breakdown of the

Problem


The

aquarium hosts many visitors each year and there are
various fish on display.

However, the viewers are not always able
to obtain instant information of specific

fish which they are
interested in learning more about. The input to the final

system
is a live

or pre
-
recorded
video
stream
. Using a live video feed
rather than a recorded

video file is ideal and more practical but
difficult to implement in terms of the

efficiency of algorithms
and number of frames per second. A camera is used to

capture/record the fish while it is swimming. This allows the
user to
o
bserve

the fish and decide which particular fish he/she
is interested in learning more

about. The user interacts with the
system by moving the mouse cursor over
a specific

fish and by
c
licking on it. The location of the click is used to determine

which fish was clicked on by the user and an image of that fish
is created.

The system uses image processing techniques and
functions from the OpenCV

(Open Computer Vision) libraries
in order to

classify the fish accordingly. The

system then
displays the necessary output on the screen. The difficulty lies in

processing the image to determine what fish was clicked on and
this
s
hould be

done fast enough to ensure real
-
time processing.


3.2
Complete

Analysis of the Problem


3.2.1 Recording the fish in real
-
time


A camera is used in order to record the fish swimming in the
fish tank. This

input from the camera is used by the system in
order to display the live video

feed of the fish in two dimensions
on a screen, and to enable the user to select

a particular fish via
the click of a mouse.


3.2.2 Processing the image of the selected fish


After the user clicks on a specific fish, the location of the click is
used by the

system in order to determine whic
h fish was clicked
on. The region of interest

(ROI) will be of importance in order
to achieve this goal.

3.2.3 Displaying information to the user


The system is not required to display critical biological features
of each fish
. However, the information
should be precise
enough in order to educate

the user but concise enough to
ensure that the user does not feel overwhelmed

by too much
information.

Therefore, the system only displays the fish type.


3.3
Current Solutions


There are a few systems which are

similar to this one. They are
used to do fish

surveys as well as counting fish in order to
protect marine ecosystems.
However,

there is currently no
system that exists which solves the exact same problem

as the
proposed system. The similar systems
mentioned earlier do
make use of

similar
functions and techniques that are necessary
to solve the problem
such as the Fish Identification System
.

Some of these systems are described below.


3.3.1
Rea
l
-
time fish detection based on
improved adaptive
backgrou
nd

[
4
]

This system is a kind of fish behaviour
monitoring system. The
system

proposes a new approach to update the background,
which is based on frame difference and background difference,
in order to detect the fish in a real
-
time video sequence.

The
system combines the background difference and frame
difference to update the background more correctly and
completely using shorter computation times.


3.3.2 Recognizing Fish in Underwater video
[
5
]

This system uses a deformable template object recogn
ition
method for classifying fish species in an underwater video. The
method used can be a component of a system that automatically
identifies fish by species, improving upon previous works, which
only detect and track fish. In order to find globally optim
al
correspondences between the template model and an unknown
image, Distance Transforms are used. Once the query images
have been transformed into estimated alignment with the
template, they are processed to extract texture properties.


3.3.3 Field Program
mable Gate Array (FPGA) Based Fish
Detection Using Haar Classifiers
[
6
]

The quantification of abundance, size and, distribution of fish is
critical to properly manage and protect marine ecosystems and
regulate marine fisheries. This system is designed to
a
utomatically detect fish using a method based on the Viola and
Jones Haar
-
like feature object detection method on a field
programmable gate array (FPGA). This method generates Haar
classifiers for different fish species by making use of OpenCV’s
Haar Train
ing Code which is based on the Viola
-
Jones detection
method. This code allows a user to generate a Haar classifier for
any object that is consistently textured and mostly rigid.



3.4
Suggested Solution


The system will work effectively at classifying
various types of
fish. The suggested

solution is easy to modify such that
additional functionality can be
added to

it when necessary. It is
also cost
-
effective since only one camera is used as

well as open
-
source software (OpenCV).









CHAPTER 4

USER
INTERFACE SPECIFICAT
ION (UIS)


The foll
ow
ing section describes exactly what the user interface is
going to

do, what it looks like and how the user interacts with
the program.


4.1
The complete user interface


The complete user interface is a Graphical User

Interface

(GUI)
.
Text commands are not used by the user

to interact with the
system
. The figure below shows the user interface as it appears
to the user.














Figure
1
: User Interface

4.2
The input video frame


Once
the system starts running
, the video feed will be displayed
to
the user

within a window

on the screen as shown above
. The
user can now click on any fish within this window.


4.3
How the user interface behaves


The system will display the video feed once it is executed. It
then waits for
input from

the user via the click of a mouse. If
the user clicks on a fish, the
system will

respond by displaying an
additional window that shows the classification
of the

fish.









Figure
2
: Behaviour of User Interface

CHAPTER 5

HIGH LEVEL DESIGN (H
LD)


In this section a HLD view of the problem will be applied.
Since
the

programming language of choice is C/C++
;

Object
Orientated Analysis

is not
being

applied. A very high level of
abstraction of
the system

is constructed as we take a look and
analyse the
methodology behind

the construction of the system.
















5.1
Description of Concepts


Consider the system objects and their
corresponding
descriptions in the following table:

Tabl
e
1
: System Objects and their descriptions


OBJECT

DESCRIPTION

Ffmpeg

Ffmpeg

is free and provides libraries for handling
multimedia data. It is
a command

line program
used for transcoding multimedia files.

OpenCV

OpenCV is a library of programming functions
mainly aimed at real time
computer vision

and is
focused mainly on real
-
time image processing.

BGR2HSV


Adaptive
Threshold

This is the simplest method of image segmentation.
This technique is used
to create

a binary image in
which there exist only black and white pixels.

However, this method includes ways to adapt to
dynamic lighting conditions.

Region of Interest (ROI)

The ROI is set around the fish to segment it, since
it is only the fish that is of inter
est, not the entire
image.

The coordinates of the user’s click is used to
set the ROI.

Contour Detection

The edge pixels are assembled into contours.

The
largest contour is detected and is the only contour
that is used to represent the final shape of the fish.

Histogram

The Histogram represents the distribution of
colour within an image.

Support Vector Machine (SVM)

A SVM is used to recognize pattern
s regarding the
intensity of the pixels
and is

used to classify which
class a certain pixel belongs to and makes its
decision based

on the data analysis.

The ROI as
well as the Histogram values are sent to t
he SVM
for training and testing the system.

5.2
Relationships between objects


The figure below depicts the relationships between the objects:







5.3 Subsystems of HLD



Figure 3: Object Relations

Figure 4: Subsystems

5.4

Complete Subsystem

The figure below shows the hi
gh level design and its sub
-
components which include more detail about the subsystem.


Figure 5: Complete Subsystem









CHAPTER
6

LOW LEVEL DESIGN

(LLD)

In this section explicit detail of all data types

and functions will
be prov
ided. Pseudo code will also be
provided as well as all the
aspects of the programming effort without resorting to actual
code.

6.1 Low Level
Description of Concepts


Table
2
: Low Level view





CLASS

ATTRIBUTES

BGR2HSV

cvCvtColor
(bgrImg, hsvImg,
CV_BGR2HSV)

Adaptive Threshold

cvAdaptiveThreshold (hsvImg, hsvThresh,
CV_ADAPTIVE_THRESH_MEAN_C,
CV_ADAPTIVE_THRESH_GAUSSIAN_C, 139,
0)

Region of Interest (ROI)

cvResize (hsvThresh, threshROI)

Contour Detection

cvFindContours (threshROI, storage,
&
first_contour, sizeof (CvContour),
CV_RETR_CCOMP)

Draw Histogram

DrawHistogram (hist
Img
)

6.2 Detailed Methodology

This section will emphasise the methodology used to create this
system by analysing the

detail of each component.


6.2.1 Video feed

The figure below depicts how the video feed is captured
. The
consecutive frames make up the video which is displayed on the
user’s monitor. This is illustrated below in Figure6.






Figure 6: Video feed

6.2.2

Processing starts once
the
user clicks on
the
fish.








6.2.3

BGR (Blue, Green, Red)

to HSV (Hue, Saturation,
Value)

Once a frame is captured from the video feed, it is converted
from RGB to
HSV.

This

is done because the HSV colour space
is not as sensitive to dynamic lighting conditions as BGR.




Figure 7
: User clicks on fish

Figure 8
:
Convert BGR to HSV

6.2.4

Adaptive
Threshold

This method

takes individual pixels and marks them as object
pixels if their value is greater than some threshold value and as
background pixels if they are less than some threshold value.

The resulting image is a binary image which consists only of
black and white p
ixels. The most important part is the selection
of the threshold value. In this system the threshold value is
manually selected using trial and error to observe which value
removes the most noise.

The Hue component of the HSV
colour space is used and then
an adaptive threshold is applied to
the single
-
channel image.






Figure 9
:
Adaptive
Threshold

6.2.5 Region of Interest

This method uses
x

and
y

coordinates to set borders around the
object of interest (the fish). The larger image is then cropped in
order to do further segmentation on a smaller image, in which
only the fish is displayed.



6.2.6

Contour Detection

and Flood Fill

This method computes contours it finds from a binary image. In
this case the binary image is the threshold image in which the
image edges are implicit as boundaries between positive and
negative regions.
The largest contour, which is the sh
ape of the
fish, is extracted
to remove background noise
and it is then filled
with white pixels to represent the shape of the fish.

Figure 10: Set ROI




6.2.7

Histogram

The Histogram values are used to represent the colour
distribution of the fish.

The Figure below illustrates how
dominant ‘orange’ is within the RGB image.





Figure 11
: Contour Detection

and Flood Fill

Figure 1
2
: Draw Histogram

6.2.8 Send
ing the

Shape and Colour representations to
the
Support Vector Machine (SVM)

[7]

Since the shape and colour
distribution of the fish is determined,
this data is combined and sent to the SVM.

Each fish is given a
label (e.g. fish A has label 1 and fish B has label 2…, etc.) and
the corresponding features (shape and colour) are combined for
each label.
The SVM the
n trains the system to recognize all the
fish that is clicked on, each fish having its own unique
set of
features.


6.2.8.1
SVM
Cross
-
validation and Grid
-
search

[7]

The RBF kernel nonlinearly maps samples into a higher
dimensional space so it, unlike the linear kernel, can handle the
case when the relation between class labels and attributes is
nonlinear.
[4]

The two parameters,
C

and γ, are the parameters
used in the

RBF kernel. Some form of model selection needs to
be done in order to decide which
C

and γ are the best for a
given problem. The aim is to choose a good
C

and γ in order to
accurately predict testing data.
In
v
-
fold cross
-
validation, the
training set is d
ivided into
v

subsets of equal size. One subset is
sequentially tested using the classifier which is trained on the
remaining
v
-
1

subsets. Therefore, each instance of the whole
training set is predicted once so the cross
-
validation accuracy is
the percenta
ge of data which are correctly classified.

This kind
of cross
-
validation procedure can prevent the overfittin
g
problem.

The
figure below illustrates

the overfitting problem
whereby the classifier overfits the training data.

In contrast, the classifier shown in Figure14 below, does not
overfit the training data and gives better cross
-
validation and as
well as testing accuracy.


Figure 1
3
: Better Classifier

A “grid
-
search” is recommended on
C

and γ using cross
-
validation. Different pairs of (
C
, γ) values are tried and only the
pair with the best accuracy is chosen. In order to identify good
parameters, exponentially growing sequences of
C

and γ are
tried; for example,
C

= 2
-
5
, 2
-
3
, …, 2
15

and


= 2
-
15
, 2
-
13
, …, 2
3


6.2.8.2

Training the System

The videos that are used in the system
a
re captured at the
aquarium
. The camera i
s placed on a tripod in order to keep it
stationary. Since the tanks within the aquarium are so large, it is
not easy to record a fish swimming at a constant distance from
the camera all the time. Therefore, the only frames that are used
in the system
are
those in which the fish appear a
t a

reasonable
Figure 1
4
: Overfitting pr
oblem

distance from the camera and maintain this distance for at least
three or four seconds. Since most fish appeared
only once
in the
video
s
,

at the

desired distance, some

of the training and testing
videos are th
e same.
This is acceptable

since the duration of this
project is only one year and though n
ot impossible, capturing
different training and testing videos

is a complicated and tedious
process.
In order to have totally different training and testing
sets, th
e tank should not be too large, because the shape
and
orientation
of fish definitely changes as it is able to swim away
from the camera.

Each fish was trained with a total of about 40
training samples each. This includes both shape and colour data
which is

sent to the SVM.

Each label in the SVM corresponds to
a certain fish name, so when testing takes place, the SVM
returns a value (the label) that is stored in a file and the system
reads that label and prints the desired output to make the
classification.






6.2.9
Figure 1
5
: Send Shape and Colour Distribution to SVM

System
Classification

If the user clicks on a particular fish, its features (shape and
colour distribution) are sent to the SVM. Since the system has
been trained
prior to testing it, the SVM allows the system to
know more or less what each fish’s features ‘look’ like. The
SVM will respond by giving the system a label; this label
corresponds to a certain fish species and the corresponding
classification output is di
splayed to the user. Figure14 below
illustrates the classification process.






Figure 1
6
: System Classification

CHAPTER
7

TESTING

AND RESULTS

In order to correctly asses
s

the accuracy of the system, each fish
will be clicked on at least 10 times. This will amount to a total of
200 clicks, 10 clicks for 20 fish. The result of the test is

represented in the graph below. The graph illustrates the
accuracy of each individual f
ish, showing by what percentage
each fish is classified correctly
, which

contribute
s

to the overall
accuracy of the system.
The overall accuracy of the system
amounts to 88%. This is a reasonable result taking into account
the number of different types of
fish that need to be classified.


Figure 1
7
: Individual Accuracy

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Percentage

Fish Type

Accuracy of Individual Fish

In the above graph it is evident that an accuracy of 100% is
achieved. This can be due to the fact that these particular fish
have shapes that are very unique and shape features that are
outstanding. There are also fish that have a low accuracy of 50%
and
less. This can be due to the fact that they take on shapes
that are similar to other fish. Nevertheless, the systems
performance is good overall.













CHAPTER
8

USER MANUAL

The

demonstration mode o
f the GUI

is illustrated in the figures
that follow.

8.1

Starting the System

A window will appear at
start
-
up
, verifying whether or not the
user wants to start the system. The system will start if the user
clicks ‘yes’ and will exit if the user clicks ‘no’.




8.2

Load Video

The system is now requesting a video
from the user. After a

video is selected, the user will click on ‘Open’. The system will
now use the selected video and display it on the screen. The user
can now interact with the system by clicking on any fish within
the video.




CHAPTER
9

CODE DOCUMENTATION

The code has be
en fully documented whereby comments were
inserted at each statement and each method. A description of all
inputs and outputs will be given as well as caveats of all
methods. The final source code will be stored on a CD and
placed in an envelope.













Conclusion

In Chapter 2 a detailed description of the problem is stated as
well as the software solution to the problem. The user requires
an easy
-
to
-
use, interactive system. Since the system includes a
GUI, it is simple and easy to use, because the
user navigates
through the system by mouse clicks instead of typing
commands. The system is therefore also interactive. The
problem stated is that the visitors at the aquarium do not have
instant access to information of specific fish. The final system
cle
arly meets this requirement by providing an easy
-
to
-
use
interactive system which provides the user with instant
information about specific fish. Such a system is also educational
and attracts people because it is interactive.








Bibliography


[1]

Andrew Rova, G. M. (n.d.). Recognizing Fish in Underwater
Video.

[2]

Bridget Benson, J. C. (n.d.). Field Programmable Gate Array
(FPGA) Based Fish Detection Using Haar Classifiers. California
San Diego, USA.

[3]

Chih
-
Wei Hsu, C.
-
C. C.
-
J. (2003). A

Practical Guide to Support
Vector Classification. Taipei, Taiwan.

[4]

Haslum, P. (n.d.). Computer Vision.

[5]

Kaehler, G. B. (2008). Learning OpenCV. USA.

[6]

Rapp, C. S. (1996). Image Processing and Image Enhancement.
Johnson City, Texas, USA.

[7]

Zhou H
ongbin, X. G. (n.d.). Real
-
time fish detection based on
improved adaptive background. HangZhou ,Zhejiang Province,
China.