Marauder’s Map Surveillance System
A senior Design Project Proposal
Security is a very important issue in the world.
devoted towards the development and installation
of security systems
. People use
guard their home
, companies, government properties
even vehicles such as
When it is used for the protection and safety of
, it is
constantly monitor and
activities inside or outside of the
. There are two
in the market
One option is that
cameras are fixed and monitor one particular location
The other option is
180 degree plane. Moving cameras
are possible to deceive and
if an individual
of the camera closely although
they do provide better coverage than stationary cameras
ras are installed i
they will provid
video streams which are displayed on a screen
each individual video stream will be displayed separately on the screen requiring the user to
different videos where N is the number of cameras in
The existing surveillance
systems also do not
indicate if the person entering the building
to be there or
often very difficult to
if the person monitoring
the cameras cannot see
the face clearly
Marauder`s map surveillance system
able to provide
monitoring as well as more
bird`s eye view
each and every
together on a 2D plane and t
of any people within the building. Each individual will be represented
by a moving
set of feet
set of feet
with a name attached to the feet.
To provide this
type of identification, the
face recognition software and
sends its data to the central server where
images are sent
through the face recognition and motion detection software.
Next to each set of feet will be
information about the individual that is contained in the employee
is not authorized to be in the
is recognized by the surveillance
set of feet
with the name
will represent the individual
the system is i
f a person is authorized to be in the building but not in
particular area, then
individual is located in that unauthorized area,
the person’s feet will turn red on the monitor
last additional feature is to enable the sy
stems’ user to view the real time video stream of a
particular person or location in case if he/she would like to look closer at a particular person or
his system can be
thought of as providing a real time blueprint of a building that
yone within the building and their individual movements.
To provide video surveillance, some form of video camera must be used. There are two
types of cameras that are of interest to this system. The first is a
Digital IP video camera
second is an
Analog Video Camera with video to IP converter.
Analog video cameras are
existence in some older surveillance systems and with the video to IP converter, a preexisting
surveillance system can be outfitted with the Marauder’s M
camera is a
type of digital video camera commonly used for surveillance and it can send and
receive data via
. There are two
types of surveillance systems for IP cameras. The first is
hold and display
each camera and
the alarm management. The second is
where each camera can store,
and display its own video streams
many advantages for using
30 frames per second
at 3 Mbps.
offer secure data transmission through encryption and authentication methods such as WEP,
WPA, WPA2, TKIP, AES. Live video feeds from selected cameras can be seen from an
connected to the internet
. IP cameras have
ability to operate with
out an additional
because they are powered through the Ethernet connection
The cheapest camera
has a 20 meter viewing range.
One of the issues with the Marauder’s Map Surveillance system is how the video streams
cameras will be sent to the central server. The only feasible option is wireless
transmission because laying wires between each camera and the server would be an expensive
and difficult task. The next dilemma is how the video cameras will wirelessly trans
mit their data
to the central server. After careful deliberation a wireless mesh network was decided. A wireless
mesh network will be the most energy and cost effective solution. A wireless mesh network can
utilize WiFi standards making it easy to configur
e each node. The network is self configurable
meaning that if additional nodes are added, the nodes automatically configure themselves to
identify the new node and an updated route to the server. A wireless mesh network is also self
healing meaning that if
a node goes down for some reason then the other nodes will find an
alternate path for rating the data.
The wireless mesh network will work as follows. Each IP digital video camera will be
connected to its own node in the network. The nodes will communic
ate with each other in ad
mode using the 2.4 GHz band. The nodes will create a shortest path route to the central server
using a routing protocol (either OLSR or B.A.T.M.A.N). The video streams will then hop from
node to node until it reaches the centr
al server. Each node will have its own unique IP address so
that the central server will know which camera the data is from and where the camera is located.
Since the cameras will be stationary for the system, each IP address will have a corresponding
ical location which describes where in the building the camera is located. There are multiple
ways to implement a wireless mesh network. The following section describes three options. At
the end a final decision will be made regarding which solution is the
best for the surveillance
system. Table 1 also lists statistics to compare the three options.
The first option for a mesh network is to build mesh nodes out of Zigbee modules. Zigbee
is a wireless protocol based on the IEEE
. Zigbee was first designed for
low rate personal are networks but
recent modules have increased its range. If Zigbee is used,
then nodes would have to be individually built to hold the Zigbee modules. Right now there are
two different Zigbee modules with different specifications. There is a low power module and a
ower module. The low power module consumes very little power (about 1 mW) and is
relatively cheap. The high power module consumes more power (100 mW) and costs a little
more. Both modules can only achieve a maximum data rate of 250 Kbps at 2.4 GHz which wi
cause a considerable delay in the play out of the video at the server. The main problem with
using Zigbee is that the surveillance system will not be able to utilize one of Zigbee’s primary
features, which is its low power setting. When the Zigbee modul
e is not in use, it goes into low
power mode which saves considerably on energy especially if a battery is being used to power
the device. With video, however, there needs to be a constant stream of data being transmitted by
the wireless mesh node, so the
Zigbee device will never be in power save mode.
The second option for a mesh network is to use a private company’s predesigned wireless
mesh nodes. The company Firetide produces high end wireless mesh equipment. All of their
products come with
software that enables the user to configure the nodes from a lap top or
computer. One of the advantages of Firetide’s wireless mesh nodes is that they can transmit
between nodes on the 5 GHz frequency band and then the nodes can transmit between client
vices on the 2.4 GHz frequency band. This allows for less interference between the client
devices and the mesh nodes. For the Marauder’s Map Surveillance system, the IP cameras will
connect via Ethernet to the mesh nodes and the nodes will transmit between
each other in 5 GHz
band. The mesh nodes consume quite a bit of power at 400 mW but they have a significantly
high maximum data rate of 54 Mbps at 5 GHz. Since Firetide’s mesh nodes are for commercial
use, they have many additional functions that would no
t be needed for the Marauder’s Map
Surveillance System. They are much more durable and can withstand harsh temperatures and
their data rates may be unnecessarily high for the surveillance system.
The third option for a mesh network is to use an
ordinary Linksys WRT54G router and
install the open source Freifunk firmware. Freifunk is an initiative started in Germany to provide
free wireless radio networks to third world countries. When the firmware is installed on the
Linksys router, the router i
s able to act as a node within a mesh network. The firmware uses the
OLSR routing protocol to find the shortest path to the central server. The Linksys router
consumes very little power at only 42 mW but has a significantly worse operating range than
tide’s mesh nodes. The router does have a maximum data rate of 11 Mbps which is
significantly better than the Zigbee modules but not as high as Firetide’s devices.
Wireless Mesh Network Decision
Table 1 compares all of the specifications for the four (i
ncluding the Zigbee low and high
power modules) mesh network choices. The best decision for the Marauder’s Map Surveillance
System is the Linksys router with Freifunk firmware. The router combines low cost with
reasonably high data rates and receiver sensi
tivities. Since the Freifunk firmware is open source,
it is not difficult to install and configure the devices. Also, since the IP cameras will be located
indoors, there is very little need for such a durable router like the ones that Firetide supplies.
lso, since cost is a big determinant, there is very little justification for spending nearly $2000
per node. In the case of Zigbee, although it would be cost effective, the data rates are too low to
maintain real time video at the server.
Table 1: Comparis
on of wireless mesh network technology.
Zigbee ZMN2405 Module
915 MHz or
92 dBm at 250
Zigbee ZMN2405HP Module
915 MHz or
95 dBm at 250
900 MHz, 2.4
GHz, or 5 GHz
95 dBm at 1
Linksys router with Freifunk
89 dBm at 1
With the Marauder’s Map Surveillance System it is
very important to detect who is in the
desired area. Whether the individual is in the database or not, and if he or she is, than the details
we have on that individual. This is where the face recognition comes in. When the video cameras
detect an individua
l in its coordinates, the face recognition software will then try to detect who
that person is and will present a name if he or she is in the database on the GUI. If that person
cannot be detected, he or she will be shown as “Unknown” on the GUI. Software
recognition can be purchased online through big companies that specialize in this technology.
Usually they are used in government areas or privates businesses. This, however, is very
There are three tasks that have to be dealt with when
using face recognition: document
control, access control, and database retrieval . Document control is the verification of a
person with his or her actual camera image with a document photo stored in the database. Access
control is if the individual de
tected in the vicinity is given permission to be in that area. Database
retrieval is the actual documented information of the individual if he or she is in the database.
There are many algorithm and software that already are able to do some kind of face re
I.e apple's iphoto and googles’s Picasa. Nevertheless, many of these software do not provide
their source code and have copyrights. The goal was not to find open source software that is able
to do everything by one click, but to find algorithms
and programming libraries that can be
changed to fit the requirements of the project and make the project more reasonable. Two
different algorithms were found that not only show the different approaches of Face Recognition
but make the face recognition pa
rt of our project feasible and doable.
The first algorithm is
Based Recognition and the second is Scale Invariant Feature Transform (SIFT).
Both methods take different approaches to face recognition.
Concept of Appearance Base recognition is to create a set of possible
appearances of a certain object. The set of appearances are 2
D images on different illumination
and angles of a 3
D object. When the camera takes the image of the object that
, we check which set of appearances
the image most probabl
. The set
with the highest probability is chosen and the object is recognized.
The first step for this algorithm is to create the set of possible appearance
for a gi
face. Different images are taken of the face at different illumination
and angles with a blank
background. The background must be blank since the background
Each image can be represented as an nXn matrix, since
where each i
mage is assumed to be the
same height and width
. The matrix is then represented as a
n^2 vector where each entry of the
vector are the pixel value
of the image. In other words, the image has become a point in the n^2
This process is repeated
or the other images in the set. As it happens, the different
of the image
start to form a “cloud”. A weighted representation could be
used so that
points do not have a great effect on the cloud. When the camera takes the
image of the scene, the image of the face
to be recognized
will have to be extracted
image of the scene. This can be done by using motion detection and face detection in openCV
After the image of the face
is extracted it is also represented as a point in the n^2 dimension. To
find the set with the greatest probability,
cloud closest to the
point representing the new face
will be the match.
This is done by checking the distance between the point and the
center of the cloud
could be chosen
to be the point that is used to check the distance.
Similarly, a threshold
must be used
so that if no distance is found to be lower than the threshold
then the face is tagged
shold will need to be decided later when more
knowledge is gained on the subject
. After the cloud with the closes distance is found that is lower
than the threshold the face
with the appropriate person
There are several obvious problems
if the algorithm is used just as it has been described
above. The first problem is that n^2 can be
a very large number
hecking the distance of
with a given cloud can
be very time consuming for a computer
. This will result in the program
for a laptop to do. Another concern is that
a lot of memory
Yet to deal with the first problem
of the face will need to
be represented in eigenspace and thus obtain a
n eigenface representation
of the faces. To do this, the images are first normalize
and then the mean face of the entire
database is found. This can be done by adding each vector and then dividing by the number of
faces in the database. Hence, a
be represented by:
is the given face,
is the mean face, g
are the eigenvalue
for the given face, e
are the eigenfaces
and n is the number of faces in the database.
n by an integer, k,
only the components associated with the largest eigenvalue will be kept. Consequently, this will
reduce the problem to a lower dimension while keeping a reasonable representation of the faces.
Also, by reducing the dimension it will
the algorithm to
Another obstacle with this method is that faces can be at different distances from the
Each face must be then scaled to a specified size. Another
obstacle with this method
that the appearance of a perso
change over time.
may change their hair
older. Hence this method would not be useful to use in
real world application. Nevertheless, it
can be useful for the environment
required with the project proposal
will be somewhat static.
Scale Invariant Feature Transform (SIFT)
Another possible approach for Face Recognition is SIFT. The basic concept is to find
certain features within the face that are invariant to scale, rotation and partially invariant to
illumination. When a new image of the face comes in these features
to a database. If a match occurs then the face has been successfully recognized. This
algorithm follows four steps to find the features.
The first step is call
etection, which basically applies
difference of Gaussian filter
to the image. To do this the image is first convoluted with different
Gaussian filters at different scale
. In other words
by different frequencies.
images are then subtracted to obtain the
space representation of the image. The
next involves finding the key features within the image. The key points/features are maximum or
minimum points across the scales. Each pixel is compared to its other adjacent neighbors as well
as the neighborin
g scales. Yet not
every maximum or minimum point
is chosen. Key points with
low contrast are remove as well as key points along
edges. This is what makes Sift
invariant, since it considers the image
at different scale
s. The next step is to
orientation to each key point. The orientation is obtained by applying a gradient at different
. A gradient orientation histogram is then computed. The highest
as well as any
peak along 80% is
assigned to the key point orientation. The i
nvariance to orientation is
because the properties of the key point are dependent to the chosen orientation. The last step is to
generate a key point descriptor. This descriptor uses a set of 16 orientation histograms to
generate a description of
the key point.
Face Recognition Decision
After stating what each algorithm does and what each is capable of, it makes the most amount of
sense to use SIFT. SIFT doesn’t require a lot more data storage and there isn’t the problem of
having to capture a per
fect screen shot of the person
to be successful
How the software will work
A simplified overview of the software can be seen
in figure 1.
Overview of Software structure.
Cameras will take photos of the scene which are then
into the software. The first part
of the software is the motion detection and object tracking part. To capture only motion within
openCV will be used
. There are
tutorials showing how to use the library to attain
this effect. Face detection
(used to separate the face from the scene) can also
uses an algorithm that uses
to detect a face. Haar
are adjacent rectangular feature
that differ by illumination. For instance, region
eyes are darker than the cheeks, thus a set of two adjacent rectangles, one on the eyes and
another one on the cheeks counts as a
like feature. After the moving face has been detected,
the system ask
whether the face has been tagged or not. This is
done so that the software will
to recognize the same
. This is useful because
recognition is not
necessary for object tracking.
Later it will be decided whether
will expire after some
so that the syste
m can minimize any error of losing track of the person
. If the
is not tagged or its tag is expire
then the images of the faces are send to the object
gnition part of the software.
This uses a database and one of the algorithm
to match the image with the database. The multiple image
of the face will be use
error, since they should be matched to the same face. If the face has been tagged then it skips the
object recognition and proceeds to construct a bird
s eye view
of where the person
Data Storage/Central Server
A central processor is essential to developing the Marauder’s Map Surveillance system.
The central server will consist of a PC runni
ng the Linux operating system.
Linux Enterprise is a
ution that includes Apache web server software, which is optimal for building a server.
Linux also supports web programming languages,
such as PHP, Perl, and Python.
This allows the
server to “ta
lk” with the web applications.
Once this has been set up, use
rs can access data from
the central server
remotely, if the need arises.
The central server will also store incoming data
from the video cameras, and must maintain a database of known people working for a company.
For this, the PC will run database manage
tware like MySQL.
The advantage of using
MySQL is that it supports multiple databases, and allows multiple users to
access the different
An external hard drive can also be easily implemented to store GBs of data. This
method of designin
g a central server is cost efficient and capable of handling the tasks necessary
for our system.
Video Stream Construction
Once the object has been tracked and tagged, one coherent output on the screen
to be displayed.
However, each camera
tracks the same object
from a different perspective.
coordinate the multiple images onto a common coordinate system, the cameras wil
l first need to
Camera calibration involves estimating external and internal parameters
parameters helps in determining camera geometry and finding a common coordinate plane to
map all the images.
refer to location and orientation of the camera.
are defined by focal length, image format, distortion, pri
system, cameras will remain in a fixed locat
ion, and will not be rotating.
cameras will only need to be calibrated once.
Also, the camera parameters
there is free software
available for use.
There is a camera calibration toolbox
The next step to reconstructing the images is called
finding the corresponding p
oints in the multiple images.
Since each of the cameras will be
tracking the s
ame objects, there will be a common set of points, in the common coordinate
frame, corresponding to all
There exists a featured based algorithm and a
correlation based algorithm to solve this correspondence problem
The feature based
requires matching features such as line segments and edge points within images, whereas the
correlation based algorithm matches images based on the intensity of images. Both algorithms
are adequate for our purposes in reconstructing the 3D view.
Once the corresponding points are found, a 3D representation of the image
using a method known as camera triangulation. It is a means by which a point in
3D space is located given the fact that corresponding points exist.
point can be
found because it will exist at the point where the cameras’ focal points intersect.
algorithms like the Direct Linear Transformation and the Mid
Point Method, the geometry
can be solved
construct a 3D view of the
To design a 2D
bird’s eye view of the room however, one of the axes
can be removed
without losing any
valuable data (the object being tracked).
The graphical user interface for Marauder’s Map Surveillance makes the system u
compared to other available surveillance systems
users, namely security
viewing. The main design of the interface will allow the
user to get a bird’s eye view of the entire floor of a building.
Each person in the room will be
displayed on screen with a pair of footprints, and his/her name will be visible alongside the
footprints. In addition, if the user clicks on the footprints, an icon will pop up giving a detailed
description of the person o
f interest. Along with the person’s name and photo, some other
important data will be featured; this can include information such as the department the person
works in, the name of that person’s manager, followed by contact information.
However, if a pe
rson has not been tracked, either because they are not in the system
database or because the cameras were unable to recognize the person, the name written next to
the footprints would be labeled “Unknown”. An additional feature this system will give the u
the option to view the live video footage by double clicking the footprints. This would be
helpful in identifying people who aren’t tagged with a name. Below is an example of how the
interface would appear to a user. For example, if this system were
to track college students in
SERC, this is how he/she would appear on screen.
Figure 2 gives an example of what the
dashboard will look like.
Fig. 2: Dashboard for Marauder’s Map Surveillance System.
Camera Calibration Technique Toolbox for Matlab.
Updated July 2010.
. Last modified October 2010.
Wikipedia. Triangulation (Computer Vision). Last modified March, 2010.
Enmanuele Trucco & Alessandro Verri.
Introductor Techniques for 3
d Computer Vision
David G. Lowe.
Object Recognition from Local Scale
. University of
Estrada & A. Jepson & D. Fleet.
Local Features Tutorial
. Nov. 8, '04