MaraudersMap - Winlab

pleasantrollSecurity

Feb 16, 2014 (3 years and 6 months ago)

87 views






Marauder’s Map Surveillance System

A senior Design Project Proposal





By

Eric Wasserman

Hiran Patel

Nishit Raval

Jose Suriel

Sapan Sharma








Introduction


Security is a very important issue in the world.

A

large

part of
the
technology
industry
is
devoted towards the development and installation

of security systems
. People use
surveillance
systems to

guard their home
s
, companies, government properties
and

even vehicles such as
Rutgers buses.
When it is used for the protection and safety of
buildings
, it is
necessary

to
constantly monitor and
record

activities inside or outside of the
building
. There are two
existing
options
in the market
.
One option is that

cameras are fixed and monitor one particular location
.
The other option is

a

constant
ly
rotating camera
moving in
a
180 degree plane. Moving cameras
are possible to deceive and
if an individual
follow
ed

the motions
of the camera closely although
they do provide better coverage than stationary cameras
.
I
f these
video

came
ras are installed i
n a
building

they will provid
e

individual
video streams which are displayed on a screen
.

Typically,
each individual video stream will be displayed separately on the screen requiring the user to
observe N
different videos where N is the number of cameras in

use.

The existing surveillance
systems also do not
indicate if the person entering the building
is

authorized
to be there or
not
. It
is

often very difficult to
identify
the individual

if the person monitoring
the cameras cannot see
the face clearly
.


The
Marauder`s map surveillance system
will be

able to provide
bird
`s eye
view
monitoring as well as more

accurate security
surveillance
.
The

bird`s eye view
representation
of
the building
will bring

each and every
video stream

together on a 2D plane and t
rack
the
movement
s

of any people within the building. Each individual will be represented
by a moving
or stationary
set of feet
. Each
set of feet

is a
n

individual
person
with a name attached to the feet.

To provide this
type of identification, the

system
will incorporate

face recognition software and
an employee
database.
E
ach camera
sends its data to the central server where

images are sent
through the face recognition and motion detection software.

Next to each set of feet will be
his/her name,
and other

information about the individual that is contained in the employee
database
.
If

someone who
is not authorized to be in the
building
is recognized by the surveillance
system,
a red
set of feet

with the name


unknown


will represent the individual
.
Another
use of
the system is i
f a person is authorized to be in the building but not in
a

particular area, then
if that
individual is located in that unauthorized area,

the person’s feet will turn red on the monitor
.

One
last additional feature is to enable the sy
stems’ user to view the real time video stream of a
particular person or location in case if he/she would like to look closer at a particular person or
location. T
his system can be
thought of as providing a real time blueprint of a building that
shows ever
yone within the building and their individual movements.



Video Surveillance

To provide video surveillance, some form of video camera must be used. There are two
types of cameras that are of interest to this system. The first is a
Digital IP video camera
and the
second is an

Analog Video Camera with video to IP converter.
Analog video cameras are
still in
existence in some older surveillance systems and with the video to IP converter, a preexisting
surveillance system can be outfitted with the Marauder’s M
ap

Surveillance system.
An

IP
camera is a

type of digital video camera commonly used for surveillance and it can send and
receive data via
Ethernet
. There are two
types of surveillance systems for IP cameras. The first is
a c
entralized
surveillance system
,

which require
s

a central
server

to
hold and display

the
video
streams
from

each camera and
handle

the alarm management. The second is

a d
ecentralized
surveillance system
,
where each camera can store,
process
,

and display its own video streams
.



There are

many advantages for using
digital
IP camera
s
.
They have

high image
resolution
s

of
640x480 and
HDTV image

quality at
10 to
30 frames per second

at 3 Mbps.

They
offer secure data transmission through encryption and authentication methods such as WEP,
WPA, WPA2, TKIP, AES. Live video feeds from selected cameras can be seen from an
y

computer

connected to the internet
. IP cameras have
the
ability to operate with
out an additional
power supply

because they are powered through the Ethernet connection
.
The cheapest camera
has a 20 meter viewing range.


Video Transmission


One of the issues with the Marauder’s Map Surveillance system is how the video streams
from the

cameras will be sent to the central server. The only feasible option is wireless
transmission because laying wires between each camera and the server would be an expensive
and difficult task. The next dilemma is how the video cameras will wirelessly trans
mit their data
to the central server. After careful deliberation a wireless mesh network was decided. A wireless
mesh network will be the most energy and cost effective solution. A wireless mesh network can
utilize WiFi standards making it easy to configur
e each node. The network is self configurable
meaning that if additional nodes are added, the nodes automatically configure themselves to
identify the new node and an updated route to the server. A wireless mesh network is also self
healing meaning that if

a node goes down for some reason then the other nodes will find an
alternate path for rating the data.


The wireless mesh network will work as follows. Each IP digital video camera will be
connected to its own node in the network. The nodes will communic
ate with each other in ad
-
hoc
mode using the 2.4 GHz band. The nodes will create a shortest path route to the central server
using a routing protocol (either OLSR or B.A.T.M.A.N). The video streams will then hop from
node to node until it reaches the centr
al server. Each node will have its own unique IP address so
that the central server will know which camera the data is from and where the camera is located.
Since the cameras will be stationary for the system, each IP address will have a corresponding
phys
ical location which describes where in the building the camera is located. There are multiple
ways to implement a wireless mesh network. The following section describes three options. At
the end a final decision will be made regarding which solution is the

best for the surveillance
system. Table 1 also lists statistics to compare the three options.


Zigbee


The first option for a mesh network is to build mesh nodes out of Zigbee modules. Zigbee
is a wireless protocol based on the IEEE
802.15.4
-
2003

standard
. Zigbee was first designed for
low rate personal are networks but

recent modules have increased its range. If Zigbee is used,
then nodes would have to be individually built to hold the Zigbee modules. Right now there are
two different Zigbee modules with different specifications. There is a low power module and a
high p
ower module. The low power module consumes very little power (about 1 mW) and is
relatively cheap. The high power module consumes more power (100 mW) and costs a little
more. Both modules can only achieve a maximum data rate of 250 Kbps at 2.4 GHz which wi
ll
cause a considerable delay in the play out of the video at the server. The main problem with
using Zigbee is that the surveillance system will not be able to utilize one of Zigbee’s primary
features, which is its low power setting. When the Zigbee modul
e is not in use, it goes into low
power mode which saves considerably on energy especially if a battery is being used to power
the device. With video, however, there needs to be a constant stream of data being transmitted by
the wireless mesh node, so the
Zigbee device will never be in power save mode.


Firetide


The second option for a mesh network is to use a private company’s predesigned wireless
mesh nodes. The company Firetide produces high end wireless mesh equipment. All of their
products come with

software that enables the user to configure the nodes from a lap top or
computer. One of the advantages of Firetide’s wireless mesh nodes is that they can transmit
between nodes on the 5 GHz frequency band and then the nodes can transmit between client
de
vices on the 2.4 GHz frequency band. This allows for less interference between the client
devices and the mesh nodes. For the Marauder’s Map Surveillance system, the IP cameras will
connect via Ethernet to the mesh nodes and the nodes will transmit between

each other in 5 GHz
band. The mesh nodes consume quite a bit of power at 400 mW but they have a significantly
high maximum data rate of 54 Mbps at 5 GHz. Since Firetide’s mesh nodes are for commercial
use, they have many additional functions that would no
t be needed for the Marauder’s Map
Surveillance System. They are much more durable and can withstand harsh temperatures and
their data rates may be unnecessarily high for the surveillance system.



Freifunk


The third option for a mesh network is to use an

ordinary Linksys WRT54G router and
install the open source Freifunk firmware. Freifunk is an initiative started in Germany to provide
free wireless radio networks to third world countries. When the firmware is installed on the
Linksys router, the router i
s able to act as a node within a mesh network. The firmware uses the
OLSR routing protocol to find the shortest path to the central server. The Linksys router
consumes very little power at only 42 mW but has a significantly worse operating range than
Fire
tide’s mesh nodes. The router does have a maximum data rate of 11 Mbps which is
significantly better than the Zigbee modules but not as high as Firetide’s devices.


Wireless Mesh Network Decision


Table 1 compares all of the specifications for the four (i
ncluding the Zigbee low and high
power modules) mesh network choices. The best decision for the Marauder’s Map Surveillance
System is the Linksys router with Freifunk firmware. The router combines low cost with
reasonably high data rates and receiver sensi
tivities. Since the Freifunk firmware is open source,
it is not difficult to install and configure the devices. Also, since the IP cameras will be located
indoors, there is very little need for such a durable router like the ones that Firetide supplies.
A
lso, since cost is a big determinant, there is very little justification for spending nearly $2000
per node. In the case of Zigbee, although it would be cost effective, the data rates are too low to
maintain real time video at the server.

Table 1: Comparis
on of wireless mesh network technology.






Frequency Band

Output power

(mW)

Cost

Maximum
Data Rate

Receiver
Sensitivity

Zigbee ZMN2405 Module

915 MHz or

2.4 GHZ

1

$22.00

250 Kbps

-
92 dBm at 250
Kbps

Zigbee ZMN2405HP Module

915 MHz or

2.4 GHZ

100

$37.50

250 Kbps

-
95 dBm at 250
Kbps

Firetide

900 MHz, 2.4
GHz, or 5 GHz

400

$1795

54 Mbps

-
95 dBm at 1
Mbps

Linksys router with Freifunk
firmware

2.4 GHz

42

$50

11 Mbps

-
89 dBm at 1
Mbps

Face Recognition

With the Marauder’s Map Surveillance System it is
very important to detect who is in the
desired area. Whether the individual is in the database or not, and if he or she is, than the details
we have on that individual. This is where the face recognition comes in. When the video cameras
detect an individua
l in its coordinates, the face recognition software will then try to detect who
that person is and will present a name if he or she is in the database on the GUI. If that person
cannot be detected, he or she will be shown as “Unknown” on the GUI. Software
for face
recognition can be purchased online through big companies that specialize in this technology.
Usually they are used in government areas or privates businesses. This, however, is very
expensive.

There are three tasks that have to be dealt with when

using face recognition: document
control, access control, and database retrieval [1]. Document control is the verification of a
person with his or her actual camera image with a document photo stored in the database. Access
control is if the individual de
tected in the vicinity is given permission to be in that area. Database
retrieval is the actual documented information of the individual if he or she is in the database.
There are many algorithm and software that already are able to do some kind of face re
cognition;
I.e apple's iphoto and googles’s Picasa. Nevertheless, many of these software do not provide
their source code and have copyrights. The goal was not to find open source software that is able
to do everything by one click, but to find algorithms

and programming libraries that can be
changed to fit the requirements of the project and make the project more reasonable. Two
different algorithms were found that not only show the different approaches of Face Recognition
but make the face recognition pa
rt of our project feasible and doable.
The first algorithm is
Appearance
-
Based Recognition and the second is Scale Invariant Feature Transform (SIFT).
Both methods take different approaches to face recognition.


Appearance
-
Based Recognition



The Basic
Concept of Appearance Base recognition is to create a set of possible
appearances of a certain object. The set of appearances are 2
-
D images on different illumination
s

and angles of a 3
-
D object. When the camera takes the image of the object that
needs to

be

recognize
d
, we check which set of appearances
does

the image most probabl
y

belong

to
. The set
with the highest probability is chosen and the object is recognized.


The first step for this algorithm is to create the set of possible appearance
s

for a gi
ven
face. Different images are taken of the face at different illumination
s

and angles with a blank
background. The background must be blank since the background
might

affect
the

algorithm.
Each image can be represented as an nXn matrix, since
where each i
mage is assumed to be the
same height and width
. The matrix is then represented as a
n

n^2 vector where each entry of the
vector are the pixel value
s

of the image. In other words, the image has become a point in the n^2
dimension.
This process is repeated

f
or the other images in the set. As it happens, the different
point representation
s

of the image
s

start to form a “cloud”. A weighted representation could be
used so that
the outlier

points do not have a great effect on the cloud. When the camera takes the
image of the scene, the image of the face
to be recognized

will have to be extracted
from the
image of the scene. This can be done by using motion detection and face detection in openCV
.
After the image of the face

is extracted it is also represented as a point in the n^2 dimension. To
find the set with the greatest probability,
the

cloud closest to the
point representing the new face
will be the match.

This is done by checking the distance between the point and the

different
clouds.
The

center of the cloud
could be chosen
to be the point that is used to check the distance.
Similarly, a threshold
must be used
so that if no distance is found to be lower than the threshold
,

then the face is tagged
as

unknown.
The thre
shold will need to be decided later when more
knowledge is gained on the subject
. After the cloud with the closes distance is found that is lower
than the threshold the face
is tagged
with the appropriate person
’s name
.


There are several obvious problems
if the algorithm is used just as it has been described
above. The first problem is that n^2 can be
a very large number
.
C
hecking the distance of
a

point
with a given cloud can
be very time consuming for a computer
. This will result in the program
running v
ery slow
ly

and become
almost impractical

for a laptop to do. Another concern is that
storing all
the

image
s

will require

a lot of memory
.
Yet to deal with the first problem
,

the i
mages
of the face will need to
be represented in eigenspace and thus obtain a
n eigenface representation
of the faces. To do this, the images are first normalize
d

and then the mean face of the entire
database is found. This can be done by adding each vector and then dividing by the number of
faces in the database. Hence, a
f
ace can

be represented by:


Where x
j

is the given face,

̌

is the mean face, g
ji

are the eigenvalue
s

for the given face, e
i
are the eigenfaces
,

and n is the number of faces in the database.
Changing

n by an integer, k,
only the components associated with the largest eigenvalue will be kept. Consequently, this will
reduce the problem to a lower dimension while keeping a reasonable representation of the faces.
Also, by reducing the dimension it will
a
llow

the algorithm to
be
run
on

a laptop.


Another obstacle with this method is that faces can be at different distances from the
camera.
Each face must be then scaled to a specified size. Another

obstacle with this method
is
that the appearance of a perso
n
can

change over time.
P
eople
may change their hair

or

appear

older. Hence this method would not be useful to use in
a

real world application. Nevertheless, it
can be useful for the environment
required with the project proposal
, since
people
s


appearance
s

will be somewhat static.


Scale Invariant Feature Transform (SIFT)



Another possible approach for Face Recognition is SIFT. The basic concept is to find
certain features within the face that are invariant to scale, rotation and partially invariant to
illumination. When a new image of the face comes in these features
are ex
tracted
and
then
compare
d

to a database. If a match occurs then the face has been successfully recognized. This
algorithm follows four steps to find the features.


The first step is call
ed Scale
-
space
extrema d
etection, which basically applies

a
difference of Gaussian filter
s

to the image. To do this the image is first convoluted with different
Gaussian filters at different scale
s
. In other words
the
image
is blurred
by different frequencies.
The blur
red

images are then subtracted to obtain the

scale
-
space representation of the image. The
next involves finding the key features within the image. The key points/features are maximum or
minimum points across the scales. Each pixel is compared to its other adjacent neighbors as well
as the neighborin
g scales. Yet not
every maximum or minimum point

is chosen. Key points with
low contrast are remove as well as key points along
edges. This is what makes Sift
scale
invariant, since it considers the image
s

at different scale
s. The next step is to

assign an

orientation to each key point. The orientation is obtained by applying a gradient at different
orientation
s
. A gradient orientation histogram is then computed. The highest
peak
,

as well as any
peak along 80% is

assigned to the key point orientation. The i
nvariance to orientation is
achieved

because the properties of the key point are dependent to the chosen orientation. The last step is to
generate a key point descriptor. This descriptor uses a set of 16 orientation histograms to
generate a description of
the key point.


Face Recognition Decision

After stating what each algorithm does and what each is capable of, it makes the most amount of
sense to use SIFT. SIFT doesn’t require a lot more data storage and there isn’t the problem of
having to capture a per
fect screen shot of the person

to be successful
.








How the software will work

A simplified overview of the software can be seen
in figure 1.


Fig. 1:
Overview of Software structure.

Cameras will take photos of the scene which are then
run

into the software. The first part
of the software is the motion detection and object tracking part. To capture only motion within
the scene
openCV will be used
. There are
many

tutorials showing how to use the library to attain
this effect. Face detection

(used to separate the face from the scene) can also
be
done using
open
CV
. Open
CV

uses an algorithm that uses
h
a
ar
-
like features

to detect a face. Haar
-
like
feature
s

are adjacent rectangular feature
s

that differ by illumination. For instance, region
s

of the

eyes are darker than the cheeks, thus a set of two adjacent rectangles, one on the eyes and
another one on the cheeks counts as a
haa
r
-
like feature. After the moving face has been detected,
the system ask
s

whether the face has been tagged or not. This is
done so that the software will
not
be
continually try
ing

to recognize the same
person
. This is useful because

recognition is not
necessary for object tracking.
Later it will be decided whether

the tag

will expire after some
period
of time
so that the syste
m can minimize any error of losing track of the person
. If the
person

is not tagged or its tag is expire
d,

then the images of the faces are send to the object
reco
gnition part of the software.
This uses a database and one of the algorithm
s

described before

to match the image with the database. The multiple image
s

of the face will be use
d

to reduce
error, since they should be matched to the same face. If the face has been tagged then it skips the
object recognition and proceeds to construct a bird

s eye view

of where the person

is located
.


Data Storage/Central Server


A central processor is essential to developing the Marauder’s Map Surveillance system.
The central server will consist of a PC runni
ng the Linux operating system.
Linux Enterprise is a
distrib
ution that includes Apache web server software, which is optimal for building a server.
Linux also supports web programming languages,
such as PHP, Perl, and Python.
This allows the
server to “ta
lk” with the web applications.
Once this has been set up, use
rs can access data from
the central server

remotely, if the need arises.
The central server will also store incoming data
from the video cameras, and must maintain a database of known people working for a company.
For this, the PC will run database manage
ment sof
tware like MySQL.
The advantage of using
MySQL is that it supports multiple databases, and allows multiple users to
access the different
databases.

An external hard drive can also be easily implemented to store GBs of data. This
method of designin
g a central server is cost efficient and capable of handling the tasks necessary
for our system.


Video Stream Construction


Once the object has been tracked and tagged, one coherent output on the screen

will ne
e
d
to be displayed.
However, each camera

tracks the same object

from a different perspective.
To
coordinate the multiple images onto a common coordinate system, the cameras wil
l first need to
be calibrated.
Camera calibration involves estimating external and internal parameters
.
Knowing
the
parameters helps in determining camera geometry and finding a common coordinate plane to
map all the images.

External parameters
refer to location and orientation of the camera.

I
nternal
parameters

are defined by focal length, image format, distortion, pri
ncipal point

In
the

system, cameras will remain in a fixed locat
ion, and will not be rotating.
As a
result,
the

cameras will only need to be calibrated once.

Also, the camera parameters
can be
determined since

there is free software
available for use.
There is a camera calibration toolbox
available
in

Matlab.


The next step to reconstructing the images is called
stereo reconstruction.
This involves
finding the corresponding p
oints in the multiple images.
Since each of the cameras will be
tracking the s
ame objects, there will be a common set of points, in the common coordinate
frame, corresponding to all
of the
camera
images.
There exists a featured based algorithm and a
correlation based algorithm to solve this correspondence problem
.
The feature based
algorithm
requires matching features such as line segments and edge points within images, whereas the
correlation based algorithm matches images based on the intensity of images. Both algorithms
are adequate for our purposes in reconstructing the 3D view.


Once the corresponding points are found, a 3D representation of the image
can be
reconstructed
using a method known as camera triangulation. It is a means by which a point in
3D space is located given the fact that corresponding points exist.

This common

point can be
found because it will exist at the point where the cameras’ focal points intersect.

Using known
algorithms like the Direct Linear Transformation and the Mid
-
Point Method, the geometry
problems
can be solved
and
then
construct a 3D view of the

room/building
.
To design a 2D
bird’s eye view of the room however, one of the axes

can be removed

without losing any
valuable data (the object being tracked).



GUI/Dashboard


The graphical user interface for Marauder’s Map Surveillance makes the system u
nique

compared to other available surveillance systems

and enables
the
users, namely security
personnel,
to have

multiple options
of

viewing. The main design of the interface will allow the
user to get a bird’s eye view of the entire floor of a building.

Each person in the room will be
displayed on screen with a pair of footprints, and his/her name will be visible alongside the
footprints. In addition, if the user clicks on the footprints, an icon will pop up giving a detailed
description of the person o
f interest. Along with the person’s name and photo, some other
important data will be featured; this can include information such as the department the person
works in, the name of that person’s manager, followed by contact information.

However, if a pe
rson has not been tracked, either because they are not in the system
database or because the cameras were unable to recognize the person, the name written next to
the footprints would be labeled “Unknown”. An additional feature this system will give the u
ser
the option to view the live video footage by double clicking the footprints. This would be
helpful in identifying people who aren’t tagged with a name. Below is an example of how the
interface would appear to a user. For example, if this system were
to track college students in
SERC, this is how he/she would appear on screen.
Figure 2 gives an example of what the
dashboard will look like.




Fig. 2: Dashboard for Marauder’s Map Surveillance System.















References

Wikipedia,
Camera
Resectioning,

August 2010.

http://en.wikipedia.org/wiki/Camera_resectioning


Jean
-
Yves Bouguet.
Camera Calibration Technique Toolbox for Matlab.
Updated July 2010.
http://www.vision.caltech.edu/bouguetj/calib_doc/


http://www.ehow.com/how_2266701_build
-
linux
-
based
-
web
-
serve
r.html


Wikipedia,
Correspondence Problem
. Last modified October 2010.

http://en.wikipedia.org/wiki/Correspondence_problem


http://www.cse.unr.edu/~bebis/CS791E/Notes/StereoCorrespondenceProblem.pdf


Wikipedia. Triangulation (Computer Vision). Last modified March, 2010.

http://en.wikipedia.org/wiki/Triangulation_(computer_vision)

Enmanuele Trucco & Alessandro Verri.
Introductor Techniques for 3
-
d Computer Vision
.
Prentici Hall
-
Chapter 10


David G. Lowe.
Object Recognition from Local Scale
-
Invariant Features
. University of

British
Columbia


Estrada & A. Jepson & D. Fleet.
Local Features Tutorial
. Nov. 8, '04


Wikipedia,
Haar
-
like features

http://en.wikipedia.org/wiki/Haar
-
like_features