Face and Speech Identification System (FASIS)

spectacularscarecrowΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

79 εμφανίσεις

Face and Speech Identification System
(FASIS)

George Liao, Andrew Au, Ching
-
Hsin Chen

Overview


Project Overview


Ztitch Solutions Team


Motivation


Design Solution


Design Alternatives


Software Design


Hardware Design


Finance


Schedule


Future Work


What we learned


Conclusion


Acknowledgements / Questions


Demo overview

2

Ztitch Solutions Team

3


Andrew Au (Team Leader):


5
th

year computer engineering student


16 months of development experience at Nokia & Sierra Wireless


4 months NSERC research assistant for Dr. Jie Liang


Freelance mobile developer; published “Ztitch” app for Windows Phone 7



George Liao:


5
th

year electronics engineering student


Experience in MATLAB image processing


Software debug and test


Audio Processing



Ching
-
Hsin (Danny) Chen:


5
th

year electronics engineering student


12 months of research experience at Broadcom


Hardware designer


QA and debugging

Motivation

4


Number of smart phones worldwide ~200
M

[April 2010
Park Associates]



Mobile internet usage will exceed fixed line internet by
2014 [Morgan Stanley]



Steady growth in demand for mobile applications. Value
market estimated ~$14.5
B

USD by 2012. [CNET]



Motivation

5


Despite high smart phone demand, there hasn’t been
much innovation in the area of mobile log
-
in and security


Username/password scheme is difficult on a phone


Example: AndrewAu1986@hotmail.com / enter123



Any process/method which allows the user execute a
task faster is highly desirable.


Example: PayPal


fast payment system



Google


efficient search engine



SMS


fast messaging protocol



It’s all about
fast

and
efficiency

Motivation (cont’d)

6



Our goal:



Implement a new method of secured mobile log
-
in



Eliminate the need for tedious typing on tiny touch
screens or keypads



Secured
,

fast
, and

efficient


Design Solution

7


Face recognition


Ease of access


It’s quick to snap a photo


But we need a secondary solution to make


it more secured...



Voice recognition


Providing a spoken phrase is also quick


Design Solution (cont’d)

8


We combine face and voice recognition as following


Note: our original goal was to use mug shot to grant access to
the server, but there are still some concerns in our mind about
the security issues. In alternative here is the steps that we have




1) User snaps picture of face using phone, which relays image
to server via cellular internet connection



2) Server recognizes the face, and requests voice password



3) User speaks specific keyword as a password to phone, which
relays the speech data to the server via VOIP


Design Solution (cont’d)

9


Processing will be done remotely on the server as an online service


Reason:


More secured than client processing


Independent of phone’s processing power


Easier to apply software upgrades (updating a server vs. updating
thousands of users’ phones)




1

2 (send voice and face image)

3 (grant access)

Design solution (cont’d)

10


That was a simplified model



There are many other details to be considered, i.e. :


Image compression


Key encryption / decryption


Reducing ambient noise during voice recognition


Face localization


Handling multiple failed attempts


Image and voice data



For the proof
-
of
-
concept, we don’t have time to do all of
this, only some of these plus the basic model

Design Solution (cont’d)

11


In our model, the face is the identifier, replacing username







The spoken
-
phrase replaces the password

“enter123”

Design Alternatives

12


Besides face and voice recognition, the other alternatives are:



1)Conventional typed username/password


Slow and tedious as mentioned before



2) Fingerprint


Requires hardware modification to existing phone


Our system requires only software, but the demo prototype has
hardware modification for the purpose of 3
rd

party control of the
phone during demo



3) Eye
-
Iris


Complex and requires a special camera


Design Alternatives (cont’d)

13


Besides server
-
side processing, the alternative is client
-
side processing



Client
-
side processing is executing the face and voice
recognition on the phone, rather than the server



Main disadvantage:


Identifier & password are stored on phone and therefore
vulnerable to mobile thefts


Software Design

14




The software is divided into three parts:


1) Face Localization


2) Face Recognition


3) Voice Recognition

Software Design #1 Face Localization

15


The first step of the software is face localization, or
tracking where it is


Where is my face in this image?


The computer does not know!


Software Design #1 Face Localization

16


There are a few different methods of face localization, but some of them
require additional equipment such as two cameras (stereo). Many research
papers in this area.



Our method is simple and fast. Can be done in real
-
time.











First, we define the range of color that is the human skin color




Software Design #1 Face Localization

17


Second, we filter out all the color in the image that
matches my definition of skin color

Filter

Software Design #1 Face Localization

18


Third, we remove the noises in the new image.

Noise removal

Software Design #1 Face Localization

19


Unfortunately, removing the noises also removes some
data


So fourth, we expand with dilation

Expand

Software Design #1 Face Localization

20


Finally we have a face “blob”, and we can determine the
center of this blob in x
-
y coordinates by stacking

Stack up the
pixels for the
two axis

Software Design #1 Face Localization

21


Now the problem of face localization is solved, and the
computer knows where my face is









However, this is a simple case only...

Crop

Software Design #1 Face Localization

22


What if there are multiple faces in the image?











We can use the same steps as before except replace pixel
stacking with Hough circle detection

Software Design #1 Face Localization

23


Same as before:

Software Design #1 Face Localization

24


An algorithm called Circular Hough Transform is used


Detects the edge points that lie along the outline of a circle


We can generalize this method to
detect arbitrary shapes


Slower than previous method, but covers more scenarios

Software Design #2 Face Recognition

25


We choose to use a method of face recognition called
Eigenface



Easy to implement and fits our tight development
schedule



Can be upgraded to the Eigenfeatures for higher accuracy
(as part of our future work)

Software Design #2 Face Recognition

26


First, add a set of images of the user’s face to the database


Usually 5 or more images with slight variations in angle
and lighting conditions


We add our first image:

Compute mean face

Image #1

Mean face

Software Design #2 Face Recognition

27


Now, we add a second image to the database

Compute mean face

Image #1

Mean face

Image #2

Software Design #2 Face Recognition

28


Now, we add a third image

Compute
mean face

Image #1

Mean face

Image #2

Image #3

Software Design #2 Face Recognition

29


We can add a few more images until we finally have our
database, a.k.a. training set


Now, we execute face recognition as follows:


Compare the input image with the mean face, and find the
difference from face space, and the difference


If the error is above a certain threshold:



recognition fails



If the error is below a certain threshold:



recognition successful

Software Design #2 Face Recognition

30


Calculate two values: difference, and difference from face space


In this example


Difference = 4418.3


Difference from face space = 316.4081


normalize((input
-
mean)
-
projection)












Mean face (from database)

Input image

Software Design #2 Face Recognition

31


We set our threshold values via trial and error.


From our test:



When the input face image is the real owner


Difference < 500


Difference from face space < 5000



When the input face image is NOT the real owner


500 < Difference < 2000


5000 < Difference from face space < 10000



When the input image is not a face


2000 < Difference


10000 < Difference from face space


Software Design #2 Face Recognition

32


Our results parallel the results from other Eigenface
recognition researchers


The following is from cnx.org [Rice University]

Software Design #3 Voice Recognition

33


The brain of our voice recognition is the Microsoft
Speech SDK which is free and comprehensive



Does not require the developer to have extensive
knowledge in voice pattern science


Provides a high level application programming interface (API)
for third party developers to use speech recognition in their
applications








FASIS

Speech SDK 5.1

Hardware Design

34










We stress that the final commercialized product requires no
hardware modification



We need to modify the hardware in the prototype to control the
phone OS.


We do not have the underlying permission to control the phone’s
functionality, such as sending an image automatically, or signalling it to
lock/unlock

Hardware Design

35


In this prototype, the hardware are:


The phone itself


Hardware board to relay image to the server


The PC acting as server


Microcontroller to control the phone

Summary:




Transceiver

Microcontroller

Hardware Design

36


The phone:


Nokia N96


Non
-
touch screen


320x240 resolution front facing camera



The transceiver:


RS232 serial connection


The board uses the MAX3222 IC


Low power consumption and high data rate


Requires four 0.1
μ
F external charge pump capacitors


Guaranteed 120kbps while maintaining standard RS232 levels


2 receivers and 2 drivers


Hardware Design

37


The Server:


Executes the software design


Run using mainly MATLAB


Also needs drivers to talk to transceiver and MCU


Alerts owner of intruders via email (sends picture attachment)










Hardware Design

38


The BOE kit


Parallax Board of Education (BOE) kit


Comes with MCU + bread board for our custom circuit to wire the phone with
the MCU


USB connection for programming and
communication during run
-
time
.


We wired the phone’s buttons to the MCU so that we can control those
buttons using the PC












Hardware Design

39


Basic STAMP 2 Module:


Processor Speed: 20 MHz


RAM Size: 32 Bytes


Number of I/O Pins: 16 + 2 dedicated serial


PBASIC Commands: 42


Package: 24
-
pin DIP

Hardware Design

40


When switch is on, metal spring makes contact with two wires,
allowing current to flow.








Phone’s internal MCU cycles the input lines B0


B3:


If you pull B1 High (and the others low) if some key is pressed, the
voltage is transferred to the corresponding row wire, so if you get
A1 as output you know that the buttons SW1 is pressed.



Hardware Design

41


We can create a current to simulate the key press as follows:


General
-
purpose I/O pins P0
-
P15: each can sink 25
mA

and source
20mA.


The HIGH command sets the specified pin to 1 (a +5 volt
level) and then
sets its mode to output.


HIGH 14
















Hardware Design

42

Main:


HIGH 0


PAUSE 500


LOW 0


PAUSE 500


END

Hardware Design

43


The integrated hardware:

Finance

44


The cost of this project was substantially reduced
because Nokia provided us with the N96


Many software are also free for students via DreamSpark
(Visual Studio 2010)


Total cost came to about $250 CAD








Finance

45


The overhead cost for commercializing this product is low because it is
entirely software based



We can either accept a one
-
time fee, or an annual subscription fee from
users



Most of the expense comes from hosting dedicated servers to execute the
software algorithms and storing user’s training sets (face images)









There are many dedicated hosting services available for a monthly fee of
~$100 / month, allowing us to basically rent these expensive equipment
located elsewhere

Schedule

46


Keynotes:


Project began last semester


Research took longest, then development


Documentation cost a lot of time, but well worth it

Future Work

47


Improve the algorithms


Eigenfeatures
-

combines facial metrics, which is measuring
distance between facial features, with the Eigenface approach


Further enhance localization methods



Collaborate with Symbian to get low level OS access


Symbian is the Nokia phone’s operating system, and FASIS
needs permission from the company in order for FASIS to
become a reality



Setup our dedicated servers


This demo uses a laptop, but the final product requires
commercial grade servers to handle thousands of users

Future Work

48


Generalize the system for other brands, not just Nokia

What We Learned

49


Professional documentation (ENSC305)



Group dynamics and team management



How to create a product from scratch


From research to commercialization



Programming


Low
-
level (Microcontroller, C)


High
-
level (SAPI, C#, .NET)


Scripting (Batch files, MATLAB)


Conclusion

50


The Face and Speech Identification System (FASIS) fills the
need for a rapid secured mobile log
-
in solution to
eliminate tedious typing on small touchscreens/keypads



Efficient while maintaining a level of security



With further improvements, we firmly believe that FASIS
could become a marketable product considering the
current trend in the mobile industry...

Conclusion

51




There are 200 million smart phones in the world,
and this number is rising rapidly...



...even if we capture only 1% of the market, our
business can become huge

Acknowledgements

52


Ali &
Carlyn


Excellent feedback and comments in our marked documents



Dr.
Rawicz

& Mike


Excellent feedback during oral progress reports


The idea for voice recognition



Nokia Vancouver


Simon Wong, who provided us with the phone



Microsoft


Free software tools for students via the
DreamSpark

program



Questions

53

54



Thank you





-
Ztitch Solutions


Live Demonstration

55


Overview:



1. Face localization



2. FASIS: Try to authenticate real owner (Andrew Au)



3. FASIS: Try to authenticate non
-
face object (hand)



4. FASIS: Try to authenticate an audience member