The Study of Voiceprint technology As a Means of Voice Verification

acceptablepeasΑσφάλεια

30 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

67 εμφανίσεις


The Study of Voiceprint technology

As a Means of Voice Verification


Roy Morris


Instructor:

Charles C. Tappert



Abstract

This technical paper explains the
definition
, and
functionality

of Voiceprint

as a means of
authentication and verification. It describes the
metho
ds used to create a voiceprint
as well as
methods and techniques used to perform a
spectral analysis. The work also

lists some of the
current a
nd possible future applications
.

This
study explains some of the methods used in
measuring a voice sample such as the
analysis of a spectrogram

Introduction

As the amount of
internet
users and
information increase, there is a parallel
increase in the need for
stronger

security.

Nowadays, passwords alone do not provide
enough security
, and many users are more
seriously considering biometrics as a form
of protection through verificati
on
.
As the
study

of biometrics
expands, voice
authentication is being looked at as a form
of biometrics that can provide that added
extra security.

More financial institutions are
considering voice biometrics as a way to fight
call center fraud. That's be
cause other forms of
authentication are proving ineffective at a time
when socially engineered attacks against call
centers are on the rise


[3]
.

This

study is
focused on how well a person’s voice can be
individualized and used as a means of




authentication, and verification.

The
fundamental thoughts on voice verification
suggest that every voice can be
individualized enough to be able to identify
one person from the next thro
ugh the
analysis of a voiceprint
[5].

A person’s voice print is defined as “a
graphic record made by a sound
spectrograph of the energy patterns emitted
by speech
.

No two voiceprints are alike”

[6]
.

Through recording, editing
, and analyzing
the spec
trograp
hic features of a voice, we

can
study
what makes the voice a sufficient
means of verification.



For this study, each participant was
recorded repeating the phrase “My name is
Name
” twenty times. They were instructed
to speak relaxed, but clearly in order to get
clean voice samples. Once twenty utterances
were captured from
each

subject, the
background noise was filtered out so the
sample utterance was free of interfering
noise.

Thi
s is referred to as energy
thresholding.

Also, the phrase “My name is”
was isolated so that all samples repeated the
same utterance. This is done to improve the
study, as consistency is
important in all
studies.
Next,
a spectral analysis of the
utterance w
as conducted
.


A spectrogram is defined as a visual
representation of sound that shows the
amplitude of frequency components of a
signal over time.
A spectrogram is created
by a mathematical algorithm called FFT.
The signal is decomposed into its frequenc
y
components where time is shown on the x
-
axis while frequency is
displayed on the y
-
axis. Below
is an example of a spectrogram
created by one of the subjects in the study.



There are three main categories in which
speech recognition can be placed into.
They
are the acoustic phonetic approach, the
pattern recognition method, and the ar
tificial
intelligence technique [2].

The acoustic phonetic approach is based on
the theory that the specific phonetic sounds
can be found within the speech sample.

The pattern recognition method
is

one in
which the speech pattern are required
directly without explicit feature
determination and segmentation
”, and the
artificial intelligence technique’s greatest
advantage is that it allows for parallel
computation

One

of the predominant spectral analysis
techniques used in voice verification is the
Cepstral analysis. This analysis technique
essentially separates excitation and vocal
tract, the speech signal is given as

𝑠

𝑛

=


𝑛

×
𝑣

𝑛

(1)

Where
𝑣

(
𝑛
).
,
is the voc
al tract impulse
response and

(
𝑛
) is the excitation. The
entire frequency domain is shown as

S


=
𝐺


.
𝑉



(2)

Some other analysis techniques used for
speech verification are the Mel Cepstrum
Analysis, Human Factor Cepstrum Analysis,
LPC Analysis, PL
P Analysis, and a
Temporal Analysis.


The Mel
-
frequency’s advantage is
that it provides a more accurate response to
a human auditory system. It does this by
locating the frequency bands logarithmically
over the mel scale which provides a better
response of

the human auditory system than
other frequency bands derived from FFT.

The LPC Analysis is interesting.
This technique offers the idea that a speech
utterance can be determined by using a
linear combination of all of the other
previous speech samples.


L
PC analysis states that a given
speech sample for a signal at time
n,
𝑠

𝑛

.can
be represented as a linear combination of all
the previous
p
speech sample as given
below:
𝑠

𝑛

=
𝑎
1
𝑠

𝑛

1 +
𝑎
2
𝑠

𝑛

2 +

+
𝑎
𝑛
𝑠

𝑛

𝑛


[2].

In the

study
of this work’s
spectrogram
s
,
the

voice sample
s

were

segmented into

phonet
ic sounds
. The phrase
“My name is” for instance, has
seven
phonetic sounds.
D
ynamic Time Warping
can

also

be

used to segment the individual
sounds. Dynamic Time Warping (DTW) is a
non
-
linear

pattern recognition algorithm and
has become one of the main algorithms used
in modern day speech recognition. It
measures the similarity between two voice
samples.

Dynamic time warping establishes
an alignment

for t
wo sequences of feature
vectors [2].


M
any companies today are creating
speech recognition software for
all sorts of
consumer products
. For instance, a company
who considers itself amongst the leaders in
speech technologies has incorporated speech
recognition/verification software into

computer
s, automobiles, bluetooth devices,
mobile phones and even home appliances. It
appears that the list of uses for speech
verification is endless. Sensory reports a
False Acceptance Rate of 0.01% for its
products while the False
Reject Rate rests
just under 5
% [1].

These numbers

are
optimal, but are they real?

Most companies
who manufacture speech verification
software have similar numbers, but they do
not all operate at the same efficiency.

Conclusion

In conclusion to this paper
, many people
today are seeking

out biometrics as a means
to protect their information on and offline.
With the help of a quality voiceprint, speech
verification can be used to help secure one’s
information. Speech recognition can be
placed into three main categories
, t
he
acoustic phone
tic approach, the pattern
recognition method, and the artificial
intelligence technique. For this voiceprint
study,
the entire analysis of recorded voice
samples was partially completed.
a spectral
analys
is must be done in order to obtain the
numeric value
s of the voice samples
. T
he
science behind speech recognition is
growing and the applications of this
technology appear to be endless, when it is
perfected.

A research company named Opus
Research will be holding its annual Voice
Biometric Conference in Sin
gapore. “
We're
very pleased to showcase the ever
-
expanding set of present solutions and future
opportunities for voice biometrics to support
speaker identification and verification
around the world. With enrolled voiceprints
already exceeding
20

+

million
,

we’re
witnessing an accelerated deployment of
voice biometric
-
based solutions to support
trusted commerce

[4].

References

[1]
savedelete.com/7
-
best
-
free
-
speech
-
recognition
-
software
.html

[2]
Krishan Kant Lavania

“Reviewing Human
-
Machine Interaction
through Speech
Recognition approaches and Analyzing an
approach for Designing an Efficient System”

International Journal of Computer Applications
(0975


8887) Volume 38


No.3, January 2012


[3]
Tracy Kitten
,


Voice Biometrics as a Fraud
Fighter
,
Could
Emerging Technology Play New
Role in Call Centers?

,
May 22,
2012




[4]
Dan Miller
, “
Global Growth Brings Voice
Biometrics Conference To Singapore
” August
22, 2012,


http://voicebiocon.com/

[5]

Steve
Cain
,
http://expertpages.com/news/voiceprint
identification.htm
,

updated August 23, 2012.


[6]
Criminal Investigation
By Kären M.
Hess, Christine Hess Orthmann 2010
.



Appendix

Specifications of the Study

Each subject was asked to record the phrase


“My name is
Name
”. There were
20 utterances
recorded from each participant. Before the
subjects were asked to speak, they were each
instructed to speak naturally, but clearly while
recording the utterances. Utterances considered
“bad” were all deleted and re
-
recorded.

Hardware used

Computer:
Hewlett P
ackard Model G71
-
Notebook PC

Microphone:

IDT High Definition audio
CODEC

(Internal Microphone)

Software used

wavesurfer

version 8.5.8

-

waveSurfer is an
open source tool for sound visualization and
manipulation.
Some

applications are
for
speech analysis and s
ound annotation.
Wavesurfer was chosen for its ease of use
and user
-
friendly interface. This software
application was mainly used f
or creating
wav files and viewing spectrograms of the
samples collected.

http://www.speech.kth.se/wavesurfer/


Audacity 2.0.1



Audacity is a free tool to
download from the internet
http://audacity.sourceforge.net/.

It can
operate on many different operating systems


to record and mix sounds as well as a
number of other functions. The reason I
chose Audacity for the study, was for its
ab
ility to cleanly erase background noise,
and erase/cut the portions of sound that
weren’t needed

Subject Demographics

The subjects are from somewhat different
ethnic backgrounds, but all of them speak
English as their primary language, and none
of them
have foreign identifying accents.

Name

Age

Gender

Andrew Calipa

21

M

An
ita Valencia

22

F

Cody Tacktil

29

M

Desiree Morris

27

F

Michael Valencia

19

M

Noemi Valdovinos

22

F

Pauly Fidun

26

M

Rob Sorensen

21

M

Rob Weinstein

28

M

Roy Morris

30

M

Tony Federici

31

M