Gender and accent classification for soft biometric authentication

superfluitysmackoverSécurité

23 févr. 2014 (il y a 3 années et 4 mois)

102 vue(s)






CSE 717




PRELIMINARY
REPORT FOR BIOMETRICS SEMINAR


Gender and accent classification

for soft biometric authentication



0
2000
4000
6000
8000
10000
12000
14000
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1






SUBMITTED BY





SWAPNIL KHEDEKAR




SHAMALEE DESHPANDE







PROJECT OBJECTIVE


Speech is a behavioral biometric. The pitch can be used to recog
n
ize gender of
a
speaker.
Accent of a person depends on his/her ethnic origin, geographical and social conditions
as well as
individual conscious/unconscious changes from it. Thus the main objective of
this project is to use these and other speech
-
based features for Soft
-
Biometric
authentication in a real time system

By e
xtraction of Soft
-
Biometric

features from speech data, to
be further used for
applications including gender
-
determination, accent recognition/classification




TASKS



Gender Determination:


For the gender
-
based determination, the feature that is being used is pitch or in other
words the fundamental frequency of

the speaker.

Pitch is the most important
characteristic that can be used to differentiate between speakers.


Pitch is extracted from the signal in either time domain or frequency domain.

The male pitch frequency range lies within the interval of 100
-
200H
z, while the female
interval for pitch frequency is 200
-
300Hz on an average.


Because a persons pitch may vary by a certain interval (+/
-

30Hz) due to various factors,
instead of using a hard

predefined

threshold to finally determine whether the speaker is

male or female,
we propose a method to provide loose, fuzzy classification using a
combination of weak classifiers over pitch representation in different domains.

The input
to these classifiers is the pitch calculated using different spectral representat
ions of the
speech signal. These being Discrete Fourier Transform (DFT), Linear Predictive Coding
(LPC), and Cepstral Analysis.
Thus giving an adaptive

boundary to the decision control

to
be implemented
.
This also needs analysis of every representation to
provide individual
confidence. The method can also be extended to further sub classify Male & Female
classes into rough/normal/soft sounds

such as shrill, high, low, base tones.













Implementation Details:



1.
Pitch calculation using Linear Predic
tive Coding

(LPC)


The LPC Coefficients or roots are found using the Levinson Durban Algorithm. These
represent the frequencies produced by the vocal tract. To find the pitch, we plot these
coefficients on a pole
-
zero plot. The pitch frequency will have th
e highest magnitude and
will thus be closest to the unit circle. We thus obtain the pitch frequency by finding the
pole pair closest to the unit circle.

The formula used to calculate the pitch frequency is,


Pitch = Sampling freq * ( θ / 2Π )


Where,

θ =

Angle of the pitch vector on the unit circle





2.
Pitch calculation using Discrete Fourier Transform (DFT)


Frequency analysis of the speech signal involves the resolution of the signal into its
frequency (sinusoidal) components. The DFT of a speech sig
nal gives a pictorial
representation of the frequency components of the signal including the formant structure.

Pitch
can be calculated from this frequency structure.





3.
Pitch calculation using Cepstral Analysis


In the Cepstrum, the formant frequenci
es which may overlap over the pitch period, are
removed, resulting in an unerroneous detection of the pitch period.

The Cepstral coefficients called Quefrencies are plotted in the time domain and the
fundamental period between the
highest
peaks in the quef
rencies give the pitch period of
the speech signal.


Pitch = 1 / (Fundamental Period calculated from peaks)










Accent Classification:


The accent of a person is affected by his/her native language. Thus in a voice sample, the
inherent features of n
ative language of speaker are always evident. We hope to extract
feature
s

which are inherent to the native language of the speaker and not inherent to the
language being spoken, using various spectral representations of the speech signal.



Database:


For
The Accent Classification Part, which is still to be implemented, we are currently
collecting speech samples from individuals to be used temporarily

to work on,

till we
receive the
standard speech database

which is ‘Foreign Accented English Corpus(FAE)’
fr
om the Centre For Spo
ken Language Understanding, Oregon Graduate Institute of
Science and Technology.
Th
e

corpus consists of American English utterances by
non
-
native speakers. It
contains 4925 telephone quality utterances from native speakers of 23
langua
ges

We are also in process to obtain NIST


TIMIT database providing speech data for the
acquisition of acoustic
-
phonetic knowledge. It consists of speech sampled from 630
speakers from 8 American dialects.