REPORT FOR BIOMETRICS SEMINAR
Gender and accent classification
for soft biometric authentication
Speech is a behavioral biometric. The pitch can be used to recog
ize gender of
Accent of a person depends on his/her ethnic origin, geographical and social conditions
as well as
individual conscious/unconscious changes from it. Thus the main objective of
this project is to use these and other speech
based features for Soft
authentication in a real time system
xtraction of Soft
features from speech data, to
be further used for
applications including gender
determination, accent recognition/classification
For the gender
based determination, the feature that is being used is pitch or in other
words the fundamental frequency of
Pitch is the most important
characteristic that can be used to differentiate between speakers.
Pitch is extracted from the signal in either time domain or frequency domain.
The male pitch frequency range lies within the interval of 100
z, while the female
interval for pitch frequency is 200
300Hz on an average.
Because a persons pitch may vary by a certain interval (+/
30Hz) due to various factors,
instead of using a hard
threshold to finally determine whether the speaker is
male or female,
we propose a method to provide loose, fuzzy classification using a
combination of weak classifiers over pitch representation in different domains.
to these classifiers is the pitch calculated using different spectral representat
ions of the
speech signal. These being Discrete Fourier Transform (DFT), Linear Predictive Coding
(LPC), and Cepstral Analysis.
Thus giving an adaptive
boundary to the decision control
This also needs analysis of every representation to
confidence. The method can also be extended to further sub classify Male & Female
classes into rough/normal/soft sounds
such as shrill, high, low, base tones.
Pitch calculation using Linear Predic
The LPC Coefficients or roots are found using the Levinson Durban Algorithm. These
represent the frequencies produced by the vocal tract. To find the pitch, we plot these
coefficients on a pole
zero plot. The pitch frequency will have th
e highest magnitude and
will thus be closest to the unit circle. We thus obtain the pitch frequency by finding the
pole pair closest to the unit circle.
The formula used to calculate the pitch frequency is,
Pitch = Sampling freq * ( θ / 2Π )
Angle of the pitch vector on the unit circle
Pitch calculation using Discrete Fourier Transform (DFT)
Frequency analysis of the speech signal involves the resolution of the signal into its
frequency (sinusoidal) components. The DFT of a speech sig
nal gives a pictorial
representation of the frequency components of the signal including the formant structure.
can be calculated from this frequency structure.
Pitch calculation using Cepstral Analysis
In the Cepstrum, the formant frequenci
es which may overlap over the pitch period, are
removed, resulting in an unerroneous detection of the pitch period.
The Cepstral coefficients called Quefrencies are plotted in the time domain and the
fundamental period between the
peaks in the quef
rencies give the pitch period of
the speech signal.
Pitch = 1 / (Fundamental Period calculated from peaks)
The accent of a person is affected by his/her native language. Thus in a voice sample, the
inherent features of n
ative language of speaker are always evident. We hope to extract
which are inherent to the native language of the speaker and not inherent to the
language being spoken, using various spectral representations of the speech signal.
The Accent Classification Part, which is still to be implemented, we are currently
collecting speech samples from individuals to be used temporarily
to work on,
standard speech database
which is ‘Foreign Accented English Corpus(FAE)’
om the Centre For Spo
ken Language Understanding, Oregon Graduate Institute of
Science and Technology.
corpus consists of American English utterances by
native speakers. It
contains 4925 telephone quality utterances from native speakers of 23
We are also in process to obtain NIST
TIMIT database providing speech data for the
acquisition of acoustic
phonetic knowledge. It consists of speech sampled from 630
speakers from 8 American dialects.