A NOVEL DATA DESCRIPTION KERNEL BASED ON ONECLASS SVM
FOR SPEAKER VERIFICATION
*
Yufeng Shen and Yingchun Yang
College of Computer Science and Technology,
Zhejiang University, Hangzhou, P.R. China, 310027
ABSTRACT methods are straightforward such as Generalized Linear
Discriminant Sequence (GLDS) kernels by Campbell [5],
In this paper we develop a novel Data Description kernel where mapping is done using simple polynomial expansion
based on OneClass SVM (OCSVMDD kernel) used for Some other methods rely on using data description models
textindependent SVM speaker verification. The basic idea to map utterances such as Fisher kernels by Jaakkola and
of the new kernel is to combine the data description model Haussler [6], Probabilistic Distance Kernels by P. Moreno
OCSVM with SVM discriminant classifier. Utterances are and P.P.Ho [7] and Pair HMM kernels by Durbin [8].
firstly mapped to the normal vector of the separating Though GLDS kernels’ mapping method through
hyperplane in OCSVM model. Then a SVM classifier with polynomial expansion is simple and cheap in computation,
linear kernel is applied on those mapped vectors. it actually does little in modeling of the utterance and does
Experiments results on NIST 2001 SRE database show that not extract enough feature information from utterances. On
the performance of our new kernel is superior to the other hand, mapping methods using data description
models can benefit a lot from their data characterizing
Generalized Linear Discriminative Sequence (GLDS) kernel
and comparative with UBMMAPGMM method. ability. Based on these observations, we develop a new
kernel whose construction of feature space is similar to
Index Terms—Speaker verification, SVM, Kernel GLDS kernels’ method while the characterizing abilities of
OneClass SVM the mapped vectors are improved by a new data description
model: OneClass SVM (OCSVM) [9].
OCSVM is a variation of standard SVM which deals with
1. INTRODUCTION
the situation where only one class of example data can be
obtained. The objective of OCSVM is to find a hyperplane
Support Vector Machine (SVM) [1] has been widely used in
to separate the only positive examples from the origin with
Speaker Verification fields for its excellent classifying
maximum margin. We choose OCSVM as the data
ability and generalizing capacity. The performance of SVM
description model in kernel construction for its strong data
is comparable with those stateoftheart classifiers such as
descriptive ability. So the new kernel is called OneClass
GMMs [2], while requiring relatively less training data.
Initial Speaker recognition works using SVMs by SVM based Data Description (OCSVMDD) kernel.
This paper is organized as follows: section 2 provides
Schmidt and Gish [3], Wan and Campbell [4] employed
framelevel classification: train and test are performed on some background knowledge; section 3 gives the detailed
description of OCSVMDD kernels; experimental
the frame level and the scores of each frame are combined
to obtain the overall score of an utterance. This method has evaluation and results are presented in section 4; finally,
section 5 is the conclusion.
two main disadvantages: one is that the amount of frame
data is too large for efficient computation; the other is that
2. BACKGROUND KNOWLEDGE
the sequence information contained in the utterance is lost
when each frame is treated individually.
Due to those drawbacks of framelevel classification, 2.1. GLDS kernels
utterancebased kernel methods are now the mainstream
n
methods in SVM speaker verification fields. The basic idea
For a sequence of observations x : x , x ,..., x the
1 1 2 n
of utterancebased kernel method is to map a whole
n
mapping x b is defined as
utterance to a single vector in feature space and do SVM
1
classification on those mapped vectors. Some mapping
*
Corresponding author
1424407281/07/$20.00 ©2007 IEEE II 489 ICASSP 2007n
k(x, y) ( (x) (y)) (3.2)
1
n
x b(x ) (2.3)
1 i
OCSVM’s objective of finding the optimal hyperplane can
n
i 1
be formulated in a quadratic program (QP) problem
where b(x) is an expansion of the input space into a vector
1 2 1
min
3.3
of scalar functions. Usually the b(x) is chosen to be the
i
F , R , R
2
i
vector of polynomial basis terms of the input vector x.
Subject to
Given two sequences of speech feature vectors,
n m
( (x )) , 0 (3.4
x y i i i
and , the GLDS kernel is defined as
1 1
where is the normal vector of that separating hyperplane
n m t 1
K (x , y ) b R b (2.4)
GLDS 1 1 x y
and parameter controls the tradeoff between and
where matrix R is trained from the speech data of both
slack variables .
speakers and imposters and in essence is used to normalize
After solving this QP problem, the final decision function is
t
the mapped vectors. b andb .
x y f (x) sgn( k(x , x) ) (3.5)
i i
i
2.2. Data description model and discriminant classifier
where all patterns x in equation (3.5) are support vectors.
i
The feature of OCSVM is that the framework of a two
For a discriminant classifier to achieve good performance,
class classifier is reconstructed to do the job of oneclass
the prerequisite is that the extracted feature vectors can
data description. And the data characterizing ability of
convey enough information of example data. The central
OCSVM is comparative with classic probabilistic models
idea of utterancebased method in speaker verification tasks
such as GMMs and HMMs.
is to map the whole utterance to a single vector as the input
of discriminant classifier. So a good mapping should be able
3.2. Conception of the OCSVMDD kernel
to extract useful information contained in utterances and
encode them into the mapped single vector.
When substituting equation (3.2) into equation (3.5)
Data description model is a good tool to implement such
f (x) sgn( k(x , x) ) (3.5)
mapping: wellconstructed descriptive model can accurately i i
i
characterize the utterance features and wellselected model
parameters can be used as the feature vector to represent the
sgn( (x ) (x) ) (3.6)
i i
model.
i
Classic descriptive models such as GMMs and HMMs
sgn(( (x)) ) (3.7)
have been used in kernel construction [6] [7] [8]. Both
Where (x ) is the normal vector of the
GMMs and HMMs are probabilistic models. In the next
i i
i
section, we will construct our kernel using a descriptive but
separating hyperplane in OCSVM.
non probabilistic model: OneClass SVM.
Viewed in another way, the inner product
3. ONECLASS SVM BASED DATA DESCRIPTION (x) can be thought as the similarity between the
KERNEL
testing point x and the already trained model. Constant
is the threshold. So the normal vector is actually a
3.1. Review on OneClass SVM
weight vector, reflecting (x) ’s each dimension’s
The conception of OCSVM [9] is to separate the only contribution to the total similarity (x) . Or we can say
positive examples from the origin with maximum margin.
that well characterizes the OCSVM model.
We can view OCSVM as a descriptive model for it actually
With this observation, we have good reason to believe
estimates the distribution of positive examples in the high
that normal vector well represents the whole utterance.
dimension space through kernel mapping.
So comes the idea of our new kernel: mapping the utterance
We first introduce terminology and notation
to the normal vector and then use as the input of
conventions. We consider training data
SVM classifier.
x , x ,...... x
(3.1)
1 2
From the definition (x ) we can see that
i i
i
Where is the number of observations. Let be a
to compute the concrete form of must be known first.
feature map F , then by evaluating some simple
Usually SVM performs the mapping implicitly through
kernel functions we can compute the inner product of the
simple kernel function and it is hard to get the concrete
image of in the feature space F
expression of . Some special polynomial kernels are
II 490exceptions and we will use a kind of specific polynomial cepstral vector is extracted from the speech signal every
kernel functions to accomplish the mapping. 16ms using a 32ms window. Deltacepstral coefficients are
N
then computed and appended to the cepstral vector to form a
We define to map x R to the vector (x)
d d
32dimensional feature vector. Lastly, to make the features
whose entries are all possible dth degree ordered products
more robust to different channel and noise effects, we also
of the entries of x . Then the corresponding kernel
map the raw features to the standard normal distribution,
computing the dot product of vectors mapped by is using feature warping described in [11].
d
d
k(x, y) (x) (y) (x y) (3.8)
d d
4.2. OCSVMDD kernel based system
The proof is straightforward:
N N
OCSVM is implemented using LIBSVM [12]. Both degree
(x) (y) ... x ... x y ... y
d d j j j j 2 and 3 polynomial kernel functions are tried and we set the
1 d 1 d
j 1 j 1
1 d
penalty parameter C = 1 (the one resulting in the best
N N
performance according to experiences). In the classification
x y x y
of SVM, we use linear polynomial kernel and set C = 1.
j j j j
1 1 d d
j 1 j 1
1 d In practical implementation one optimization about the
N
mapping function can be done: the dimension of
d d d
( x y ) (x y)
j
j
d
feature space is p after mapping , where p is the
j 1
d
So if the kernel function is chosen to be the form
dimension of the original input space. Since the mapping
d
k(x, y) (x y) in OCSVM, then the map has the is ordered, there are many redundant components in
d d
(x ) the mapped vector (x) for that many components of
explicit expression and can be computed
i i d
i
(x) are the product of the same entries of x with
d
explicitly.
The definition of OCSVMDD kernel is given by: (x)
different orders. An unordered version of in the
d
k(A, B) (3.9)
computing of can reduce the dimension of feature space
A B
where A and B represent two utterances and and 1
A B
by a factor of about .
are the normal vector in A and B’s OCSVM models.
d!
The mapped space of OCSVMDD kernel is similar to
After the computation of on all utterances,
GLDS kernels’ in that both are explicitly constructed
normalization is preferred to control the variability between
through polynomial expansion. The difference is that for
different of different speakers. In our experiments a
GLDS kernels, once all the frames are mapped to feature
vectors, they are simply summed and averaged (see
simple normalization is used where is
equation 2.3); while for OCSVMDD kernel, a descriptive
the mean of all utterances and is the stand deviation
model OCSVM is constructed on those mapped frames and
a representative vector (normal vector ) is chosen to be computed separately along each dimension on all utterances.
the feature vector. We will see how this difference can
4.3. Reference systems
affect the performance of SVM classifier in the next
experiments section.
4.3.1 GLDS kernel based SVM system
4. EXPERIMENTS
The first reference system is a SVM system with GLDS
kernel. Comparison between GLDS kernel and OCSVM
4.1. Database and frontend processing
DD kernel can show how modeling of input data in the
mapping process can affect the performance of classifier. In
Experiments are performed on the NIST2001 SRE database
experiments, we try GLDS kernels with both degree 2 and
according to the rules of onespeaker detection evaluation
described in evaluation plan [10]. In the database there are
degree 3 polynomial expansion. Matrix R in equation (2.4)
174 target speakers of which 74 are male and 100 are
is trained using DEVTEST database and diagonal matrix is
female. For the training, each speaker has a speech lasting
used.
about 1~2 minutes. For the testing, there are about 2200 test
segments and each is evaluated against 11 hypothesized
4.3.2 UBMMAPGMM system
speakers of the same sex as the segment speaker.
In the frontend processing, a 16dimensional mel
II 491
The other reference system is UBMMAPGMM [13] based. the widely used GLDS kernels and achieves comparative
UBMMAPGMM represents the highest level technology experiment results with UBMMAPGMM system. One
in speaker verification field. Comparison with this stateof main drawback of our new method is that it takes a long
theart system can test the validation of our new kernels. In time to train an OCSVM for each utterance. So for the
our experiments, 2048 components Gaussian Mixture future work, we will focus on decreasing the time
Models (GMM) with diagonal covariance matrices are used. complexity of OCSVM training while improving, at least
The male and female background models are trained retaining, the performance of our new kernel.
respectively using the DEVTEST database and then each
target speaker’s model is derived from the corresponding Acknowledgments. This work is supported by National
background model according to a MAP criterion [14]. Science Fund for Distinguished Young Scholars 60525202,
Program for New Century Excellent Talents in University
4.4. Results NCET040545 and Key Program of Natural Science
Foundation of China 60533040, Zhejiang Provincial Natural
System EER (%) Min DCF Science Foundation (Y106705), National 863 Plans
(2006AA01Z136).
GLDS Kernel (d=2) 14.2 0.068
GLDS Kernel (d=3) 11.4 0.061
6. REFERENCES
OCSVMDD Kernel (d=2) 14.0 0.063
OCSVMDD Kernel (d=3) 9.6 0.049
[1] V. N. Vapnik, Statistical Learning Theory. New York: Wiley,
UBMMAPGMM 10.5 0.044
1998.
Table 1. the experiment results comparing OCSVMDD kernel
[2] G. Doddington, M. Przybocki, A. Martin, and D. Reynolds,
with GLDS kernels and UBMMAPGMM, using the criterion of
“The NIST speaker recognition evaluationOverview,
Equal Error Rate (EER) and minimal DCF.
methodology, systems, results, perspective” Specch Common, vol.
31. no. 23,pp, 225254, 2000
[3] M. Schmidt and H. Gish, “Speaker identification via support
vector classifiers,” in Proc. ICASSP, vol.1, 1996, pp.105108
[4] V. Wan and W. M. Campbell, “Support vector machines for
speaker verification and identification,” in Proc, Neural Networks
for Signal Processing X, 2000, pp. 775784
[5] W. M. Campbell, “Generalized linear discriminant sequence
kernels for speaker recognition,” in Proc. ICASSP, 2002.
[6] T. S. Jaakkola and D. Haussler, “Exploiting generative models
in discriminant classifiers”, in Advances in Neural Information
Processing Systems 11, M. S. Kearns, S. A. Solla, and D. A. Cohn,
Eds, MIT Press,1999.
[7] PJ Moreno, and PP Ho. A New SVM Approach to Speaker
Identification and Verification Using Probabilistic Distance
Figure 2. DET plots showing the comparison of OCSVMDD
Kernels. in Eurospeech. 2003. Geneva, Switzerland.
kernel with GLDS kernel and UBMMAPGMM system.
[8] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological
Sequence Analysis, Cambridge University Press, 1998.
Experiment Results are showed in Table 1 and the Detection
[9] B. Scholkopf, J. C. Platt, J. T. Shawe, A. J. Smola, R. C.
Error Tradeoff curves are presented in Figure 2. The metric
Williamson, “Estimating the support of a highdimensional
is Equal Error Rate and Detection Cost Function [10].
Distribution”, Technical Report MSRTR9987, Microsoft
From the results we can see that OCSVMDD kernel is
Research
superior to GLDS kernel in terms of both EER and min [10] “The NIST Year 2001 Speaker Recognition Evaluation Plan”,
DCF, verifying that modeling of the utterance using http://www.nist.gov/speech/tests/spk/2001/
[11] J.Pelecanos and S.Sridharan, “Feature warping for robust
OCSVM is better than the method of simple polynomial
speaker verification”, Proc. Speaker Odyssey 2001
expansion used in GLDS kernel. Although the UBMMAP
conference, June 2001.
GMM system has lower Min DCF, our new kernel has
[12] ChihChung Chang and ChihJen Lin, LIBSVM : a library for
better results in the EER. So the performance of OCSVM
support vector machines, 2001. Software available at
DD kernel based system is comparative with UBMMAP
http://www.csie.ntu.edu.tw/~cjlin/libsvm
GMM system as a whole
[13] Frederic Bimbot “A Tutorial on TextIndependent Speaker
Verification”, EURASIP Journal on Applied Signal Processing
5. CONCLUSION 2004:4,430451
[14] Gauvain, J. L. and Lee, C.H., Maximum a posteriori
estimation for multivariate Gaussian mixture observations of
In this paper we present a new OCSVMDD kernel applied
Markov chains, IEEE Trans. Speech Audio Process. 2 (1994), 291
in SVM speaker verification system. By exploiting the good
298
modeling ability of OCSVM, our new kernel outperforms
II 492
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment