Mathematical and Computer Modelling 52 (2010) 19101920

Contents lists available at

ScienceDirect

Mathematical and Computer Modelling

journal homepage:

www.elsevier.com/locate/mcm

A novel approach to HMM-based speech recognition systems using

particle swarmoptimization

Negin Najkar

a

,

,Farbod Razzazi

a

,Hossein Sameti

b

a

Department of Electrical Engineering,Faculty of Engineering,Islamic Azad University,Science and Research Branch,Tehran,Iran

b

Department of Computer Engineering,Sharif University of Technology,Tehran,Iran

a r t i c l e i n f o

Article history:

Received 23 September 2009

Received in revised form25 December 2009

Accepted 31 January 2010

Keywords:

Hidden Markov model (HMM)

Particle swarmoptimization (PSO)

HMM-based speech recognition

Viterbi algorithm

a b s t r a c t

The main core of HMM-based speech recognition systems is Viterbi algorithm.Viterbi

algorithmuses dynamic programming to find out the best alignment between the input

speech and a given speech model.In this paper,dynamic programming is replaced by a

searchmethodwhichis basedonparticle swarmoptimizationalgorithm.The major idea is

focused on generating an initial population of segmentation vectors in the solution search

space and improving the location of segments by an updating algorithm.Several methods

are introduced and evaluated for the representation of particles and their corresponding

movement structures.In addition,two segmentation strategies are explored.The first

method is the standard segmentation which tries to maximize the likelihood function for

each competing acoustic model separately.In the next method,a global segmentation

tied between several models and the system tries to optimize the likelihood using a

common tied segmentation.The results showthat the effect of these factors is noticeable

in finding the global optimumwhile maintaining the systemaccuracy.The idea was tested

on an isolated word recognition and phone classification tasks and shows its significant

performance in both accuracy and computational complexity aspects.

'2010 Elsevier Ltd.All rights reserved.

1.Introduction

Hidden Markov model (HMM) is the base of a set of successful techniques for acoustic modeling in speech recognition

systems.The mainreasons for this success are due tothis model's analytic abilityinthe speechphenomenonandits accuracy

in practical speech recognition systems.Another major specification of HMM is its convergent and reliable parameter

training procedure.Spoken utterances are represented as a non-stationary sequence of feature vectors.Therefore,to

evaluate a speech sequence statistically,it is required to segment the speech sequence into stationary states.An HMM

model is a finite state machine.Each state may be modeled as a single Gaussian or a multi-modal Gaussians mixture.Due

to the continuous nature of speech observations,continuous density pdfs are often used in this model.The topology of an

HMMmodel for speech is considered to be left-to-right to meet the observations arrangement criterion.This left-to-right

topology authorizes transitions fromeach state to itself and to right-hand neighbors.HMMmodel parameters are usually

estimated in the training phase by maximumlikelihood based [

1

] or discriminative based training algorithms [

2

,

3

] using

sufficient training data sets.A continuous left-to-right HMMmodel parameters with N states and M mixtures can be stated

by D f;A;Bg. D f

i

g is the initial state distribution matrix,and A D fa

ij

g is the state transition probability distribution

matrix.The transition probabilities are defined as follows.a

ij

D PTq

tC1

D jjq

t

D iU is the transition probability fromstate i

Corresponding author.

E-mail addresses:

nnajkar@gmail.com

(N.Najkar),

razzazi@sr.iau.ac.ir

(F.Razzazi),

sameti@sharif.edu

(H.Sameti).

0895-7177/$ see front matter'2010 Elsevier Ltd.All rights reserved.

doi:10.1016/j.mcm.2010.03.041

N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920 1911

Fig.1.The overall block diagramof an automatic speech recognition system.

at time t to state j at time t C1 satisfying the following constraints:

a

ij

0;

N

X

iD1

a

ij

D 1I 1 i;j N:(1)

B D fb

j

.o

t

/g is the set of observation probability density per state,which may be represented by a multi-modal Gaussian

mixture model as

b

j

.o

t

/D

M

X

mD1

C

jm

G.o

t

;

jm

;

jm

/(2)

where C

jm

is the mixture coefficient for the mth mixture in state j.C

jm

satisfies the following constraints:

C

jm

0;

M

X

mD1

C

jm

D 1I 1 j N;1 m M:(3)

G.:/is a Gaussian distribution with mean vector

jm

and covariance matrix

jm

.

Fig.1

shows the overall block diagram of an automatic speech recognition system in the recognition phase.The

continuous input speech utterance is segmented into frames by the preprocessing module.In the next step,the feature

extraction module extracts a feature vector on each frame to represent its acoustic information.Hence,a discrete sequence

of feature vectors (observations),O D.o

1

o

2

:::o

T

/,is obtained.In an utterance classification task with vocabulary size v,

the unknown input speech is compared with all of the HMMs

i

according to some search algorithms,and finally,the input

speechis identifiedas one of the reference HMMs withthe highest score.Inmost HMM-basedsystems,Viterbi algorithm[

1

]

is the core of the recognition procedure.Viterbi algorithmis a full search method that tries all possible solutions to find the

best alignment path of the state sequence between the input utterance and a given HMM.The full search in HMMcan be

formulated as

LL D P.Oj/D max

q

1

q

2

:::q

T

PTq

1

q

2

:::q

T

;o

1

o

2

:::o

T

jU

D max

q

1

q

2

:::q

T

PT

q

1

b

q

1

.o

1

/

T

Y

tD2

a

q

t1

q

t

b

q

t

.o

t

/U (4)

where q

t

is the state at time t.The sequence q

1

q

2

:::q

t

denotes an alignment of observation sequence and speech HMM

and T is the length of the observation sequence.Obviously,as the search space increases,the computational cost increases

exponentially with O.N

T

/;therefore,it is impractical to solve this NP-complete problem.Viterbi algorithm extracts the

alignment path dynamically by a recursive procedure.

LL

t

.j/D max

1iN

TLL

t1

.i/a

ij

Ub

j

.o

t

/(5)

where LL

t

.j/is the partial cost function of the alignment path in state j at time t and LL

t1

.i/is the score of the best path

among possible paths that start fromfirst state and end in the ith state at time t 1.

Fig.2

shows a Viterbi trellis diagramin

which the horizontal axis represents the time axis of the input utterance and the vertical axis represents the possible states

of the reference HMM.

The computational complexity of this method is O.N

2

T/.Although it saves the computational cost and memory

requirements,it can however only be practically used where the length of the input utterance is short and the number

of HMMreference models is small.In particular,for continuous speech recognition,this is not usually the case.Hence,to

overcome this deficiency,a Viterbi beamsearch [

4

] has been presented.The main idea in beamsearch is to keep and extend

possible paths with higher scores.This approach may eliminate the optimality of the algorithm.

1912 N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920

Fig.2.Viterbi trellis diagram.

Recently,evolutionary algorithms (EAs) have been extended in speech recognition problems.However,there is little

research in using these algorithms in the recognition phase of HMM-based recognizers,and most studies have been

focused on the training phase.EAs are based on generating a group of randompopulation of possible solutions and using a

collaborative search in each generation to achieve better solutions than previous ones.In HMMtraining,genetic algorithms

(GAs) [

59

] andparticleswarmoptimization(PSO) [

1012

] havebeenstudiedinrecent years,whereeachindividual solution

is represented as an HMMand is encoded as a string of real numbers.The studies have revealed that PSO can yield better

recognition performance and more capability to find the global optimum in comparison with GA and the well-known

BaumWelch algorithm.These algorithms have also been applied in optimizing the nonlinear time alignment of template-

based speech recognition in the recognition phase [

13

,

14

].In these works,to solve the optimal warping path searching

problem,each potential solution is considered as a possible warping path in the search space.It was shown that using

PSO with a pruning strategy causes a considerable reduction in recognition time while maintaining the systemaccuracy.

In contrast,using a direct GA without pruning is not a promising approach.PSO has been used to solve many NP-complete

optimization problems [

15

,

16

].

Inthis paper,anovel approachis proposedtoapplyparticleswarmoptimizationstrategyintherecognitionphaseof aspeech

recognition systeminstead of the traditional Viterbi algorithmto deal with PSOperformance in finding the global optimum

segmentation.Preliminary results of this work were reported in [

17

].To explore the performance of the proposed system

performance,experiments were conducted on isolated word recognition and stop consonants phones.Stop consonants

classification is one of the most challenging tasks in speech recognition.In addition,a new classification method based

on a tied segmentation strategy is introduced.The method can be generalized to the continuous speech recognition case.

The remainder of this paper is organized as follows.The next section provides the details of the proposed PSO-based

recognition procedure.Section

3

presents the experimental results and in the last section the paper is concluded.

2.PSOtrellis recognition approach

Particle swarm optimization algorithm was originally introduced by Eberhart and Kennedy [

18

] in 1995.PSO is a

population-based evolutionary algorithm in which the algorithm is initialized with a population of random solutions

(particles),which are then flown through the hyperspace.To solve an optimization problem by the PSO algorithm,the

problemshould be appropriately modeled and mapped to PSO notation before the start of the optimization.Each particle

can be represented as a multidimensional vector X

i

D.x

i1

x

i2

:::x

in

/,where the vector elements and the fitness function

of each particle are determined depending on the problem domain.Each particle keeps track of its coordinates in the

hyperspace ineachiterationwhichare associatedwith the best solutionit has achievedsofar,whichis called pbest.pbest

i

D

.p

i1

p

i2

:::p

in

//and also keeps the overall best location obtained thus far by any particles in the population,which is called

gbest:gbest D.g

1

g

2

:::g

n

/.Therefore,the position of each particle changes toward pbest and gbest based on the particle

velocity,which is obtained in the next iteration as follows:

V

kC1

i

D!V

k

i

Cr

1

.pbest

k

i

X

k

i

/Cr

2

.gbest

k

X

k

i

/(6)

X

kC1

i

D X

k

i

CV

kC1

i

(7)

where!is the inertia weight. and are constants which guide the particles towards the improved positions.r

1

and r

2

are

uniformly distributed randomvariables between 0 and 1 and k denotes the evolution iteration.PSOalgorithmis terminated

N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920 1913

Fig.3.The concept of modification of searching points in the solution space.

Fig.4.An example of a four-state trellis diagramwith representation segmentation vector and state sequence vector.

after a maximal number of iterations or when the best particle position of the entire swarmcannot be improved further

after sufficient iterations.

Fig.3

shows the concept of modification of searching points through iterations.

2.1.Defining particles

To evaluate the PSO-based recognizer systemperformance,two methods were applied in defining particles.In the first

method (SS),each particle could be one of the possible allowable state sequences in the trellis of problemsolution space.

The trellis diagram,as is shown in

Fig.4

,is a state transition diagram which represents all of the possible states over a

sequence of time intervals.

An allowable path is specified by two properties.In the first,the path should be left-to-right in the sequence of state (i.e.as

the time increases,the state index increases).The second property may be stated as follows:each path starts fromthe first

state of each HMMand may (or may not) lead to the final state.

However,as we will showin the next section,this method for defining the particles leads to a local optimumsolution,while

this same technique has been used in [

14

] and provided good performance,which indicates the importance of particle

representation based on the problem.

The major idea of the second method (Segment) is focused on correcting and improving on the utterance segmentation.

This definition of particles gives better results in finding the global optimumsolution.In this procedure,each particle is a

segmentation vector,that is,its components are the transition locations fromone segment to another.Hence,the length of

the particle vectors reduces and is equal to the corresponding HMMsegments.Consequently,due to the reduction of the

search space,the exploration search may be better performed.If some repetitive elements are placed in the segmentation

vector,it is the representation of the number of jumps that occurs in the state sequence.This repetitive element is located

to have a constant length path vector that is essential in defining a proper movement of particles.

1914 N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920

Table 1

Movement structure.

Conditions Positions

x < Move in direction throughpbest

x ;x < C Move in direction throughgbest

x C;x < C C Move in randomdirection

Else Staying at the current position

In the first type of particle definition,the length of the path vector.X

i

D.q

i1

q

i2

:::q

it

:::q

iT

//is equal to the duration of the

utterance.In the second one,the length of each segmentation vector.X

i

D.s

i1

s

i2

:::s

in

:::s

iN

//is equal to the number of

reference HMMstates.

Fig.4

shows both types of generating particle.

2.2.Fitness function

To evaluate each particle in both mentioned methods,a unique fitness function is used.However,the particles which are

represented as segmentation vectors should be represented as their corresponding state sequence to calculate the fitness.

We can rewrite Eq.

(4)

in the logarithmic domain over one of the possible paths as a fitness function:

LL.X

i

/D log.b

1

.o

1

//C

T

X

tD2

flog.a

q

i.t1/

q

it

/C.log.b

q

it

.o

t

///g (8)

where LL.X

i

/is the ith particle log-likelihood,a

q

i.t1/

q

it

is the transition probability fromstate q

i.t1/

to state q

it

and b

q

it

.o

t

/

is the function of producing the observation o

t

in state q

it

.Since all paths start fromthe first state in the search space,the

first element of matrix is one (in the logarithmic domain it is zero).Therefore,the first termof Eq.

(4)

has been neglected

in Eq.

(8)

.

2.3.Movement structures

To update the positionof a particle at eachtime instant,two methods are commonly applied.Inthe first method(LCPSO),

the particle movement is modified as follows:

V

kC1

i

D .pbest

k

i

X

k

i

/C.gbest

k

X

k

i

/C .randX

k

i

X

k

i

/(9)

X

kC1

i

D X

k

i

CV

kC1

i

:(10)

This method is similar to the mentioned standard PSO algorithm;however,we added a random term to make the

algorithmrobust against a local optimumand omitted the inertia weight that scales the previous time step velocity.We

can rewrite Eqs.

(9)

and

(10)

as

X

kC1

i

D.1 . C C //X

k

i

Cpbest

k

i

Cgbest

k

i

C randX

k

i

:(11)

The coefficients , and that scale the influence of pbest,gbest and the random term should satisfy the following

constraints:

C C <1I ;; >0:(12)

These conditions guarantee the consistency of particles during generations which represent a left-to-right continuous

warping path,because,when the velocity update rule of SPSO is used for this isolated problem,it is possible to generate

a discontinuous path.This method leads to a local optimumwhile the second method (ProbPSO) is a solution to avoid this

problem.We usedthe updating methodbefore in[

14

];it is explainedin

Table 1

,where ,, are three constant parameters

that have been introduced to move a particle.Amovement is carried out due to the value of a uniformly distributed random

parameter x between zero and one.

2.4.Segmentation strategies

Inthetiedsegmentationmethod,anewmethodfor classificationhas beenpresented.Intheprivatesegmentationmethod

(

Fig.1

),for assessing the likelihood of input speech to each of the classes,the PSO-based algorithmis simulated separately

for each class.In each class,particles modify their positions by the gbest which is obtained on the corresponding class in

every iteration.However,in the tied segmentation method,a global segmentation is achieved by comparing the particles

score of all classes.The particles are updated in the direction of global segmentation.This comparing method causes the

score of the true class including this global segmentation to increase during generations and the score of the other classes

with this segmentation to decrease.However,the Viterbi algorithmis not able to handle this flexibility.

2.5.Computational complexity

The computational complexity of Segment-LCPSO is similar to that of the standard PSO algorithm.The Segment-LCPSO

algorithm traps into a local optimum;furthermore,it is a very slow process for this recognition task.However,in the

N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920 1915

Table 2

Implementation parameters of isolated word recognizer.

Test words#States#Mixtures Frame shift (ms) Frame length (ms)

=water=;=like=;=year=,=wash=;=greasy=,=dark=;=carry=;=oily= 4 9 12 20

Table 3

Implementation parameters of phone classifier.

Test stop phones#States#Mixtures Frame shift (ms) Frame length (ms)

=b=;=d=;=g=,=p=;=t=;=k= 3 16 10 16

proposed Segment-ProbPSO algorithm,due to its probability-based movement structure,the computational complexity is

less than that of the standard PSOalgorithm.To save the computational cost in generating initializing particles,we produce

a pre-saved look-up table of possible particles.In addition,we re-use the look-up table to move the particles in a random

direction.In summary,most of the calculations are focused on the fitness function computation (Eq.

(8)

).Since each particle

is evaluated at every iterationand inadditionsummations are proportional to the lengthof input speech,the computational

order of the proposed Segment-ProbPSO is O(M.I.T),where M is the population size,I is the required generation number

and T is the length of observation vector.It outperforms the classic Viterbi algorithmif the following constraint is satisfied:

M:I < N

2

.

3.Experimental results

3.1.Experimental setup

To evaluate the performance of the idea in an utterance classifier,a set of experiments was conducted on the eight most

frequently occurring words of standard TIMIT speech database which are presented in

Table 2

.The frequency of each word

is more than 460 in the training set and the words frequency is more than 160 in the test set.Although TIMIT is a continuous

speech recognition benchmark,the variety of words and speakers in TIMIT makes it a good benchmark for our task.

In addition,the idea was evaluated on a stop consonants phone classifier for six TIMIT stop phones which are presented

in

Table 3

.We eliminated the phones with length less than three frames in the test set.The total number of phones in the

resulting test set for six stop phones was 5184 phones.The overall block diagramof both HMM-based recognition systems

is similar to

Fig.1

.The BaumWelch algorithmwas applied to train a continuous density HMMof each word and phone.The

numbers of states in each word and each phone were assumed to be four and three,respectively.There are 9 mixtures per

state in the words model and 16 mixtures per state in the phones model.

In the preprocessing stage of both systems,the audio signal is transformed into 26 MFCC feature vectors.In the word

recognizer,featurevectors areextractedin20ms windows of theutteranceusingoverlapped8ms slidingframes.Incontrast,

in our phone classification test bed,the preprocessor produced the feature vectors every 10 ms for 16 ms length windows.

The first 12 features are based on 25 mel-scaled filter bank coefficients,the 13th element is a log energy coefficient and

the 13 remaining features are their first derivatives.The tests were simulated using Matlab 7.6 programming language.A

summary of implementation parameters is given in

Tables 2

and

3

and

Fig.5

shows the block diagramof our system.

3.2.Isolated word recognizer results

Figs.6

and

7

describe the overall behavior of the suggested system.

Fig.6

shows the effect of particles defining on the

convergence ratio with respect to Viterbi path likelihood.When the particles are presented as state sequence vectors,the

initial step starts fromlower ratios.In addition,the recognition systemis easily trapped into a local optimum.

The experiments show that if particles are defined as segmentation vectors and movement updating is considered as

being by the second method in Section

2.3

,the probability of finding the global optimum and approaching Viterbi path

likelihood increases.Therefore,Segment-ProbPSO is determined as the baseline method in all of the experiments and

optimizations have also been performed using this method.

Fig.7

shows the effect of movement structure.Although both curves start from a common point,their convergence

rates are different.Therefore,if the particles movement during generations is defined as a probabilistic structure,it is more

probable to find the global optimum.

The results in

Table 4

reveal that Viterbi and Segment-ProbPSOalgorithms are equal in error rates on average.Although

we have applied the recognition procedure of Viterbi algorithm for our system as the benchmark,this method provides

better results in some cases.This statement indicates the major drawback of the traditional recognition process that makes

the decision based on the comparison of best paths between unknown uttered word and given word models.

Fig.8

shows an example of a comparison between Viterbi and segment-ProbPSO recognition processes in 10 iterations.

The unknown input utterance belongs to word model 1.This test sample is recognized by Viterbi algorithmcorrectly while

1916 N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920

Fig.5.Speech recognizer block diagram.

100

99

98

97

96

95

94

93

92

(LL.Viterbi/LL.PSO)*100

Segment. ProbPSO

SS. ProbPSO

5 10 15 20 25 30 35 40

Iteration

Fig.6.Particles defining influence in convergence percentage to Viterbi path likelihood.

by the proposed algorithmit is recognized as the second word model after 10 iterations.However,it is obvious that,after

more generations,the system achieves the correct result.Therefore,more iteration was required for obtaining sufficient

N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920 1917

100

99

98

97

96

95

94

(LL.Viterbi/LL.PSO)*100

5 10 15 20 25 30 35 40

Iteration

Segment. ProbPSO

SS. LCPSO

Fig.7.Movement construction influence in convergence percentage to Viterbi path likelihood.

Table 4

Comparison error rates.

#Reference words Viterbi error rate (%) Segment-ProbPSO error rate (%)

2 0.86 0.86

4 0.58 0.44

6 0.87 0.66

8 0.73 0.73

convergence.In most cases,the difference between the competing models likelihood is large and desirable accuracy will be

reached in primary iterations.

Figs.9

and

10

report the systemoptimization results.

Fig.9

shows the influence of the , and coefficients on the

recognition error rate under fixed conditions for eight reference word models in 20 iterations.The optimumvalue of is 5

for D 15 and D 15.The optimumvalue of is 5 for D 5, D 15 and the optimumvalue of is 10 for D 5 and

D 5.

Fig.10

shows the effect of population size on the overall error rate.The computational cost increases with the increase

of population size.Therefore,the population size's optimumvalue is the position where the curve is saturated.This value is

about eight particles in the empirical curve depicted in

Fig.10

.

3.3.Phone recognition results

Considering the good performance of Segment-ProbPSO on an isolated word recognition system,some phone

classification experiments were conducted using Segment-ProbPSO with the optimum achieved values for , and in

the previous section.The results are depicted in

Fig.11

and

Table 5

which are compatible with the stops recognition results

in the literature [

19

].

Fig.11

shows the outstanding performance of this recognizer,which is even better than the PSO-based isolated word

recognizer inachievingtheglobal optimum.Furthermore,it is obvious that,intheinitial iteration,the gbest likelihoodvalues

of phone classes got close enoughtoViterbi's best pathvalue.Inaddition,the systemerror rate inthe first iteration,as shown

in

Table 5

,is 32:45%,which has negligible difference in comparison with the 30:29% baseline systemerror rate based on

Viterbi recognitionprocedure.Therefore,withafewadditional iterations,thesystemcaneasilyachievethedesiredaccuracy.

Table 5

shows the variations of systemerror rate versus different population sizes and iterations.If we neglect 1% or 2%

difference in error rate,we can claimthat,in the phone classification task,the proposed algorithmcomputational cost is

almost equivalent or even less than the computational size of Viterbi algorithm.

3.4.Tied segmentation method results

The results of the tiedsegmentationmethodin

Table 6

showthat bothclassifier types have almost the same performance.

However,in this method,the convergence rate to the desired accuracy rate is more than previous proposed methods.

1918 N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920

920

900

880

860

840

820

800

780

-Likelihood

-Likelihood

-Likelihood

0 2 4 6 8 10 12

Iteration

0 2 4 6 8 10 12

Iteration

0 2 4 6 8 10 12

Iteration

0 2 4 6 8 10 12

Iteration

reference1LL.Segment.ProbPSO

reference1LL.Viterbi

reference2LL.Segment.ProbPSO

reference2LL.Viterbi

reference4LL.Segment.ProbPSO

reference4LL.Viterbi

reference3LL.Segment.ProbPSO

reference3LL.Viterbi

851.7

1520

1500

1480

1460

1440

1420

1400

-Likelihood

1449

865

860

855

850

1000

980

960

940

920

900

880

851.6

898

Fig.8.An example of comparison of likelihood values versus number of iterations for Viterbi and Segment-ProbPSO methods for four reference word

models.

β Influence

α Influence

γ Influence

1.05

1

0.95

0.9

0.85

0.8

0.75

0.7

ErrorRate (%)

5 10 15 20 25 30 35 40 45 50

α, β, γ

Fig.9.The influence of , and on the recognition error rate for eight reference word models in 20 iterations.

Initial Population Influence

Initial Population

0 2 4 6 8 10 12 14 16 18

ErrorRate(%)

3.5

3

2.5

2

1.5

1

0.5

Fig.10.The effect of population size on recognition error rate for 20 iterations and D D 5 and D 10.

N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920 1919

0 5 10 15 20 25 30 35 40

Iteration

100

99.5

99

98.5

(LL.Viterbi/LL.PSO)*100

Fig.11.Convergence percentage of Segment-ProbPSO's gbest likelihood values to the Viterbi path likelihood.

Table 5

Phone classifier error rates (%).

#Iterations

1 2 3 4 5 6

#Particles

1 39.12 36.94 35.63 34.88 34.22 33.12

2 36.65 34.86 33.72 33.04 32.64 32.31

3 35.38 33.35 32.48 31.85 31.58 31.46

4 34.26 32.54 31.98 31.69 31.27 31.15

5 33.04 32.37 31.69 31.19 30.92 30.90

6 32.45 31.79 31.53 30.74 30.63 30.55

Table 6

Phone classifier error rates (%).

#Iterations

1 2 3 4 5 6

#Particles

1 39.12 37.58 36.34 34.68 33.97 33.02

2 36.65 34.95 33.31 32.91 32.43 31.91

3 35.38 33.24 32.48 31.73 31.60 31.20

4 34.26 32.20 31.46 31.15 31.02 30.94

5 33.04 31.83 31.15 30.90 30.64 30.22

6 32.45 31.13 30.83 30.63 30.55 30.13

4.Conclusion and future work

Although there are several methods for speech recognition,it is still an open problemdue to lack of a fast and accurate

algorithm.In this paper,a newrecognition approach was introduced based on particle swarmoptimization.The major idea

in this approach is focused on generating an initial population of segmentation vectors in the solution search space and

correcting and improving the location of segments.The algorithmwas tested on both isolated word speech recognition and

phone classification tasks.The experimental results show that this idea works properly to move toward global optimum

while maintaining the Viterbi systemaccuracy.

Considering the computational complexity of the PSO-based recognition procedure and its pruning capability before

achievingthe best path,it seems that this methodcouldbe well employedincontinuous speechrecognitiontasks.Therefore,

we are pursuing our research on continuous speech recognition.

References

[1]

L.R.Rabiner,A tutorial on Hidden Markov Models and selected applications in speech recognition,Proceedings of the IEEE 77 (2) (1989) 257286.

[2]

S.Mizuta,K.Nakajima,A discriminative training method for continuous mixture density HMMs and its implementation to recognize noisy speech,

Journal of the Acoustical Society of Japan 13 (6) (1992) 389393.

[3]

Q.Y.Hong,S.Kwong,A training method for hidden Markov model with maximum model distance and genetic algorithm,in:Proceedings of IEEE

International Conference on Neural Networks and Signal Processing,2003,pp.465468.

[4]

H.Ney,Dynamic programming parsing for context free grammars in continuous speech recognition,IEEE Transaction on Signal Processing 39 (2)

(1991) 336340.

[5]

S.Kwong,C.W.Chau,K.F.Man,K.S.Tang,Optimisation of HMMtopology and its model parameters by genetic algorithms,Pattern Recognition 34 (2)

(2001) 509522.

[6]

C.W.Chau,S.Kwong,C.K.Diu,W.R.Fahrner,Optimizationof HMMby a genetic algorithm,in:Proceedings of the International Conference onAcoustics,

Speech,and Signal Processing,vol.3,1997,pp.17271730.

1920 N.Najkar et al./Mathematical and Computer Modelling 52 (2010) 19101920

[7]

F.Yang,C.Zhang,G.Bai,A novel genetic algorithmbased on tabu search for HMMoptimization,in:Proceedings of the 4th International Conference

on Natural Computation,vol.4,2008,pp.5761.

[8]

S.Kwong,Q.H.He,K.W.Ku,T.M.Chan,K.F.Man,K.S.Tang,Agenetic classification error method for speech recognition,Signal Processing 82 (5) (2002)

737748.

[9]

P.Bhuriyakorn,P.Punyabukkana,A.Suchato,A genetic algorithm-aided Hidden Markov Model topology estimation for phoneme recognition

of thai continuous speech,in:Proceedings of the 9th International Conference on Software Engineering,Artificial Intelligence,Networking,and

Parallel/Distributed Computing,2008,pp.475480.

[10]

L.Xue,J.Yin,Z.Ji,L.Jiang,A particle swarmoptimization for Hidden Markov Model training,in:Proceedings of the 8th International Conference on

Signal Processing,vol.1,2006,pp.791794.

[11]

H.Sajedi,H.Sameti,H.Beigy,B.Babaali,Discriminative training of Hidden Markov Model using PSO algorithm,in:Proceedings of 12th Annual

International CSI Computer Conference,2007,pp.295302.

[12]

F.Yang,C.Zhang,T.Sun,Comparison of particle swarm optimization and genetic algorithm for HMM training,in:Proceedings of the International

Conference on Pattern Recognition,2008,pp.14.

[13]

S.Kwong,C.W.Chau,W.A.Halang,Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems,IEEE

Transaction on Industrial Electronics 43 (5) (1996) 559566.

[14]

S.Rategh,F.Razzazi,A.M.Rahmani,S.O.Gharan,A time warping speech recognition systembased on particle swarmoptimization,in:Proceedings of

the International Conference on Modeling and Simulation,2008,pp.585590.

[15]

Y.Shi,R.C.Eberhart,A modified particle swarmoptimizer,in:Proceedings of the IEEE International Conference on Evolutionary Computation,IEEE

Press,Piscataway,NJ,1998,pp.6973.

[16]

J.Yan,H.Tieson,H.Chongchao,W.Xianing,A modified particle swarmoptimization algorithm,in:Proceedings of the IEEE International Conference

on Computational Intelligence and Security,2006,pp.421424.

[17]

N.Najkar,F.Razzazi,H.Sameti,A novel approach to HMM-based speech recognition system using particle swarm optimization,in:Proceedings of

IEEE International Conference on Bio-Inspired Computing:Theories and Application,2009,pp.16.

[18]

J.Kennedy,R.C.Eberhart,Particle swarm optimization,in:Proceedings of IEEE International Conference on Neural Networks,IEEE,Piscataway,NJ,

1995,pp.19421948.

[19]

P.K.Ghosh,S.S.Narayanan,Closure duration analysis of incomplete stop consonants due to stopstop interaction,Journal of the Acoustical Society of

America 126 (1) (2009) El 17.

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο