Speech recognition using the classifiers based upon hidden ... - TUKE

cheesestickspiquantAI and Robotics

Nov 17, 2013 (3 years and 6 months ago)

51 views

SCYR 2010

-

10
th
Scientific Conference of Young Researchers

FEI TU of Ko
šice

Speech recognition using the classifiers based upon
hidden Markov´s models

1
Radoslav Bučko
,
2
J
án Molnár

1
Dept. of Theoretical Electrotechnics and Electrical Measurement, FEI TU of Košice, Slovak Republic

2
Dept. of Theoretical Electrotechnics and Electric
al Measurement, FEI TU of Košice, Slovak Republic


1
radoslav.bucko@tuke.sk,
2
jan
.
m
olnar
@tuke.sk


Abstract


This paper describes recognition
of
spoken speech
and problem with recognition
,

b
asic part of recognizer and
especially
classifiers based on

statistic methods

called
Hidden

Markov

models.
H
idden
markov models are describe
d

on
example
.





Key
words

speech recognition, classifiers, hidden markov
model
.


I.

I
NTRODUCTION

Comm
unication using verbal speech is the most basic,
most natural and most important form of information transfer
between people. If computer or other device has to be
commanded by voice, acoustic signal from speaker, voice
recognition and understanding the in
formation has to be
technically and algorithmically solved. This problem is
being solved from last century and still is not finished. The
main reasons are:

a) Speaker´s voice can be different in various conditions.
Stress, illness, loudness of voice and
even ageing change the
voice signal. Coarticulation changes the fonetical properties
of the beginning and end of the word, depending upon the
context with other words.

b) Different speakers have different voices. Every speaker
has different voice colou
r, accent, speed of speech and
more...Voice recognizing is divided into two groups: speaker
dependent and speaker independent.

c) Changing environment causes trouble for speech
recognition. With increased levels of interference the
identification of begin
ning nad end of the word is more
difficult.

d) Recorded voice can be degraded by quality of
microphone or by distance from it.
[1]



Voice recognition software consist of 2 main parts. First
part is the signal processing, which results in sequence of
obser
vations (mostly vectors). Second part are the classifiers,
which assign the most suitable word from the dictionary to
each sequence. Based upon words in the dictionary there are
2 main groups

recognition with small amount of words and
with big amount of
words.



From the point of applied methods we can divide
classifiers into:

a) classifiers, which process the word as a whole and it is
assigned to class, which is nearest to the example image


this distance is usually defined by applied methods of
dynamic
programming.

b) classifiers which use the classification based upon
statistical methods

words are modeled using the so called
hidden Markov´s models.
[2]

II.

H
IDDEN
M
ARKOV
´
S
M
ODELS

Markov´s
proces
s
G
with hidden Markov´s model can be
expressed using
the pentad:


G = ( S, O, A, B, π )


(
1
)
.



Fig. 1. Markov´s model with 3 states and given probabilities of crossings


Where
:

-

S = { s
1
, s
2
, …,s
N
}

is the file of individual states of
Markov´s model,

-

O = {
o
1
, o
2
, …,o
L
}
is the alphabet of L output
symbols of vector quantiser,

-

A = [a
ij
]

is the matrix of crossings, its elements
define the probability of system crossing from state

a
i

in time t to state a
j
in time
t+1
,

We can say that:







N
j
i
a
t
a
a
t
s
P
a
i
j
ij






,
1
,
|
1
,

-

B = [b
jl
] = [
b
j
(l)]

is the matrix of generated
examples probability. It determines the probability of
SCYR 2010

-

10
th
Scientific Conference of Young Researchers

FEI TU of Ko
šice

generating
l
entry of the final file of spectral
examples when the system is in state
a
i
,


We can say that:









j
l
j
jl
s
t
s
o
t
o
P
l
b
b




|
,

N
j
L
l




1
,
1
,

-

π = [π
i
]

is the column vector of starting state
probability
,

We can say that:





N
i
s
s
P
i
i




1
,
1

.


For parameters
π
i
,
a
ij
a
b
j
(l)
we can apply these conditions:




N
i
i
1
,
1






N
j
ij
a
1
1

pre i = 1,...,N ,




1
1



L
l
j
l
b

pre j = 1,...,N .
[3]


We will describe

λ

as

λ

= ( A, B,
π
).


We have 3 enclosed opaque containers with the opening
for the arm. These containers will be representing individual
states of Markov´s model (set S)

s
1
,
s
2
a

s
3
.
In every container
there are 7 b
alls labelled as

A
,
B
,
C
a

D
.
These balls
represents the alphabet of four output symbols of vector
quantizer and

o
1
=A, o
2
=B, o
3
=C, o
4
=D.
[3]

There is a rotary arrow on every container, which points on
3 different parts of circle labelled as
s
1
,
s
2
a

s
3.
Arr
ow will
represent randomness of container choice, form which the
ball will be chosen. Every container has its own arrow
because of different probabilities..


Probability of crossing from state
s
to state
s
1
(picking the
ball from container
s
1
) is
a
11
=0,6 (
60%)
.

Similarly:

a
12
=0,2, a
13
=0,2,





a
11
+
a
12
+
a
13
=1,

a
22
=0,6 ,a
21
=0,3, a
23
=0,1,


a
22
+
a
21
+
a
23
=1
,

a
33
=0,7 ,
a
31
=0,1,
a
32
=0,2,


a
33
+
a
31
+
a
32
=1,














7
,
0
2
,
0
1
,
0
1
,
0
6
,
0
3
,
0
2
,
0
2
,
0
6
,
0
ij
a


There is a
100
%
probability that we we can get somewhere
from every state

either by picking the ball or by crossing to
another container.


In container
s
1
there are 2 balls labeled as
A
, so the
probability of picking the ball
A

will be

:
b
1
(A)=2/7
.

Similarly:


b
1
(B)=2/7, b
1
(C)=1/7, b
1
(D)=2/7,



b
1
(A)+ b
1
(B)+
b
1
(C)+ b
1
(D
)=1,

b
2
(A)=3/7, b
2
(B)=1/7, b
2
(C)=2/7, b
2
(D)=1/7,

b
2
(A)+ b
2
(B)+
b
2
(C)+ b
2
(D)=1,

b
3
(A)=2/7, b
3
(B)=3/7, b
3
(C)=1/7, b
3
(D)=1/7,

b
3
(A)+ b
3
(B)+
b
3
(C)+ b
3
(D)=1
.















7
1
7
1
7
2
7
1
7
3
7
2
7
2
7
1
7
3
7
1
7
2
7
2
ij
b


Everytime, only 1 ball is picked up and then it is returned
back.


Starting con
tainer is chosen and one ball is picked. That ball
is then returned back. Sequence of states and pickings is
being recorded.

-

state:


s
1
,


-

choice:
C
.



Fig.
2 Markov
´s
model
for choice of first ball


Arrow is spinned and the re
sult will determine if the ball will
be picked up, or if we cross to next container. Probability of
picking the ball is 3 times higher than probability of crossing
to next container so the sequence will be:

-

s
1
, s
1
, s
1
, s
1
,

-

C, A, C, D
.

Next picking of ball
and after spinning crossing to
s
2
(fig.3).

Sequence:

-

s
1
, s
1
, s
1
, s
1
, s
1,
s
1
,

-

C, A, C, D, D, C
.



Fig
. 3
Markov´s model for crossing to

s
2


SCYR 2010

-

10
th
Scientific Conference of Young Researchers

FEI TU of Ko
šice

Choice from z

s
2

(fig.4):

-

s
1
, s
1
, s
1
, s
1
, s
1,
s
1
, s
2
,

-

C, A, C, D, D, C, A
.


Fig
. 4 Markov
´s
model
picking from

s
2


Rep
eating this process we will get to final form of sequence
of states and choices:

-

s
1
, s
1
, s
1
, s
1
, s
1,
s
1
, s
2,
s
2,
s
2,
s
2,
s
2,
s
3,
s
3,
s
3,
s
3,
s
3,
s
3,
s
1
,

-

C, A, C, D, D, C, A, C, C, C, A, A, A, A, A, C, B, C
.


However, if we use hidden Markov´s model we don´
t see
the sequence of states, only the sequence of pickings:

C, A, C, D, D, C, A, C, C, C, A, A, A, A, A, C, B, C
.


For this sequence of picking we have to determine the
container from which they were picked. We will use
Markov´s model for hypothesis, whi
ch will most accurately
describe probabilities of crossings and pickings. We will
choose random hypothesis:

-

s
3
, s
2,
s
2,
s
2,
s
1
, s
1
, s
1
, s
2,
s
2,
s
2,
s
1
, s
1
, s
1
, s
3,
s
3,
s
2,
s
2,
s
3
.


picking

C

A

C

D

D

C

A



C

state

s
1

s
2

s
2

s
2

s
3

s
3

s
3

s
1

P
(picking
)

3
/7

3/7

1/7

2/7

1/7

1/7

2/7

...

3/7

p(crossing
)

2/10

6/10

6/10

3/10

6/10

6/10

2/10





P(s
3
, s
2,
s
2,
s
2,
s
1
, s
1
, s
1
, s
2,
s
2,
s
2,
s
1
, s
1
, s
1
, s
3,
s
3,
s
2,
s
2,
s
3
) =
1.6768x10
-
19
.

We will choose hypothesis which corresponds the reality:

-

s
1
, s
1
, s
1
, s
1
, s
1,
s
1
,
s
2,
s
2,
s
2,
s
2,
s
2,
s
3,
s
3,
s
3,
s
3,
s
3,
s
3,
s
1
.


Picking

C

A

C

D

D

C

A



C

State

s
1

s
1

s
1

s
1

s
1

s
1

s
2

s
1

p(picking
)

3/7

2
/7

3
/7

1
/7

1/7

3
/7

3
/7

...

3/7

p(crossing
)

6
/10

6/10

6/10

6
10

6/10

2
/10

6
/10





P
skut.
(s
1
, s
1
, s
1
, s
1
, s
1,
s
1
, s
2,
s
2,
s
2,
s
2,
s
2
,
s
3,
s
3,
s
3,
s
3,
s
3,
s
3,
s
1
) =
8.7350x10
-
16
.


Probability of hypothesis which corresponds to reality is
higher than probability of our chosen hypothesis. If we would
like to determine the most probable hypothesis, we would
have to count the probability of
all hypothesis, which amount
is very high. Because we don’t want to count the probabilities
of every hypothesis, we can use the Viterbi´s algorithm which
can determine the most probable option directly. Let´s
consider that we have partial sequences create
d by first n
containers, which ends with containers
s
1
,
s
2
a

s
3
.
Let´s
consider that we know their probabilities for specific
n
and
for
n+1
we will count
(fig.5)
:

p
s
3

= p
s
3
c.max( p
s
3
.p
s
3
s
3
, p
s
2
.p
s
2
s
3
, p
s
1
.p
s
1
s
3
)
,

p
s
2

= p
s
2
c.max( p
s
3
.p
s
3
s
2
, p
s
2
.p
s
2
s
2
, p
s
1
.p
s
1
s
2
)
,

p
s
1

= p
s
1
c.max( p
s
3
.p
s
3
s
1
, p
s
2
.p
s
2
s
1
,p
s
1
.p
s
1
s
1
)
.


Inductively (fig.6):

p = max( p
s
3
, p
s
2
, p
s
1

)
.


We have to count the sequence, so we will use the branch
(fig.7) with highest probability
u
nderlined
:

p
s
3

= p
s
3
c.max( p
s
3
.p
s
3
s
3
, p
s
2
.p
s
2
s
3
,
p
s
1
.p
s
1
s
3

)
,

p
s
2

= p
s
2
c.max( p
s
3
.p
s
3
s
2
,
p
s
2
.p
s
2
s
2
, p
s
1
.p
s
1
s
2
)
,

p
s
1

= p
s
1
c.max( p
s
3
.p
s
3
s
1
,
p
s
2
.p
s
2
s
1
,p
s
1
.p
s
1
s
1
)
.



Fig
.
5

Viterbi´s algorithm
.


Fig
.
6

Viterbi´s algorithm


next step


Fig
.
6

Viterbi´s algorithm


branch with highest probability

...A C C


C
A A
A...

n =
s
3

s
2

s
1

s
2

s
3

s
1

s
2

s
3

s
1

s
2

s
3

s
1

s
3

s
3

s
3

s
2

s
2

s
2

s
1

s
1

s
1

C

s
3

C

s
2

C

s
1

p
s
3

p
s
2

p
s
1

n = 10

p
s
3
´

p
s
2
´

p
s
1
´

n = 1

p
s
3

= p
s
3
C

p
s
2

= p
s
2
C

p
s
1

= p
s
1
C

C
A ... B C

n = L

p

s
3

s
2

s
1

s
3

s
2

s
1

...A C C


C
A A
A...

n =
s
3

s
2

s
1

s
2

s
3

s
1

s
2

s
3

s
1

s
2

s
3

s
1

s
3

s
3

s
3

s
2

s
2

s
2

s
1

s
1

s
1

C

s
3

C

s
2

C

s
1

p
s
3

p
s
2

p
s
1

n = 10

p
s
3
´

p
s
2
´

p
s
1
´

SCYR 2010

-

10
th
Scientific Conference of Young Researchers

FEI TU of Ko
šice


This way we can determine the most probable variant of
sequence from our example.


Sequences:

-

determined:

s
1
, s
1
, s
1
, s
1,
s
2,
s
2,
s
2,
s
2,
s
1,
s
3,
s
3,
s
3,
s
3,
s
3,

s
3,
s
1
, s
1
,
s
1
,

-

real:

s
1
, s
1
, s
1
, s
1
, s
1,
s
1
, s
2,
s
2,
s
2,
s
2,
s
2,
s
3,
s
3,
s
3,
s
3,
s
3,

s
3,
s
1
.


Determined sequence doesn´t have to correspond reality
(like in our example) but it is the most probable variant.
When using speech recognition, mostly left
-
to
-
right Markov
´s
models are used. These are well suited for modeling
processes, which progression is connected with time. [4]



Fig.8 Model of word “STOP”


Figure (fig.8) shows the model of word “STOP” which is
constructed from 6 states, from which only states
s
2
,

s
3
,

s
4
and
s
5
are emitting states (states which are able to generate the
output vector of observations).

Each of these states corresponds to one articulatory position
of speech apparatus during creation of individual phonemes
of word “STOP”. Every state has it
s transitional probabilities
as well as function of separation of output probability.


For example, for phoneme
[s]
corresponds state
s
2
,
transitional probabilities
a
22
and
a
23
and function of output
probability separation
b
s
()
.
[2]

III.

C
ONCLUSION

Classifiers b
ased upon hidden Markov´s models are used
for speech recognition of isolated words. Their main
advantage lies in possibility of recognition of fluent speech
when it is possible to model separately input, middle and
output part of phoneme, which is very imp
ortant due to
coarticular effects.

A
CKNOWLEDGMENT

The paper has been prepared by the support of Slovak
grant projects
VEGA No. 1/0660/08,
KEGA 3/6386/08 and
KEGA
3/6388/08
.



References

[1]

J
.
Psutka
, “

Co
m
omunication with PC using spoken speech

ACADEMIA
, Praha,
19
95
.

[2]

J
.
Psutka
,

Speaking Czech with computer
” ACADEMIA, Praha, 200
6

[3]

L
.
R. Rabiner
,

A tutorial a hidden markov models and selected
applications in speech recognition”
.
Proceedings of the
IEEE, 1989
.


[4]

A
.
Lúčny
, “
Skryté markovove modely

,
http://www.microstep
-
mis.com/~andy/hmm2.ppt
.