ON BAYESIAN NETWORKS

ocelotgiantAI and Robotics

Nov 7, 2013 (3 years and 11 months ago)

70 views

SPEECH RECOGNITION BASED
ON BAYESIAN NETWORKS
WITH ENERGY AS AN
AUXILIARY VARIABLE

Jaume Escofet Carmona


IDIAP, Martigny, Switzerland

UPC, Barcelona, Spain

Contents


Bayesian Networks


Automatic Speech Recognition using
Dynamic BNs


Auxiliary variables


Experiments with energy as an auxiliary
variable


Conclusions

What is a Bayesian Network?

A BN is a type of graphical model composed of:



A directed acyclic graph (DAG)



A set of variables V = {v
1
,… ,v
N
}



A set of probability density functions P(v
n
|parents(v
n
))

Example:

P(V) = P(v
1
,v
2
,v
3
) = P(v
1
|v
2
)

P(v
2
)

P(v
3
|v
2
)

v
3

v
1

v
2

Joint distribution of V:

P(V) = P(v
n
|parents(v
n
))

P

n=1

N

Automatic Speech Recognition
(ASR)

Feature

extraction

Statistical

models

X = {x
1
,… ,x
T
}

LPC, MFCC,...

HMM, ANN,...

M
j

= argmax P(M
k
|X) = argmax P(X|M
k
)


P(M
k
)

P(X|M
k
) = p(x
t
|q
t
)


p(q
t
|q
t
-
1
)

M
j

M
1
: ‘cat’

M
2
: ‘dog’



M
K
: ‘tiger’

{M
k
}

{M
k
}

P

T

t=1

ASR with Dynamic Bayesian
Networks

acoustics

x
t

phone

q
t

/k/

/a/

/a/

/t/

Equivalent to a standard HMM

t = 1

t = 2

t = 3

t = 4

ASR with Dynamic Bayesian
Networks

P(q
t
| q
t
-
1

)

q
t
-
1

p(x
t
|q
t
=k) ~ N
x
(
m
k
,
S
k
)

q
t

x
t

x
t
-
1

Auxiliary information (1)


Main advantage of
BNs
:


Flexibility

in defining dependencies between
variables


Energy

damage the system performance if it is
appended to the feature vector


BNs

allow us to use it in an
alternative

way
:


Conditioning the emission distributions upon this
auxiliary variable


Marginalizing it out in recognition


Auxiliary information (2)

p(x
t
| q
t
=k ,

a
t
=z) ~ N
x
(

m
k
+B
k

z

,

S
k
)

The value of a
t

affects

the value of x
t

q
t

a
t

x
t

Auxiliary information (3)

p(a
t
| q
t
=k) ~ N
a
(
m
ak
,

S
ak
)

p(x
t
| q
t
=k

,

a
t
=z) ~ N
x
(
m
k
+B
k
z

,

S
k
)

The value of the auxiliary

variable can be influenced

by the hidden state q
t

q
t

a
t

x
t

Auxiliary information (4)

q
t

a
t

x
t

Equivalent to appending the

auxiliary variable to the

feature vector

p(x
t
, a
t

|

q
t
=k) ~ N
xa
(
m
k
xa

,
S
k
xa

)

q
t

a
t

x
t

Hiding auxiliary information

We can also marginalize out (hide)

the auxiliary variable in recognition


Useful when:



It is noisy



It is not accessible

p(x
t
|q
t
) = p(x
t
|q
t
,a
t
)

p(a
t
|q
t
)da
t



Experimental setup


Isolated word recognition


Small vocabulary (75 words)


Feature extraction: Mel Frequency
Cepstral Coefficients (MFCC)


p(x
t
|q
t
) modeled with 4 mixtures of
gaussians


p(a
t
|q
t
) modeled with 1 gaussian

Experiments with Energy
as an auxiliary variable


WER


Observed Energy Hidden Energy


System 1



6.9 %


5.3 %


System 2



6.1 %


5.6 %


System 3



5.8 %


5.9 %


Baseline



5.9 %


E = log s
2
[n]w
2
[n]

S

N

n=1

Baseline

System 1

System 2

System 3

Conclusions


BNs are more flexible than HMMs. You
can easily:


Change the topology of the distributions


Hide variables when necessary


Energy can improve the system
performance if used in a non
-
traditional
way

Questions?