# ON BAYESIAN NETWORKS

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

76 εμφανίσεις

SPEECH RECOGNITION BASED
ON BAYESIAN NETWORKS
WITH ENERGY AS AN
AUXILIARY VARIABLE

Jaume Escofet Carmona

IDIAP, Martigny, Switzerland

UPC, Barcelona, Spain

Contents

Bayesian Networks

Automatic Speech Recognition using
Dynamic BNs

Auxiliary variables

Experiments with energy as an auxiliary
variable

Conclusions

What is a Bayesian Network?

A BN is a type of graphical model composed of:

A directed acyclic graph (DAG)

A set of variables V = {v
1
,… ,v
N
}

A set of probability density functions P(v
n
|parents(v
n
))

Example:

P(V) = P(v
1
,v
2
,v
3
) = P(v
1
|v
2
)

P(v
2
)

P(v
3
|v
2
)

v
3

v
1

v
2

Joint distribution of V:

P(V) = P(v
n
|parents(v
n
))

P

n=1

N

Automatic Speech Recognition
(ASR)

Feature

extraction

Statistical

models

X = {x
1
,… ,x
T
}

LPC, MFCC,...

HMM, ANN,...

M
j

= argmax P(M
k
|X) = argmax P(X|M
k
)

P(M
k
)

P(X|M
k
) = p(x
t
|q
t
)

p(q
t
|q
t
-
1
)

M
j

M
1
: ‘cat’

M
2
: ‘dog’

M
K
: ‘tiger’

{M
k
}

{M
k
}

P

T

t=1

ASR with Dynamic Bayesian
Networks

acoustics

x
t

phone

q
t

/k/

/a/

/a/

/t/

Equivalent to a standard HMM

t = 1

t = 2

t = 3

t = 4

ASR with Dynamic Bayesian
Networks

P(q
t
| q
t
-
1

)

q
t
-
1

p(x
t
|q
t
=k) ~ N
x
(
m
k
,
S
k
)

q
t

x
t

x
t
-
1

Auxiliary information (1)

BNs
:

Flexibility

in defining dependencies between
variables

Energy

damage the system performance if it is
appended to the feature vector

BNs

allow us to use it in an
alternative

way
:

Conditioning the emission distributions upon this
auxiliary variable

Marginalizing it out in recognition

Auxiliary information (2)

p(x
t
| q
t
=k ,

a
t
=z) ~ N
x
(

m
k
+B
k

z

,

S
k
)

The value of a
t

affects

the value of x
t

q
t

a
t

x
t

Auxiliary information (3)

p(a
t
| q
t
=k) ~ N
a
(
m
ak
,

S
ak
)

p(x
t
| q
t
=k

,

a
t
=z) ~ N
x
(
m
k
+B
k
z

,

S
k
)

The value of the auxiliary

variable can be influenced

by the hidden state q
t

q
t

a
t

x
t

Auxiliary information (4)

q
t

a
t

x
t

Equivalent to appending the

auxiliary variable to the

feature vector

p(x
t
, a
t

|

q
t
=k) ~ N
xa
(
m
k
xa

,
S
k
xa

)

q
t

a
t

x
t

Hiding auxiliary information

We can also marginalize out (hide)

the auxiliary variable in recognition

Useful when:

It is noisy

It is not accessible

p(x
t
|q
t
) = p(x
t
|q
t
,a
t
)

p(a
t
|q
t
)da
t

Experimental setup

Isolated word recognition

Small vocabulary (75 words)

Feature extraction: Mel Frequency
Cepstral Coefficients (MFCC)

p(x
t
|q
t
) modeled with 4 mixtures of
gaussians

p(a
t
|q
t
) modeled with 1 gaussian

Experiments with Energy
as an auxiliary variable

WER

Observed Energy Hidden Energy

System 1

6.9 %

5.3 %

System 2

6.1 %

5.6 %

System 3

5.8 %

5.9 %

Baseline

5.9 %

E = log s
2
[n]w
2
[n]

S

N

n=1

Baseline

System 1

System 2

System 3

Conclusions

BNs are more flexible than HMMs. You
can easily:

Change the topology of the distributions

Hide variables when necessary

Energy can improve the system
performance if used in a non
-