The Comparison of Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model

movedearΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

96 εμφανίσεις

The Comparison of Vector Quantization Algoritms in Fish Species Acoustic
Voice Recognition Using Hidden Markov Model

Diponegoro A.D
1)
. and Fawwaz Al Maki. W
1)

1)
Department Electrical Enginering, University of Indonesia, Indonesia


Abstract—
The implementation of Vector
Quantization (VQ) in the Fish acoustic voice
recognition using Hidden Markov Model (HMM)
was to reduce the memory capacity and to reduce
the computation time. There were three kinds of
VQ algorithms that implemented in the fish voice
recognition namely Traditional K-Means
Clustering, LBG (Linde, Buzo, and Gray), and
Successive Binary Split. In the vf recognition
processing the input of fish voice waveform was
converted to the descrete signal and atracted to
obtain its spectrum characteristic using Mel
Frequency Cepstrum Coefficient (MFCC). The
vector components of fish voice spectrum were
quantized using three kind of VQ algoritms. The
performance of these VQ algoritms were examined
during fish voice recognition processing by means
of HMM. Based on the experiment result the
Sucessive Binary Split algorithm was the
optimum algorithm because its algorithm had the
higest accuracy compared to the two other
algorithms. During the recognition processing, the
Sucessive Binary Split algorithm required the
lowest memory capasity and time consumption.

Keywords---Vector quantization, HMM, Fish
acoustic voice

II. I
NTRODUCTION

Every kind of Soniferous fishes are able
to produce the specific acoustic voice that
distinguish them from their species and
introduce their behaviour such as courtship
behaviour [1], mating behaviour [2], spawning
behaviour [3] [4], and reproductive behaviour
[5]. During the recognition processing, the
wave characteristic of the observed fish were
compared to the number of wave characteristics
of a number of fish voices in a data base. In
case the number of fishes were so big therefore
to search the vector components in data base
need the long time computation. To solve such
problem, the nearest vector component of every
spectrals were combined into one value that was
called centroid or codeword. The combination of
several vector components to one value of
codeword were processed by means of VQ
algorithm. There were three kind of VQ
algorithms namely Traditional K-Means
Clustering, LBG, and Sucessive Binary Split.
From the three VQ algorithms which one was


the optimum performances in term of the
smalest memory capacity, the shortest
computation time and also the importance term
was to obtain the highest accuracy recognition
result.
II. V
ECTOR
Q
UANTIZATION


The vector components of the
extracted fish voice spectrums were mapping
from a large vector space to the finite number of
region space. Each region was called a cluster. In
a cluster the vector components were called as
the sample points. The nearest-neighbor sample
points were quantized to a centroid or a
codeword by means of VQ quantization (see Fig.
1) . The distance between the sample points to its
centroid called VQ distortion . Increasing the
number of sample points caused the distance of
the VQ distortion became smaller it means that
the accuracy became higher. In the certain
number of sample points (vector component), if
the VQ distortion were small then it required the
big number of centroids. it means that the
computation time became longer and the storage
capacity became bigger. It also depend on the
number of attracted waves. The relation
between VQ distortion and the acoustic waves
were depend on the number of extracted waves
that were produced from the concerned acoustic
wave. If the acoustic waves of every kind of
observed fishes had the big differences each
others, the duration time of extracted waves were
longer than the duration time of extracted wave
if he acoustic wave of every kind of fishes had
nearly same each others. The method of VQ
algorithms would determine the performances of
fish species recognition based on the fishes
acoustic voices that were produced. The VQ
algorithms were used in this paper K-Means
Clustering (Traditional K-Means Clustering),
Sucessive Binary Split (Binary Split), dan LBG
(Linde, Buzo, and Gray).

A. K-Means Clustering algorithm [7]
K-Means Clustering algorithm was
used the iteration method to built the
codewords. The procedure of K-Means
Clustering algorithm was explained in The flow
chart that shown in Fig. 2.



Fig. 1. VQ processing [6]






























Fig. 2. Flow chart of K-Means Clustering algorithm


B. Sucessive Binary Split algorithm [7]

. In the Binary Split algorithm the initial
codebook are set at the random value M. The
Sucessive Binary Split algorithm procedure was
shown in a flow chart Fig. 3.

C. . LBG algorithm [7] [8]

The LBG algorithm procedure was shown
in Fig. 4. Spliting each current codebook C
m

according to the rule

)1( ε+=
+
mm
CC


)1( ε+=

mm
CC

where ε is a spliting parameter (choose ε = 0.01)



























Fig. 3. Flow chart of
Sucessive Binary Split
algorithm























Fig. 4. Flow chart of LBG algorithm [8]
Determine initial
codeword
cluster vector
Fine codeword
Update codeword
Compute distortion
(
D
)

D < D’
End
Start
Determine initial
codebook
establish new
codeword with
centorid and cluster
Quantize all the
training vector
Determine centroid
of new cluster
D-D’< t
End
Compute Distortion
(D)
Yes
m < M
No
no
start
III. R
ECOGNITION
P
ROCESSING

In the recognition processing, the extracted
wave of the observed fishes acoustic voice were
determined its characteristics (vector
components and HMM parameters) based on the
characteristic in data base. The comparison
results between the observed fish acoustic voice
characteristic and the fish acoustic voice
characteristic in data base would be used to
recognize the name of observed species fish. In
the recognition processing, the kind of fish that
had the highest log-probability value that used
to decide the name of the observed fish. The
block diagram of recognition processing was
shown in Fig. 5.











Fig. 5. Recognition processing procedure

The notation of HMM can be writen as followed
[9]
λ = (A, B, π) (1)

where A = a
ij
= P[q
t+1
= j|q
t
= i] is state-transition
probability
B = b
j
= P[o
t
= v
k
| q
t
= j] is Observation
symbol probability distribution.
π = {π
j
}= P[q
1
=i] is the initial state
distribution.
The observation sequence is given by

O = (o
1
o
2
..... o
T
) (2)

The staet sequence is given by

q = (q
1
q
2
....... q
T
) (3)

The HMM probability (log of probability) is
given by
P(O|λ) = Σ P(O|q, λ)P(q|λ) (4)

Where the probability of the observation
sequence can be writen as

P(O|q, λ) =
b
q
1
(
o
1
)
.
b
q
2
(
o
2
) .... b
q
T
(
o
T
)
(5)

And the probability of a state sequence q can be
writen as

P(q|λ) = π
q1
a
q1 q2
a
q2 q3
...... a
qT-1 qT
(6)
IV. E
XPERIMENT

RESULT


The fish species were used in this
experiments coonsisted of 5 (five) kind of fish
accoustic voice namely :
- Cynoscion regalis – drumming
- Cynoscion regalis – chattering
- Conodon nobilis
- Opsanus tau
- Cynoscion jamaicensis
Every 5 (five) kind of fishes accoustic
voice were segmented into 60 (sixty) burst of
the extracted wave in a certain time period. The
training processing were excecuted for 12
(twelve) times. In this experiment the time
period (duration time) of the extracted waves
were implemented for 3 (three) duration times
namely
1) The duration time less than 0.4 second
2) The duration time between 0.6 to 2.3
second
3) The above duration times combined to
the random duration time of burst
The dimension of codebook that were applied in
this experiment were excecuted for 3 (three)
sizes namely
1) 32 bit size of codebook
2) 64 bit size of codebook
3) 128 bit size of codebook

A. The accuracy level performance
The experiment performed the accuracy
level of each VQ algorithms The results were
shown in Table I to Table III,

TABLE I.
The accuracy level of ( %) fish voice recognition for
Traditional K-Means Clustering algorithm
Accuracy level
(%)
Number
of
iteration
Codebook
size

0.4 s 2.3 s Comb
32 23,33 20 20
64 33,33 46,67 36,67
10

128 50 53,33 60
30 32 26,67 26,67 36,67

TABLE II.
The accuracy level of ( %) fish voice recognition for
. Sucessive Binary Split algorithm.
Accuracy level
(%)
Number
of
iteration
Code book
size

0.4 s 2.3 s Comb
32 46,67 63,33 56,67

10
64 80 76,67 83,33
30 32 53,33 70 56,67

From the tables, it could be showed that LBG
algoritm was most accurate compared to the two
others algorithms for combination burst, the
highest codebook size and for 10 times of
iteration cycle.
Fish
Acous
tic

VQ
Discrete
Signal
Process
HMM
for
trainin
g

Data
base
Fish
Acous
tic

VQ
Discrete
Signal
Process
HMM
for
reco
g


Deci
tion
TABLE III.
The accuracy level of ( %) fish voice recognition for
. LBG algorithm.
Accuracy level (%)
Number
of
iteration
Code book
size

04 s 2.3 s Comb
32 40 43,33 50
64 63,33 60 56,67

10
128 86,67 86,67 90
30 32 46,67 46,67 56,67

B. Relative time consumption
The time consumption were measured
based on the cumpoter time started from entering
the data until the results were diplayed
completely on the monitor.

Relative time calculation results of HMM
training for each VQ algorithms were shown in Table
IV. In the table showed that LBG algorithm
consummated the smalest excecution time.

TABLE IV.
Excecution time of each VQ algoritms for codebook
size and number of iteration


C. VQ distortion
The VQ distortion for several codebooks size
and number of iteration were shown in Table V
In the table shows that the VQ distortion became
smaller for the bigger codebook size and also for
the bigger number of iteration.

TABLE V.
VQ distortion for several codebook size and number
of iteration
Number of
Iteration
Codebook
size
VQ
distortion
2 3.922
4 1.594
8 0.927
16 0.571
32 0.388
64 0.277



10
128 0.204
2 3.922
4 1.594
8 0.928
16 0.561


30
32 0.383


Based on the above results , at the same
value of repetition and at the same duration time,
mainly the increasing the size of codebook
would increase the recognition accuracy . Such
case happened because the increasing the
number of codeword in a codebook, the
consequences that the distant of VQ distortion
became smaller. It means that the probability of
error also became higher.

V.
C
ONCLUTION

Based on the results LBG algorithm was the smallest
excecution time, and also
LBG algoritm was most
accurate compared to the two others algorithms
for combination burst type, the highest codebook
size and for 10 times of iteration.


REFERENCES

[1] Gerald, J. W., “sound production during courtship
in six species of sunfish”, Evolution 25: 75-87,

1971.
[2] Fine, M. L., “Seasonal and geographical variation
of Matting call of oyster toad-fish”, Oecologia,
36: 45-47,1978.
[3] Philip S Lobel, “sound produced by spawning
fish”, Environmental Biology of fishes 33,351-
358, 1992
[4] Lugli Marco, Gianni Pavan, Ptrizia Torricelli,
Laura Bobbio, “Spawning vocalization in male
freshwater gobiids”, Environmental Biology of
fishes 43:219-231, 1995.
[5] Stout J. F., “ Sound communication during the
reproductive behavior of Notropis analostanus”,
Amer. Midle Nat 94: 296-325, 1975.
[6] Batri, Nadim, “Robust Spectral Parameter
Coding in Speech Recognition”, Thesis,
Department of Electrical Engineering, McGill
University, Montreal, Canada, 1998.
[7] Thomas M Parks, ”Vector Quantization
Codebook Design Using Neural Networks”, Air
Force Office of Scientific Research
(AFOSR/JSEP), December 1990.
[8]

Liu, Zhongmin, Yin, Qizhang, Zhang, Weimin,
“A Speaker Identification and Verification
System”, EEL6586 Final Project, 2002
[9] Rabiner, L, Juang, Bing Hwang, “Fundamentals
of Speech Recognition”, Prentice Hall, Inc., New
Jersey, 1993.













The relatif Time
consumption of HMM
training processing
Number
of
iteration
Baum
Welch
Code-
Book
size
Trad. K
Means
Clust.
LBG
Succ.
Binary
Split
32 42,05 34,75 37,17
64 55,56 51,84 53,42

10
128 92,53 86,11 -
30 32 123,79 108,14 111,76















































































































TABLE IV.


Accuracy level (%) for
berbagai jenis sinyal yang
digunakan sebagai
masukan
Number of
iteration
Baum Welch
Algorithm

B P C
10 23,33 20 20
40 26,67 26,67 26,67