The Comparison of Vector Quantization Algoritms in Fish Species Acoustic

Voice Recognition Using Hidden Markov Model

Diponegoro A.D

1)

. and Fawwaz Al Maki. W

1)

1)

Department Electrical Enginering, University of Indonesia, Indonesia

Abstract—

The implementation of Vector

Quantization (VQ) in the Fish acoustic voice

recognition using Hidden Markov Model (HMM)

was to reduce the memory capacity and to reduce

the computation time. There were three kinds of

VQ algorithms that implemented in the fish voice

recognition namely Traditional K-Means

Clustering, LBG (Linde, Buzo, and Gray), and

Successive Binary Split. In the vf recognition

processing the input of fish voice waveform was

converted to the descrete signal and atracted to

obtain its spectrum characteristic using Mel

Frequency Cepstrum Coefficient (MFCC). The

vector components of fish voice spectrum were

quantized using three kind of VQ algoritms. The

performance of these VQ algoritms were examined

during fish voice recognition processing by means

of HMM. Based on the experiment result the

Sucessive Binary Split algorithm was the

optimum algorithm because its algorithm had the

higest accuracy compared to the two other

algorithms. During the recognition processing, the

Sucessive Binary Split algorithm required the

lowest memory capasity and time consumption.

Keywords---Vector quantization, HMM, Fish

acoustic voice

II. I

NTRODUCTION

Every kind of Soniferous fishes are able

to produce the specific acoustic voice that

distinguish them from their species and

introduce their behaviour such as courtship

behaviour [1], mating behaviour [2], spawning

behaviour [3] [4], and reproductive behaviour

[5]. During the recognition processing, the

wave characteristic of the observed fish were

compared to the number of wave characteristics

of a number of fish voices in a data base. In

case the number of fishes were so big therefore

to search the vector components in data base

need the long time computation. To solve such

problem, the nearest vector component of every

spectrals were combined into one value that was

called centroid or codeword. The combination of

several vector components to one value of

codeword were processed by means of VQ

algorithm. There were three kind of VQ

algorithms namely Traditional K-Means

Clustering, LBG, and Sucessive Binary Split.

From the three VQ algorithms which one was

the optimum performances in term of the

smalest memory capacity, the shortest

computation time and also the importance term

was to obtain the highest accuracy recognition

result.

II. V

ECTOR

Q

UANTIZATION

The vector components of the

extracted fish voice spectrums were mapping

from a large vector space to the finite number of

region space. Each region was called a cluster. In

a cluster the vector components were called as

the sample points. The nearest-neighbor sample

points were quantized to a centroid or a

codeword by means of VQ quantization (see Fig.

1) . The distance between the sample points to its

centroid called VQ distortion . Increasing the

number of sample points caused the distance of

the VQ distortion became smaller it means that

the accuracy became higher. In the certain

number of sample points (vector component), if

the VQ distortion were small then it required the

big number of centroids. it means that the

computation time became longer and the storage

capacity became bigger. It also depend on the

number of attracted waves. The relation

between VQ distortion and the acoustic waves

were depend on the number of extracted waves

that were produced from the concerned acoustic

wave. If the acoustic waves of every kind of

observed fishes had the big differences each

others, the duration time of extracted waves were

longer than the duration time of extracted wave

if he acoustic wave of every kind of fishes had

nearly same each others. The method of VQ

algorithms would determine the performances of

fish species recognition based on the fishes

acoustic voices that were produced. The VQ

algorithms were used in this paper K-Means

Clustering (Traditional K-Means Clustering),

Sucessive Binary Split (Binary Split), dan LBG

(Linde, Buzo, and Gray).

A. K-Means Clustering algorithm [7]

K-Means Clustering algorithm was

used the iteration method to built the

codewords. The procedure of K-Means

Clustering algorithm was explained in The flow

chart that shown in Fig. 2.

Fig. 1. VQ processing [6]

Fig. 2. Flow chart of K-Means Clustering algorithm

B. Sucessive Binary Split algorithm [7]

. In the Binary Split algorithm the initial

codebook are set at the random value M. The

Sucessive Binary Split algorithm procedure was

shown in a flow chart Fig. 3.

C. . LBG algorithm [7] [8]

The LBG algorithm procedure was shown

in Fig. 4. Spliting each current codebook C

m

according to the rule

)1( ε+=

+

mm

CC

)1( ε+=

−

mm

CC

where ε is a spliting parameter (choose ε = 0.01)

Fig. 3. Flow chart of

Sucessive Binary Split

algorithm

Fig. 4. Flow chart of LBG algorithm [8]

Determine initial

codeword

cluster vector

Fine codeword

Update codeword

Compute distortion

(

D

)

D < D’

End

Start

Determine initial

codebook

establish new

codeword with

centorid and cluster

Quantize all the

training vector

Determine centroid

of new cluster

D-D’< t

End

Compute Distortion

(D)

Yes

m < M

No

no

start

III. R

ECOGNITION

P

ROCESSING

In the recognition processing, the extracted

wave of the observed fishes acoustic voice were

determined its characteristics (vector

components and HMM parameters) based on the

characteristic in data base. The comparison

results between the observed fish acoustic voice

characteristic and the fish acoustic voice

characteristic in data base would be used to

recognize the name of observed species fish. In

the recognition processing, the kind of fish that

had the highest log-probability value that used

to decide the name of the observed fish. The

block diagram of recognition processing was

shown in Fig. 5.

Fig. 5. Recognition processing procedure

The notation of HMM can be writen as followed

[9]

λ = (A, B, π) (1)

where A = a

ij

= P[q

t+1

= j|q

t

= i] is state-transition

probability

B = b

j

= P[o

t

= v

k

| q

t

= j] is Observation

symbol probability distribution.

π = {π

j

}= P[q

1

=i] is the initial state

distribution.

The observation sequence is given by

O = (o

1

o

2

..... o

T

) (2)

The staet sequence is given by

q = (q

1

q

2

....... q

T

) (3)

The HMM probability (log of probability) is

given by

P(O|λ) = Σ P(O|q, λ)P(q|λ) (4)

Where the probability of the observation

sequence can be writen as

P(O|q, λ) =

b

q

1

(

o

1

)

.

b

q

2

(

o

2

) .... b

q

T

(

o

T

)

(5)

And the probability of a state sequence q can be

writen as

P(q|λ) = π

q1

a

q1 q2

a

q2 q3

...... a

qT-1 qT

(6)

IV. E

XPERIMENT

RESULT

The fish species were used in this

experiments coonsisted of 5 (five) kind of fish

accoustic voice namely :

- Cynoscion regalis – drumming

- Cynoscion regalis – chattering

- Conodon nobilis

- Opsanus tau

- Cynoscion jamaicensis

Every 5 (five) kind of fishes accoustic

voice were segmented into 60 (sixty) burst of

the extracted wave in a certain time period. The

training processing were excecuted for 12

(twelve) times. In this experiment the time

period (duration time) of the extracted waves

were implemented for 3 (three) duration times

namely

1) The duration time less than 0.4 second

2) The duration time between 0.6 to 2.3

second

3) The above duration times combined to

the random duration time of burst

The dimension of codebook that were applied in

this experiment were excecuted for 3 (three)

sizes namely

1) 32 bit size of codebook

2) 64 bit size of codebook

3) 128 bit size of codebook

A. The accuracy level performance

The experiment performed the accuracy

level of each VQ algorithms The results were

shown in Table I to Table III,

TABLE I.

The accuracy level of ( %) fish voice recognition for

Traditional K-Means Clustering algorithm

Accuracy level

(%)

Number

of

iteration

Codebook

size

0.4 s 2.3 s Comb

32 23,33 20 20

64 33,33 46,67 36,67

10

128 50 53,33 60

30 32 26,67 26,67 36,67

TABLE II.

The accuracy level of ( %) fish voice recognition for

. Sucessive Binary Split algorithm.

Accuracy level

(%)

Number

of

iteration

Code book

size

0.4 s 2.3 s Comb

32 46,67 63,33 56,67

10

64 80 76,67 83,33

30 32 53,33 70 56,67

From the tables, it could be showed that LBG

algoritm was most accurate compared to the two

others algorithms for combination burst, the

highest codebook size and for 10 times of

iteration cycle.

Fish

Acous

tic

VQ

Discrete

Signal

Process

HMM

for

trainin

g

Data

base

Fish

Acous

tic

VQ

Discrete

Signal

Process

HMM

for

reco

g

Deci

tion

TABLE III.

The accuracy level of ( %) fish voice recognition for

. LBG algorithm.

Accuracy level (%)

Number

of

iteration

Code book

size

04 s 2.3 s Comb

32 40 43,33 50

64 63,33 60 56,67

10

128 86,67 86,67 90

30 32 46,67 46,67 56,67

B. Relative time consumption

The time consumption were measured

based on the cumpoter time started from entering

the data until the results were diplayed

completely on the monitor.

Relative time calculation results of HMM

training for each VQ algorithms were shown in Table

IV. In the table showed that LBG algorithm

consummated the smalest excecution time.

TABLE IV.

Excecution time of each VQ algoritms for codebook

size and number of iteration

C. VQ distortion

The VQ distortion for several codebooks size

and number of iteration were shown in Table V

In the table shows that the VQ distortion became

smaller for the bigger codebook size and also for

the bigger number of iteration.

TABLE V.

VQ distortion for several codebook size and number

of iteration

Number of

Iteration

Codebook

size

VQ

distortion

2 3.922

4 1.594

8 0.927

16 0.571

32 0.388

64 0.277

10

128 0.204

2 3.922

4 1.594

8 0.928

16 0.561

30

32 0.383

Based on the above results , at the same

value of repetition and at the same duration time,

mainly the increasing the size of codebook

would increase the recognition accuracy . Such

case happened because the increasing the

number of codeword in a codebook, the

consequences that the distant of VQ distortion

became smaller. It means that the probability of

error also became higher.

V.

C

ONCLUTION

Based on the results LBG algorithm was the smallest

excecution time, and also

LBG algoritm was most

accurate compared to the two others algorithms

for combination burst type, the highest codebook

size and for 10 times of iteration.

REFERENCES

[1] Gerald, J. W., “sound production during courtship

in six species of sunfish”, Evolution 25: 75-87,

1971.

[2] Fine, M. L., “Seasonal and geographical variation

of Matting call of oyster toad-fish”, Oecologia,

36: 45-47,1978.

[3] Philip S Lobel, “sound produced by spawning

fish”, Environmental Biology of fishes 33,351-

358, 1992

[4] Lugli Marco, Gianni Pavan, Ptrizia Torricelli,

Laura Bobbio, “Spawning vocalization in male

freshwater gobiids”, Environmental Biology of

fishes 43:219-231, 1995.

[5] Stout J. F., “ Sound communication during the

reproductive behavior of Notropis analostanus”,

Amer. Midle Nat 94: 296-325, 1975.

[6] Batri, Nadim, “Robust Spectral Parameter

Coding in Speech Recognition”, Thesis,

Department of Electrical Engineering, McGill

University, Montreal, Canada, 1998.

[7] Thomas M Parks, ”Vector Quantization

Codebook Design Using Neural Networks”, Air

Force Office of Scientific Research

(AFOSR/JSEP), December 1990.

[8]

Liu, Zhongmin, Yin, Qizhang, Zhang, Weimin,

“A Speaker Identification and Verification

System”, EEL6586 Final Project, 2002

[9] Rabiner, L, Juang, Bing Hwang, “Fundamentals

of Speech Recognition”, Prentice Hall, Inc., New

Jersey, 1993.

The relatif Time

consumption of HMM

training processing

Number

of

iteration

Baum

Welch

Code-

Book

size

Trad. K

Means

Clust.

LBG

Succ.

Binary

Split

32 42,05 34,75 37,17

64 55,56 51,84 53,42

10

128 92,53 86,11 -

30 32 123,79 108,14 111,76

TABLE IV.

Accuracy level (%) for

berbagai jenis sinyal yang

digunakan sebagai

masukan

Number of

iteration

Baum Welch

Algorithm

B P C

10 23,33 20 20

40 26,67 26,67 26,67

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο