echolocation calls from field recordings

unknownlippsAI and Robotics

Oct 16, 2013 (4 years and 27 days ago)

61 views

Automatic detection of microchiroptera
echolocation calls from field recordings
using machine learning algorithms

Mark D. Skowronski and John G. Harris

Computational Neuro
-
Engineering Lab

Electrical and Computer Engineering

University of Florida, Gainesville, FL, USA

May 19, 2005

Overview


Motivations for acoustic bat detection


Machine learning paradigm


Detection experiments


Conclusions


Bat detection motivations


Bats are among the most diverse yet
least
studied

mammals (~25% of all mammal
species are bats).


Bats affect
agriculture

and carry
diseases

(directly or through parasites).


Acoustical domain is significant for
echolocating bats and is
non
-
invasive
.


Recorded data can be volumous


automated

algorithms for objective and
repeatable detection & classification desired.


Conventional methods


Conventional bat detection/classification parallels
acoustic
-
phonetic

paradigm of automatic
speech recognition from 1970s.


Characteristics of acoustic phonetics:


Originally mimicked human expert methods


First,
boundaries

between regions determined


Second,
features

for each region were extracted


Third, features compared with decision trees, DFA


Limitations:


Boundaries ill
-
defined, sensitive to noise


Many feature extraction algorithms with varying
degrees of noise robustness


Machine learning


Acoustic phonetics gave way to machine
learning for ASR in 1980s:


Advantages:


Decisions based on
more information


Mature statistical foundation for algorithms


Frame
-
based features, from expert knowledge


Improved
noise robustness


For bats: increased detection
range


Detection experiments


Database of bat calls


7 different recording sites, 8 species


1265 hand
-
labeled calls (from spectrogram
readings)


Detection experiment design


Discrete events: 20
-
ms bins


Discrete outcomes: Yes or No: does a bin
contain any part of a bat call?

Detectors


Baseline


Threshold for frame energy


Gaussian mixture model (GMM)


Model of
probability distribution

of call features


Threshold for model output probability


Hidden Markov model (HMM)


Similar to GMM, but includes temporal constraints
through
piecewise
-
stationary states


Threshold for model output probability along Viterbi
path

Feature extraction


Baseline


Normalization: session noise floor at 0 dB


Feature: frame power


Machine learning


Blackman window, zero
-
padded FFT


Normalization: log amplitude mean subtraction


From ASR: ~cepstral mean subtraction


Removes transfer function of recording environment


Mean across time for each FFT bin


Features:


Maximum FFT amplitude, dB


Frequency at maximum amplitude, Hz


First and second temporal derivatives (slope, concavity)


Feature extraction examples

Feature extraction examples

Feature extraction examples

Six features: Power, Frequency,

P,

F

P,

F

Detection example

Experiment results

Experiment results

Conclusions




Machine learning algorithms improve detection
when specificity is high (>.6).


HMM slightly superior to GMM, uses more
temporal information, but slower to train/test.


Hand labels determined using spectrogram,
biased towards high
-
power calls.


Machine learning models applicable to other
species.

Bioacoustic applications




To apply machine learning to other species:


Determine ground truth training data through
expert hand labels


Extract relevant frame
-
based features, considering
domain
-
specific noise sources (echos, propellor
noise, other biological sources)


Train models of features from hand
-
labeled data


Consider training “silence” models for discriminant
detection/classification

Further information


http://www.cnel.ufl.edu/~markskow


markskow@cnel.ufl.edu


Acknowledgements

Bat data kindly provided by:

Brock Fenton, U. of Western Ontario, Canada