BlackboardPolyphTranscrip

ocelotgiantΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

61 εμφανίσεις

Using Blackboard Systems for
Polyphonic Transcription

A Literature Review

by Cory McKay

Outline


Intro to polyphonic transcription


Intro to blackboard systems


Keith Martin’s work


Kunio Kashino’s work


Recent contributions


Conclusion

Polyphonic Transcription


Represent an audio signal as a score


Must segregate notes belonging to different
voices


Problems: variations of timbre within a
voice, voice crossing, identification of
correct octave


No successful general purpose system to
date

Polyphonic Transcription


Can use simplified models:


Music for a single instrument (e.g. piano)


Extract only a given instrument from mix


Use music which obeys restrictive rules


Simplified systems have had success rates
of between 80% and 90%


These rates may be exaggerated, since only
very limited testing suites generally used

Polyphonic Transcription


Systems to date generally identify only
rhythm, pitch and voice


Would like systems that also identify other
notated aspects such as dynamics and
vibrato


Ideal is to have system that can identify and
understand parameters of music that
humans hear but do not notate


Blackboard Systems


Used in AI for decades but only applied to music
transcription in early 1990’s


Term “blackboard” comes from notion of a group
of experts standing around a blackboard working
together to solve a problem


Each expert writes contributions on blackboard


Experts watch problem evolve on blackboard,
making changes until a solution is reached

Blackboard Systems


“Blackboard” is a central dataspace


Usually arranged in hierarchy so that input is at
lowest level and output is at highest


“Experts” are called “knowledge sources”


KSs generally consist of a set of heuristics and a
precondition whose satisfaction results in a
hypothesis that is written on blackboard


Each KS forms hypotheses based on information
from front end of system and hypotheses
presented by other KSs


Blackboard Systems


Problem is solved when all KSs are satisfied
with all hypotheses on blackboard to within
a given margin of error


Eliminates need for global control module


Each KS can be easily updated and new
KSs can be added with little difficulty


Combines top
-
down and bottom
-
up
processing

Blackboard Systems


Music has a naturally hierarchal structure
that lends itself well to blackboard systems


Allow integration of different types of
expertise:


signal processing KSs at low level


human perception KSs at middle level


musical knowledge KSs at upper level

Blackboard Systems


Limitation: giving upper level KSs too much
specialized knowledge and influence limits
generality of transcription systems


Ideal system would not use knowledge above the
level of human perception and the most
rudimentary understanding of music


Current trend is to increase significance of upper
-
level musical KSs in order to increase success rate


Keith Martin (1996 a)


“A Blackboard System for Automatic
Transcription of Simple Polyphonic Music”


Used a blackboard system to transcribe a four
-
voice Bach chorale with appropriate segregation
of voices


Limited input signal to synthesized piano
performances


Gave system only rudimentary musical
knowledge, although choice of Bach chorale
allowed the use of generally unacceptable
assumptions by lower level KSs

Keith Martin (1996 a)


Front
-
end system used short
-
time Fourier
transform on input signal


Equivalent to a filter bank that is a gross
approximation the way the human cochlea
processes auditory signals


Blackboard system fed sets of associated
onset times, frequencies and amplitudes


Keith Martin (1996 a)


Knowledge sources made five classes of
hierarchally organized hypotheses:


“Tracks”


Partials


Notes


Intervals


Chords


Keith Martin (1996 a)


Three types of knowledge sources:


Garbage collection


Physics


Musical practice


Thirteen knowledge sources in all


Each KS only authourized to make certain
classes of hypotheses


Keith Martin (1996 a)


KSs with access to upper
-
level hypotheses can put
“pressure” on KSs with lower
-
level access to
make certain hypotheses and vice versa


Example:
if the hypotheses have been made that
the notes C and G are present in a beat, a KS with
information about chords might put forward the
hypothesis that there is a C chord, thus putting
pressure on other KSs to find an E or Eb.



Used a sequential scheduler to coordinate KSs

Keith Martin (1996 b)


“Automatic Transcription of Simple
Polyphonic Music: Robust Front End
Processing”


Previous system often misidentified octaves


Attempted to improve performance by
shifting octave identification task from a
top
-
down process to a bottom
-
up process

Keith Martin (1996 b)


Proposes the use of log
-
lag correlograms in front
end


Models the inner hair cells in the cochlea with a
bank of filters


Determines pitch by measuring the periodic
energy in each filter channel as a function of lag


Correlograms now basic unit fed to blackboard
system


No definitive results as to which approach is better


Kashino, Nadaki, Kinoshita and
Tanaka (1995)


“Application of Bayesian Probability Networks to
Music Scene Analysis”


Work slightly preceded that of Martin


Used test patterns involving more than one
instrument


Uses principles of stream segregation from
auditory scene analysis


Implements more high
-
level musical knowledge


Uses Bayesian network instead of Martin’s simple
scheduler to coordinate KSs

Kashino, Nadaki, Kinoshita and
Tanaka (1995)


Knowledge sources used:


Chord transition dictionary


Chord
-
note relation


Chord naming rules


Tone memory


Timbre models


Human perception rules


Used very specific instrument timbres and musical
rules, so has limited general applicability


Kashino, Nadaki, Kinoshita and
Tanaka (1995)


Tone memory: frequency components of
different instruments played with different
parameters


Found that the integration of tone memory
with the other KSs greatly improved
success rates


Kashino, Nadaki, Kinoshita and
Tanaka (1995)


Bayesian networks well known for finding good
solutions despite noisy input or missing data


Often used in implementing learning methods that
trade off prior belief in a hypothesis against its
agreement with current data


Therefore seem to be a good choice for
coordinating KSs


Kashino, Nadaki, Kinoshita and
Tanaka (1995)


No experimental comparisons of this
approach and Martin’s simple scheduler


Only used simple test patterns rather than
real music


Kashino and Hagita (1996)


“A Music Scene Analysis System with the MRF
-
Based Information Integration Scheme”


Suggests replacing Bayesian networks with
Markov Random Field hypothesis network


Successful in correcting two most common
problems in previous system:


Misidentification of instruments


Incorrect octave labelling

Kashino and Hagita (1996)


MRF
-
based networks use simulated annealing to
converge to a low
-
energy state


MRF
approach enables information to be
integrated on a multiply connected hypothesis
network


Bayesian networks only allow singly connected
networks


Could now deal with two kinds of transition
information within a single hypothesis network:



chord transitions


note transitions

Kashino and Hagita (1996)


Instrument and octave identification errors
corrected, but some new errors introduced


Overall, performed roughly 10% better than
Bayesian
-
based system at transcribing 3
-
part arrangement of
Auld Lang Syne


Still only had a recognition rate of 71.7%

Kashino and Murase (1998)


Shifts some work away from blackboard system
by feeding it higher
-
level information


Simplifies and mathematically formalizes notion
of knowledge sources


Switches back to Bayesian network


Perhaps not truly a blackboard system anymore


Has very good recognition rate


Scalability of system is seriously compromised by
new approach

Kashino and Murase (1998)


Uses adaptive template matching


Implemented using a bank of filters
arranged in
parallel and a number of
templates corresponding to particular notes
played by particular instruments


The correlation between the outputs of the
filters is calculated and a match is then
made to one of the templates

Kashino and Murase (1998)


Achieved recognition rate of 88.5% on real
recordings of piano, violin and flute


Including templates for many more instruments
could make adaptive template matching intractable


Particularly a problem for instruments with


Similar frequency spectra


A great deal of spectral variation from note to note

Hainsworth and Macleod (2001)


“Automatic Bass Line Transcription from
Polyphonic Music”


Wanted to be able to extract a single given
instrument from an arbitrary musical signal


Contrast to previous approaches of using
recordings of only one instrument or a set of
pre
-
defined instruments


Hainsworth and Macleod (2001)


Chose to work with bass


Can filter out high frequencies


Notes usually fairly steady


Used simple mathematical relations to trim
hypotheses rather than a true blackboard
system


Had a 78.7% success rate on a Miles Davis
recording

Bello and Sandler (2000)


“Blackboard Systems and Top
-
Down Processing
for the Transcription of Simple Polyphonic
Music”


Return to a true blackboard system


Based on Martin’s implementation, using a
conventional scheduler


Refines knowledge sources and adds high
-
level
musical knowledge


Implements one of knowledge sources as a neural
network

Bello and Sandler (2000)


The chord recognizer KS is a feedworard network


Trained using the spectrograph of different chords
of a piano


Trained network fed a spectrograph and outputs
possible chords


Can therefore output more than one hypothesis at
each iteration


Gives other KSs more information and allows
parallel exploration of solution space

Bello and Sandler (2000)


Could automatically retrain network to recognize
spectrograph of other instruments with no manual
modifications needed


Preliminary testing showed tendency to
misidentify octaves and make incorrect
identification of note onsets


These problems could potentially be corrected by
signal processing system that feeds blackboard
system

Conclusions


Bass transcription system and more recent
work of Kashino useful for specific
applications, but limited potential for
general transcription purposes


True blackboard approach scales well and
appears to hold the most potential for
general
-
purpose polyphonic transcription

Conclusions


Use of adaptive learning in knowledge
sources seems promising


Interchangeable modules could be
automatically trained to specialize in
different areas


Could have semi
-
automatic transcription,
where user chooses correct modules and
system performs transcription using them