SAME OLD SONG

elbowcheepAI and Robotics

Oct 15, 2013 (3 years and 5 months ago)

112 views



8

Page(s)



SAME OLD SONG

LOUDNESS AND TEMPO A
CROSS FIVE DECADES O
F MUSIC


IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

2013
-
10
-
15



ABSTRACT
:

Music is a highly complex expression of culture, so it makes intuitive sense that music would evolve
alongside culture. This is anecdotally supported by clashes between generations that relentlessly defend the
music of their formative years while simultane
ously decrying the music of their ancestors and progeny. But
to what extent does music actually change in terms of the many complex characteristics that may be used
to understand music in a mathematically decomposable way?

This study attempts to explore t
he composition of music across five decades (1960s
-
2000s) along the
dimensions of loudness and tempo. A basic
knowledge discovery lifecycle

approach is used to analyze a
collection of one
-
million songs, and results are presented

that indicate distinct simi
larities in the way music
is composed across the evaluated dataset.

To conclude, a brief discussion of challenges related to the
mathematical evaluation of music is presented, along with a potential roadmap for future investigations.

KEYWORDS
:

Music Information Retrieval; Data Mining; Million Song Dataset

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

1
/
8


INTRODUCTION

Music is a highly complex artistic construct representing the confluence of acoustic science and human
expression that can say a great deal about the societies

from which it emerges. It seems trivial to state that
as societies evolve their music does as well. It seems equally trivial to observe, even if only anecdotally,
that each generation seems to prefer the music of their formative years over the “boring” mu
sic of their
parents or the “crazy” music of their progeny. But how much of this phenomenon is rooted in actual sonic
differences between various generations of music, and how much is just generational head
-
butting? Can
patterns be discerned in the way mus
ic changes over time? Or are the differences between songs negligible
when analyzed mathematically?

A great deal of work has been done to both create datasets of musical information, and to analyze the data
in meaningful ways. The Million Song Dataset (MSD
) is particularly noteworthy and well referenced in
this field
[2]
. The result of a collaborativ
e effort between Columbia University’s LabROSA and The Echo
Nest project, MSD is a freely available dataset containing metadata information (e.g., song and artist
names) as well as musical feature analysis data.

The MSD has been used for various academic
works, and researchers working with the dataset have fallen
into several camps. Specifically, there are those researchers interested in scalable algorithms for use with
high
-
dimensional data, such as spatial tree search techniques
[8]

and alternative machine learning
techniques
[4]
. A second


and perhaps more prolific


camp is interested in understanding the actual
structure of music in the dataset to various ends including the identification of certain genres of music
[3]
[7]
, the evolution of music in certain cultures
[10]
, and general analysis of m
usic
[9]
.

In essence, this work seeks to extend the 2012 research conducted by Serrà, et al
[10]
, by looking at a
larger subset of the MSD and using more grounded data mining approaches to the analysis of the set, and
specifically the attributes of song loudn
ess and tempo (considered to be the musical characteristics at the
root of generational bickering regarding “turning that racket down” and why “old music is sooo slow”).

As may be hinted by the title of this work, it is anticipated that conflicts between g
enerations regarding
whose music is “better” stem from psychological and egocentric reasons (i.e., developed tastes and
arbitrary preferences), and that no significant differences will be found in the acoustical structure of music
across generations. It st
ands to reason that music


especially popular music


will tend to follow a basic
template of construction that appeals to a vast majority of people (hence its popularity), and that themes
and motifs in those templates are pervasive throughout time.

But w
hy bother? Music and musical preferences are arguably the purest argument for the adoption of an
interpretivist worldview and no manner of mathematical analysis will convince a Simon & Garfunkel fan
that The Transplants is a superior group, nor vice versa.

However, understanding the mathematical structure of music offers many potential advantages, especially
if patterns in those structures become discernable. Specifically, recommendation systems, which currently
IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

2
/
8


mainly operate according to crowd
-
sourced lis
tening behaviors, could potentially be augmented by
machine learning algorithms that “recognize” unique characteristics of specific types of music and can
find corresponding matches from among a corpus of new songs. These types of recommendations would
be
useful for services like Pandora Internet Radio, and many online music storefronts, and would offer the
advantage over other methods in that even unpopular (i.e., rarely purchased music) could be
deconstructed and analyzed.

Moreover, the ready availabili
ty of music creation tools (e.g., GarageBand, Reason, and Cakewalk) as
well Internet
-
based media sharing services (e.g., YouTube, and SoundCloud), have democratized the
production and distribution of music. Unfortunately, resources are widely considered to

have surpassed
ability, and the results of an in
-
depth analysis of the mathematical composition of popular music may be
the first step towards balancing scales. Specifically, one can envision the use of proven mathematical
patterns in music to guide the p
roduction of new songs, or correction tools that help artists know when
their music has veered far beyond the thresholds of what is considered listenable across generations.

METHODOLOGY

In order to guide this study, a basic
knowledge
discovery

lifecycle approach was followed [
Error!
Reference source not found.
].

The steps of such an
approach are highly iterative, a
nd require a flexible
toolset.
As such, this study used a series of tools,
both custom and off
-
the
-
shelf.

The data integration phase in particular presented
an interesting hurdle. Specifically,
downloading of
the nearly 3
00 GB MSD for local would not be
convenient


nor for some (this researcher
included) even feasible. To bypass this issue, an
Amazon Elastic Cloud Computing (EC2) instance
of Ubuntu Linux was stood up, and an image of
the MSD was

mounted using an Elastic Block
Store (EBS) volume.

The entire MSD dataset is archived as an HDF5
[6]

directory
-
file structur
e

with each file representing a
song. B
asic Linux command line tools were used to slice the dataset into a smaller test set

(i.e., a subset of
files)
.
The siz
e of the test set was arbitrary,

though purposefully small (approximately 10,000 songs) to
reduce
the amount of time necessary
to perform preliminary investigations of the file structures and
attributes used within the MSD
.
Once this test set was prepared, a number of custom Python scripts were
written to begin preliminary dissection of the 51 attribut
es of each MSD
song
record
[5]
.

Knowledge!

Integration

Cleaning

Mining

Analysis

Figure
1
: The basic knowledge discovery
approach has highly
iterative steps, and the overall process is cyclical.

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

3
/
8


During initial
exploration of the MSD, several less
-
than
-
desirable characteristics began to emerge.
Specifically, a number of text attributes contained errant escape and control characters that needed to be
reso
lved using various encodings, and a fair amount of missing data had to be accounted for. These
realizations informed the repeated revision of the Python scripts being used, and influenced the se
lection
of loudness
(calculated as a floating
-
point
value repr
esenting the average decibel range throughout a song)
and tempo
(calculated as a floating
-
point value
representing the average beats per minute

throughout a
song)
as the key attributes to be investigated (as they appeared to be consistently available,
well
-
formatted,
and
uniformly
collected

using The Echo Nest API
[11]
).

As the principle intention of this study was to investigate potential differe
nces in music across generations,
a decadal binning strategy was used to organize the data gleaned from the MSD. As such, a set of final
modifications were made to the Python scripts to appropriately parse, and format all data of interest into
appropriatel
y named CSV files

(i.e., 1960.csv through 2000.csv)
, before the scripts were run on the full
dataset. Considering the file
-
delimited nature of MSD records, processing of the full dataset clearly
presented a file I/O processing hurdle. To mitigate, the EC2
instance
was configured as a “High
-
Memory
Quadruple Extra Large Instance” with approximately 70 GB of memory, and eight virtual cores
[1]
. Even
t
hen, processing of the full dataset took nearly 7.5 hours, most of which was spent on file open and close
operations.

The yielded CSV files were
then
downloaded for local manipulation

using WEKA.

To account for
WEKA’s memory constraints each decadal
-
binned

file was trimmed to 10,000 records

(representing
50,000 songs in total)
.

Moreover, as the intention of this study was to investigate the potential similarities
across generationally diverse music, an unsupervised
clustering technique was used.

C
entroids w
ere
considered a good pl
ace to begin with for cross
-
decade comparison, and as
outliers were not of immediate
interest, a technique that clustered every point

was selected; namely, Simple K
-
Means. To determine the
ideal K value of each of the binned data sets, a simple Java application was written to call various
functions using the WEKA API. Once determined,
an ideal K was used to produce
scatter plots
colored

to
represent various clusters within each of the decadal bins. Centroids were identified, and compared as
representatives of their respective clusters, and distances were calculated between proximal centroids
across

all the

decadal bin
s
.
Finally, a
brief
qualitative evaluation was performed, comparing the relative
similarity of centroid songs.


RESULTS AND ANALYSIS

An analysis
of the information collected throughout the
knowledge discovery approach taken appears to
reveal several compelling results.

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

4
/
8


First, the decisi
on to use the unsupervised clustering algorithm, Simple K
-
Means, required the
identification of an ideal K value for each of the decadal bins. Not only was an ideal K value of five (5)
found for each of the binned datasets, but also a quick plotting of the

ideal K trend lines reveals
consistently close SSE across bins for various values of K

[
Figure
2
]
. While very little can be determined
from gross SSE for an entire cl
ustered bin, the proximity of SSE values across all of the bins hints that the
loudness and tempo data points being evaluated are similarly distributed around their respective centroids.
In terms of supporting the Same Old Song hypothesis, this critical fi
rst step indicates that songs from our
decadal bins are not entirely differently distributed than each other.

Moreover, once ideal K values were calculated, each decadal bin was plotted and colored to differentiate
between clusters [
Figure
3
]. The similarity in shapes between clusters is striking, and seems to further
indicate that with regards to loudness and tempo songs appear to be consistently distributed around
sim
ilar centroids across the evaluated decades. Additionally, features of clusters that aren’t of immediate
interest nonetheless appear to recur across decadal bins (e.g., the scattering of potential outliers around
low
-
loudness/moderate
-
tempo clusters).

A fu
rther analysis of cluster centroids shows that they are all
proximally situated

[
Figure
4
]
.

In terms of the Same Old Song hypothesis, the consistent similarity of clustered data, and the proximity of
respective cluster centroids

across decadal bins seems to indicate that music acr
oss the evaluated decades
exhibits similar patterns in terms of loudness and tempo.


0
50
100
150
200
250
2
3
4
5
6
7
8
9
10
Ideal K

Figure
2
: The decadal bins from the MSD appear to share an ideal K value when analyzed using the unsupervised clustering
algorithm, Simple K
-
Means. Note the consistent closeness of SSE across bins as K is changed.

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

5
/
8


Lending f
urther support
to
the Same Old Song hypothesis

are the results of
a brief

qualitative evaluation of
the songs represented by each of the decadal bin
cluster centroids [
Figure
5
]. Specifically, a simple coding
of songs according to their placement along the loudness and tempo
dimensions, and a subsequent
sampling of

the actual songs revealed that

songs from proximal clusters resemble each other regardless of
their decadal bin. For example, the cluster centroids representing high
-
loudness/moderate
-
tempo music
(i.e., the upper
-
middle group in
Figure
4
) conta
in songs by Mihalia Jackson (198
0s), The Jackson 5
(1970s), and Eric Clapton (1990s); all are popular vocal performers with ties to Motown, soulful blues, or
gospel influenc
es. Similarly, high
-
loudness/high
-
tempo songs by Nino Bravo (1970s), The Pointer Sisters
(1980s), and Tommy Roe (1960s) all represent energetic pop music, while low
-
loudness/low
-
tempo Sleepy
John Estes (1960s), Todd Rundgren (
2000
s), and Barclay Harvest (1
970s) represent slower rock ballads
and guitar solo driven folksy blues.


-45
-40
-35
-30
-25
-20
-15
-10
-5
0
0
100
200
300
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
0
100
200
300
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
0
100
200
300
Figure
4
: Plots of clustered data points from 1960, 1980, and 2000 decadal bins (left to right respectively). Note ancillary
similarities
such as the scattering of potential outlier points near the low
-
loudness/moderate
-
tempo clusters.

-25
-20
-15
-10
-5
0
60
80
100
120
140
160
180
200
1960
1970
1980
1990
2000
Figure
3
: A plot of cluster centroids across decadal bins shows further similarity in the distribution of songs, and hints
that there is a degr
ee of consistency regarding loudness and tempo across the generations of music evaluated.

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

6
/
8



Low
-
Tempo

Moderate
-
Tempo

High
-
Tempo

Low
-
Loudness

Sleep John Estes

Barclay Harvest

Wes Montgomery Trio

Another Level

Todd Rundgren

Bobby Darin

Leon Redbone

Pulp

ATB

Toshinori Kondo

No Cluster

High
-
Loudness

Jorgen Ingmann

The Jackson 5

Mihalia Jackson

Eric Clapton

UNKLE

Jeff Beck

The J. Geils Band

Cerrone

Kirk Whalum

Joana Zimmer

Tommy Roe

Nino Bravo

The Pointer Sisters

Red Giant

Haggard

Figure
5
: Artists
of cluster centroid songs organized by relative location of the respective cluster along loudness and tempo
dimensions. All artists are listed in ascending order according to the decadal bin to which they belong (i.e., 1960
-
2000).

In all, an analysis of th
e mathematical proximity of cluster centroids across decadal bins seems reinforced
further by a ground truth qualitative evaluation of representative songs, and overall the Same Old Song
hypothesis appears supported.

CHALLENGES

AND FUTURE WORK

While both
basic data mining techniques and the qualitative evaluation carried out by this study seem to
support the overall Same Old Song hypothesis, there are a number of potential confounds regarding the
approach taken, and these challenges, while not insurmountab
le, warrant discussion.

In particular,
the qualitative evaluation, though admittedly simplistic in its approach, made clear the
number of ancillary musical characteristics that could potentially affect perception of tempo and loudness.
The wide range of so
unds produced by a rock ballad and the many instruments and production techniques
used throughout it, for instance, may seem louder than the relatively limited vocal range of a blues singer;
however, these songs may nonetheless be mathematically very simil
ar. Loudness in particular, is
calculated by The Echo Nest API using a number of normalizing filters that account for many audible
subtleties that may contribute to a person’s perception of loudness.

For these reasons, loudness is difficult
to analyze vigo
rously through mathematics alone, and future works should include more robust
qualitative evaluations.

This may prove especially useful in the evaluation of outlier songs to begin asking
questions along the lines of “is the loudest song from the last decad
e significantly louder than the loudest
song from the 1960s?”.

The filters used
by The Echo Nest API
to calculate loudness bring to

light another significant challenge.
Specifically, manipulation of data was required at several stages throughout this particular investigation,
and each manipulation inherently distances any conclusions from the original data through added
complexity. Th
e most evident example of these phenomena is the simplification of arrays of data stored by
the MSD into representative averages. While taking, for instance, the average tempo of a song may seem
to be relatively straightforward,
much
richness
is
lost in su
ch a manipulation.
Imagine, for example an
IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

7
/
8


electronic dance song that undulates dramatically between very fast and very slow verses that when
averaged appears to be proximal to an easy listening lounge crooner.
Future works should account for
such shifts b
y avoiding overly simplistic statistical summaries for dimensions of songs that are inherently
highly varied. A number of data mining techniques could be used to analyze arrays representing changing
musical patterns; for example, one can imagine a classifi
cation algorithm being used to predicts various
patterns within a song based on any number of the complex attributes provided by the MSD.

Finally, t
he MSD itself
also
presents a number of challenges for studies that rely heavily on certain specific
subsets

of attributes (and understandably so, a dataset so large can’t be expected to be flawless across all
dimensions). Missing, miscalculated, and malformed data all the same presents a number of issues that
may be overcome through the creation of more focused

datasets that concentrate on isolating less messy
data for a smaller group of attributes. Future work in such an are
a

could potentially branch in many
directions that would allow for

more focused investigations of specific musical characteristics. Moreove
r,
there are a number of ways the current MSD could be modified to include analysis of lyrical composition
and other musical characteristics that might lend themselves to analysis through data mining.

BIBLIOGRAPHY

[1]

Amazon EC2 Instance Types:
http://aws.amazon.com/ec2/instance
-
types/
. Accessed: 2012
-
12
-
01.

[2]

Bertin
-
Mahieux, T. et al. 2011. The Million Song Dataset.
Proceedings of the 12th International Society
for Music Information Retrieval Conference

(2011)
.

[3]

Bertin
-
Mahieux, T. and Ellis, D. 2011. LARGE
-
SCALE COVER SONG RECOGNITION
USING HASHED CHROMA LANDMARKS.
Applications of Signal …
. (2011).

[4]

Dieleman, S. et al. 2011. Audio
-
Based Music Classification with a Pretrained Convolutional
Network.
Proceed
ings of the 12th International Society for Music Information Retrieval Conference

(2011),
669

674.

[5]

egyptiankarim/Same_Old_Song:
https://github.com/egyptiankarim/Same_Old_Song
. Accessed: 2012
-
12
-
01.

[6]

Hierarchical Data Format:
http://en.wikipedia.org/
wiki/Hierarchical_Data_Format#HDF5
. Accessed:
2012
-
12
-
01.

[7]

Levy, M. 2011. Improving Perceptual Tempo Estimation with Crowd
-
Sourced Annotations.
Proceedings of the 12th International Society for Music Information Retrieval Conference

(2011), 317

322.

[8]

McFee, B. and Lanckriet, G. 2011. Large
-
Scale Music Similarity Search with Spatial Trees.
Proceedings of the 12th International Society for Music Information Retrieval Conference

(2011), 1

6.

IS 733

-

Dr. Vandana Janeja

Karim Said

[AG86361]

Same Old Song

2013
-
10
-
15



Same Old Song

8
/
8


[9]

McVicar, M. and Bie, T. De 2012. CCA and a Multi
-
way Extens
ion for Investigating Common
Components between Audio, Lyrics and Tags.
9th International Symposium on Computer Music
Modelling and Retrieval

(2012), 19

22.

[10]

Serrà, J. et al. 2012.
Measuring the Evolution of Contemporary Western Popular Music
.

[11]

The Echo Nest Developer Center:
http://developer.echonest.com
. Accessed: 2012
-
12
-
01.