Song Genre Classification

Only available on StudyMode
  • Download(s) : 239
  • Published : May 6, 2013
Open Document
Text Preview
Music Genre Classification with the Million Song Dataset
15-826 Final Report
Dawen Liang,† Haijie Gu,‡ and Brendan O’Connor‡
† School of Music, ‡ Machine Learning Department Carnegie Mellon University

December 3, 2011

1

Introduction

The field of Music Information Retrieval (MIR) draws from musicology, signal processing, and artificial intelligence. A long line of work addresses problems including: music understanding (extract the musically-meaningful information from audio waveforms), automatic music annotation (measuring song and artist similarity), and other problems. However, very little work has scaled to commercially sized data sets. The algorithms and data are both complex. An extraordinary range of information is hidden inside of music waveforms, ranging from perceptual to auditory—which inevitably makes largescale applications challenging. There are a number of commercially successful online music services, such as Pandora, Last.fm, and Spotify, but most of them are merely based on traditional text IR. Our course project focuses on large-scale data mining of music information with the recently released Million Song Dataset (Bertin-Mahieux et al., 2011),1 which consists of 1

http://labrosa.ee.columbia.edu/millionsong/

1

300GB of audio features and metadata. This dataset was released to push the boundaries of Music IR research to commercial scales. Also, the associated musiXmatch dataset2 provides textual lyrics information for many of the MSD songs. Combining these two datasets, we propose a cross-modal retrieval framework to combine the music and textual data for the task of genre classification: Given N song-genre pairs: (S1 , GN ), . . . , (SN , GN ), where Si ∈ F for some feature space F, and Gi ∈ G for some genre set G, output the classifier with the highest classification accuracy on the hold-out test set. The raw feature space F contains multiple domains of sub features which can be of variable length. The genre label set G is discrete.

1.1 Motivation
Genre classification is a standard problem in Music IR research. Most of the music genre classification techniques employ pattern recognition algorithms to classify feature vectors, extracted from short-time recording segments into genres. Commonly used classifiers are Support Vector Machines (SVMs), Nearest-Neighbor (NN) classifiers, Gaussian Mixture Models, Linear Discriminant Analysis (LDA), etc. Several common audio datasets have been used in experiments to make the reported classification accuracies comparable, for example, the GTZAN dataset (Tzanetakis and Cook, 2002) which is the most widely used dataset for music genre classification. However, the datasets involved in those studies are very small comparing to the Million Song Dataset. In fact, most of the Music IR research still focuses on very small datasets, such as the GTZAN dataset (Tzanetakis and Cook, 2002) with only 1000 audio tracks, each 30 seconds long; or CAL-500 (Turnbull et al., 2008), a set of 1700 humangenerated musical annotations describing 500 popular western musical tracks. Both of these datasets are widely used in most state-of-the-art research in Music IR, but are far away from practical application. Furthermore, most of the research on genre classification focuses only on music features, ignoring lyrics (mostly due to the difficulty of collecting large-scale lyric data). 2

http://labrosa.ee.columbia.edu/millionsong/musixmatch

2

Nevertheless, besides the musical features (styles, forms), the genre is also closely related to lyrics—songs in different genres may involve different topics or moods, which could be recoverable from word frequencies in lyrics. This motivates us to join the musical and lyrics information from two databases for this task.

1.2 Contribution
To the best of our knowledge, there have been no published works that perform largescale genre classification using cross-modal methods. • We proposed a cross-modal retrival framework of model...
tracking img