Digital Music Lab - Analysing Big Music Data(Category: Research)

Name of Submitter(s): Tillman Weyde, Stephen Cottrell, Jason Dykes, Nicolas Gold, Simon Dixon, Emmanouil Benetos, Mark Plumbley
Name of Team: Digital Music Lab
Organisation: City University London, City University London, City University London, University College London, Queen Mary University London, Queen Mary University London & City University London, University of Surrey

The research project "Digital Music Lab - Analysing Big Music Data" (DML) has enabled and still enables new forms of research on music audio collections of the British Library. By providing computational analyses on a collection level and by creating an interactive graphical web interface the DML enables for the first time: the computational analysis of music audio collections at the BL, large scale analysis for recognising patterns and trends within and across collections, and remote access to analysis systems without copyright infringement. The current system incorporates more than 49,000 recordings from the British Library Sound Archive, as well as more than 5,000 historical music recordings from the CHARM collection, and more than 280,000 recordings from the I Like Music collection, which is the main source of content for the BBC. The DML web interface has been presented at several workshops at the British Library and beyond, and feedback from musicologists, music researchers, and music enthusiasts has been overwhelmingly positive. Our plan is for the system to be continuously enhanced (e.g. music similarity visualisations were recently added to the interface, following the AHRC-funded ASyMMuS project) and to continue adding more music collections from the British Library as they become available.

URL for Entry:


Twitter: @tweyde @jsndyks @emmanouilb @markplumbley

Job Title: Senior Lecturer, Professor, Professor, Professor, Senior Lecturer, Reader, RAEng Research Fellow, Professor

Background of Submitter:

The Music Informatics research group at City University London is a specialized research team within the Department of Computer Science. Music Informatics includes the study of computational models of music and sound analysis and generation, and music performance. Interests of the Music Informatics Research Group include music information retrieval and computational musicology, music signal analysis, music knowledge representation, and music applications, such as e-learning and games. Team members + URLs with publication lists:
- Dr Tillman Weyde -
- Prof Jason Dykes -
- Prof Stephen Cottrell -
- Dr Emmanouil Benetos -
The Centre for Digital Music (C4DM) at Queen Mary University of London is a world-leading multidisciplinary research group in the field of Music & Audio Technology, which is ranked among EPSRC's top 5 groups in People and Interactivity. C4DM has more than 50 members working on signal processing of digital music, music informatics, machine listening, audio engineering, interactive sound, and music cognition. Research funding since 2007 totals over £22m, from EPSRC, AHRC, EU, Royal Society, Leverhulme Trust, TSB, JISC, Mellon Foundation and industry sources, including a £1.1m Platform Grant and a £5.1m Programme Grant. Team members + URLs with publication lists:
- Prof Mark Plumbley -
- Dr Simon Dixon -
The Music Systems Engineering Research Team at University College London carries out research on computational approaches to the understanding, analysis, use, and performance of music, developing theoretical techniques and embodying them in systems. This includes representations and systems for large-scale music analysis, algorithmic and generative composition methods, and interactive systems for live performance. We are also bringing together the UCL Laptop Orchestra (UCLOrk). Team members + URLs with publication lists:
- Dr Nicolas Gold -

Problem / Challenge Space:

The main research question in the project is: How can Big Music Data be used in music performance research? Studies of cultures of music performance are examples of research that can hardly be conducted without large amounts of data. With current resources and technologies available to music researchers, these studies are hard to realise this although most of the technologies (e.g. transcription, audio analysis) and the datasets (e.g. collections at the BL, that are required exist.
What is missing is a methodology for working on large amounts of data in the analysis of musical audio, the tools and infrastructure that bring needed technologies together, and access to data collections in a way that is practical and does not infringe copyright. The higher degree of automation in such a system requires different approaches to music performance research with different trade-offs on large and small scales of analysis. The specific research questions asked here are therefore:
- How can music research make use of audio transcription and analysis to conduct music performance analysis on large data collections?
- How can we provide an infrastructure to enables researchers to make use of large data collections and create reusable open datasets?
- How can computational tools be made usable for music researchers, musicians and other users, who are not necessarily computer scientists?

Approach / Methodology:

With regards to methods, the team and project is exploring relevant questions in musicology that can be specifically addressed with large data collections. Questions that benefit from large data collections are e.g. how music performance develops over time, how it differs between regions, and where similarities are between different cultures. The technologies needed for Big Data applications have been rapidly evolving in the last few years; the most popular approach is parallelisation with a Map-Reduce framework. We are embedding VAMP plug-ins and other audio feature extraction into the Map step and perform collection level analysis in the Reduce step. We are working on visualisations for supporting non-technical users in defining queries and analysing the results.
A major output of the project is a software service infrastructure with two prototype installations. One installation enables researchers, musicians and general users to explore, analyse and extract information from audio data stored in the British Library (BL), which cannot be used outside the BL for copyright reasons. Another installation is hosted by the Centre for Digital Music at Queen Mary University of London and provides facilities to analyse the audio collections by I Like Music and other freely accessible datasets. The combination of state of the art music transcription method and music analysis on the audio and the symbolic level with collection level analyses allows for exploration and quantitative research on music that has not been not possible at this scale up until now.

Extent of showcasing BL Digital Content:

The DML project has used and extracted information from over 49,000 music recordings from the British Library Sound Archive. In particular, over 29,000 recordings from the BL's World and Traditional Music Collections were used, along with over 18,000 recordings from the Classical Music collections, over 400 recordings of Oral History, and over 700 radio broadcasts. In addition to the recordings, relevant metadata for each recording was used and curated (either in METS/XML or TXT format).
As part of the project, we also used over 5,000 recordings from the CHARM collection of historic recorded music ( ) and over 280,000 recordings of popular music from the I Like Music collection ( ), which is the main source of content for the BBC.

Impact of Project:

Further funding includes:
- The £77 AHRC-funded project "An Integrated Audio-Symbolic Model of Music Similarity" (ASyMMuS), aiming to apply the DML infrastructure to answer the musicological question what constitutes and contributes to similarity of music.
- The £6k project entitled "Automatic segmentation of audio recordings to speech and music" (MuSpeak), funded from the City University London Pump Priming Fund. The project is currently using ethnographic recordings from the BL's World and Traditional Music collections, in order to separate music and speech segments.
- A £505k research fellowship entitled "A machine learning framework for audio analysis and retrieval", funded by the Royal Academy of Engineering. The 5-year project aims to utilize the existing DML infrastructure and data collections for creating novel methods for computational analysis of audio, including music transcription and instrument separation.
Workshops organised by the team:
- Digital Music Lab 1st Workshop on Analysing Big Music Data, British Library, 19 March 2014
- Digital Music Lab Final Workshop on Analysing Big Music Data, British Library, 13 March 2015
- Digital Music and Big Data:
- Digital music to feel impact of Big Data:
- UCL and Big Data: funding announcement:
- Digital Music Events at the British Library:
Presentations at conferences/workshops:
- "Big Data: Challenges and Applications" workshop, 17-19th February, London, UK
- European Conference on Data Analysis (ECDA 2014): "An iterative learning approach to dataset demarcation in music analysis", "Digital Music Lab – A Framework for Analysing Big Music Data", "Machine Learning for the Analysis of a Large Collection of Musical Scales"
- "Big Data, Big Models, it is a Big Deal" workshop, Coventry, September 2014
- "Cutting Edge Research – from City University and King’s College London", City University London, 14th October 2014.
- International Society for Music Information Retrieval Conference, "Visualising chord progressions in music collections: a big data approach". At the Unconference session of the conference, we organised a discussion on "Big Data for Music Analysis and Musicology"
- British Library Labs Symposium 2014, 3rd November 2014
- 59th annual meeting of the Society for Ethnomusicology, 13-16 November, Pittsburgh, Pennsylvania, USA.
- Workshop on Musical Timbre, Paris, France, 14th November 2014. The talk was entitled "Instrument transcription & instrumentation recognition".
- "Towards analysing big music data – Progress on the DML research project", Digital Music Research Network Workshop 2014 (DMRN+9), December 2014, London, UK
- THATCamp British Library Labs, 13th February 2015
- "Numbers, Noises and Notes: Quantitative Data and Music Research" symposium, 16th June, Sussex Humanities Lab, University of Sussex, UK.
- T. Weyde, S. Cottrell, J. Dykes, E. Benetos, D. Wolff, D. Tidhar, N. Gold, S. Abdallah, M. Plumbley, S. Dixon, M. Barthet, M. Mahey, A. Tovell, A. Alancar-Brayner. Big Data for Musicology, 1st International Digital Libraries for Musicology workshop (DLfM 2014)
- D. Wolff, D. Tidhar, E. Benetos, E. Dumon, S. Cherla, T. Weyde. Incremental Dataset Definition for Large Scale Musicological Research, 1st International Digital Libraries for Musicology workshop (DLfM 2014)
- D. Tidhar, S. Dixon, E. Benetos, and T. Weyde, "The Temperament Police", Early Music, 42(4):579-590, November 2014.
- M. Barthet, M. Plumbley, A. Kachkaev, J. Dykes, D. Wolff and T. Weyde, "Big Chord Data Extraction and Mining", 9th Conference on Interdisciplinary Musicology (CIM 2014)
- A. Kachkaev, D. Wolff, M. Barthet, M. Plumbley, J. Dykes and T. Weyde, "Visualising Chord Progressions in Music Collections: A Big Data Approach", 9th Conference on Interdisciplinary Musicology (CIM 2014)
- S. Abdallah, A. Alencar-Brayner, E. Benetos, S. Cottrell, J. Dykes, N. Gold, A. Kachkaev, M. Mahey, D. Tidhar, A. Tovell, T. Weyde, and D. Wolff, "Automatic transcription and pitch analysis of the British Library World & Traditional Music Collection",
- A. Leroi, M. Mauch, P. Savage, E. Benetos, J. P. Bello, M. Panteli, J. Six, and T. Weyde, "The deep history of music project", 5th International workshop on Folk Music Analysis, Paris, France, June 2015

Issues / Challenges faced during project(s):

A major challenge of the project was providing the complete toolchain with the web interface for the project's final workshop, since participants would be asked to carry out tasks using the interface and evaluate the system's capabilities. This required a close collaboration between the various project teams: on automatically extracting audio features, integrating the features within the project's semantic web framework, curating the metadata (some of which had inconsistencies) and importing them also in the semantic framework, building the visualisation interface in conjunction with the musicologists of the project, and finally linking the visualisation interface with the semantic web framework. This task was solved by scheduling several frequent meetings (over skype or in person) during the last stage of the project - and also involved a few sleepless nights! In the end, the system was ready on time, and a back-up system was also operational in case the first system copy crashed; the workshop exercises using the web interface were a success, and we are now in the process of documenting the whole system for reproducibility purposes.