Polina Proutskova

Submitted Entry for 2013 Competition

Assessing potential mutual benefits for British Library and Music Information Research (MIR)


British Library possesses unique, one of world's largest holdings of music recordings from around the world. While a huge effort goes into digitizing, annotating and maintaining recordings for future generations, these holdings remain underused and under-explored. Meanwhile, the booming field of Music Information Research (MIR) has recently shown a particular interest in non-Western data - to test existing and to develop novel approaches to extracting information from large music corpora and automating aspects of their analysis. Researchers are desperately looking for annotated datasets to experiment with.

My project aims at bridging the gap between the librarians' and the researchers' worlds to facilitate MIR tools development specifically targeted to the needs of British Library users and archivists. Digital music collections with a potential for computational processing will be identified and practical modalities of content and metadata re-use will be investigated. On the one hand, user needs will be assessed for chosen collections based on their current and potential use; on the other hand, the collections' content and annotations will be explored in terms of their suitability for MIR applications.

In most cases, for an archival collection to become a dataset that can be directly exploited by computational methods, its annotations have to be cleaned and brought into a single, clearly documented, automatically readable format. This work often requires manual processing, sometimes by a field expert, while other tasks can be solved by a software routine. My project will investigate the current state of existing annotations and the resources needed to bring them to the digitally exploitable shape. Also, where no annotations exist that address an identified user need, it will be assessed whether these can be extracted or purchased from other sources and what the cost of a manual creation would be.

The outcomes of the project will be in the form of use case descriptions of current or potential British Library processes that may benefit from MIR applications, with a clear guidance on what resources are required to prepare the existing data to be directly employed by MIR researchers. The outcomes will be disseminated at MIR conferences, through publications and online media. We then envision a good uptake among MIR researchers to collaborate with the British Library, attracted by the access to a unique data pool with appropriate annotations, to work on real-life problems and to produce applications to change the experience of millions of British Library users. Also, future collaborations will be based on informed decisions from both librarians and computer scientists based on the preparatory work provided by this project. This will ensure a higher rate of positive outcomes as well as reaching the goals in less time.

Assessment Criteria

The research question / problem you are trying to answer

Please focus on the clarity and quality of the research question / problem posed:

I would like to address the need for interdisciplinary expertise that includes archival, musicological and MIR knowledge. Bringing in this combined expertise will facilitate development of new tools, approaches and processes that could not be envisioned or developed based only on a view from one particular field of knowledge.

Please explain the ways your idea will showcase British Library digital collections

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

- British library collections of music recordings which can easily be turned into digitally exploitable datasets will be identified and communicated to MIR researchers
- British Library specific scenarios/use cases/processes which would benefit from MIR applications will be identified and communicated to MIR researchers
- the above would lead to a much greater exposure of British Library holdings to MIR researchers; this in turn could lead to development of MIR tools targeted at British Library specifics and holdings, which would then enable novel, improved and possibly unexpected ways for users to interact with BL digital contents
- I plan to work with collections of music recordings from Traditional Music Section which can legally be used for research purposes. Examples are all collections currently available for higher education members in the UK.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved

Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

-Qualitative approach – unstructured and semi-structured interviews with members of various teams who work with music recordings and with their users
-Formulating use case scenarios
-determining technological requirements to address the use cases
-analysing data contents, formats and metadata, detecting elements missing for the technological approaches to be implemented
-possibly an inductive approach (if it seems promising after the use case formulation stage): I submitted an audiovisual collection of field recordings to British Library. As is often the case with such collections, it has detailed musicological and ethnographic annotations, but in this form is not suitable for direct digital analysis. Knowing this collection very well would help me to develop practices of creating datasets from richly annotated collections. These practices can then be tested on other collections, adjusted, extended, improved.

Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team

E.g. work you may have done, publications, a list with dates and links (if you have them)

I am a PhD student in the field of MIR, with a previous experience of working with music and audiovisual archives in the field of metadata interoperability.

I previously submitted my ethnomusicological field recordings and annotations to the British Library and have had a number of interesting discussions with the archivists. I have also presented at British Library events.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).


-no technical work is planned within this project; I might have questions to the technical team about formats and data extraction


-Assistance by archivists required
-Interviews with members of various teams to assess user needs


-Legal guidance on which collections and which annotations can be used for which purposes. Since some of the collections are currently available for download for higher education, it seems plausible, that some data can be used for research purposes

Please provide a brief plan of how you will implement your project idea by working with the Labs team

You will be given the opportunity to work on your winning project idea between July 6th - October 31st 2013

July 6 - Onwards
Choice of collections

August 2013
user needs assessment, interviews, analysis, use case formulation

September 2013
content and metadata analysis

October 2013
writing up