| Competition | Previous Entries & Ideas | Digital Collections | TOCs | FAQs | Judging | Resources and Tools | Submit Entry | Events |

Alf Eaton

Submitted Entry for 2013 Competition

Resonance FM Audio/Schedule Parser/Matcher for Archiving and Playback (RFMASPMAP)

Abstract

This project analyses a) the British Library collection of Resonance FM audio recordings (MP3), b) the current and historical Resonance FM schedules (Google Calendar), c) the HTML archive of Resonance FM show metadata.

The output will be an index of each show that was broadcast, the position of each show in the daily audio files, and a browsable interface for search and playback of each show.

Assessment Criteria

The research question / problem you are trying to answer*

Please focus on the clarity and quality of the research question / problem posed:

The ability to reconstruct a historical archive of broadcast media using audio data and metadata from multiple sources.

Please explain the ways your idea will showcase British Library digital collections*

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

This index of broadcast episodes - and their positions within individual MP3 files - will allow researchers to immediately listen to any of the recorded shows from the 10 years of audio in the British Library Resonance FM archive.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved*

Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

The Google Calendar feed and HTML schedules will be parsed with PHP scripts and stored in MongoDB, then augmented with data (length, bitrate) from analysis of the audio files. The position of each show will be identified in the corresponding audio file(s). The compiled data will then be output as CSV for archiving and as HTML for interactive playback of the audio files.

Some manipulation will be required to the data to account for truncated/split audio files, and text mining of the show descriptions will allow the names of participants in each show to be indexed (where available).

Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team*

E.g. work you may have done, publications, a list with dates and links (if you have them)
I maintain a rolling 2-day archive of current Resonance FM broadcasts, which uses many of the same techniques to identify the position of each show within the audio stream: http://resonance.macropus.org/
A prototype of the data analysis was produced during the recent British Library Hack Day.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis*

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).

Technical + Curatorial

The input data formats are all readily parsable with everyday scripting languages. The output format is defined, and current web server software is able to host the audio files appropriately for playback.

Legal

All code written will be placed in the public domain and made available through GitHub.

The audio may contain copyrighted content, so will at first only be available within the British Library. However, knowing the position of each show within the audio files may allow the identification of audio that is freely licensed (particularly that which is only speech) and can be made available more widely.

Please provide a brief plan of how you will implement your project idea by working with the Labs team*

You will be given the opportunity to work on your winning project idea between July 6th - October 31st 2013

At first I will finish the conversion of the input data and perform some verification to make sure that the output is accurate. From then on, any enhancements to the data or playback interface can be made according to the needs of the archivists and end users.