| Competition | Previous Entries & Ideas | Digital Collections | TOCs | FAQs | Judging | Resources and Tools | Submit Entry | Events |

Kim Foale

Submitted Entry for 2013 Competition

Computers, and especially the web, handle audio very badly. This project will develop a new audio shorthand helping rethink how we visualise, browse, and manage audio data, in order to bring it up to speed with where image handling has been for years. Images and video are the de facto web media formats, and are generally handled very well on modern websites; browsing large amounts of audio data, however, is still a chore.
Audio files should have semantically rich thumbnails that convey information about the sound within, communicating meaningful properties of the sounds, in order to allow quick comparison and evaluation of the salient points of a very large number of files. They should also have a robust, widely accepted metadata format for storing this information.

Some of the properties are:

• Duration
• Rough frequency content
• Type (music, audio recording, podcast, etc)
• Volume/dynamic range
• Degree of consonance/dissonance

They could be represented abstractly with things like:

• Shape
• Colour
• Opacity
• Pattern
• Small icon

This process will work in two stages.

An audio metadata format will be created or adapted to store data about audio files. Some of this data is directly based on the audio content and can be automatically generated (e.g. frequency content, volume, duration, some will be semantic data which will need to be manually entered (e.g. list of sounds, weather conditions), and some will be semantic data which can either be automatically generated or entered manually (e.g. GPS trace, time of recording).
This metadata can then be used to create the audio thumbnails. Using an open format, there can be a sensible default, but allow other coders to reinterpret or re-imagine the audio within. Within this project though, I aim to find a visually simple, easy to interpret format that can be readily used with most applications.

Previous articles I wrote, this application is at a better stage though: http://alliscalm.net/2012/05/16/sounds-and-the-web-dont-work/ and http://alliscalm.net/2012/06/18/sound-and-the-web-part-2/

Assessment Criteria

The research question / problem you are trying to answer*

Please focus on the clarity and quality of the research question / problem posed:

How can very large collections of audio data be quickly and easily searched, analysed, and catalogued?

Please explain the ways your idea will showcase British Library digital collections*

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

Audio data is very hard to effectively parse without directly listening to it, a process which is very slow for large quantities. There are therefore many applications for this technology: any situation where there are more than a handful of audio files to process. It's key to note here that my use case is very different from cymatics [See, for instance http://www.youtube.com/watch?v=CsjV1gjBMbQ ], or audio visualisation. Existing audio visualisation works on an artistic level: it is a form of visual feedback or manipulation of audio data, generally to make beautiful or interesting patterns as the primary aim. The aim of this project, however, is to instead make semantically rich thumbnails that connote information about the internal data, and not nessecarily to make anything beautiful or aesthetically pleasing in the first instance.

The aim therefore is to allow the listener to select appropriate sounds to listen to out of a potentially unwieldy sound archive. Here are some examples of how this can be applied using collections from the British Library, and my own research.

“Soundscapes” collection
There is already a map of soundscape recordings, and a separate XML database on the British Library website. However, reading the descriptions of each sound or seeing where they are located on the map gives little useful information about the data set as a whole. A useful visualisation here would be to directly superimpose sound glyphs onto the existing map. I've done a very rough mockup [ https://dl.dropboxusercontent.com/u/14314148/docs/map-draft.svg ], with some examples of how I would go about using the space.

“Accents” collection
In a similar way, the “accents” collection has a huge amount of files but no practical way of discerning or identifying an accent heard in the real world short of listening to every file. A well designed glyph would show important parts of the audio recording, and may show where vowels or consonants are emphasised more. This application would necessitate a specialised application and analysis; however, likely focusing heavily on the frequency bands for vowels and consonants. Again, superimposed glyphs on the map may show interesting regional or national patterns that would be hard to discover or navigate with a manual listening exercise.

Similar experiments with manual coding show fascinating results, for example Joshua Katz's “Dialect Survey Maps” [ http://spark-1590165977.us-west2.elb.amazonaws.com/jkatz/SurveyMaps/ ]. Imagine a map like this with sound recordings layered on top!

Birdsong database
Ascertaining a birdsong in the wild can be very difficult. Some birds are known to imitate others, and the songs can be very similar. This is another example of a specialised application: with birdsong we are only interested in a fairly specific, high pitched section of the frequency range.

An ideal thumbnail here would show the pattern and pitch of each bird, in a much more literal way than the general soundscape recordings. A simple thumbnail list with filters is the best visualisation here -- there is no need for map data. Through listening to a bird in your garden, the thumbnails would allow you to vastly narrow down the number of recordings to compare the birdsong with. With filters for, for instance, geographical region and rarity, a well designed format should point you to relevant files very quickly. Again, to reiterate the project goals -- the thumbnails are an aid to focussed listening, not a replacement for listening.

My research
I have about 400 audio recordings from my PhD fieldwork of people's day-to-day lives, with associated log book entries for what they were and what people thought about them. Without cross-referencing a large spreadsheet, there is no way for me to tell the content of the files without listening to them all. Suppose I want to find audio files that consist of broadband noise, in order to compare responses to them -- there is currently no way of doing this without resorting to one of the above solutions. I have metadata for all the sounds the person heard in the recording -- if I could select, say, all recordings of traffic and compare the thumbnails to see the range of experience here, it would greatly assist my critical comparison skills.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved*

Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

The basic framework would be:

1. Research existing audio meta data formats, and see if there is either an existing suitable one, or one that can be extended. Talk to other developers about a best practice for establishing a format if one doesn't exist.
2. Talk to people who access the audio collections to establish user stories about what they need, how, and when. I've given several use cases here, but a more exhaustive approach will yield more breadth and depth of outputs.
3. Engage with the public and artists to establish the basis for an intuitive thumbnail (see below).
4. Investigate, modify and/or write a plugin to encode the audio data appropriately.
5. Create a proof-of-concept jQuery/HTML5 plugin that implements a few chosen outputs of this in time for the Labs event in November.

Public engagement -- Sounding Shapes
Members of the public will be shown shapes of different sizes and colours, and asked what they think they sound like. They will be played sounds (on headphones) and asked to draw them. This will be a fun, portable public engagement to do at the British Library, on the net and other sites as available.

After collecting the data we can publish the results and see what similarities and differences there are between both the ways people draw sounds, and describe the sounds of drawings. This will hopefully be a fun, viral project that can potentially get some traction with the Sound Art community, local news, and blogs on web and interactivity.
Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team*

E.g. work you may have done, publications, a list with dates and links (if you have them)
I'm a soundscape researcher at the University of Salford, with the PhD title "A listener-centered approach to soundscape evaluation". I have given people sound diaries for two weeks and then interviewed them about their experiences.

My background is sound engineering and music technology, and I'm also a web developer. The technical side of the audio file processing needs more research into the best solution, but I have a strong idea how this project would look and feel. I feel we live in a time where there is huge priority given to the visual, and audio data needs to be given special attention to bring it back into contention.

I would ideally work with an experienced JavaScript developer for development help, and a graphic designer to both make a visual identity for the project and develop the thumbnails themselves.

• Foale, K. and Davies, W. J. (2012), A listener-centred approach to soundscape evaluation, in `Institute of Acoustics 2012', Nantes.

• 2013 “Elephants in the Dark”, multimedia film and audio installation. The Penthouse, Manchester, UK. http://thepenthousenq.com/post/52049396575 . Available online soon.

• 2012 “IN/(from the out)”, invited speaker. Two day symposium organised by Ben Gwilliam & Helmut Lemke. Blankspace, Manchester, UK. http://infromtheout.com/

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis*

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).


Backend - Either the new HTML5 Audio API (preferable) or server-side scripting with GNU/Linux tools. Ideally store metadata as YAML or other lightweight flat file database.

Frontend - HTML5 Canvas or write to .png using a tool like Imagemagick, depending on application


On your approval, all code source will be released under a GPL license on a site like GitHub, making it usable and adaptable by anyone who wants to use it. Hopefully we are inventing a new web standard here, and development will continue after the end of the initial project


As this can be applied to any content, and all my applications are free to listen to on the British Library website, there should be no legal issues. Ideally source code to be released as GPL.
Please provide a brief plan of how you will implement your project idea by working with the Labs team*

You will be given the opportunity to work on your winning project idea between July 6th - October 31st 2013

Develop and run the public engagement, create a microsite outlining the project, get feedback from developers, people working with audio, sound artists and the public. This stage would be in collaboration with British Library Labs for organising engagement, and a designer to make some initial engaging mockups, a proper name, and a visual identity for the project. I would create the microsite.

Create some mockups and user stories to guide development. I'm on holiday this month so will have limited time. This would be in collaboration with any interested stakeholders, designer and developer, to make sure we're all on the same page.

Initial development, try and get a working prototype up based on a fixed dataset with the developer.

Further development depending on progress, either introduce more datasets or generate further outputs. Do further design iterations.