Bob Nicholson

Winner's blog posts :
The Victorian Meme Machine Finding Jokes - The Victorian Meme Machine Victorian Meme Machine - Extracting and Converting Jokes

Submitted Entry for 2014 Competition

Abstract

What would it take to make a Victorian joke funny again?

Nothing short of a miracle, you might think. After all, there are few things worse than a worn-out joke. Some provoke a laugh, and the best are retold to friends, but even the most delectable gags are soon discarded. While the great works of Victorian art and literature have been preserved and celebrated by successive generations, even the period’s most popular jokes have now been lost or forgotten. Fortunately, a few thousand of these endangered jests have been preserved (largely by accident) in the margins of the British Library’s digital collections. This project aims to find these forgotten jokes and bring them back to life.

An initial sample of 1,000 jokes will be extracted from the British Library’s Nineteenth Century Newspaper Database. These jokes will be manually corrected and encoded with new metadata and descriptive tags. In future stages of the project these texts will become part of a much larger crowd-sourced online archive of Victorian jokes. However, for this competition, I want to do something a bit more creative and tap into the spirit of the Library’s forthcoming Comics Unmasked exhibition. I propose to develop an application that converts Victorian jokes into ‘memes’ – humorous new images that are designed to circulate virally over social media.

The ‘Victorian Meme Machine’ [VMM] will analyse the content and metadata of a joke and automatically pair it with an appropriate image (or series of images) drawn from the British Library’s digital collections and other participating archives. The joke will then be superimposed automatically onto these images, either as a caption or via speech bubbles. Examples of how these pairings might be formatted appear on the project website (figs 2-12). Users will be able to re-generate the pairings until they discover a good match (or a humorously bizarre one) – at this point, the new ‘meme’ will be saved to a public gallery and distributed via social media. An associated twitter account will periodically tweet randomly generated examples, much like the Mechanical Curator. The project will monitor which memes go viral and fine-tune the VMM in response to popular tastes.
http://www.digitalvictorianist.com/vmm,
https://twitter.com/DigiVictorian

Assessment Criteria

The research question / problem you are trying to answer

Please focus on the clarity and quality of the research question / problem posed:

Firstly, the VMM offers a playful way to explore the differences between our sense of humour and that of our Victorian ancestors. While some of the pairings generated by the software may highlight the ‘otherness’ of nineteenth-century jokes, they will also demonstrate that the subjects, situations, and relationships that characterised Victorian humour are remarkably similar to the ones that appear in our own.

Secondly, the project invites us to consider new ways in which ephemeral forms of popular culture might be preserved and repurposed. If the works of Shakespeare can be reinvigorated by continually situating them in new contexts, can the same principle be applied to jokes? If the act of creative ‘remixing’ proposed by this project is successful, could similar approaches be used to breathe new life into some of the library’s other underused holdings?

Finally, it asks whether the creation of these new jokes can be successfully automated, or whether a human eye is needed in order to identify viable pairings and fine-tune their presentation. Can a piece of software develop a good sense of humour? Can a computer write a comic?

Please explain the ways your idea will showcase British Library digital collections

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

The core digital resource used by the VMM will be the 19th Century British Library Newspapers database. It will offer an example of how the raw data held in this archive can be used for creative new projects – ones that are not possible using the archive’s conventional interface. It will also determine an effective way to extract a particular type of content from the archive (in this case jokes) and develop a system for adapting the data and feeding it into new projects. Similar procedures could be used in future projects to isolate and extract other journalistic genres from the database.

The VMM will also pull its images from the British Library’s digital collections. The one million images recently uploaded to Flickr seem an excellent place to start. However, I would also be interested in using the Early Photographically Illustrated Books and Evanion collections. Copyright restrictions permitting, I’m also keen to pull images from the library’s digital newspaper archives and from external projects such as the Database of Mid-Victorian Illustration [DMVI]. The DMVI’s images have been given detailed descriptive tags, which would make them well suited to the operating principles of the VMM.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved

Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

The development of the VMM will take place across three stages:
Firstly, I will identify and extract a sample of 1,000 jokes from the 19th Century British Library Newspaper archive. These will be taken from two of the Hampshire Telegraph’s weekly humour columns: ‘Jonathan’s Jokes’ and ‘John Bull’s Jokes’. If I am able to gain access to the XML underpinning the archive then this process should be fairly straight forward – it will be possible to isolate the jokes using the ‘article title’ field and pull them from the database. If this access is not granted within the timeframe of the project, I will manually transcribe the jokes. Once the jokes have been extracted I will mark them up using XML (see Fig 1 on project website). I’m prepared to do this in a basic XML editor if necessary, though I would like to work with your technical expert to develop an efficient, user-friendly web interface for the process.

Secondly, the project will explore how we can pair jokes with appropriate images. The methodology used in this process will depend on the structure of the image archives. I anticipate using pre-existing tags and/or keywords to select images, though it would also be interesting to experiment with facial recognition software. This process will either require me to extract a collection of images from the archives and integrate them into the VMM, or develop an API to connect to these databases each time the VMM looks to generate a new meme. Once the jokes and images have been paired, they need to be combined – the formats outlined on the project website would all require slightly different approaches. While the captions should be fairly straightforward, the speech bubbles would require the software to detect appropriate spaces on the image to place the text. I’m keen to find a way to fully automate this process, but have a range of semi-automated solutions in mind if this proves too difficult.

Finally, the process of generating and publishing new memes needs to be converted into a user-friendly web interface with connections to key social media platforms. I would also like to develop a system that measures the success of the memes and monitors which pairings and subjects are most popular. These preferences could then be fed back into the VMM’s programming.

Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team

E.g. work you may have done, publications, a list with dates and links (if you have them)

I have been researching the history of Victorian humour for the last five years, including research on the transnational circulation of jokes during the nineteenth century. I have also published several articles on digital research methodologies, with a particular focus on archives of Victorian newspapers. I was awarded the inaugural Gale Dissertation Research Fellowship for my innovative work in this area and now chair its prize committee.

In addition to my academic research, I have also spent the last six months tweeting humorous and peculiar items from the Victorian press. In the course of this activity I’ve developed a good sense of how to present this content for modern audiences, including 2,600 twitter followers who seem to enjoy it. This pre-existing platform will be useful when promoting the VMM.

Finally, it’s important to stress that I am not a programmer. I’m competent with computers and have done some basic xml and html coding, but I will require the assistance of your technical lead in order to develop this project. One of the main reasons I am applying for this competition is to develop my programing skills in the hope that I can put them to use in future digital projects.

For a list of my relevant publications, see: http://www.digitalvictorianist.com/publications

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).

Technical
I am convinced that each stage of this process is technically feasible. A fully-automated version of the VMM will require us to overcome some challenging technical problems surrounding the pairing of texts and images. This, of course, is part of the fun of the project. However, if the fully-automated approach is unfeasible then I have a range of other approaches in mind that will lead to a successful completion. If needs be, I will hand-pick and tag a collection of appropriate images and programme the VMM to append the jokes as captions below them. This outcome would be less interesting than automatically generating comic strips with speech bubbles, but would produce results using a fairly simple process and provide a foundation upon which to build a more advanced version of the VMM. Similarly, if these automated processes do not function satisfactorily then we could explore ways for users to combine jokes and images more deliberately using a comic-building web application.

Curatorial
All of the digital collections required for this project are accessible on site at the British Library. In addition to working with the BL labs team, I would also be keen to collaborate with the curators of the library’s newspaper collections.

Legal
All of the jokes and images used by the VMM are out of copyright and controlled by the British Library. However, I understand that access to the XML underpinning the 19th Century British Library Newspapers database is currently uncertain until new copyright legislation is passed later this year. If it is not possible to obtain access during the timeframe of the pilot project then the jokes will be transcribed manually. Access to the library’s image collections should be more straightforward. Access to the DMVI will depend on negotiations with its director – I will initiate these conversations if/when the project is commissioned.

Please provide a brief plan of how you will implement your project idea by working with the Labs team

You will be given the opportunity to work on your winning project idea between May 26th - Oct 31st, 2014.

May 26 2014- Onwards
Initial discussions with the BL Labs team and the curators of the library’s digital newspaper archives to determine the best way to approach the project.
Initiate dialogue with the directors of the Database of Mid-Victorian Illustration.
Initiate dialogue with Gale Cengage to obtain permission to use data from the 19th Century Newspaper Database (if their permission is required).

June 2014
Working with the BL Lab’s technical lead to:
(a) Understand how the XML underpinning the 19th Century Newspaper Database is structured.
(b) Develop a system for extracting jokes from the database
(c) Develop a process for marking up the jokes with new tags and metadata

Working independently to:
(a) Transcribe a sample of jokes (if access to metadata is not permitted)
(b) Mark up a sample of jokes (see fig 1 on project website).

July 2014
Working with the BL Lab’s technical lead to:
(a) Explore the XML underpinning the library’s digital image collections.
(b) Develop a process for pairing jokes with images.
(c) Program a basic prototype of this pairing mechanism.

August 2014
Working with the BL Lab’s technical lead to:
(a) Discuss the pros and cons of the ‘meme’ formats proposed on the project website.
(b) Create a program that adds the joke as a caption below its paired image.
(c) If possible, enhance this program to support speech bubbles and comic-strip formats.

September 2014
Working independently to:
(a) Manually tweet/share outputs from the prototype.
(b) Compile data about which jokes and formats are most successful.
(c) Feed these preferences back into the VMM.

Working with the BL Lab’s technical lead to:
(a) Convert the prototype into a user-friendly web application.
(b) Create a user manual and support users of the application should problems arise.

October 2014
(a) Working with the wider BL community to promote the VMM and share its results at conferences and in the media.
(b) Working independently to monitor the usage of the VMM and track the success of its memes.
(c) Meeting with the BL labs team to discuss ways of developing the next stage of the project.