Jennifer Batt

Submitted Entry for 2016 Competition


Abstract

This project is designed to interrogate the British Library’s collections of digitized newspapers in order to recover a complex, expansive, ephemeral poetic culture that has been lost to us for well over 250 years.

In the eighteenth century, thousands of poems appeared in the newspapers that were printed the length and breadth of the country. Poems in newspapers were extraordinarily varied: some were light and inconsequential pieces designed to provide momentary diversion and elicit a smile or a raised eyebrow; others were topically-engaged works commenting on contemporary cultural or political events; and still others were literary verses in a range of different genres. Some of these poems were the work of established and professional writers; some were composed by amateur contributors; and others still were by countless anonymous individuals. Though much of this verse disappeared into obscurity after appearing in a single newspaper issue, a number of poems that began their printed lives in newspapers achieved a far wider dissemination, being copied from one paper into another and another (going viral, we might say) before making their way into magazines, miscellanies, songbooks, and manuscripts.

The rich, dynamic, ephemeral and responsive poetic culture that found a home in eighteenth-century newspapers has long been overlooked by literary scholars and cultural historians and excluded from conventional literary histories. This project, targeting in the first instance a decade’s worth of newspapers, is an experiment designed to discover how far digital techniques – particularly data-mining and visualization – can be used to bring this lost literary culture back to life so it can receive the scholarly reassessment it warrants.

Focusing on the newspapers printed in the middle years of the eighteenth century, this project will interrogate two major digital repositories of newspapers co-curated by the British Library: the Burney Collection (which is very strong in its coverage of London-printed papers) and the British Newspaper Archive (which has a fair coverage of regionally-produced papers). The project has three goals: firstly, to use data-mining techniques to develop an index to the verse printed in newspapers in the mid-eighteenth century which will not only bring long-forgotten poems back to scholarly notice but will also shed new light on works with which literary scholars are already familiar; secondly, to use visualization tools to underpin a data-driven analysis of the nature and scope of newspaper-enabled poetic culture in the mid-eighteenth century; and thirdly, to develop a workflow and a set of tools and approaches that could, in subsequent projects, be put to use to explore a much larger dataset.

The research question you are trying to answer

How far is it possible to use digital tools to effectively map the poetic culture that existed in eighteenth-century newspapers? In nineteenth-century studies, research into poetry in newspapers is attracting increasing attention in, for example, the work of Andrew Hobbs on local newspapers (e.g. http://hobbb.tumblr.com/) and that of Ryan Cordell and the Viral Texts Project on North American newspapers (http://fugitiverses.viraltexts.org/). In eighteenth-century studies, by contrast, newspaper poetry continues to be largely neglected. One driver of this sustained neglect is the logistical challenge – whether one works with the original documents, with microfilm, or with databases such as the Burney Collection and the British Newspaper Archive – involved in collecting and collating data from tens of thousands of issues of newspapers.

Data-mining techniques, used in association with data visualization, seem to offer a solution to this problem. If verse can be automatically identified, if key information about it can be tabulated, and if visualization tools can be used to reveal trends and patterns within the data, this area of enquiry could be opened up like never before.

This project will survey ten years’ worth of newspapers, in order to develop hypotheses about the nature and scope of newspaper-based poetic culture in the mid-eighteenth-century which have the potential to challenge our accepted narratives of literary history. Among the questions that will be asked of this data are: how big, and how widespread, is the newspaper-based poetic culture? What can be recovered about the authors of this verse? What quantitative or qualitative variations exist in the poetry published in these newspapers? Do different papers develop different approaches to printing poetry? Is there a single, nationwide poetic culture that exists in newspapers, or are there regional variations? Do these trends change over time?

The identification of those poems that were published across multiple papers will enable a further set of questions to be asked. How many poems appear only in a single newspaper, and how many go ‘viral’, being reprinted in many? When poems are reprinted from newspaper to newspaper, how quickly do they spread, and what is the geographic extent of that spread? Is such popularity driven by topicality, literary quality, or something else? What does this reprinting reveal about the editorial cultures of different newspapers, and what does it reveal about the relationships between newspapers? Do patterns of reprinting suggest that nationwide, shared, ephemeral poetic culture existed in the eighteenth century?

Please explain the extent and way your idea will showcase British Library digital content

This project will work intensively with two major digital collections of eighteenth-century newspapers:
1) Gale Cengage’s Burney Collection Newspapers, which contains a wide range of London-based papers including the General Evening Post, the London Evening Post, the London Gazette, the Penny London Post, Read’s Weekly Journal, the Westminster Journal, and the Whitehall Evening Post;
and 2) findmypast’s British Newspaper Archive, which contains a range of regional papers including the Caledonian Mercury, the Derby Mercury, the Edinburgh Courant, the Ipswich Journal, and the Newcastle Courant.
By bringing together data from both digital collections it will be possible to develop a fuller picture of the nature of newspaper-based poetic culture than might be gained from working with one collection in isolation: the Burney Collection’s strengths in London-printed papers are complemented by the British Newspaper Archive’s focus on provincial papers enabling the investigation of the interaction between the national and regional press.

The project will shine a spotlight on a neglected element of these newspapers’ content: the thousands of poems they contain. One of the project’s key outputs will be an index to the verse printed in newspapers in the middle years of the eighteenth century, a finding aid which has the potential to transform how literary researchers approach these digital collections. To date, if researchers have been interested in discovering whether a particular poem, or works by a particular poet, have been printed in newspapers, they have been restricted to searching by keyword, an inefficient process given the unreliability of OCR for texts of this nature. The index will make it far easier for researchers to discover the content in which they are interested, opening up these newspaper collections to users in new ways.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods

This project will begin by focusing on a small sample of data - all newspapers included in the Burney Collection and the British Newspaper Archive from a single year (e.g. 1750) - in order to develop and test the procedures outlined below. The sample size will then be enlarged, to encompass a 10-year period which will both a) test how scalable the processes are and b) enable a broader picture to be constructed of newspaper poetic culture and its development over time. If time permits, the sample size could be expanded even further.

There will be three key stages to this project:

1. DATAMINING
- Develop strategies for identifying poems within newspapers. Poems appear irregularly in newspapers - not in every issue and seldom in the same page position in more than one issue - making this identification a particular challenge. The BL Labs team’s prior experience of data mining newspapers (for Bob Nicholson’s Victorian Meme Machine and Katrina Navickas’s Political Meetings Mapper) will be invaluable in this process.

- Once the poems are identified, re-run their image files against OCR software to generate as clean a text as possible. Manual intervention might be required to further improve the OCR generated text; if so, this will be focused on the poems’ key identifying data (i.e. headers, footers, first lines and last lines). Sort this information, together with publication details (newspaper published in, date of publication, place of publication) into a csv file.

2. CLEANING UP AND ENHANCING THE DATA
- Develop an automated process for unpicking information that appears in the headers and footers of poems (e.g. author attributions, dates, locations) and recording this information in the csv file.

- Investigate the feasibility of supplementing the data automatically harvested with further, researcher-generated information about the poems (e.g. pertaining to genre, subject, authorship where possible). The feasibility of this will depend on the quantity of verse recovered by data-mining.

- Develop an automated process for identifying possible duplicate poems (i.e. poems that appear in more than one newspaper) and develop strategies for recording that information.

3. VISUALIZING AND ANALYSING THE DATA
- Use visualization tools to reveal trends within the data, e.g. to map the chronological development of trends (e.g. regarding quantity of verse, genre, or approaches to attribution); to chart the geographical trajectory of those poems that go viral; and to plot the connections that exist between different newspapers.

- Use these visualizations to underpin an analysis of the data which will oscillate between what Franco Moretti termed ‘distant reading’ and a more traditional close reading of texts. This will both generate hypotheses about eighteenth century newspaper-based poetic culture and identify individual poems or clusters of poems that warrant further investigation.
Please provide evidence how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the British Library* *

E.g. work you may have done, publications, a list with dates and links (if you have them)

I am a lecturer in English Literature at the University of Bristol; my research focuses on eighteenth-century poetry with a particular interest in the ways that verse is printed and reprinted across a range of different media. From 2010 to 2013, I was project manager and editor of the Digital Miscellanies Index (digitalmiscellaniesindex.org) based at the University of Oxford; this was a project that created a database of the verse printed in more than 1400 eighteenth-century verse miscellanies. This Digital Miscellanies Index project, together with my more recent work on eighteenth-century magazines, depended heavily on the manual transcription of metadata about poems (and the publications in which they appeared) into databases of various kinds. This BL Labs project provides the opportunity to develop this interest in poetic metadata in a new way, by automating (and hence expediting and making more efficient) the data-capture process.

My work on the Digital Miscellanies Index means I have several years’ experience of working on digital humanities projects. Though I do not have any particular programming skills and would require the support and expertise of the BL labs team to implement the technical solutions necessary to this project, my deep investment in this project’s research questions is a strong motivation to learn more about these digital techniques.
Some relevant publications include:

‘Poems in Magazines’ in The Oxford Handbook of British Poetry, 1660-1800 ed. Jack Lynch (Oxford: Oxford University Press). Forthcoming.

‘ “A Muse / To grace the Page of weekly News”: Mary Leapor and the Periodical Press.’ Review of English Studies. Forthcoming.

‘"It ought not to be lost to the world": the Transmission and Consumption of Eighteenth-Century Lyric Verse.’ Review of English Studies 62 (255) (2011): 414-432.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis

Technical
Since poems appear in newspapers irregularly (i.e. not in every issue and not always in the sale place) the biggest technical challenge will be developing a robust and reliable strategy for identifying poems from the digital files. The BL Labs team have experience of identifying and extracting content of particular types from the Gale Cengage newspaper database, developed over the course of previous Labs projects (i.e. Bob Nicholson’s Victorian Meme Machine and Katrina Navickas’ Political Meetings Mapper). The BL Labs team also have experience of generating reliable OCR texts from newspaper images, and in translating this into databases. By working initially with a small sample size (i.e. the newspapers from a single year) it will be possible to develop robust strategies and processes, before scaling up to deal with a larger sample size.

In order to make sense of the data generated by data-mining, the project will make use of free (e.g. Tableau Public) or free and open source (Timemapper, Gephi) visualization tools.

Curatorial
Both the Burney Collection and the British Newspaper Archive are accessible on site at the British Library. It will be necessary to liaise with the curators of the British Library newspaper collections to discuss the best way of proceeding with their material.

Legal
It will be necessary for the BL Labs team to negotiate with Gale Cengate and findmypast in order to gain access to their OCR texts, image files, and associated files to enable the processing of this data. The outputs of this project will be largely in the form of metadata; this can be made freely available for reuse under a creative commons licence.

Please provide a brief plan of how you will implement your idea by working with the Labs team

You will be given the opportunity to work on your winning idea between May 26th - November 4th 2016

June 2016
JB to meet BL team to discuss the scope of the project.

BL team to negotiate with Gale Cengage and findmypast to gain access to the data necessary to effect this project.

JB in collaboration with BL team to develop a set of protocols to identify poems in newspapers; to generate reliable, useable OCR texts; and to tabulate information about the poems in a database.

July 2016
JB in collaboration with BL team to begin work on a small sample dataset (i.e. one year’s worth) of newspapers.

JB to manually correct OCR generated metadata as and when necessary.

JB to investigate the feasibility of supplementing the data automatically harvested with further information about the poems

JB in collaboration with BL team to begin work on strategies to visualize trends and patterns within the initial data sample.

August 2016
JB in collaboration with the BL team to expand the dataset to include ten years’ worth of newspapers (1746-55)

September 2016
JB in collaboration with the BL team to continue to work on the expanded dataset. (Depending on the success or otherwise of the processes developed, it may be possible to expand the sample size beyond the initial projected ten year period).

JB in collaboration with the BL team to begin work on a user-facing platform to make this data (the index, together with some of the generated visualizations) available to researchers and the general public.

JB to begin work on an article describing the project’s findings.

October 2016
JB in collaboration with the BL team to continue to work on the expanded dataset.

JB in collaboration with the BL team to continue work on the user-facing platform.

JB to continue work on an article describing the project’s findings.