Palimpsest: Telling Edinburgh’s Stories with Maps (Category: Research)

Name of Submitter(s): Beatrice Alex, Miranda Anderson, Ian Fieldhouse, Claire Grover, David Harris-Birtill, Uta Hinrichs, James Loxley, Jon Oberlander, Nicola Osborne, Lisa Otty, Aaron Quigley, James Reid, Tara Thomson

Name of Team: Palimpsest

Organisation: University of Edinburgh, University of Edinburgh, Edina, University of Edinburgh, University of St. Andrews, University of St. Andrews, University of Edinburgh, University of Edinburgh, Edina, University of St. Andrews, Edina, University of Edinburgh

“Edinburgh Castle is a noble rock — so are the Salisbury Craigs noble craigs — and Arthur's Seat a noble lion couchant, who, were he to leap down on Auld Reekie, would break her back-bone and bury her in the Cowgate.”
This quotation from John Wilson’s The Recreations of Christopher North (1854) illustrates one of the many ways in which Edinburgh has been used as a literary setting. The first ever UNESCO City of Literature, it has a rich literary heritage which provides the backdrop for many novels and stories. Thanks to Palimpsest, Edinburgh’s literary history can now be explored interactively via the LitLong web interface and mobile app. The project is an AHRC funded collaboration between the University of Edinburgh’s School of Literatures, Languages and Cultures and School of Informatics, EDINA, and the University of St Andrews’ SACHI lab. The LitLong tools link to more than 1,600 locations within Edinburgh mentioned in over 47,000 literary excerpts from around 550 books. The interfaces are aimed at scholarly and non-specialist audiences, including tourists exploring the streets of Edinburgh virtually or physically, locals who want to discover how authors described their city 150 years ago and literary scholars who are interested in place in the relations between place and literature.
The data behind the interfaces was created by text mining out-of-copyright literary works as well as a select number of contemporary books, and included work from Robert Louis Stevenson, Walter Scott, Muriel Spark and Irvine Welsh. It is also accessible via an API. 111 books and over 12600 excerpts - over 20% of the Palimpsest data - were retrieved from the British Library Nineteenth Century Books collection. All location mentions within them were geo-referenced to a fine-grained Edinburgh gazetteer, and excerpts containing them are linked back to the original electronic documents of its data provider to enable close reading.
The project involved close collaboration between the different disciplines as literary critical knowledge was vital in developing and fine-tuning not only the text processing but also the visualisation tools. Palimpsest makes it possible to see across numerous well known and less celebrated works simultaneously, and to explore their treatment of place. The Palimpsest team also created an interface specific to works from Walter Scott. Both LitLong and LitLong Walter Scott were presented at the “Reading the City” event led by James Robertson at the Edinburgh Book Festival on August 15th 2015, the birthday of Scott.
URL for Entry: http://palimpsest.blogs.edina.ac.uk , http://litlong.org , https://github.com/LitPalimpsest

Email: balex@staffmail.ed.ac.uk

Twitter: @LitPalimpsest, @Lit_Palimpsest

Job Title: Research Fellow, Research Fellow, Software Engineer, Senior Research Fellow, Research Fellow, Lecturer, Professor of Early Modern Literature, Professor of Epistemics, Social Media Officer, Project Officer, Chair of Human Computer Interaction, Business Development Manager, Research Fellow

Background of Submitter:

The Palimpsest team is made up of 13 people from 4 partners: literary scholars from the School of Literatures, Languages and Cultures at the University, computer scientists at the University of Edinburgh’s School of Informatics and the University of St. Andrews’ SACHI group and data scientists and software engineers at EDINA. An up-to-date list of Palimpsest presentations (17), scientific papers (2) and one guest blog can be found at: http://litlong.org/about-litlong/talks-and-publications/
The brain child of Palimpsest was Dr. Miranda Anderson (now School of History, Classics and Archaeology, University of Edinburgh). She came up with the idea and was involved in creating the very first Palimpsest prototype using a data set crowd-sourced by her colleagues. Prof. James Loxley (School of Literatures, Languages and Cultures) was the Palimpsest PI and is engaged in Digital Humanities research activities. Before Palimpsest, he led the Ben Jonson’s Walk project tracing Jonson’s 1618 walk from London to Edinburgh. His research fellows, Tara Thomson and Lisa Otty, were responsible for providing literary expertise when ranking the literature in order to bring the correct Edinburgh references to the top of the pile.
The text analysis work was carried out by members of the Edinburgh Language Technology Group at the School of Informatics at the University of Edinburgh and led by Prof. Jon Oberlander. Dr. Beatrice Alex and Dr. Claire Grover developed the natural language processing methods necessary to create the final geo-referenced Palimpsest data set. Both have previously been involved in Digital Humanities and Social Science collaborations such as Trading Consequences, DEEP and Text Mining Careers and are also mining scientific textual data such as brain scan reports. Aside from these core Informatics project team members, Ke Zhou, Richard Tobin, Kate Byrne and Colin Matheson also assisted in different aspects of the text processing work.
The SACHI lab at St.Andrews headed by Prof. Aaron Quigley includes visualisation experts who were in charge of the visualisation work required for Palimpsest. The user interfaces are branded as LitLong (http://litlong.org ). Dr. Uta Hinrichs developed the LitLong Edinburgh web interface and Dr. David Harris-Birtill programmed the mobile LitLong Edinburgh App.
Staff at Edina, including James Reid, Nicola Osborne and Ian Fieldhouse, assisted the project in creating and managing the database, preparing the API (http://litlong.edina.ac.uk/search/ ) and overseeing the project’s media and social media presence.

Problem / Challenge Space:

The aim of Palimpsest was to be able to discover and make available for exploration a broad spectrum of books, including forgotten gems which are not part of the established canon of Edinburgh literature. The main literary research question was: is the characteristic of literary setting, and the detailed ways in which this is narratively established, sufficiently amenable to machine reading to allow us to work automatically across large scale collections of digitised texts? This involved reaching ways to define when a book qualifies as being Edinburgh-centric, exploring the literary use of place names and their utility as a marker of setting, and developing a document retrieval and ranking tool to sift potential candidates out of the pool of literature to which we had access. The results of this retrieval and ranking process were then checked manually by literary curators. This combination of automatic and manual processing meant that we were able to identify a wide range of literary works set in Edinburgh while at the same time ensuring that all documents visualised by the LitLong tools contain Edinburgh place name mentions.
From a mining perspective, there were other challenges involved in the project. The goal was to conduct fine-grained geo-referencing to street, area, landmark and building names as opposed to city-level geo-referencing which most prior work in that area had focussed on. To achieve this, we created an Edinburgh gazetteer by aggregating a number of existing geo-referenced data sets containing information on locations in Edinburgh including OS Locator, OpenStreetMap, and data from the Royal Commission of Ancient and Historic Monuments of Scotland. We adapted the existing Edinburgh Geoparser to geo-reference using the Edinburgh gazetteer containing both coordinate point references for locations and polygons for areas within Edinburgh. The main challenge here was to identify the references of ambiguous place names correctly. We spent a significant amount of time fine-tuning the mined output and are currently in the process of evaluating the performance of our fine-grained geo-referencing method.
One issue which occurred is that for some works we were able to extract many different excerpts, some of which contained only random mentions of Edinburgh in passing while others really described the city and were much more interesting to explore. We therefore developed a method to rank the excerpts by an “interestingness” score (i-score). This was inspired by work on automatic prediction of text aesthetics and interestingness by Ganguley et al. 2014 (presented at Coling). The aim is to rank excerpts by interestingness based on the presence of a series of linguistic and other features, including word repetition, snippet length and presence of adjectives, verbs and multiple Edinburgh place names. This problem is still in its infancy and we are seeking to evaluate our method in future work.
From a visualisation and interface point of view, the main research questions were how to represent and make explorable the data set resulting from this combined mining and curation process: real-world place names set in a literary context with all the ambiguity and tensions that this may involve. As part of this, we focused on how to promote interaction and discoveries, not only among literary scholars as well as among general-interest readers from outside and within Edinburgh. To address this, we developed a web-based visual interface that provides several entry points – location, keywords, book titles, and authors – into the data set, and a mobile app which enables the exploration of the literary layer of the city in-situ while strolling through the streets of Edinburgh.

Approach / Methodology:

The text processing work involved four distinct work packages: data preparation, document retrieval and ranking, creation of the Edinburgh gazetteer and text mining and geo-location of the final Palimpsest document collection. The aim was to take all electronically accessible literature and identify and geo-reference only the subset of literature set in Edinburgh. As detailed above, we used an assisted curation method, in that we employed the computer to identify possible candidates but then asked literary scholars to give their final say over which of the candidates were of real interest. This approach included the creation and iterative refinement of a visual interface that could efficiently facilitate manual curation of the data. Using this methodology, we were able to uncover some relatively unknown literary gems such as John and Betty’s Scotch History Visit (Margaret Williamson, 1912), a fictional children story about Scottish history, and Noctes Ambrosianae (John Wilson, 1785-1854), a series of imaginary colloquies set in Edinburgh. At the same time, we include all the well-known works by Stevenson, Scott and others … even Mary Shelley’s Frankenstein:
“I visited Edinburgh with languid eyes and mind; and yet that city might have interested the most unfortunate being. Clerval did not like it so well as Oxford: for the antiquity of the latter city was more pleasing to him. But the beauty and regularity of the new town of Edinburgh, its romantic castle, and its environs, the most delightful in the world, Arthur's Seat, St. Bernard's Well, and the Pentland Hills, compensated him for the change, and filled him with cheerfulness and admiration.”
In order to provide different entry points to explore the data set which may address different audiences and interest groups, we created an interlinked visualisation which shows the literary works from multiple perspective: a map view (as an obvious way to explore geo-referenced data, and therefore one central component of the Palimpsest visualisations available at www.litlong.org/locations) alongside a timeline and a booklist. These views can be explored individually, but, at the same time, act as filters – for example, zooming into the map will update the booklist and timeline accordingly. Excerpts taken from the book and relating to individual Edinburgh locations are integrated into the visualisation in form of pop-ups and link back to the original electronic collections for further reading. Additional filters are provided to enable more targeted searches of the literary texts by author name, location name and/or keyword. In addition, a mobile version of this interface offers the opportunity to stroll through the streets of Edinburgh and encounter literary excerpts based on the places one passes. Both visual interfaces underwent an internal testing and refinement before being released to the general public. The Palimpsest data is also accessible via the LitLong Edinburgh mobile app available from the AppStore (https://itunes.apple.com/gb/app/litlong-edinburgh/id1004433531?mt=8 ).

Extent of showcasing BL Digital Content:

We were able to get access to over 380,000 full text electronic literary works from a number of different data providers including the British Library, HathiTrust, Project Gutenberg, the National Library of Scotland and the Oxford Text Archive. From the British Library we had access to the entire Nineteenth Century Books Collection (over 65,000 items) which we indexed and ranked based on Edinburgh location queries and available metadata. Over 5,500 items were retrieved as potential candidates and, of these, 111 were identified as being Edinburgh-specific. This means that over 20% of the material within the Palimpsest data is from the British Library. The excerpts from all BL content is linked out to URLs in Historical Texts (http://historicaltexts.jisc.ac.uk ). Our data providers, including BL, are mentioned in all of our presentations and on the LitLong site. We received great support from the BL Labs team in making the data available to us and answering questions on its format.

Impact of Project:

The project made a significant impact from a scholarly perspective. It has already led to funding from JISC for an ongoing project on geo-referencing and text mining data made accessible via Historic Texts. This includes the entire BL Nineteenth Century Books collection as well as the EEBO TCP collection and part of the ECCO TCP data. A Horizon 2020 proposal which aims to geo-reference literature set in Scotland and the Netherlands and to combine it with geo-referenced architectural information on points of interest is currently under review. If funded, this project will build directly on Palimpsest. We have initiated conversations with societies devoted to the promotion and celebration of particular authors’ works, and have created a version of our location visualiser specifically focused on Walter Scott to demonstrate the possibilities of this approach. We have worked closely with the Edinburgh World City of Literature Trust during the lifetime of the project, and are planning a follow-on project focusing on public engagement with LitLong tools and data
Palimpsest also received media attention around the time of the launch of the user interfaces, in particularly with an article in the Guardian on 28/03/2015 (http://gu.com/p/473pj/stw ). The extremely well attended launch event featured readings by some of the contemporary writers who granted us permission to include their works or who had participated in the writer’s competition which was organised as part of the project. Palimpsest captured the imagination of a number of Edinburgh-based authors. In her short story Candlemaker Row, the competition winner, Jane Alexander, approached all the past writing about Edinburgh by destroying the city with an unspecified disaster and exploring it through the Edinburgh Reboot project to rebuild it (http://litlong.org/content/candlemaker-row/ ). Palimpsest and LitLong have also featured at two events, in 2014 and 2015, at the Edinburgh International Book Festival, and has been featured at the Edinburgh and Dundee Science Festivals in 2015 and the Canongate Festival in 2014.
The release of the Palimpsest mobile app was also accompanied by a recent AHRC Press article on the project: http://www.ahrc.ac.uk/research/readwatchlisten/features/livingliteraryedinburgh/ .
Palimpsest was also presented at a number of conferences, workshops, symposiums and events, including DH2015 and DH2014, the Big Data Approaches to Intellectual, Cultural and Linguistic History workshop in Helsinki, and the British Library Labs Symposium 2014. A full list of publications, presentations and papers can be found at: http://litlong.org/about-litlong/talks-and-publications/
In terms of career development, the experience in the project has been beneficial to all team members. Most notably Dr. Hinrichs secured a lectureship at the University of St.Andrews in Human Computer Interaction for Digital Humanities and Nicola Osborne moved from her social media officer role at EDINA to become its Jisc MediaHub Service Manager and Digital Education Manager.

Issues / Challenges faced during project(s):

Aside from the research questions addressed and already mentioned we had a couple of other challenges in this project. One issue that affected the entire non-copyrighted literary data that we processed was the poor quality of the optical character recognition (OCR). Where necessary, we applied two tools to improve the text by automatically removing end-of-line soft hyphenation and the character confusion error which lead to all “long s” characters being recognised as “f”. Due to time constraints we were not able develop other OCR post correction techniques in this project however the interestingness score of snippets indirectly helps to rank low quality OCR output lower. More work in this area is needed to improve already digitised textual collections for which there is insufficient funding to redo the OCR with newer and more improved technology.
Another issue was that some works were present several times in the final Palimpsest data set either because they are contained in the different collections aggregated or because they had several editions. In the final stages of the project we undertook a de-duplification process. We developed a script which automatically identifies duplicates and near duplicates by means of vocabulary comparison. Any book pair which was less than 5% different in terms of its vocabulary was presented to a literary scholar for manual de-duplification. This automatic step helped to speed up the manual work for real or near duplicates but was not used for works which contained a collection of works one of which was also contained in the data separately.