Title: Fabio Antonini – Post-doctoral research assistant, ARCHIves Project, Birkbeck, University of London

Abstract
The introduction of print into early modern Europe led to a radical transformation in the production, dissemination and reception of ideas. No longer restrained by the laborious processes of manuscript reproduction, the writings of Renaissance and Reformation era authors were now able to spread across the continent in new editions, translations and compendia with far greater speed and volume than ever before. Historians of the book have long debated the extent to which this technological revolution had transformed the social, political and intellectual landscape of Europe during this time, whilst historians of the period more generally have repeatedly pointed to the publication of certain texts in particular cities and countries as the corollary, or even the cause, of social, political or intellectual shifts in that area. One need only look at the arrival of the vernacular Bible, for instance, to see how devastating such publication trends could be.
Taken in isolation, there is only so much we can learn from individual publications of a particular text. But what if we could map the entire spread of that same text, in every language and edition, over both space and time? Inversely, what could we learn about a particular city, publishing house or even individual editors if we could collate, quantify and analyse their entire corpus of printed works? There are a plethora of research questions to be both asked and answered through a visualisation of publishing trends in early modern Europe, a visualisation which can be easily achieved through the digital resources held by the British Library.
The purpose of this project is to transform the bibliographic records of early modern Europe – for which the British Library catalogue and its affiliated Short Title Catalogues are an unrivalled source – into a single, refined and malleable dataset. Once plotted onto an interactive map of Europe and North America, this dataset will provide the single largest research tool for scholars interested in any author or text during the first three centuries of print, with each individual catalogue item serving as both a plot point and a series of metadata to be extracted, visualised and interrogated in whichever manner their research questions demand. In addition, this bibliographic metadata can also be integrated into the ever-growing corpus of digital facsimiles held by the British Library, and thus the Mapping Early Modern Print project will mark a new frontier in the relationship between library catalogues and scholarly research.

URLs:
https://www.google.com/fusiontables/DataSource?docid=12ju-sj44fQ6xqiOeWkmGr4G5_oAkxrOPYFUP8Vf1 (Jean Bodin trial)
https://www.google.com/fusiontables/DataSource?docid=1Rb9Y5yD3ubdkQmPdghy_wt5X03qRHwmPPCU3TPym (Niccolo Machiavelli trial)

The research question / problem you are trying to answer
As outlined in the abstract, the intention of this research project is to transform the bibliographic metadata from the BL catalogues of early modern printed works into a fully navigable map of publication trends. The principal purpose of this data visualisation is to provide a tool through which scholars can raise new research questions by viewing the geographical spread of their chosen author, title or genre of text over time. By presenting early modern publication trends in this unique and previously unseen way, users of the Mapping Early Modern Print project will be confronted with new challenges to explain and contextualise the relative popularity and readership of their particular subjects.
The visual trends and patterns highlighted by a mapped version of the BL catalogue will provoke scholars to ask new questions such as:
  • Why did the Spanish suddenly adopt a particular French philosopher several centuries after his death?
  • How did one thirteenth century manuscript author find a new audience in print during the early seventeenth?
  • Why were female writers more successful in Edinburgh than Prague?
These are the kind of new avenues which will be opened up by the creation of this vast visualisation project, and which will thus render the Mapping Early Modern Print site an extremely popular and well used resource amongst early modern scholars.
My trial of the BL catalogue data relating to the sixteenth century French theorist Jean Bodin, for instance, highlights an example of the kind of research questions which will be raised by this resource. Under the tab ‘Bodin Demon-Mania’ (as one can see through the link above), I have performed a keyword search for the title of Bodin’s Of the Demon-Mania of the Sorcerers, a publication which has long been associated with the witch-craze of the late sixteenth and early seventeenth centuries. As the Fusion Map illustrates, this geographical spread of this work was mainly concentrated around the Alsace, Low Countries and parts of rural France, areas which were known for particularly high rates of prosecution during this time. Scholars of the European witch-craze can either use this visualisation as the starting point for examining the correlation between the arrival of this text in certain areas and the prosecution of those accused of sorcery, or else insert it into their own work as an illustration of the relative popularity of Bodin’s text.
Taken as a whole, moreover, a larger scale visualisation of the entire BL online catalogue (and its affiliated STCs) will also allow us to answer more general research questions about the nature of publishing and reproducing works across early modern Europe, including:
  • What countries, cities and publishing houses were most active and during which periods?
  • What role did individual editors and translators play in the dissemination of texts, and how can we retrace their careers through this bibliographic data?
  • Did known incidents of censorship (and, inversely, of patronage and support) by the Church and governments have a particular effect on the spread of certain works?
  • To what extent did the works of Western European authors spread into other parts of the world (Eastern Europe, America etc.), and how did these influence intellectual developments in these regions?
  • How regularly, and under what circumstances, were medieval manuscript authors edited and released in print?
  • Did certain authors find a greater spread and reception as part of larger edited volumes or miscellanies?
Please explain the extent and way your idea will showcase British Library digital content
By converting the early modern collections of the British Library into a single dataset, which can then be mapped, visualised and analysed by generations of future researchers, its catalogues and various STCs will be showcased as the single most comprehensive bibliographic resource currently available online. It will reintroduce the library catalogues to a substantial variety of early modern scholars as not simply a source for individual texts and publications, but as a body of metadata through which they can discover new approaches to the study of their favoured authors and titles. One potential application of the maps and visualisations created from this dataset would be to integrate them into the online catalogue of the British Library itself, allowing general keyword searches to lead to further information about how an author’s work was edited, translated and distributed across the continent, the history of the publishers and editors involved in its release, and even the potential correlations between its geographic spread and contemporary events.
By incorporating both the English and numerous other European and American Short Title Catalogues (http://www.bl.uk/reshelp/bldept/epc/earlyprinted/index.html) into the Mapping Early Modern Printdataset, the BL Labs will be able to further showcase and distribute the invaluable compendia compiled by these separate projects. In addition, there will be the opportunity to integrate, or at least link to, the ever growing body of digital facsimiles of early modern texts, with the new transformed catalogue entries serving as valuable metadata for each particular group of images. This would allow researchers to assess, for instance, how the frontispieces or even binding practices of individual publishing houses or editors changed and developed over time (using the BL database of book bindings: http://www.bl.uk/catalogues/bookbindings/Default.aspx), whilst also comparing how the works of a particular author were physically reproduced in different countries across Europe and the Americas. On a design level, this can easily be achieved by inserting links to the digital images as a separate field within each entry, so that those clicking on individual points on the map can then access and compare the physical features of each publication.
Similarly, this project can be extended to incorporate the datasets of the British National Bibliography (for scholars interested in publishing trends during the second half of the twentieth century – http://www.bl.uk/bibliographic/natbib.html) and the Integrated Archives and Manuscript System (for those entries within the collection which have a precise geographical provenance, and can thus be used to trace the spread of manuscript texts during the medieval and early modern periods).
Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods
As outlined in the ‘skills, knowledge and expertise’ section below, I have already trialled this project with the holdings of two renown early modern authors (Niccolò Machiavelli and Jean Bodin) from the BL catalogue. My initial methodology for this trial was as follows:
  1. 1. I exported the entire holdings for each of the two authors (as found through a refined search on the BL online catalogue) onto the bibliographic software Zotero, storing each in their own designated folder. This dataset included all of the works to which Machiavelli and Bodin were attributed as an author, and I was careful to include each different edition of the same title.
  2. 2. Within the two Zotero folders, I exported the datasets once again as two separate CSV files.
  3. 3. Within the CSVs themselves, I looked through the dataset for any gaps or misplaced entries (of which there were only a few in each case) and manually corrected the fields as appropriate.
  4. 4. I then uploaded the two files to the data cleaning software Open Refine, through which I was able to make uniform the entries regarding author and place of publication (both of which can be subject to different spellings according to language).
  5. 5. Once the dataset was cleaned, I uploaded each onto Google Fusion Tables. Whilst this is unlikely to be the software used in the project conducted with the BL Labs, this online programme allowed me to create a simple map of the data, in which the data field ‘Place of Publication’ served as a geocode for each individual plot point, whilst the data field ‘Date of Publication’ allowed me to use different markers on the map for different periods. This allows the user to better visualise the geographical spread of a publication across different time periods, which they can refine by date or title in order to identify more specific patterns and interrogate the data even further.
The results of these two trials are illustrated through the links included in this application, and are elaborated upon in the ‘skills, knowledge and expertise’ section below.
Although the basic structure of this methodology (exporting the entries as CSVs from the BL catalogue and affiliated STCs / refine the datasets in order to create uniform entries for author, title and place of publication / visualise and publish the data using interactive mapping software) will remain in the proposed project, my initial trial has highlighted a number of areas for improvement and consideration:
  • Firstly, we will need to manually create a series of universal tags for each data entry regarding the title of the publication (possibly in the ‘Short Title’ field). This is because many editions of the same work have different wordings or translations of the same title, and thus a universal title field would improve the searchability of the final visualisation.
  • In addition, we will need to identify a more efficient method through which to convert the online catalogue into a series of CSVs. This is a process which would be more easily achievable for a far larger dataset (that is, the entire collection of early modern publications rather than those of individual authors) if we were to bypass the online interface of the BL catalogue site itself and instead extract the data directly from the catalogue servers. If it is decided, after consultation, that it is not technically feasible to export and convert the entire catalogue, then I will design the project around a series of key authors from early modern Europe. This will create a more manageable dataset with which to build our visualisations.
  • Finally, we will need to design a more bespoke mapping and visualisation software through which to publish our data. Whilst Fusion Tables is a useful tool for trialling the ways in which one can plot a series of CSVs according to place and date of publication, there are a number of drawbacks with the programme – most notably in the fact that only the first publication in a particular city will be plotted on the map. There are other visualisation options already in place online, such as Neatline.org and even the Fusion Tables API function, through which we can have more control over the design and functionality of the maps themselves. Rather than performing keyword searches in order to find a particular author or title, for instance, we can insert a series of drop-down menus for each of the fields.
Please provide evidence how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the British Library
During the academic year 2015-2016, I have undertaken a series of training courses in Digital Humanities with both the ‘Venice Time Machine’ project in Italy and the Consortium of Humanities and Arts for the South East (CHASE) in the UK. Both of these courses have provided me with the necessary technical skills to carry out the data mining, cleaning and visualisation involved in my proposed project, as well as giving me the opportunity to trial, discuss and further develop the concept and design of my project as a whole.
I have developed a trial of my proposed project with the support and guidance of several DH researchers and lecturers associated with the CHASE training course, testing and refining my methodology as well as producing tangible results through the use of Google Fusion Tables. For this trial, I have focussed on two specific authors – Jean Bodin and Niccolò Machiavelli – whose geographic spread over the course of the early modern period demonstrates the myriad ways in which this bibliographic data can be visualised. I have included links to both of these Fusion Maps above. These allow users to see examples of the maps depicting the date of first publication in each city (1500-1800), and to perform searches on the date ranges and titles of the author’s various works (as well as view the original dataset in ‘Rows 1’).
The Mapping Early Modern Print project will publish its results through more bespoke visualisation software than that used in the two trials, but the trial results should illustrate the ways in which this larger body of bibliographic data could be mapped and interrogated by those interested in specific early modern authors, their dissemination and influence. As I explain in the research questions above, one could for instance search for the spread of Jean Bodin’s Of the Demon-Mania of the Sorcerers during the European witch-craze, and ask new questions about the actual influence of this work in the regions in which it was published.
The other purpose of this trial was to identify some of the immediate issues of data refinement and visualisation which will need to be addressed in the Mapping Early Modern Print project proper. Chief amongst these is the need to insert a standardised entity for author, place of publication, and above all title, into the dataset, in order to optimise the keyword search functions in each case. Similarly, author and editor entities within the catalogue data will need to be separated and disambiguated, as well as the entities for publisher and place of publication (some of which may have been deliberately placed under false dates for the purposes of circumventing censorship at the time of publication). This process, which will require a contextual understanding of the authors and titles of this period that I will be able to bring the project through both my own knowledge and that of my colleagues in the field of early modern studies, will also help to improve the navigability of the British Library catalogue itself, as it will refine many of the potentially ambiguous entries which routinely trouble collections of early modern material.
As part of my work and training with the ‘Venice Time Machine’, I have also assisted in running a short project aimed at creating network diagrams of commercial transactions extracted from annotated digitised images of the state archive’s financial registers. This involved converting a series of annotations (pertaining to the names, figures and dates associated with each transaction) into a set of CSVs, which our group then refined and plotted using the statistical computing software ‘R’. The methodology and outcomes of this project were presented as a conference poster at a recent ‘Big Data’ symposium in Rome in November 2015.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis
Technical
The most important technical advantage of this project is that, for the most part, the bibliographic dataset is already fully digitised and ready to be mined, cleaned and visualised. Given the sometimes inconsistent structure of each catalogue entry (with regards to author names, place of publication and so on), certain aspects of the dataset CSVs will need to be manually edited before running through the refining software, but this can be planned in advance through consultation with those who organise and maintain the catalogue itself (see curatorial aspects below).
Curatorial
I will be better able to design a more effective method for the mass extraction of catalogue metadata than that given above (through the use of Zotero) by discussing the structure, servers and malleability of the online catalogue with those who maintain it. In addition, we will be better able to organise the storage and integration of the CSVs within the servers of the online catalogue, as well as providing links to the online repositories of texts and images held elsewhere on the BL websites.
Legal
By working with the metadata surrounding the library holdings, rather than the publications themselves, there will be far fewer rights issues concerning the reproduction and visualisation of this dataset. Similarly, any images associated with the data entries on the mapping software will be of early modern publications for which there are no concerns regarding copyright.


Please provide a brief plan of how you will implement your project idea by working with the Labs team
June 2016: Formulate a plan for mass data mining from the BL catalogue and associated STCs, in discussion with the BL Labs and curators of the collection / Begin preliminary designs for the visualisations to be used as the outcomes of this project (including the necessary user interfaces regarding keyword searching and filters) / Begin trials of large scale data mining from the collection.
July 2016: As the data mining from the catalogue is ongoing, I will formulate the most efficient method of data cleaning and short-title tagging in order to improve the serachability of the dataset once it is imported into the mapping software, and trial this on the first bodies of data to emerge from the process. I will then run a short trial of the visualisation (most likely based on a series of pre-determined popular authors and works).
August 2016: Continue the cleaning of the dataset, and begin constructing / refining the framework for the visualisation programme / Enter discussions with the digital reproduction teams regarding the incorporation of images (or links to images) into the dataset, or else the incorporation of this new bibliographic metadata into their collections.
September 2016: Continue to refine the visualisation programme, complete with the full dataset which has been able to be extracted from the catalogue during this time. Prepare beta version of this programme for the BL Labs presentations in November.