Georgi Marinov and Denitsa Blagoeva

Submitted Entry for 2015 Competition

Abstract

The dynamics and movement of books inside British Library has always been hidden from the public eye. It is something only its members might get a fleeting sense of while being there. Yet the scale of it i.e. the amount of books being borrowed and the variety of topics inquired remains represented mainly by numbers. Numbers that to many of us are difficult to grasp, register as noise, and often, leave our brains numb.

The aim of this project is to visualise the live stream of machine data that flows through The British Library Book Ordering system, and to translate it back to human terms. Knowing the subjects with the highest priority to readers in a given moment allows us to build a kind of Readometer. A widget-like compass that monitors the current direction of reader's attention in real time. The Readometer mimics a mechanical device, and its interface could be implemented equally well in a website, twitter bot, an app, or a stand alone physical object. Its finished appearance would be influenced by the research and further development of this project.

Acting as a visual indicator, it will be in constant motion, aiming to showcase the breadth of subjects the readers are delving in, and to promote the diversity of the BL's vast catalogue. We hope to provide the people who haven't been to the British Library, or used its resources with a sense of its pulse, and the immense wealth of knowledge that is being unlocked before their eyes by the ones exploring the BL's catalogue. Using the real time ordering dataset and feeding it back into the public imagination we aim to spark curiosity and ignite a new interest in the digital and physical catalogues of the British Library.

The live reading information will be collected, analysed, organised in a specific way, summarised and then showed to the public. This will be done by means of automated algorithms based on carefully devised rules. In an attempt to demonstrate the diversity of subjects being explored by the readers the Readometer will be sending its signal to remind us of how lively this world of knowledge actually is.

URL for project:
http://www.denitsablagoeva.com/readometer

Assessment Criteria

The research question / problem you are trying to answer

Please focus on the clarity and quality of the research question / problem posed:

The book ordering records contain a bigger story about the curiosity of BL's readers, the diversity of its catalogue, the function of the modern library, and the extent of historical knowledge pursued or ignored by our society. Is there a way of revealing this story?

Computers with great processing capabilities are able to handle enormous amounts of data. The detail present in the data obscures the patterns from the human mind. This data is normally used for statistical purposes or planning, but greater stories contained within it are more readily revealed to a human gaze.

There is a lot of hidden beauty in the British Library's book ordering data. In our research we will be exploring ways in which to extract this beauty from where it hides now and to share it with the readers, in an effort to spark a new interest in the BL's catalogues.

Additionally we plan to evaluate ways of correlating the BL's book ordering data with other sets of British Library data or with a third party data, for example the weather forecast, or trending hashtags on social networks, popular news items, etc.

Please explain the ways your idea will showcase British Library digital collections

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

The Readometer will serve as a visual indicator pointing at the diversity of subjects explored at the British Library. Utilising a particular set of data that isn't otherwise publicly visible we hope to point people's attention towards unexplored ways of seeing the human stories in machine data.


Promoting metadata of the movement of books the Readometer visualises hidden patterns which are not directly visible. It also acts as an open Invitation to other researchers, developers and designers alike to explore the possibilities contained within the BL's digital collections.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved


Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

The development of this project starts with studying book ordering data in more detail, to better understand some of its aspects, and to identify patterns beyond the immediately visible. This could be frequency of booking, the average time the readers spend, the reading location, and so on.

We also seek to build a database of keyword metadata that works in broader terms, better suited to everyday conversation such as e.g. “Philosophy” or “the 1800s”. Much of this metadata may be inferred from the book ordering data alone, based on author, words in the title, reading location, year of publication, etc. However if content keyword metadata is already present in digital resources at the Library, we would look to make use of such better quality data.

For example a title such as “Allende's Chile and the Inter-American Cold War” from 2011 would parse as relating to “cold war”, “politics”, “USA”, “Chile” (based on the title), “social sciences” (based on ‘politics’ or a reading location), “America” (based on ‘chile’ and ‘usa’), “world” based on ‘America’ (as opposed to “UK”), and the “2010s” and “2000s” based on the publishing year of the book. The book author may also be considered, yielding additional keywords.

The above will become our framework for data processing rules. Rules should aggregate records from the book ordering system in real time as they come in, also integrating the results as counts over time, in addition to keeping a short recent history. We will combine the output of data processing rules, into a set of simplified indicators (terms), suitable for visualisation. Our initial choices are broad terms such as “science”, “religion”, “politics”, “love”, etc, but over time we will develop a better understanding of which terms work better visually, and may opt to target those. Design-wise, the possibility of visualising terms that are not fixed will be explored.

We will aim to design and implement an engaging dynamic visualisation, inviting the public to spend a brief moment observing and contemplating the nature of the readers’ curiosity and the everyday dynamic of the British Library. Visitors may be virtual - e.g. to the BL website, or physically present at the information desk, just as the object can be a digital widget, or a physical device, “powered” by this data. We will aim to develop a fully functioning digital piece, as well as a physical prototype device (we will refer to these as “visualising clients”).

Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team

E.g. work you may have done, publications, a list with dates and links (if you have them)

Denitsa and Georgi are both experienced in digital technology and content and have worked as a team on previous occasions. Denitsa has a BA Design degree from Goldsmiths university, and previously studied 3D object design at Ravensbourne, so is well suited to design the look, feel, and function of the Readometer and its complications.

Georgi works as an interactive digital developer, and has more than 10 years of experience with code and data including in corporate environment. His past experience working with British Library data for the unveiling of Living Knowledge strategy event provides reality-checks for the technical aspects of the project.
We are both team players, capable of self-direction, and we also recognise that some aspects of our work will be best informed by the Labs team and IT staff, as well as other specialists at the library, who we will look to work with, to make this project happen.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).


Technical
The Readometer is sized for feasibility. The bigger challenge comes with our desire for it to be as accurate as possible, thus opening up possibilities for further design exploration by correlating with other realtime data. Identifying the keywords needed for rule-based data processing will be one of the deeper aspects of the work.

The data processing model will itself consume data readily available and circulating BL’s own infrastructure without interference to existing IT systems. The pace of data showcased at the Living Knowledge event gives us confidence that processing can be handled realtime. This data will be queued upon entry in the Readometer’s summarising server, so at busy times it can be processed with a short delay.

The availability of BLIC&BNB data provides options for additional subject/keyword association. Even without it, data on authors and titles, along with publication year and reading location, that are already present in book ordering data, provide for crude association, as well as integration based on e.g system date.

We will utilise common and opensource technologies such as PHP, Redis, Lua, for data processing, and for HTTP, Javascript/json and HTML5 for the client widgets, thus allowing the project to be released to the open source community later. We have the past experience in making digital resources with 24/7 operation.

Visualising clients will access the summarised data via an API over network, querying a self-contained micro-server. This must be pluggable, with no impact on existing BL IT infrastructure, and providing no useful vector for potential security risks.

Curatorial
The book ordering data is readily available at British Library and refers to the contents of the Library.

This data is refreshed and replenished on a daily basis by the readers for as long as the library is open.

Legal
All data is owned by BL, and we should require no third party processing via API or not.

By using open source data processing technologies, as well as readily available client technologies, this project will not burden BL from a licensing standpoint.

There is a small possibility of a surveillance aspect to how the public perceive the Readometer. This can be avoided by clearly explaining its aims and technology - we will work with BL to structure and present this information. If realtime processing becomes a problem for any reason, it can be sufficiently delayed without impact to the Library’s IT operation, and results can be integrated within a long enough time window so as to stop the data from being realtime.

Please provide a brief plan of how you will implement your project idea by working with the Labs team

You will be given the opportunity to work on your winning project idea between June 2015 - October 2015.

June

initial discussions and orientation
dialogue with key people at labs
start dialogue with key technical people
request a recent sample of data to study
obtain and distil BLIC&BNB data of relevance to the project
visual ideas generation
develop code that ingests realtime data, queuing or discarding data when the system is busy

July

develop runtime loop that summarises/integrates newly ingested data, maintaining a set of key indicators
develop API to query current state of terms & metrics
evaluate feasibility of designs
design prototypes

- for website widget
- standalone (self-contained) digital or physical object

finalise designs
technical implementation of designed digital or physical objects

August

work with BL labs’s IT:

- explore feasibility of widget version for main BL website in terms of accessibility/design/technical requirements, appropriateness of visuals, etc.
- establish security/data/internet - robustness of API (potential impact)
- install self-contained system within BL’s server infrastructure
- integrate web application

September

work with BL to promote the Readometer
independent monitoring of Readometer
evaluate ways of correlating with other realtime BL and 3rd party data.
in broad strokes evaluate alternative designs that combine two or more sets of data (for instance if the quality of British weather impacts on what books are ordered, or if popular Twitter hashtags relate to people’s reading choices)
present findings to BL Labs and discuss

October

document the software for BL’s use
prepare software for opensourcing

- code cleanup
- code comments