Digital Collections - Datasets

Datasets

A comprehensive list of research resources, including catalogues, metadata, archives and records, which can be accessed both online or on site at the British Library, St Pancras.

Digital scholarship guides detailing some of the datasets about our collections, for content mining, image analysis and the UK Web Archive are available on the main Library site.

The British National Bibliography

The BNB is the single most comprehensive listing of UK titles. UK and Irish publishers are obliged by law to send a copy of all new publications, including serial titles, to the Legal Deposit Office of the British Library. This material is catalogued by experienced staff using Resource Description and Access (RDA) and subject indexed using Library of Congress Subject Headings (LCSH) and the Dewey Decimal Classification system (23rd edition).

Read more about this dataset in this Data.gov.uk blogpost

Free data services include the Linked Open BNB data and the BNB in Basic RDF/XML. Files are available for download and updated monthly.

Possible ideas: Augmenting existing records in various ways with this data

Please get in touch about specific questions about this dataset or on Twitter at #BLMetadata

Access Conditions

cc-0 logo.png
This material is under a Creative Commons 0 License.

Integrated Archives and Manuscripts System (IAMS)

IAMS metadata contains an inventory of the archives and manuscripts the British Library holds. There are over two million records in our IAMS catalogue system based on International Council on Archives (ICA) standards, with some very rich descriptive metadata records.

Possible ideas: Providing new visual interfaces to this data through the use of visualisation tools

Please get in touch if you have specific questions about this dataset and to request a sample.

Access Conditions

cc-0 logo.png
This material is under a Creative Commons: Public Domain licence.

British Library Book Ordering Data

Every day thousands of items are ordered up from the library stacks and delivered to our researchers in our reading rooms. We can provide daily anonymised reports of these titles including shelfmark information and reading room location request library ordering data

Possible ideas: Combining this dataset with the BNB to get an idea of the breadth of items that are requested by our national library.

Please get in touch about specific questions about this dataset and to request a sample.

Access Conditions

Please email labs@bl.uk

Anonymised Reader Records

We are currently investigating ways we could extract anonymised British Library reader records from our reader systems for research purposes. If you are interested in the data for an idea to engage with Labs, contact us with further information about the kind of data you are interested in.

Possible ideas: Benchmarking usage.

Please get in touch about specific questions about this dataset and to request a sample.

Access Conditions

Please email labs@bl.uk

UK Web Archive

A massive archive of the UK's Web history housed at the British Library, and a number of tools that have been developed to help facilitate research. A full list of our datasets is published here, and includes:

This video covers some of the available data:


Also a presentation given by Peter Webster at the Labs launch event:


Possible ideas: Using the archive to carry our historical research requests and comparing this with other sources of data e.g. books, newspapers

A few current projects using the web archive are listed below:

Analytical Access to the Dark Domain (ADDA)

Led by the Institute of Historical Research (and several partners including the British Library), this project is working with researchers in contemporary history in particular, and digital humanities in general, to obtain feedback on the feasibility of using the UK Web Archives at an analytical level enabling researchers to carry out unique and hitherto impossible research queries, particularly for collections that span over a decade and more, more information is available at the project website.

The value of big data for social science

The project, led by the Oxford Internet Institute, examines the potential of web archives for link analysis research, by processing and analysing domain-scale web collections. This project is aiming to increase the increase visibility, accessibility, and ease-of-use of the JISC UK Web Domain Dataset, a 30 terabyte web archive of the .uk country-code top level domain collected from 1996 to 2010. The project will extract link graphs from the data, assess the feasibility and impact of using the .uk ccTLD as a boundary for UK web presence, and conduct and disseminate high-quality social science research examples using the collection. For more information, see the project website.

If you are interested in using the web archive for Labs, please contact us to start a discussion about how this might be possible, labs@bl.uk

Access Conditions

Please email labs@bl.uk

Piers Plowman Electronic Archive

Medieval manuscripts of the medieval and Renaissance witnesses to Piers Plowman, a middle English allegorical narrative poem by William Langland. Piers is considered by many critics to be one of the greatest works of English literature of the Middle Ages, along with Chaucer's Canterbury Tales and the Pearl poet's Sir Gawain and the Green Knight.

The collection is available to browse online.

Access Conditions

cc-0 logo.png
This material belongs to the Public Domain.

Asian & African Studies Catalogue

The subject focus of this collection is published, manuscript and visual material from and about Asia and Africa, including unrivalled genealogical sources for Europeans in south Asia.

The catalogue is available on the British Library's Reading Rooms.

Access Conditions

cc-0 logo.png
This material is under a Creative Commons 0 License.

IMPACT

Approximately 48,515 images (ca 220 titles)

The images are available here:
www.digitisation.eu

Access Conditions

Cc_logo_circle_svg.pngcc-0 logo.png
This material is under a PD license for out of copyright and a CC0 license for the ground truth and named entity files.