| Competition | Previous Entries & Ideas | Digital Collections | TOCs | FAQs | Judging | Resources and Tools | Submit Entry | Events

James Alexander

Submitted Entry for 2013 Competition

Visualisation Of Web archives (VOWs?)

Abstract

This project will create pathways into website collections to explore and intuitively understand the content of archived websites and usage of these by visualisations based upon analysis of content and usage. Popular sites from the archive might be highlighted to facilitate exploration of these.

By disaggregating web archive files, automatically generating images of websites and visually examining usage statistics of visits to archived websites it may be possible to provide useful automatically generated pathways into collections of archived websites. These harvested collections present unique challenges to hosting organisations in that they may be self describing but not within the (archival) context in which they are presented.

Further exploration by examplars is required to strengthen the toolset available to archivists and other working in this domain to make content more accessible. In particular, visual tools to explore web archive sites by usage are currently rudimentary but might prove of value and interest to many.

Assessment Criteria

The research question / problem you are trying to answer

Please focus on the clarity and quality of the research question / problem posed:

What automatically generated visualisations can form useful and intuitive pathways into collections of web archives?

Please explain the ways your idea will showcase British Library digital collections

Please ensure you include details of British Library digital collections you are showcasing (you may use several collections if you wish), a sample can be found at http://labs.bl.uk/Digital+Collections

The project will utilise a sample of the web archive collection and provide visual pathways into these and provide intuitive ways to indicate and examine the composition and usage of these resources.

Please detail the approach(es) / method(s) you are going to use to implement your idea, detailing clearly the research methods / techniques / processes involved

Indicate and describe any research methods / processes / techniques and approaches you are going to use, e.g. text mining, visualisations, statistical analysis etc.

The end product will be a series of visualisations derived by a combination of statistical methods to parse data and the automated creation of images by programmatic means.

Please provide evidence of how you / your team have the skills, knowledge and expertise to successfully carry out the project by working with the Labs team

E.g. work you may have done, publications, a list with dates and links (if you have them)
I have previously worked on a number of visualisation projects, including generation of maps and overlaid data for IDP project at BL.

Previous to this I developed a number of automated workflow and layout solutions for print and newspaper companies.

I have also worked with parsing data and automated generation of images, including development of a scene-change detection algorithm for a video management system at the Open University.

I have set up a web archiving workflow including automated parsing of .warc files utilising lucene-based search at OU.

Please provide evidence of how you think your idea is achievable on a technical, curatorial and legal basis

Indicate the technical, curatorial and legal aspects of the idea (you may want to check with Labs team before submitting your idea first).

There may well be problems with provision of anonymised log data for parsing. This may be mitigated by there being no particular interest in the origin or identity of requests.

There may be other legal copyright problems associated with automated generation of website images as thumbnails.

There are no obvious technical constraints beyond limitations on time.

Given the volume of available data, selection criteria will have to be applied and these may well constrain the time available.

Please provide a brief plan of how you will implement your project idea by working with the Labs team

You will be given the opportunity to work on your winning project idea between July 6th - October 31st 2013
Allocation of time to work on this project would only be made in the event of successful bid, so that a firm timeframe for phases of the project is problematic. It is anticipated that more time would be allocated in some months and less in others.

1. Project Setup

Work with Labs team and curators to select suitable web archive content within given constraints of time and resources.

Identify any obvious (existing) avenues for development by working with the relevant curators. Identify relevant party to provide agile feedback on development in process.

Contract Specification

Agree visualisations to be undertaken

2. Initial development

Parse and manipulate data to provide visualisations.

3. Iterative development phase

Present visualisations, develop automation of these and modify based on feedback and any further data requirements. To include a final iteration as -

4. Presentation of development and any additional documentation

5. Any adaptations of tool(s) for 'into service' phase, if agreed, is anticipated as extending beyond scope of the project.