The Order of Things (Category:Research, Creative / Artistic)

Name of Submitter(s): Mario Klingemann
Organisation: Quasimondo

The advent over one million images from the British Library added by the Mechanical Curator opened up a treasure trove of imagery for creative use. But it came with one problem: initially the only way to find interesting images was to browse the collection either randomly or linearly since the only useful metadata available were the title of the book, the publishing year and the author. This motivated me to use semi-automated image classification and machine learning techniques to add meaningful tags to the images and start the creation of thematic collections. Over the course of a year I have tagged tens of thousands of images in the BL flickr collection and also added lots of collections to the "Lost Visions" project by the Cardiff University that is also using the BL material. Furthermore I have created various artworks with the found images - many of which are based on the principle of order.
URL for Entry:

Twitter: @quasimondo
Job Title: Artist and Researcher

Background of Submitter:

Whilst I love finding order and patterns everywhere I am really bad at finding it in my background. I've been using computers and code to generate or analyse images since I've got my first computer back in 1984. I have no formal education in this field but I've worn many hats during my career: copywriter in advertising, graphic designer, motion designer, web designer, maker, reseacher and speaker. Over the course of the last 10 years I've realized that I am probably best described as an artist, working in the fields of generative art, evolutionary art, computational art and more in the field of artificial intelligence, machine learning and "augmented creativity". My ultimate goal is to develop autonomous creative machines which do not require human control to generate interesting or artistic output.

Problem / Challenge Space:

To what extent can image classification and machine learning techniques help in adding useful metadata to a huge unsorted image collection?
What kind of material is actually contained in the Mechanical Curators selection?

Approach / Methodology:

I am using a hybrid approach that mixes automatic classification with manual confirmation. At the base is a 128-dimensional feature vector that is calculated for each of the images in the collection. The classifier calculates histogram statistics as well as Haralick texture features over various representations of an images trying to turn various aspects of an image into numbers: colors, contrast, edges, structure and information content.
Using the feature vector it calculates distances and similarities between images and allows to cluster them by various aspects or find the nearest neighbors to a particular image. With the help of visual tools I've written particular for this purpose I can then create thematic collections or find images that fit into a certain collections or have a similar style very quickly and in a playful way.

Extent of showcasing BL Digital Content:

The visual results of my resarch can be found in my flickr collection:
The main contribution - the enhanced metadata - cannot really be shown since it has become integral part of the collection. The best way to experience is to do a tag search, e,g.,bldigital,bldigital,bldigital,bldigital,bldigital

Impact of Project:

Over the course of the reasearch I have posted my progess on twitter, facebook, flickr and tumblr which received very postive feedback and probably got many people interested in exploring the British Library collections by themselves.
I've also talked about my research and art practice at various conferences:
FITC, codemotion, Reasons to be Creative, Eyeo
In 2014 I've been invited by the MoMA New York to give a talk at "Archives as Instigator" about my work with the BL archives followed by a one day workshop:

Issues / Challenges faced during project(s):

The biggest challenge is the sheer amount of data - one million doesn't sound much these days but if you do not have access to academic computer resources or a sponsor of computation time even seconds of calculation time add up to weeks or month.
The second challenge is that my classification approach works well for some cases but does not for others, e.g. it can distinguish a portrait from a map, but only in rare cases can say if it's a male or female on a picture. So in the next phase I am planning to employ deep learning techniques to the same material to extract some more detailed metadata for certain classes of images.