MML (minimal markup language) editor (Category: Research)

Name of Submitter(s): Desmond Schmidt
Organisation: University of Queensland

Transcription is by far the most expensive part of producing online editions of historical documents. This is because the markup languages used for transcription are too complex. And they are complex because historical documents are extremely variable in form and content, and many tags are needed to describe all possible textual features. As a result, transcriptions of historical documents cannot be easily created by ordinary members of the public. Nor can they be shared or reused, since each transcriber understands the coding system differently (if at all), uses different tags for the same textual phenomena, and makes personal choices on which features to record. One way to overcome this variability is to reduce the number of tags to the absolute minimum needed for a particular task, and to make those tags as short as possible. Transcription can then be performed more easily because the text is not obscured by verbose tagging, and the equivalence between the page image and the transcribed text is more easily seen.
The MML (minimal markup language) editor uses customisations of the simple Markdown language, which is already familiar to most Web users who have used wikis. The editor consists of three panels: first a scrolling list of page images to be transcribed. Second, an MML representation of the text. Third, a live web-page preview of the text. Each of the three views is kept in sync while scrolling so the user does not lose his/her way. Instead of confusing the user with cryptic error messages, the preview simply looks wrong when a mistake is made. This painlessly reinforces good coding practice. The customised MML markup language created for each transcription task is not saved. Instead, the transcription is saved either as a web-page, with added semantic properties, or as separate plain text and markup. Both formats are highly interoperable, archivable and reusable. The MML editor is already mostly a working tool. All it requires is testing on real users and texts, of which there are many examples in the BL Labs archives. The unfinished annotation tool also allows editors to tag a text with comments. Completion of this tool will greatly facilitate transcription tasks, so that websites can be built more quickly at far lower cost.
URL for Entry: http://charles-harpur.org/main/mml_edit?docid=english/harpur/h642

Email: desmond.allan.schmidt@gmail.com

Twitter:

Job Title: Adjunct Fellow

Background of Submitter:

There is a list of publications about the tools, including the TILT tool at http://ecdosis.net/main/node/16
There is also a partly complete description of the tools (but not yet descriptions of the interface elements) at http://ecdosis.net/main/node/1
There is a brief biography of my at multiversiondocs.blogspot.com

Problem / Challenge Space:

To make practical large scale transcription of historical documents that record rich features without requiring a high level of technical expertise on the part of the user. To reduce the cost of transcribing so that important historical documents can be quickly put onto the Web.

Approach / Methodology:

The tool is written in the lingua franca of Web interaction: pure javascript. As such it can be incorporated into any Web page and only requires the existence of a complementary web-service.

Extent of showcasing BL Digital Content:

Any of the manuscript or early printed book collections, such as the David Livingstone collection, Wrongdoing in Spain, John Gower Manuscripts, etc. Basically anything you need to transcribe accurately and for which OCR does not suffice, or for which you need to record rich markup, not just the content.

Impact of Project:

The work is part of current research grant applications, to the Volkswagen Stiftung in Germany, for a private grant at Loyola University Chicago, and features on the Ecdosis website ecdosis.net. It has been presented recently at a workshop in Verona http://filologiadigitale-verona.it . The audience reaction was positive: many seemed excited by the prospect of simplifying the transcription process.

Issues / Challenges faced during project(s):

The biggest problem is how to turn a practical idea into a usable tool. For that we need users to transcribe texts using the tool. We have some students willing to do this in Italy (at Roma3, Uni Sassari) and at Loyola in the US.