| Competition | Previous Entries & Ideas | Digital Collections | TOCs | FAQs | Judging | Resources and Tools | Submit Entry | Events |

Andrew West

Expression of interest for 2013 Competition

Date: 4 May 2013
Develop an open source software tool to assist in the identification and processing of Tangut characters in Tangut manuscripts or printed texts from Kharakhoto held by the British Library and other institutions that have been digitised as part of the International Dunhuang Project. This tool would be similar in appearance to the Chopper application, allowing the user to select individual Tangut characters in a manuscript image and apply annotations to the selected characters. However, whereas Chopper only allows users to manually chop and annotate characters, the proposed tool would feature a custom OCR algorithm that attempts to identify selected Tangut characters. The tool would present the user with a list of candidate characters, and once the user confirms the correct candidate character it would automatically fill in details of the selected character (e.g. Li Fanwen dictionary index, reconstructed reading, meanings, etc.). If a selected Tangut character cannot be identified by the software (due to manuscript condition or cursive calligraphy) the tool would still offer functions to assist the user to manually identify the character (e.g. by component selection).

Research method(s): Write application in C++; develop a Tangut OCR algorithm.
British Library digital collection(s) being used: International Dunhuang Project
Other data to be used: Tangut character data extracted from modern Tangut dictionaries (copyright permitting)
Other notes / help needed: Need to select appropriate manuscripts or printed texts with clear and neatly written Tangut characters for testing the tool with.