The Bertrand Russell Research Centre (http://russell.mcmaster.ca/) at McMaster University is creating an online electronic edition of the Collected Letters of Bertrand Russell (http://russell.mcmaster.ca/brletters.htm.) Over 40, 000 letters written by Russell will be imaged, transcribed, and marked up with XML, following the guidelines of the TEI (http://www.tei-c.org/.) During the poster session we will demonstrate our system for processing letters, which combines workflow, XML markup, and publishing in a single application.
The Problem
Marking up XML is notoriously error prone. Hazards include inappropriate and/or inconsistent tag selection, simple syntax errors, incorrectly saved and/or named files, lost files, missed workflow steps, and incorrect or incomplete markup of references to authority files for people, places and events. For a large project like ours, the dangers are compounded. We are therefore developing a single Java client-server application to manage the entire digital lifecycle of a letter.
Workflow
A machine-run workflow engine will store and route letters appropriately as they move from stage to stage: through image linking, transcription, metadata tagging, reference tagging, annotation, proofreading, and online publication. The application includes a complete editing environment for each stage that is integrated seamlessly with the workflow system. As a user finishes transcribing a letter, for example, they click a single button 'DONE' which saves the letter, and puts it in the queue for the next stage, proofreading. When a proofreader next logs in, they are automatically shown an editing screen containing the next letter in the proofreading queue. There is no direct interaction with a file system or backend storage system. All steps are recorded, and versioning provided by the Fedora backend.
Consistent Tagging and Consistent References
Beyond workflow, one of the biggest challenges of the project will be ensuring that TEI tags are applied correctly and consistently. Our approach is to hide the XML tags. Forms are used to enter metadata for the TEI header, for example for author, recipient, and date. Similarly, annotations that identify people, places, events, or discuss issues are added to the letters using a popup form based system. All form entries are converted to TEI transparently when the letter is stored. Finally, to ensure accurate and consistent references to people, places, events, bibliographic items, the sytem uses a semi-automated form based lookup system. Database ids for references are automatically inserted into the TEI tags (again only when the letter is saved) based on the user guided lookup.
Open Source
The project will adopt open standards and will be built entirely from open source components including: Fedora for repository management, Apache ODE for workflow enactment, BPEL for workflow definition, Lucene for full text search, SOAP for client server communication, MySQL as the relational database, Hibernate for object-relational mapping, Acegi for user security, Spring as the web application framework, Eclipse as the client platform, and TEI as the text encoding standard.