ÒCountless LinksÓ: Qualitative Query Potential in Orlando
Susan Brown (University of Alberta and University of Guelph), Patricia Clements, Isobel Grundy, Stan Ruecker, Jeffery Antoniuk, and Sharon Balazs (all University of Alberta)
Abstract: This paper argues for the value of qualitative queries of a high quality text archive in presenting preliminary explorations of several potential applications of an extensively encoded textbase of scholarly materials. We will explore three potential developments of the linking capacities of the Orlando interface, which currently permits access to materials via index and direct queries on the tags and content. We will consider to what degree spatio-temporal information in the encoding supports the production of speculative (rather than existing) linkages between individuals and entities. We will consider leveraging markup to analyze interconnections, and whether identifying points of densest interconnection might reveal unforeseen networks and associations. Lastly, we will explore the possibilities of ludic queries into degrees of separation between people, inflected by the kinds of relationships encoded in the textbase, e.g.. by literary connection, place, by organization.
Bios: Susan Brown (University of Alberta and University of Guelph), Patricia Clements, and Isobel Grundy (both (University of Alberta) are co-founders of the Orlando Project and co-editors of Orlando: WomenÕs Writing in the British Isles from the Beginnings to the Present (Cambridge, 2006), as well as authors of articles and books on digital humanities and literary topics. Stan Ruecker (University of Alberta) is an Assistant Professor of Humanities Computing with degrees in English, computer science, and visual communication design whose research focuses on the electronic book and rich prospect browsing. Jeffery Antoniuk, the Orlando Project systems analyst, holds an MSc in Computing Science; his interests include information retrieval and data-mining. Sharon (Balazs) Farnel, the projectÕs Textbase Manager, has a background in Russian and Library and Information Science; her interests include the information-seeking behaviours of humanists and the role of technology in managing information.
Full proposal:
Orlando: WomenÕs Writing in the British Isles from the Beginnings to the Present (Brown, Clements and Grundy) poses unique possibilities for exploring the query potential of well-encoded scholarly text. Because it is not an archive, and because it is critical, historical prose encoded with a custom semantic XML tagset designed to structure a literary history, this born-digital textbase poses different challenges and offers different opportunities than archive-oriented query experimentation.
Unlike an archive, which might aim for comprehensive coverage of a particular domain, a literary history is necessarily selective, even though Orlando is the most extensive resource of its kind in its field. Answers to quantitative queries are therefore of limited potential. The current interface is posited on the idea that quantitative results will only occasionally be of value in themselves, but will generally be useful in providing support for and progress towards qualitative results. In response to a query, you can generate a list of relevant entries, a set of chronological entries, a set of search results excerpted from entries, or a set of bibliographical entries. The system counts the results in all but the first case, but that is the sum total of the quantitative results we offer. Our logic here is plural. We wanted to explore the possibilities offered by digital humanities methods for the production of qualitative, readable literary history in new ways. We did not aim to map comprehensively the entire terrain of writing, or even womenÕs writing, as a field along the lines advocated by Franco Moretti: to attempt to capture exhaustively the complex interrelationships between lives, texts, and the world that we sought to represent would have been madness. As a result, we were well aware that the value of what we were doing lay in the sifting, the selecting, the relative weighting, the value judgments, the contextualizing, and the relating work of our history—which is about unusual, exceptional, anomalous cases, as well as the commonalities that cut across them—rather than in statistical results or even claims to broad representativeness. The results we offer are therefore qualitative, and they insistently bring users back to the text. That doesnÕt mean that there arenÕt interesting quantitative possibilities in Orlando. However, these are not our focus here.
This paper will briefly outline some important aspects of OrlandoÕs quantitative queries and the way they relate to the markup: our encoding, for instance, of race, ethnicity, nationality deliberately made it impossible to produce a list of Jewish or lesbian or working-class or filthy rich or even Roman Catholic writers (as opposed to, say, a list of governesses, which the system will readily provide). We want users to do the work to construct their own lists of these categories because we need them to decide for themselves how they will deal with the relation between Jewish ethnicity and belief, or how rich is rich, or how to reckon crypto-Catholics, ex-Catholics, etc. Even if such searches end by producing quantitative results (Catholics so many, ex-Catholics so many), they will have tangled with complex questions about definitions of identity categories in a way that they would not have done if simply presented with a table, chart, or graph that provides a Òdistant readingÓ (Moretti 1).
But this doesnÕt mean that Orlando markup supports only the straightforward XML queries (on tags, combinations of tags and content, or of tags within tags) offered by our current interface. This paper will lay out several possibilities that this collaborative team—of literary specialists, a graphic designer, a library scientist, and a systems analyst—is keen to pursue to leverage OrlandoÕs markup in new ways. All fall under the general rubric of Òlinks,Ó a term that highlights one of the more innovative features of the textbase: the organization of hyperlinks of core tags according to the semantic markup.
WeÕve been interested for some time into exploring how Orlando might support queries into relationships and networking, and present not just search results on a single relationship (e.g. a link to a particular journal or genre or place) but an entire field of interwoven relationships of the kind that strongly interests literary historians, e.g. of publishers, journals, annuals, the genre of poetry, particular authors and texts, etc. in the 1820s and 1830s. Rather than beginning from a set of networks or interconnections of which one is already aware, how might we exploit our markup to analyze interconnections, so as to reveal the points of densest interconnection? WeÕve started to think that an interface for these sorts of queries might usefully leverage some of the representational strategies that are developing out of Web 2.0 social networking environments.
An inquiry into linkages can also be extended from the actual to the speculative via the encoding, which can posit possible conjunctions between individuals mentioned in the textbase. On the basis of the spatio-temporal information and the encoding, one can tell if they were in same place at (or around) the same time based on a certain level of granularity. This could provide the basis for starting with a particular person, moving through everyone that they explicitly met according to the textbase, to those who they may have come into contact with as a result of spatial and temporal proximity. Alternatively, one could begin from the people associated with a place at a particular time. Places are not always recorded at the accuracy of the street or address level, nor are dates often precise to the minute, but even conjunctions of people in the same town in the same year pose present interesting avenues for literary inquiry. The evidence for some "meetings" will be stronger and others weaker but confidence can be measured by the granularity of the spatio-temporal information and encoding. The paper will discuss initial results from our investigations into this kind of query.
On a lighter note, weÕre also interested in producing a degrees-of-separation game: how many steps does it take to get from one writer to another? The system would calculate the fewest links required to get from, say, Jane Austen to Angela Carter. It could then challenge the user by offering a range of links that could all lead from person one to the other, but via different routes of association. A user could then choose one of the offered links, see what the link was (i.e. view the prose from which the association was extracted) and move on to the next set of possible links, etc. Users would try to get to the target person in as few links as possible. For instance, you can get from Queen Elizabeth I to the revolutionary Maud Gonne in 3: Maria Edgeworth has her Lady Delacour in Belinda deliver a verdict on Elizabeth; Kate O'Brien discusses both Edgeworth and Gonne in My Ireland. A further refinement would be to break the links down into (or restrict a particular session to) different kinds of associations, e.g. by literary connection, place, by organization, or whatever. While apparently trivial in itself, this kind of game might serve interesting pedagogical purposes by helping students to understand the different mechanisms by which writers are linked, as well as informing them about the broader historical picture.
Together, these several potential applications of OrlandoÕs markup reveal the power of extensive semantic encoding to offer diverse modes of inquiry into a scholarly textbase, yielding Òcountless linksÓ (Emily Bront‘) among various aspects of literary history.
References
Brown, Susan, Patricia Clements, and Isobel Grundy, ed. Orlando: WomenÕs Writing in the British Isles from the Beginnings to the Present. Cambridge: Cambridge Online, 2006.
Moretti, Franco. Graphs, Maps, Trees: Abstract Models for a Literary Theory. London: Verso, 2005.