From graveyard to graph
The technological developments in the field of textual scholarship lead to a renewed focus on textual variation. Variants are liberated from their peripheral place in appendices or footnotes and are given a more prominent position in the (digital) edition of a work. But what constitutes an informative and meaningful visualisation of textual variation? The present article takes visualisation of the result of collation software as point of departure, examining several visualisations of collation output that contains a wealth of information about textual variance. The newly developed collation software HyperCollate is used as a touchstone to study the issue of representing textual information to advance literary research. The article concludes with a set of recommendations in order to evaluate different visualisations of collation output.
KeywordsCollation software Textual scholarship Visualisation Markup Hypergraph for variation Tool evaluation
Scholarly editors are fond of the truism that the detailed comparison (‘collation’) of literary texts is a tiresome, error prone, and demanding activity for humans and a task suitable for computers. Accordingly, the past decades have born witness to the development of a number of software programs which are able to collate large numbers of text within seconds, thus advancing significantly the possibilities for textual research. These developments have led to a renewed focus on textual variation, liberating variants from their peripheral place in appendices or footnotes and giving them a more prominent position in the edition of a work. Still, automated collation continues to engross researchers and developers, as it touches upon universal topics including (but not limited to) the computational modelling of humanities objects, scholarly editing theory, and data visualisation. The present article takes visualisation of collation result as its point of departure. We use the representation of the results of a newly developed collation tool, ‘HyperCollate’, as a use case to address the more general issue of using data visualisations as a means of advancing textual and literary research. The underlying data structure of HyperCollate is a hypergraph (hence the name), which means that it can store and process more information than string-based collation programs. Accordingly, HyperCollate’s output contains a wealth of detailed information about the variation between texts, both on a linguistic/semantic level and a structural level. It is a veritable challenge to visualise the entire collation hypergraph in any meaningful way, but the question is, really, do we want to? In particular, therefore, we investigate which representation(s) of automated collation results best clear the way for advanced research into textual variance.
The article is structured as follows. After a brief introduction of automated collation immediately below, we define a list of textual properties relevant for any study into the nature of text. We then consider the strengths and weaknesses of the prevailing representations of collation output, which allows us to define a number of requirements for a collation visualisation. Subsequently, the article explores the question of visual literacy in relation to using a collation tool. Since visualisations function simultaneously as instruments of study and as means of communication, it is vital they are understood and used correctly. In line with the idea of visual literacy, we conclude with a number of recommendations to evaluate the visualisations of collation output. The implications of creating and using visualisations to study textual variance are discussed in the final parts of the article. Before we go on, it is important to note that we define 'textual variance' in the broadest sense: it comprises any differences between two or more text versions, but also the revisions and other interventions within one version. Indeed, we do not make the traditional distinction between 'accidentals' and 'substantives'. This critical distinction is the editor's to make, for instance by interpreting the output of a collation software program.
2 Automated collation
Collation at its most basic level can be defined as the comparison of two or more texts to find (dis)similarities between or among them. Texts are collated for different reasons, but in general, collation is used to track the (historical) transmission of a text, to establish a critical text, or to examine an author’s creative writing process. Traditionally, collation has been considered an auxiliary task: it was an elementary part of preparing the textual material in order to arrive at a critically established text and not necessarily a part of the hermeneutics of textual criticism. The reader was presented with the end-result of this endeavour (a critical text), and the variant readings were stored in appendices or footnotes, the kind of repositories that would get so few visitors that they have been bleakly referred to as cemeteries (Vanhoutte 1999; De Bruijn 2002, 114). In the environment of a digital edition, however, users can manipulate transcriptions which are prepared and annotated by editors. Many digital editions have a functionality to compare text versions and, accordingly, collation has become a scholarly primitive, like searching and annotating text. The digital representation of the result of the comparison thus brings textual variants to the forefront instead of (respectfully) entombing them.
3 Properties of text
Although these versions of Krapp’s Last Tape are compared on the level of plain text only, the alignment table in Fig. 1 also shows the in-text variation of witnesses 07 and 10, thus neatly illustrating the informational role of visualisations. The main objective for the development of the collation engine HyperCollate was to include textual properties like in-text variation in the alignment in order to perform a more inclusive collation and to facilitate a deeper exploration of textual variation. A look at the drafts of Virginia Woolf’s Time Passes3 offers a good illustration of some textual features we'd like to include in the automated collation. For reasons of clarity, we limit the collation input to two small fragments: the initial holograph draft ‘IHD-155’ (witness 1) and the typescript ‘TS-4’ (witness 2). Both fragments are manually transcribed in TEI/XML. The transcriptions below are simplified for reasons of legibility.
A quick look at these fragments reveals that they contain linguistic variation between tokens with the same meaning as well as structural variation indicated by the markup. Here, the ampersand mark ‘&’ in witness 1 and the word token ‘and’ in witness 2 constitute linguistic variation: two different tokens with the same meaning. Furthermore witness 1 presents a case of in-text or intradocumentary variation: variation within a witness’ text (see also Schäuble and Gabler 2016; Bleeker 2017, 63). If we look at the revision site that is highlighted in the XML transcription of witness 1, we see several orders in which we can read the text: including or excluding the added text; including or excluding the deleted text. In other words, there are multiple ‘paths’ through the text,: the textualstream diverges at the point where revision occurs, indicated by the <del> element and the <add> element. When the text is parsed, the textual content of these different paths should be considered as being on the same level: they represent multiple, co-existing readings of the text. Intradocumentary variation can become highly complex, for instance in the case of a deletion inside a deletion inside a deletion, etc. The structural variation in this example becomes manifest if we compare the two witnesses: the excerpt in witness 1 is contained by one <s> element, while the phrase in witness 2 is contained by two <s> elements. However structural variation does not only occur across documents: when an author indicates the start of a new chapter or paragraph by inserting a metamark of some sorts, this is arguably a form of structural intradocumentary variation.
To summarise, we can distinguish different forms of textual variance. Variation can occur on the level of the text characters (linguistic or semantic variation) and on the structure of the text (sentences, paragraphs, etc.). Furthermore, we distinguish between intradocumentary variation (within one witness) and interdocumentary variation (across witnesses). Arguably all forms are relevant for textual scholarship, but taking them into account when processing and comparing texts has both technical and conceptual consequences. These consequences have been discussed extensively elsewhere (Bleeker et al. 2018) and will be briefly repeated in section 5 below. The main goal of the present article is to focus on the question of visualisation. Assuming we have a software program that compares texts in great detail, including structural information and in-witness revisions, how can we best visualise its ouput? first and foremost, The additional information (structural and linguistic, intradocumentary and interdocumentary) needs to be visualised in an understandable way. The visualisations can be useful for a wide range of research objectives, such as (1) finding a change in markup indicating structural revision like sentence division, (2) presenting the different paths through one witness and the possible matches between tokens from any path, (3) complex revisions, like a deletion within a deletion within an addition, (4) studying patterns of revision, and so on. This begs the question: is it even possible or desirable to decide on one visualisation? Is there one ultimate visualisation that reflects the dynamic, temporal nature of the textual object(s) by demonstrating both structural and linguistic variation on an intradocumentary and interdocumentary level? the existing field of Information Visualisation can certainly offer inspiration, but simply adopting its methods and techniques will not suffice, since it deals primarily with objects which are ‘self-identical, self-evident, ahistorical, and autonomous’ (Drucker 2012), adjectives which could hardly be applied to literary texts.
4 Existing Visualisations of collation results
4.1 Alignment table
4.2 Synoptic viewers
A synoptic edition contains a visual representation of the collation results from the perspective of one witness, where the variants are indicated by means of a system of signs or diacritical marks. In contrast to an alignment table, a synoptic overview is more suitable as an overview examination of the patterns of variation. The following paragraphs discuss two ways of presenting textual variation synoptically: parallel segmentation and an inline apparatus. It may be clear that both are skeuomorphic in character, in the sense that they mimic the analogue examination and presentation of textual variants. This characteristic should not necessarily be considered negative, however, precisely because it is a tried and tested instrument for textual research.
4.2.1 Parallel segmentation
To be clear: this parallel segmentation visualisation concerns the presentation of variance; it is not a collation method in and of itself. The segments are encoded by the editor, for instance using the TEI <app>/<lemm>/<rdg> construction to link matching segments. In contrast to the inline apparatus presentation (see 2b below), which uses a base text, parallel segmentation presents the witnesses are presented as variations on one another. Most tools allow for an interactive visualisation in the sense that clicking on a segment in one witness highlights the corresponding segments in the other witness(es). As represented in Fig. 4, the parallel segmentation may also visualise intradocumentary variation by rendering deletions and additions (embedded in the corresponding <rdg> by means of <del> and <add> elements).
4.2.2 Critical or inline apparatus
4.3 Variant graph
The variant graphs of CollateX in the figures directly above are non-interactive by design (since they are visual renderings of a collation output). However, the usefulness of interactive visualisations has been positively noted in several contributions (e.g., Andrews and Van Zundert) and projects. TRAViz, for instance, lets users interact with the graph and adjust it to match their needs and interests, and the variant graphs generated by the Stemmaweb tool set7 allow for their nodes to be connected, input to be adjusted, and edges to be annotated with additional information about the type of variance. Such features emphasise the visualisation’s double function as a means of communication and a scholarly instrument: on the one hand, it allows the user to clarify and communicate her argument about textual variation. On the other, the possibility of adjusting the visualisation and thus the representation of variation foregrounds the idea that the output of a tool is open to interpretation.
4.4 Phylogenetic trees or stemmata
The kind of macrolevel visualisations provided by stemmata or genetic graphs present the necessary overview and invite more rigorous exploration. Diagrams, graphs, or coloured squares add new perspectives to the various ways in which we look at text.
HyperCollate, a newly developed collation tool at the R&D department of the Humanities Cluster of the Dutch Royal Academy of Science, examines textual variation in an inclusive way using a hypergraph model for textual variation. HyperCollate is an implementation of TAG, the data model also developed at the R&D department (Haentjens Dekker and Birnbaum 2017). A discussion of the collation tool’s technical specifications is not within the scope of the present article (see Bleeker et al. 2018); for now, it suffices to know that a hypergraph differs from traditional graphs, the edges of which can connect only two nodes with each other, because the edges in a hypergraph can connect more than two nodes with one another. These ‘hyperedges’ connect an arbitrary set of nodes, and the nodes in turn can have multiple hyperedges. Conceptually, then, the hyperedges in the TAG model can be considered as multiple layers of markup/information on a text. The hypergraph for variation used by HyperCollate is an evolved model based on the variant graph. By treating texts as a network, HyperCollate is able to process intradocumentary variation and store multiple hierarchies in an idiomatic manner. In other words, because HyperCollate doesn't require TEI/XML transcriptions to be transformed into plain text files, TEI tags indicating revision like <del> and <add> can be used to improve the collation result. HyperCollate accordingly uses valuable intelligence of the editor expressed by markup to improve the alignment of witnesses.
Since the internal data model of HyperCollate is a hypergraph, the input text can be an XML file and doesn’t need to be transformed into plain text. The comparison of two data-centric XML files is relatively simple, and it is even a built-in of the oXygen XML editor, but as explained above, a typical TEI-XML transcription of a literary text with intradocumentary variation constitutes partially ordered information. In order to process this kind of information, HyperCollate first transforms the TEI-XML witnesses into separate hypergraphs and then collates the hypergraphs. Graph-to-graph collation ensures that the input text can be processed taking into account both the textual content and the structure of the text. For each witness, HyperCollate looks at the witness’ text, the different paths through the witness’ text, and the structure of the witness, and subsequently compares the witnesses on all these levels. Accordingly, the output of HyperCollate contains a plethora of information. Similar to CollateX,9 a widely used text collation tool, the output of HyperCollate could be visualised in different ways (e.g., an alignment table or a variant graph). By default, HyperCollate’s output is visualised as a variant graph, primarily because a variant graph does not have a single order so it is relatively straightforward to represent the different orders of the tokens as individual paths. The question is, how (and where) to include the additional information in the visualisations? A variant graph may be more flexible regarding the token order, but the nodes and edges can only contain so much extra information, as Fig. 12 below shows.
A favourable consequence of HyperCollate is that, in case of intradocumentary variation, each path through a witness is considered equally important. This feature is in stark contrast with current approaches to intradocumentary variation, which usually entail a manual selection of one revision stage per witness (see Bleeker 2017, 110–113). By means of illustration, let us take a look at another collation of two small fragments from Woolf’s Time Passes containing intradocumentary structural variation. The fragments are manually transcribed in TEI/XML and simplified for reasons of clarity. The XML files form the input of HyperCollate.
Witness 1 contains an interesting addition (highlighted): Woolf added a metamark and the number ‘2’ in the margin. The transcriber interpreted the added number as an indication that the running text should be split up and a new chapter should be started, so she tagged the number with the <head> element.10 This means that the tokens of this witness can be ordered in two ways: excluding the addition and including the addition. Furthermore, the <head> element in witness 1 is at the same relative position as the <head> element in witness 2, so that the two headers are a match (even though their content is not).
The visualisations of the collation hypergraph in Figs. 11 and 12 represent the collation output of two small and simplified witnesses. It may be clear that collating two larger TEI/XML transcriptions of literary text, each containing several stages of revisions and multiple layers of markup, results in a collation hypergraph that, in its entirety, cannot be visualised in any meaningful way. At the same time, the various types of information contained by the collation hypergraph are of instrumental value to a deeper study of the textual objects. For that reason, HyperCollate offers not one specific type but rather lets the user select from a wide variety of visualisations, ranging from alignment tables to variant graphs. In selecting the output visualisation, the user decides which information she prefers to see and which information can be ignored. She may consider an alignment table if she’s primarily interested in the relationships between witnesses on a microlevel, or a variant graph if an insightful overview of the various token orders is more relevant to her research. Furthermore, she may decide what markup layers she want to see: arguably knowing that every token is part of the root element ‘text’ is of less concern than detecting changes in the structure of sentences. Making such decisions does require the user to have a basic knowledge of the underlying dataset and a clear idea of what she’s looking for.
6 Requirements for visualising textual variance
This overview allows us to draw a number of conclusions regarding the visualisation of textual variation and to what extent each visualisation considers the various dimensions of the textual object. We have seen that intradocumentary variation is as of yet not represented by default; the editor is required to make certain adjustments to the visualisation. Alignment tables and parallel segmentation can be extended to some extent, for instance by using colours and visualising deletions and additions. Regular variant graphs may include intradocumentary variation if the different paths through the texts are collated as separate witnesses12; only HyperCollate’s variant graph output includes both intra- and interdocumentary variation. Structural variation, is currently only taken into account by HyperCollate and consequently only visualised in HyperCollate’s variant graph. While the added value of studying this type of variation may be clear, it remains a challenge to visualise both linguistic/semantic and structural variation in an informative and clear manner. Fig. 11 may clearly convey the structural difference between witness 1 and witness 2 (i.e., the <head> element), but the raw collation output contains much more information which, if included, would probably overburden the user. A promising feature of visualisations intended to further explorations of textual variation is interactivity. One can imagine, for instance, the added value of discovering promising sites of revision through a graph representation, zooming in, and annotating the relationships between the witness nodes.
Acknowledging the various strengths and shortcomings of existing visualisations, we propose that there is not one, all-encompassing visualisation that pays head to all properties of text. Instead, each visualisation highlights a different aspect of textual variance or provides another perspective on text. Each perspective puts another textual characteristic before the footlights, while (ideally) making users aware of the fact that there is much more happing behind the familiar scenes. As Tanya Clement argues, focusing on one aspect can be instrumental in our understanding of text, helping the user ‘get a better look at a small part of the text to learn something about the workings of the whole’ (Clement 2013, §3). Indeed it seems that multiple and interactive representations (cf. Andrews and Van Zundert 2013; Jänicke et al. 2014; Sinclair et al. 2013) are a promising direction.
7 Visual literacy and code criticism
The process of visualising data is a scholarly activity in line with the process of modelling, hence the resulting visualisation influences the ways in which a text can be studied Collation output can be visualised in different ways, which raises essential questions regarding the assessment and evaluation of visualisations. The function of a digital visualisation is two-fold: on the one hand, it serves as a means of communication and on the other hand it provides an instrument of research. The communicative aspect implies that visualisation is first and foremost an affair of the scholar(s) who creating visualisations. The diversity of visualisations, each of which highlights different aspects of the text, reflects the hermeneutic aspect inherent to humanist textual research. Thus, by using visualisation to foreground textual variation, editors are able to better represent the multifocal nature of text. In order to choose an appropriate representation of collation output, then, scholars need to know what argument they want to make about their data set, and how the visualisation can support that argument by presenting and omitting certain information. Accordingly, they can estimate the value of a visualisation for a specific scholarly task and expose the inevitable bias embedded in technology.
When a visualisation is used as an instrument of study and exploration, it is vital to be critical about its workings and its (implicit) bias. This includes an awareness of which elements the visualisation highlights and, just as important, which elements are ignored. As Martyn Jessop has pointed out, humanist education often overlooks training in ‘visual literacy’, which can be defined as the effective use of images to explore and communicate ideas (Jessop 2008, 282). Visual literacy, then, denotes an understanding of the fact that a visualisation represents a scholarly argument. Jessop identifies four principles that facilitate the understanding of a visualisation: aims and methods, sources, transparency requirements, and documentation (Jessop 2008 290). The documentation of a visualisation of collation output then, could describe what research objective(s) it aims to achieve, on what witnesses it is based, and how these witnesses have been transcribed, tokenized, and aligned.13 Another suitable rationale for critically evaluating the visualisation process is offered by the domains of ‘tool criticism’ or ‘code criticism’ (Traub and van Ossenbruggen 2015; Van Zundert and Dekker 2017, 125). Tool criticism assumes that the code base of scholarly tools reflects certain scholarly decisions and assumptions, and it raises critical questions in order to further awareness of the relationships between code and scholarly intentions. Questions include (but are not limited to) ‘is documentation on the precision, recall, biases and pitfalls of the tool available’, or ‘is provenance data available on the way the tool manipulates the data set?’ (Traub and van Ossenbruggen 2015).
Indeed, when it comes to evaluating the visualisation of automated collation results, one may well ask to what extent these witnesses and the ways in which they have been processed by the collation tool are subject to bias and interpretation. Like transcription (and any operation on text for that matter), collation is not a neutral process: it is subject to the influence of the editor. This becomes clear if we look at the different steps in the collation workflow as identified by the Gothenburg model (GM; 2009). The GM consists of five steps: tokenisation, normalisation, alignment, analysis, and visualisation. For each step, the editor is required to make decisions, e.g. ‘what constitutes a token’, ‘do I normalise the tokens and, if so, do I present the original and the normalised tokens’, or ‘what is my definition of a match and how do I want to align the tokens?’ As Joris Van Zundert and Ronald Haentjens Dekker emphasise, not all decisions made by collation software are easily accessible to the user, simply because they are the result of ‘incredibly complex heuristics and algorithms’ (Van Zundert and Dekker 2017, 123). To illustrate this, we can look at the decision tree used by HyperCollate to calculate the alignment of two simple sentences.
The GM pipeline is not strictly chronological or linear. Although automated collation does start with tokenization, not every user insists on normalising the tokens, and a step can be revisited if the outcome is considered unsatisfactory or not in line with the user’s expectations. Though visualisation comes last in the GM model, this article has argued that it is surely not an afterthought to collation. In fact, the visual representation of textual variance entails an additional form of information modelling: editors are compelled to give physical form to an abstract idea of textual variation which exists at that point only in the transcription and (partly) in the collation result. Using the markup to obtain a more optimal alignment, as HyperCollate does, only emphasises this point: marking up texts entails making explicit the knowledge and assumptions that would otherwise have been left implicit. Visualising the markup elements, then, implies that these assumptions and thus a particular scholarly orientation to text is foregrounded.
Interactivity. This may range from annotating the edges of a graph, adjusting the alignment by (re)moving nodes, to alternating between macro- and micro level explorations of variance.
Readability and scalability. Especially in a case of many and/or long witnesses, alignment tables and variant graphs become too intricate to read: their function becomes primarily to indicate complex revision sites.
Transparency of the textual model. The visualisation not only represents textual variance, but simultaneously makes clear what scholarly model is intrinsic to the collation. It needs to be clear which scholarly perspective serves as a model for transcription and representation.
Transparency of the code. Visualisations represent the outcome of an internal collation process which is usually not available to the general user audience. A clear, step-by-step documentation of the algorithmic process helps users understand what scholarly assumptions are present in the code, what decisions have been made, what parameters have been used, and how these assumptions, decisions, and parameters may have influenced the outcome. Decision trees may be of additional use. This applies particularly to interactive visualisations: if it’s possible to adjust parameters or filters, these adjustments need to be made explicit.
Digital visualisation is sometimes regarded as an afterthought in humanities research, or even considered with a certain degree of suspicion. Some consider it a mere technical undertaking, an irksome habit of some digital humanists who recently learned to work with a flashy tool. Yet if used correctly, these flashy tools may also function as instruments of study and research, which means they should be evaluated accordingly. Within the framework of visualising collation output, visual literacy is key. Having a critical understanding of the research potential of visualisations facilitates our research into textual variance. After all, these representational systems produce an object which we use for research purposes; we need to take seriously the ways in which they do this. In addition to communicating a scholarly argument, digital visualisations of collation output foreground textual variation. The collation tool HyperCollate facilitates the examination of a text from multiple perspectives (some unfamiliar, some inspiring, some contrasting, but all of them highlighting a particular element of interest). This freedom of choice invites scholars to reappraise prevalent notions and continue exploring the dynamic nature of text in dialogue with other disciplines. Digital visualisations, then, give us a means to take variants out of the graveyard and into an environment in which they can be fully appreciated.
See Haentjens Dekker and Birnbaum (2017) for an exhaustive overview of textual features and the extent to which these can be represented in a computational model.
The TEI Guidelines offer the element <cert> to indicate the degree of certainty associated with some aspect of the text markup, but as Wout Dillen points out, this requires an elaborate encoding practice that is not always worth the effort (2015, 90) and furthermore the ambiguity is not always translatable to the qualifiers “low,” “medium,” and “high.”
Woolf, Virginia. Time Passes. The genetic edition of the manuscripts is edited by Peter Shillingsburg and available at www.woolfonline.com (last accessed on 2018, April 27). Excerpts from Woolf’s manuscripts are reused in this contribution with special acknowledgments to the Society of Authors as the Literary Representative of the Estate of Virginia Woolf.
See http://v-machine.org/ (last accessed 2018, March 30).
Downloadable on https://sourceforge.net/projects/evt-project/files/latest/download (last accessed 2018, March 30)
See http://www.juxtasoftware.org/juxta-commons/ (last accessed 2018, March 30).
Stemmaweb brings together several tools for stemmatology: https://stemmaweb.net/ (last accessed on 2018, April 27).
The Stemmaweb toolset allows users to root and reroot their stemmata to explore different outcomes, see https://stemmaweb.net/?p=27 (last accessed 2018, March 25).
Haentjens Dekker, Ronald and Gregor Middell. CollateX.
Arguably the transcriber could have added a <div>, but the TEI Guidelines do not allow for a <div> to be placed within an <add>. Nevertheless, contrasting the structure of witness 1 with the structure of witness 2 already alerts the reader to structural revisions and invites a closer inspection.
The edges in a hypergraph are called hyperedges. In contrast to edges in a DAG, hyperedges can connect a set of nodes.
This practice leads to some problematic issues in case of complex revisions, see De Bruijn et al. 2007; Bleeker 2017, 111–114.
Although the value of documenting a tool’s operations is uncontested, making use of documentation is not yet part of digital humanities’ best practice. In that respect, it is worthwhile to keep in mind the RTFM-mantra of software development (‘Read the F-ing Manual’).
- Andrews, T., & Mace, C. (2012). Trees of Texts: Models and Methods for an Updated Theory of Medieval Text Stemmatology. Paper presented at the digital humanities conference, 2012, July 16–20, University of Hamburg. Abstract available at http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/trees-of-texts-models-and-methods-for-an-updated-theory-of-medieval-text-stemmatology.1.html. Accessed 23 Dec 2018.
- Andrews, T., & Van Zundert, J. (2013). An Interactive Interface for Text Variant Graph Models. Paper presented at the Digital Humanities Conference, 2013, July 16–19, University of Lincoln, Nebraska. Abstract available at http://dh2013.unl.edu/abstracts/ab-379.html. Accessed 23 Dec 2018.
- Bleeker, E. (2017). Mapping invention in writing: Digital infrastructure and the role of the genetic editor. Ph.D. Dissertation, University of Antwerp.Google Scholar
- Bleeker, E., Buitendijk, B., Dekker, R. H., Neyt, V., & van Hulle D. (2017). The challenges of automated collation of manuscripts. In Advanced in digital scholarly editing, Leiden: Sidestone Press, pp. 241–249.Google Scholar
- Bleeker, E., Buitendijk, B., Dekker, R. H., & Kulsdom, A. (2018). Including XML Markup in the Automated Collation of Literary Texts. Proceedings of the XML Prague conference 2018, February 9–11, pp. 77–95.Google Scholar
- Burnard, L., Jannidis, F., Middell, G., Pierazzo, E., & Rehbein, M. (2010). An encoding model for genetic editions, accessible at http://www.tei-c.org/Activities/Council/Working/tcw19.html (last accessed 2018, March 30).
- Clement, T. (2013). Text analysis, data mining, and visualizations in literary scholarship. In Literary studies in the digital age: An evolving anthology. https://doi.org/10.1632/lsda.2013.0.
- De Bruijn, P. (2002). Dancing around the grave. A history of historical-critical editing in the Netherlands. In Plachta, B. & Van Vliet, H.T.M. (red.), Perspectives of scholarly editing/perspektiven der textedition (pp. 113–124). Berlin: Weidler Buchverlag.Google Scholar
- Dillen, W. (2015). Digital scholarly editing for the genetic orientation: The making of a genetic edition of Samuel Beckett’s works. Ph.D. thesis, University of Antwerp.Google Scholar
- Haentjens Dekker, R., & Birnbaum, D. J. (2017). It’s more than just overlap: Text as graph. Presented at Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19. https://doi.org/10.4242/BalisageVol19.Dekker01.
- Jänicke, Stefan, Gessner, Annette, Büchler, Marco, & Scheuermann Gerik (2014). Design rules for visualizing text variant graphs. In Proceedings of the digital humanities 2014, edited by Clare Mills, Michael Pidd and Jessica Williams.Google Scholar
- Joyce, J. (1984-1986). Ulysses: A critical and synoptic edition, prepared by Hans Walter Gabler with Wolfhard Steppe and Claus Melchior, 3 vols. New York & London: Garland Publishing Inc.Google Scholar
- Schacht, P. (2016). ‘Introduction’ in: Thoreau, Henry David. Walden: A fluid-text edition. Digital Thoreau. http://digitalthoreau.org/fluid-text-toc. Accessed 27 May 2019.
- Schäuble, J., & Gabler, H. W. (2016). Visualising processes of text composition and revision across document Borders. Paper presented at the symposium Digital Scholarly Editions as Interfaces, Graz, Austria, September 22–23.Google Scholar
- Sinclair, S., Ruecker, S., & Radzikowska, M. (2013). Information visualization for humanities scholars. In Literary studies in the digital age, an evolving anthology, edited by Kenneth Price and Ray Siemens. Available at https://dlsanthology.mla.hcommons.org/information-visualization-for-humanities-scholars. Accessed 23 Dec 2018
- Traub, M., & van Ossenbruggen, J. (2015). Workshop on tool criticism in the digital humanities. CWI Techreport July 1, 2015. Available at https://pdfs.semanticscholar.org/d337/ce558c2fd1d8be793786c9cfc3fab6512dea.pdf. Accessed 27 May 2019.
- Vanhoutte, E. (1999). Where is the editor? Human IT, 3.1, 197–214.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.