Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool

I have had the opportunity to use Juxta Commons for several editorial projects, and while taking a breath between a Juxta-intensive term project last semester and my Juxta-intensive MA thesis this semester, I would like to offer a few thoughts on Juxta as an editorial tool.

For my term project for Jerome McGann’s American Historiography class last semester, I conducted a collation of Martin R. Delany’s novel, Blake, or, The Huts of America, one of the earliest African American novels published in the United States.Little did I know that my exploration would conduct me into an adventure as much technological as textual, but when Professor McGann recommended I use Juxta for conducting the collation and displaying the results, that is exactly what happened. I input my texts into Juxta Commons, collated them, and produced HTML texts of the individual chapters, each with an apparatus of textual variants, using Juxta’s Edition Starter. I linked these HTML files together into an easily navigable website to present the results to Professor McGann. I’ll be posting on the intriguing results themselves next week, but in the meantime, they can also be viewed on the website I constructed, hosted by GitHub: Blake Project home.

Juxta helped me enormously in this project. First, it was incredibly useful in helping me clean up my texts. My collation involved an 1859 serialization of the novel, and another serialization in 1861-62. The first, I was able to digitize using OCR; the second, I had to transcribe myself. Anyone who has done OCR work knows that every minute of scanning leads to (in my case) an average of five or ten minutes of cleaning up OCR errors. I also had my own transcription errors to catch and correct. By checking Juxta’s highlighted variants, I was able to—relatively quickly—fix the errors and produce reliable texts. Secondly, once collated, I had the results stored in Juxta Commons; I did not have to write down in a collation chart every variant to avoid losing that information, as I would if I were machine- or sight-collating. Juxta’s heat-map display allows the editor to see variants in-line, as well, which saves an immense amount of time when it comes to analyzing results: you do not have to reference page and line numbers to see the context of the variants. Lastly, Juxta enabled me to organize a large amount of text in individual collation sets—one for each chapter. I was able to jump between chapters and view their variants easily.

As helpful as Juxta was, however, I caution all those new to digital collation that no tool can perfectly collate or create an apparatus from an imperfect text. In this respect, there is still no replacement for human discretion—which is, ultimately, a good thing. For instance, while the Juxta user can turn off punctuation variants in the display, if the user does want punctuation and the punctuation is not spaced exactly the same in both witnesses, the program highlights this anomalous spacing. Thus, when 59 reads

‘ Henry, wat…

and 61 reads

‘Henry, wat…

Juxta will show that punctuation spacing as a variant, while the human editor knows it is the result of typesetting idiosyncrasies rather than a meaningful variant. Such variants can carry over into the Juxta Edition Builder, as well, resulting in meaningless apparatus entries. For these reasons, you must make your texts perfect to get a perfect Juxta heat map and especially before using Edition Starter; otherwise, you’ll need to fix the spacing in Juxta and output another apparatus, or edit the text or HTML files to remove undesirable entries.

Spacing issues can also result in disjointed apparatus entries, as occurred in my apparatus for Chapter XI in the case of the contraction needn’t. Notice how because of the spacing in needn t and need nt, Juxta recognized the two parts of the contraction as two separate variants (lines 130 and 131):

spacing 2

This one variant was broken into two apparatus entries because Juxta recognized it as two words. There is really no way of rectifying this problem except by checking and editing the text and HTML apparatuses after the fact.

I mean simply to caution scholars going into this sort of work so that they can better estimate the time required for digital collation. This being my first major digital collation project, I averaged about two hours per chapter (chapters ranging between 1000 and 4000 words each) to transcribe the 61-62 text and then collate both witnesses in Juxta. I then needed an extra one or two hours per chapter to correct OCR and transcription errors.

While it did take me time to clean up the digital texts so that Juxta could do its job most efficiently, in the end, Juxta certainly saved me time—time I would have spent keeping collation records, constructing an apparatus, and creating the HTML files (as I wanted to do a digital presentation). I would be remiss, however, if I did not recommend a few improvements and future directions.

As useful as Juxta is, it nevertheless has limitations. One difficulty I had while cleaning my texts was that I could not correct them while viewing the collation sets; I had, rather, to open the witnesses in separate windows.

screenshot windows

The ability to edit the witnesses in the collation set directly would make correction of digitization errors much easier. This is not a serious impediment, though, and is easily dealt with in the manner I mentioned. The Juxta download does allow this in a limited capacity: the user can open a witness in the “Source” field below the collation visualization, then click “Edit” to enable editing in that screen. However, while the editing capability is turned on for the “Source,” you cannot scroll in the visualization—and so navigate to the next error which may need to be corrected.

A more important limitation is the fact that the Edition Starter does not allow for the creation of eclectic texts, texts constructed with readings from multiple witnesses; rather, the user can only select one witness as the “base text,” and all readings in the edition are from that base text.

screenshot Edition Starter

Most scholarly editors, however, likely will need to adopt readings from different witnesses at some point in the preparation of their editions. Juxta’s developers need to mastermind a way of selecting which reading to adopt per variant; selected readings would then be adopted in the text in Edition Starter. For the sake of visualizing, I did some screenshot melding in Paint of what this function might look like:


Currently, an editor wishing to use the Edition Starter to construct an edition would need to select either the copy-text or the text with the most adopted readings for the base text. The editor would then need to adopt readings from other witnesses by editing the the output DOCX or HTML files. I do not know the intricacies of the code which runs Juxta. I looked at it on GitHub, but, alas! my very elementary coding knowledge was completely inadequate to the task. I intend to delve more as my expertise improves, and in the meantime, I encourage all the truly code-savvy scholars out there to look at the code and consider this problem. In my opinion, this is the one hurdle which, once overcome, would make Juxta the optimal choice as an edition-preparation tool—not just a collation tool. Another feature which would be fantastic to include eventually would be a way of digitally categorizing variants: accidental versus substantive; printer errors, editor corrections, or author revisions; etc. Then, an option to adopt all substantives from text A, for instance, would—perhaps—leave nothing to be desired by the digitally inclined textual editor. I am excited about Juxta. I am amazed by what it can do and exhilarated by what it may yet be capable of, and taking its limitations with its vast benefits, I will continue to use it for all future editorial projects.

Stephanie Kingsley is a second-year English MA student specializing in 19th-century American literature, textual studies, and digital humanities. She is one of this year’s Praxis Fellows [see Praxis blogs] and Rare Book School Fellows. For more information, visit, and remember to watch for Ms. Kingsley’s post next week on the results of her collation of Delany’s Blake.

Using Juxta in the Classroom: Scholar’s Lab Presentation

Director of NINES Andrew Stauffer and Project Manager Dana Wheeles will be joining the UVa Scholar’s Lab today to discuss Juxta Commons and possible uses for the software in the classroom.  Below are a list of sets included in the demo to illustrate the numerous ways Juxta could draw students’ attention to textual analysis and digital humanities.

Traditional Scholarly Sets for Analysis and Research



Scholarly Sets for Classroom Engagement



Beyond traditional scholarship: born-digital texts



Our favorites from the user community



Digital Thoreau and Parallel Segmentation

[Cross-posted at]

Every now and then I like to browse the project list at, just to get an idea of what kind of work is being done in digital scholarship around the world. This really paid off recently, when I stumbled upon Digital Thoreau, an engaging and well-structured site created by a group from SUNY-Geneseo. This project centers around a TEI-encoded edition of Walden, which will, to quote their mission statement, “be enriched by annotations links, images, and social tools that will enable users to create conversations around the text.” I highly recommend that anyone interested in text encoding take a look at their genetic text demo of “Solitude,” visualized using the Versioning Machine.

What really caught my attention, however, is that they freely offer a toolkit of materials from their project, including XML documents marked up in TEI. This allowed me to take a closer look at how they encoded the text featured in the demo, and try visualizing it, myself.

This embed shows the same text featured on the Digital Thoreau site, now visualized in Juxta Commons. It is possible to import a file encoded in TEI Parallel Segmentation directly into Juxta Commons, and the software will immediately break down the file into its constituent witnesses (see this example of their base witness from Princeton) and visualize them as a comparison set.

upload screen

Uploading Parallel Segmentation



Parallel Segmentation file added and processed 


Once you’ve successfully added the file to your account, you have access to the heat map visualization (where changes are highlighted blue on the chosen base text), the side-by-side option, and a histogram to give you a global view if the differences between the texts in the set. In this way, the Juxta Commons R&D hope to enable the use of our software in concert with other open-source tools.

I should also note that Juxta Commons allows the user to export any other sets they have created as a parallel-segmented file. This is a great feature for starting an edition of your own, but it no way includes the complexity of markup one would see in files generated by a rigorous project like Digital Thoreau. We like to think of it the Parallel Segmentation and new experimental edition builder export as building blocks for future scholarly editions.

Many thanks to the team at Digital Thoreau for allowing us to make use of their scholarship!

Authenticating Google Books with Juxta Commons

What do you get when you collate as many free Google versions of the same text as you can find? Those familiar with Google Books may suggest that you’ll quickly discover rampant OCR errors, or perhaps some truly astounding misinformation in the metadata fields. In my experiment using Juxta Commons to explore the versions of Alfred, Lord Tennyson’s long poem, The Princess, available online, I encountered my fill of both of these issues. But I also discovered a number of interesting textual variations – ones that led me to a deeper study of the poem’s publication history.

In the process of testing the efficacy of the software, I believe I stumbled upon a useful experiment that may prove helpful in the classroom: a new way to introduce students to textual scholarship, to the value of metadata, and to the modes of inquiry made possible by the digital humanities.

Many of the editions of Tennyson’s works offered in Google Books are modern, or modern reprints, and are thus available only in snippet view. Paging through the results, I chose six versions of the Princess that were available in e-book form, and I copied and pasted the text into the text editor in Juxta Commons*. Because the poem is relatively long, I chose to focus solely on its Prologue – not only to expedite the process of collation, but to see if one excerpt could give a more global view of changes to the poem across editions. Another important step was to click on the orange “i” button at the upper left of the screen to save original URLs and basic metadata about the object for future reference.

source info

This step turned out to be invaluable, once I realized that the publication information offered on the title pages of the scanned documents didn’t always agree with the metadata offered by Google (see this example).

Once the set was complete, and collated, I noticed right away that there were significant passages that were missing in the 1863 and 1900 editions of the poem.


Stepping chronologically through the set using the witness visibility feature (the eye icons on the left) showed no apparent timeline for this change (why would it be missing in 1863, present in 1866, 1872, 1875, and excised again in 1900?). The answer could only be found in a robust explanation of the revision and publication history of Tennyson’s work.

Without going too deeply into the reasons behind this set of differences (I’ll refer you to Christopher Ricks’ selected critical edition of Tennyson, if you’re interested), The Princess happens to be one of the most revised long poems of Tennyson’s career. The Prologue was expanded in the 5th edition (published in 1853) and it is that version that generally considered the standard reading text today. However, as we have seen from the Google Books on offer, even in 1900, editions were offered that were based on earlier versions of the poem. Could the fact that both versions missing the stanzas are American editions be important?

I invite Tennyson scholars to help me continue to piece together this puzzle. However, I believe that in this one example we have seen just how powerful Juxta Commons can be for delving into seemingly innocuous editions of one of Tennyson’s poem and exposing a myriad of possible topics of study. Next time you’re wondering just *which* version of a text you’re looking at on Google Books, I hope you’ll consider Juxta Commons a good place to start.

* Please note that Juxta Commons can accept some e-book formats, but those offered by Google Books have image information only, and the text cannot be extracted.

We have a winner!

Congratulations to Tonya Howe, the winner of our Juxta Commons sharing competition, leading up to the MLA Conference in Boston (#MLA13). Be sure to have a look at the side-by-side view of her comparison set, Legend of Good Women, Prologues A and B.

We’ll be featuring the set in the Juxta Commons gallery in the very near future, along with some of the other sets that received lots of interest in the last month.

A Preview of Juxta Commons

The NINES R&D team is happy to announce a new phase of testing for Juxta online: Juxta Commons. We’re entering our final phase of intensive testing on this new site for using Juxta on the web, which breaks down the processes of the desktop application so you always have access to your raw source files and your witnesses, in addition to your comparison sets. We’ve even added more ways to work with XML, as well as an option to import and export files encoded in TEI Parallel Segmentation.

We have invited a group of scholars and users to try out Juxta Commons for the next two months, and share their experiences online. They’ll be exploring new ways to add source files, filter XML content and browse our newly-updated visualizations, previewed in the gallery above.

If you would like to be a part of this group of testers, please leave a comment below, and we’ll get in touch with you.

Juxta and excess: The case of Aimé Césaire

(Guest post by Alex Gil – read full entry at NINES)

I’m a PhD candidate in the English Department at the University of Virginia currently working on a digital edition of Aimé Césaire’s early works under the sponsorship of  l’Agence Universitaire de la Francophonie and ITEM. Some of this work also moonlights as my rather schizoid dissertation (read French poet/English Department) and I consider it part of my long-term goal of generating and sustaining enthusiasm for reliable digital editions of neo-canonical Caribbean literary texts. I am rather new to this blog, but not to Juxta. I started working with Juxta around the time when I started working with Aimé Césaire’s signature poem Cahier d’un retour au pays natal, roughly 2 years ago. At the time, Juxta saved me enormous amounts of time proofreading my retooled OCRs and generating an apparatus. It was later, when I started working with Et les chiens se taisaient, a longer text with substantially more variants and transpositions, that Juxta revealed to me both its current shortcomings and its ultimate promise.

We could say that Aimé Césaire was a migratory poet in the fullest sense: He had perfect pitch for context and used it to quickly adapt his voice to new audiences as his work traveled around three continents. As a student of literature he was as much a product of his Paris education as he was of the journey that brought him there and back to his home base in Martinique. His major works, and the many revisions they were subjected to during his lifetime, provide the final testimony to his restless poetic trajectory.

To the textual critic who approaches this corpus for the first time, one feature stands out above all others: The sheer number of transpositions from one version to another. In past conversations, I have likened his stanzas and lines to Lego blocks in order to quickly explain how he seems to have an utter disregard (or is it exactly the opposite?) for sequence. In the case of Et les chiens se taisaient the text begins its life as a three-act play on the Haitian Revolution, has an adolescence as a poetic oratorio with heavy Christian overtones and grows up to be a heavily abstract play about the struggle between universal Slave and Master figures. Throughout this transformation, stanzas and lines are bandied about without care for consistency, sometimes going from one speaker to his or her antagonist in a later version.

When I began using Juxta for Et les chiens se taisaient, I only expected the same functionality that was perfect to the T for Cahier d’ un retour au pays natal, but as soon as I started working with the first two instantiations of the text, the manuscript and the oratorio, obstacles and yearnings started cropping up. In its current build (1.3.1), Juxta struggles with long texts with many transpositions. After several meetings with NINES and Nick Laiacona, it became clear that a memory issue combined with the graphic rendering of connectors was the culprit. Apparently, Juxta has a built-in limit to the amount of internal memory it uses from the machine, and rendering the graphic connectors puts substantial pressure on these resources.  To account for transpositions, Juxta allows you to mark “moves” manually from one text to the next, creating a list of these moves as you go along in one of the bottom panels. This system is intuitive and easy to use, and complements the automated functions nicely, but it becomes unwieldy in a collection with heavy traffic. While Cahier d’ un retour au pays natal had a total of four, albeit significant, moves in its four major versions, Et les chiens se taisaient has an overwhelming 64 moves just between the manuscript and the first published version!

Click here to read the full entry at NINES.

Using Juxta in the Digital Variorum Edition of Ezra Pound’s Cantos

(Guest post by Mark Byron, University of Sydney, Australia)

I am currently assembling the digital variorum edition of Ezra Pound’s Cantos with Richard Taylor. This edition aims to collate all published versions of every canto, including page proofs and setting copy, where available, and to integrate digital reproductions of illustrated capitals in deluxe editions, audio and video recordings of Pound reading his poetry, and a very large cache of annals material pertaining to the production of his epic poem over the course of sixty years.

We have chosen to use Juxta to collate the very extensive set of variants for each canto – the total number of witness files runs into the thousands – because this application addresses a number of issues inherent in such a project.

The Juxta interface lists any chosen comparison set, which, for example, might be as small as ten witness files for Canto VI or as large as forty witness files for Canto IV. The degree of variation of each witness text from a chosen base text is visually represented next to each file in the comparison set list. This provides an efficient means to identify the more eccentric versions (bibliographically speaking) of a particular canto. A curious reader viewing the Edit Note in the figure below might choose to compare the 1922 version of Canto II published in The Dial with the so-called “Base text” – the 1975 New Directions edition of the Cantos that was adopted by Faber in place of its own edition, marking the end of the separate stemmatic lineage of the British edition of the text. (It should be noted that any witness file may be chosen as a base text for the purposes of a particular collation.)

Juxta’s elegant interface provides immediate visual information concerning the kind and degree of variation between the two witness files represented here: the reader is already aware of the canto’s changed status after 1922 from the “Eighth Canto” to Canto II, and can see – immediately – that the heaviest revision occurs in the opening lines, a revision that ushers in the now-iconic address to Robert Browning (the rhetorical and semantic implications of which can be processed by means of careful comparison of the two versions).

Variation is visualized in the integrated heat map, and is complemented by the Histogram function, allowing the reader to see exactly at which points the densest variation might occur in the canto. In this case, the beginning of the text bears the most acute variation, but other significant variations occur throughout the canto, including the final lines. To be able to see this at a glance is truly a powerful aid to scholars, even those intimately familiar with the textual state and history of this poem.

The complexity of Pound’s text is legendary, and not all bibliographic features can be captured in either codex or digital editions. Yet Juxta provides the means to collate Greek text, including diacritics (seen in the example above), and the increasingly substantial presence of Chinese in later instalments of the Cantos. Indeed, any element present in the Unicode palette can be deployed in a Juxta text file. While those ideograms drawn by hand (often incorrectly) and included in published editions of the Cantos are not represented in the text field, photographic reproductions of them can be added as Edit Notes at precisely where they occur in a particular canto.

These features provide excellent reasons for the digital variorum edition of Pound’s Cantos to employ Juxta. Potential development of an HTML applet – allowing for an integrated collation function within a web-based edition – is exciting news indeed.

Mark Byron
Department of English
University of Sydney, Australia

Working with non-Roman alphabets in Juxta

Now that Juxta 1.3 has been refined and released, the development team at NINES has been discussing new directions for the software. First and foremost is the adaptation of Juxta’s collating power for texts in languages other than English. Comparisons of texts in French and Italian work pretty well, but we’re still investigating the necessary diacritics to make such operations more exact. However, it seems that scholars working with non-Roman alphabets have been left out of the conversation.

Do any Juxta users out there have any experiences with foreign language collation to share with us?

Searching Tennyson

Below is a representative page from Christopher Ricks’s critical edition of the poems of Alfred, Lord Tennyson.

This excerpt from “The Lady of Shalott” illustrates traditional methods of textual collation: the base text is prominently displayed, with variants and annotations included in notes at the foot of the page. It provides a useful comparison to this screenshot of the same poem, collated in Juxta.

Two versions of the poem can be displayed in Juxta side-by-side, with a heat map of the differences (highlighted in green) making variants instantly recognizable. But in addition to these basic visualizations, the new Juxta 1.3 adds another useful feature: search.

Continue reading