Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool

I have had the opportunity to use Juxta Commons for several editorial projects, and while taking a breath between a Juxta-intensive term project last semester and my Juxta-intensive MA thesis this semester, I would like to offer a few thoughts on Juxta as an editorial tool.

For my term project for Jerome McGann’s American Historiography class last semester, I conducted a collation of Martin R. Delany’s novel, Blake, or, The Huts of America, one of the earliest African American novels published in the United States.Little did I know that my exploration would conduct me into an adventure as much technological as textual, but when Professor McGann recommended I use Juxta for conducting the collation and displaying the results, that is exactly what happened. I input my texts into Juxta Commons, collated them, and produced HTML texts of the individual chapters, each with an apparatus of textual variants, using Juxta’s Edition Starter. I linked these HTML files together into an easily navigable website to present the results to Professor McGann. I’ll be posting on the intriguing results themselves next week, but in the meantime, they can also be viewed on the website I constructed, hosted by GitHub: Blake Project home.

Juxta helped me enormously in this project. First, it was incredibly useful in helping me clean up my texts. My collation involved an 1859 serialization of the novel, and another serialization in 1861-62. The first, I was able to digitize using OCR; the second, I had to transcribe myself. Anyone who has done OCR work knows that every minute of scanning leads to (in my case) an average of five or ten minutes of cleaning up OCR errors. I also had my own transcription errors to catch and correct. By checking Juxta’s highlighted variants, I was able to—relatively quickly—fix the errors and produce reliable texts. Secondly, once collated, I had the results stored in Juxta Commons; I did not have to write down in a collation chart every variant to avoid losing that information, as I would if I were machine- or sight-collating. Juxta’s heat-map display allows the editor to see variants in-line, as well, which saves an immense amount of time when it comes to analyzing results: you do not have to reference page and line numbers to see the context of the variants. Lastly, Juxta enabled me to organize a large amount of text in individual collation sets—one for each chapter. I was able to jump between chapters and view their variants easily.

As helpful as Juxta was, however, I caution all those new to digital collation that no tool can perfectly collate or create an apparatus from an imperfect text. In this respect, there is still no replacement for human discretion—which is, ultimately, a good thing. For instance, while the Juxta user can turn off punctuation variants in the display, if the user does want punctuation and the punctuation is not spaced exactly the same in both witnesses, the program highlights this anomalous spacing. Thus, when 59 reads

‘ Henry, wat…

and 61 reads

‘Henry, wat…

Juxta will show that punctuation spacing as a variant, while the human editor knows it is the result of typesetting idiosyncrasies rather than a meaningful variant. Such variants can carry over into the Juxta Edition Builder, as well, resulting in meaningless apparatus entries. For these reasons, you must make your texts perfect to get a perfect Juxta heat map and especially before using Edition Starter; otherwise, you’ll need to fix the spacing in Juxta and output another apparatus, or edit the text or HTML files to remove undesirable entries.

Spacing issues can also result in disjointed apparatus entries, as occurred in my apparatus for Chapter XI in the case of the contraction needn’t. Notice how because of the spacing in needn t and need nt, Juxta recognized the two parts of the contraction as two separate variants (lines 130 and 131):

spacing 2

This one variant was broken into two apparatus entries because Juxta recognized it as two words. There is really no way of rectifying this problem except by checking and editing the text and HTML apparatuses after the fact.

I mean simply to caution scholars going into this sort of work so that they can better estimate the time required for digital collation. This being my first major digital collation project, I averaged about two hours per chapter (chapters ranging between 1000 and 4000 words each) to transcribe the 61-62 text and then collate both witnesses in Juxta. I then needed an extra one or two hours per chapter to correct OCR and transcription errors.

While it did take me time to clean up the digital texts so that Juxta could do its job most efficiently, in the end, Juxta certainly saved me time—time I would have spent keeping collation records, constructing an apparatus, and creating the HTML files (as I wanted to do a digital presentation). I would be remiss, however, if I did not recommend a few improvements and future directions.

As useful as Juxta is, it nevertheless has limitations. One difficulty I had while cleaning my texts was that I could not correct them while viewing the collation sets; I had, rather, to open the witnesses in separate windows.

screenshot windows

The ability to edit the witnesses in the collation set directly would make correction of digitization errors much easier. This is not a serious impediment, though, and is easily dealt with in the manner I mentioned. The Juxta download does allow this in a limited capacity: the user can open a witness in the “Source” field below the collation visualization, then click “Edit” to enable editing in that screen. However, while the editing capability is turned on for the “Source,” you cannot scroll in the visualization—and so navigate to the next error which may need to be corrected.

A more important limitation is the fact that the Edition Starter does not allow for the creation of eclectic texts, texts constructed with readings from multiple witnesses; rather, the user can only select one witness as the “base text,” and all readings in the edition are from that base text.

screenshot Edition Starter

Most scholarly editors, however, likely will need to adopt readings from different witnesses at some point in the preparation of their editions. Juxta’s developers need to mastermind a way of selecting which reading to adopt per variant; selected readings would then be adopted in the text in Edition Starter. For the sake of visualizing, I did some screenshot melding in Paint of what this function might look like:

mockup

Currently, an editor wishing to use the Edition Starter to construct an edition would need to select either the copy-text or the text with the most adopted readings for the base text. The editor would then need to adopt readings from other witnesses by editing the the output DOCX or HTML files. I do not know the intricacies of the code which runs Juxta. I looked at it on GitHub, but, alas! my very elementary coding knowledge was completely inadequate to the task. I intend to delve more as my expertise improves, and in the meantime, I encourage all the truly code-savvy scholars out there to look at the code and consider this problem. In my opinion, this is the one hurdle which, once overcome, would make Juxta the optimal choice as an edition-preparation tool—not just a collation tool. Another feature which would be fantastic to include eventually would be a way of digitally categorizing variants: accidental versus substantive; printer errors, editor corrections, or author revisions; etc. Then, an option to adopt all substantives from text A, for instance, would—perhaps—leave nothing to be desired by the digitally inclined textual editor. I am excited about Juxta. I am amazed by what it can do and exhilarated by what it may yet be capable of, and taking its limitations with its vast benefits, I will continue to use it for all future editorial projects.


Stephanie Kingsley is a second-year English MA student specializing in 19th-century American literature, textual studies, and digital humanities. She is one of this year’s Praxis Fellows [see Praxis blogs] and Rare Book School Fellows. For more information, visit http://stephanie-kingsley.github.io/, and remember to watch for Ms. Kingsley’s post next week on the results of her collation of Delany’s Blake.

Using Juxta in the Classroom: Scholar’s Lab Presentation

Director of NINES Andrew Stauffer and Project Manager Dana Wheeles will be joining the UVa Scholar’s Lab today to discuss Juxta Commons and possible uses for the software in the classroom.  Below are a list of sets included in the demo to illustrate the numerous ways Juxta could draw students’ attention to textual analysis and digital humanities.

Traditional Scholarly Sets for Analysis and Research

 

 

Scholarly Sets for Classroom Engagement

 

 

Beyond traditional scholarship: born-digital texts

 

 

Our favorites from the user community

 

 

Digital Thoreau and Parallel Segmentation

[Cross-posted at nines.org.]

Every now and then I like to browse the project list at DHCommons.org, just to get an idea of what kind of work is being done in digital scholarship around the world. This really paid off recently, when I stumbled upon Digital Thoreau, an engaging and well-structured site created by a group from SUNY-Geneseo. This project centers around a TEI-encoded edition of Walden, which will, to quote their mission statement, “be enriched by annotations links, images, and social tools that will enable users to create conversations around the text.” I highly recommend that anyone interested in text encoding take a look at their genetic text demo of “Solitude,” visualized using the Versioning Machine.

What really caught my attention, however, is that they freely offer a toolkit of materials from their project, including XML documents marked up in TEI. This allowed me to take a closer look at how they encoded the text featured in the demo, and try visualizing it, myself.

This embed shows the same text featured on the Digital Thoreau site, now visualized in Juxta Commons. It is possible to import a file encoded in TEI Parallel Segmentation directly into Juxta Commons, and the software will immediately break down the file into its constituent witnesses (see this example of their base witness from Princeton) and visualize them as a comparison set.

upload screen

Uploading Parallel Segmentation

 

par_seg.loaded

Parallel Segmentation file added and processed 

 

Once you’ve successfully added the file to your account, you have access to the heat map visualization (where changes are highlighted blue on the chosen base text), the side-by-side option, and a histogram to give you a global view if the differences between the texts in the set. In this way, the Juxta Commons R&D hope to enable the use of our software in concert with other open-source tools.

I should also note that Juxta Commons allows the user to export any other sets they have created as a parallel-segmented file. This is a great feature for starting an edition of your own, but it no way includes the complexity of markup one would see in files generated by a rigorous project like Digital Thoreau. We like to think of it the Parallel Segmentation and new experimental edition builder export as building blocks for future scholarly editions.

Many thanks to the team at Digital Thoreau for allowing us to make use of their scholarship!

Featured Set: Wikipedia Article on Benghazi Attack

Guest post by NINES Fellow, Emma Schlosser. The full set is embedded at the end of this post.

Juxta Commons now offers a platform by which we can study the evolution of the most visited encyclopedia on the web—Wikipedia! The Wikipedia API feature allows users to easily collate variants that reveal changes made to articles, a useful tool when tracking the development of current events.   In light of President Obama’s recent nomination of Senator John Kerry to be Secretary of State following Susan Rice’s withdrawal of her bid for the position, I decided to trace Wikipedia’s article on the September 11th 2012 attack on the U.S. consulate in Benghazi.  The attack resulted in the tragic deaths of four Americans including Ambassador Christopher Stevens.

I prepared thirteen witnesses taken from the course of the article’s history on Wikipedia, stemming back to September 14th, 2012. In selecting the variants, I chose to focus on information most pertinent to the role of Rice, who is U.S. Ambassador to the UN.  These witnesses for the most part fall under the article’s “U.S. Government Response” section.  As various editors added more information regarding the attack and its aftereffects, I noted that on September 22nd a section had been added to the article entitled “Criticism of U.S. Government Response.”

In a September 16th version of the article, an editor adds that the U.S. government has begun to doubt whether a low quality and poorly produced film circulated on YouTube entitled Innocence of Muslims was in fact behind the attack.

By September 22nd, an entire paragraph had been added to the “U.S. Government Response” section, including quotations from Senator John McCain (R, Arizona) who decried any claim that the attack was spontaneous: “Most people don’t bring rocket-propelled grenades and heavy weapons to demonstrations.  That was an act of terror.”  A September 27th version reports that Susan Rice appeared on five separate news shows on the 16th, asserting that the attacks were a “spontaneous reaction to a hateful and offensive video widely disseminated throughout the Arab and Muslim world.”  The 27th variant also affirms that the Benghazi attack had become a politically fueled issue during the heated presidential race.

The October 28th variant cites under the “Criticism of U.S. Government Response” section that Senator McCain specifically accused the administration of using Susan Rice to cover the true motives of the attack.

As the progression of this Wikipedia article shows, the U.S. government response to the Benghazi attack overshadowed, to some degree, the causes and nature of the attack itself.  This, of course, had much to do with the then raging U.S. presidential campaign.  Rice’s tangential role in the response to the Benghazi attack, as evidenced by the paucity of references to her within the article, implicitly reveals the nature of political scapegoating.  Thanks to Juxta’s Wikipedia API feature it was easy for me to trace the evolution of an article on a contemporary controversy, revealing the methods by which we continually modify and interpret our understanding of current events.

MLA 2013 Reception: Saturday 1/5

For all those attending the Modern Languages Association Conference in Boston this year, please join NINES Directory Andrew Stauffer and Performant Software for a reception in the exhibit hall (Booth 717) on Saturday, January 5. We’ll be running demos of Juxta Commons and answering your questions about NINES, Juxta and digital humanities software in general.

We have a winner!

Congratulations to Tonya Howe, the winner of our Juxta Commons sharing competition, leading up to the MLA Conference in Boston (#MLA13). Be sure to have a look at the side-by-side view of her comparison set, Legend of Good Women, Prologues A and B.

We’ll be featuring the set in the Juxta Commons gallery in the very near future, along with some of the other sets that received lots of interest in the last month.

Using the Critical Apparatus in Digital Scholarship

    

As the Juxta R&D team has worked to take the desktop version of our collation software to the web, I’ve found myself thinking a great deal about the critical apparatus and its role when working with digital (digitized?) texts.

In the thumbnails above, you can see a page image from a traditional print edition (in this case, of Tennyson’s poetry) on the left, and a screenshot of the old Juxta critical apparatus output on the right. In the original, downloadable, version of Juxta, we allowed users to browse several visualizations of the collation in order to target areas of interest, but we also offered them the ability to export their results in an HTML-encoded apparatus. This was an effort to connect digital scholarship to traditional methods of textual analysis, as well as a way to allow scholars to share their findings with others in a familiar medium.

It has become clear to me, based on the feedback from our users, that this HTML critical apparatus has been quite useful for a number of scholars. Even though our output could seem cryptic without being paired with the text of the base witness (as it is in the Tennyson edition), it was apparent that scholars still needed to translate their work in Juxta into the traditional format.

 In the meantime, scholars working with the Text Encoding Initiative (TEI) developed Parallel Segmentation, a method of encoding the critical apparatus in XML. In her article,  ”Knowledge Representation and Digital Scholarly Editions in Theory and Practice,” Tanya Clement describes the effectiveness of using parallel segmentation to encode her digital edition of the work of the Baroness Elsa von Freytag-Loringhoven.  Using a TEI apparatus along with a visualization tool called the Versioning Machine, Clement argued that her project “encourage[d] critical inquiry concerning how a digital scholarly edition represents knowledge differently than a print edition,” and illustrated the flexibility of working with full texts in tandem. Witnesses, or alternate readings, were not subsumed under a (supposedly static) base text, but living, dynamic representations of the social and cultural networks within which the Baroness lived and wrote.

Working with digital texts can make generating a critical apparatus difficult. One could encode your apparatus manually, as Clement did, but most users of Juxta wanted us to take their plain text or XML-encoded files and transform them automatically. The traditional apparatus requires exact notations of line numbers and details about the printed page. How does one do that effectively when working with plain text files that bear no pagination and few (if any) hard returns, denoting line breaks? Instead of hurriedly replicating the desktop apparatus online –knowing it would posses these weaknesses and more — the R&D team chose to offer TEI Parallel Segmentation output for Juxta Commons.

Juxta TEI  Parallel Segmentation export

Any user of Juxta Commons can upload a file encoded in TEI Parallel Segmentation, and see their documents represented in Juxta’s heat map, side-by-side, and histogram views. Those working with plain text of XML files can also export the results of their collations as a downloadable TEI Parallel Segmentation file. In short, Juxta Commons and can both read and write TEI Parallel Segmentation.

However, we’re not convinced that the traditional apparatus has lost its functionality. We’d like to ask you, our users, to tell us more about your needs. How do you use the critical apparatus in your studies? What other kind of apparatus could we offer to streamline and enhance your work in Juxta Commons?

On the Juxta Beta release, and taking collation online

In September of 2008, when I first became acquainted with Juxta as a collation tool, I wrote a blog post as a basic demonstration of the software. I hunted down transcriptions of two versions of one of my favorite poems, Tennyson’s “The Lady of Shalott,” and collated them alongside the abbreviated lyrics to the song adapted from work by Loreena McKennitt. Screenshots were all I had to illustrate the process and its results, however – anyone interested in exploring the dynamic collation in full would need to first download Juxta, then get the set of files from me. We had a great tool that encouraged discovery and scholarly play, but it didn’t facilitate collaboration and communication. Now, in 2012, I can finally show you that set in its entirety.

The dream of Juxta for the web has been a long time coming, and we couldn’t have done it without generous funding from the Google Digital Humanities Award and support from European scholars in the COST Action 32 group, TextGrid and the whole team behind CollateX. As Project Manager, I’m thrilled to be a part of the open beta release of the Juxta web service, accessed through version 1.6.5 of the desktop application.

I imagine at this point you’re wondering:  if I want to try out the web service, do I still have to download the desktop application? Why would I do that?

Over the past year, our development team’s efforts have been directed to breaking down the methods by which Juxta handles texts into ‘microservices’ following the Gothenberg Model for collation. We designed the web service to enable other tools and methods to make use of its output: in Bamboo CorporaSpace, for example, a text-mining   algorithm could benefit from the tokenization performed by Juxta. We imagined Juxta not just as a standalone tool, but as one that could interact with a suite of other potential tools.

That part of our development is ready for testing, and the API documentation is available at GitHub.

However, the user workflow for Juxta as a destination site for collations on the web, is still being implemented. Hence this new, hybrid beta, which leverages the desktop application’s interface for adding, subtracting and editing documents while also inviting users to share their curated comparison sets online.

This is where you come in, beta testers – we need you to tell us more about how you’d like to user Juxta online. We know that collation isn’t just for scholarly documents: we’ve seen how visualizing versions of Wikipedia pages can tell us something about evolving conversations in Digital Humanities, and we’ve thought about Juxta’s potential as a method for authenticating online texts. But as we design a fully online environment for Juxta, we want to get a better sense of what the larger community wants.

I want to thank everyone who has set up and account and tried out the  newest version. We’ve seen some really exciting possibilities, and we’re taking in a lot of valuable feedback. If you’ve held off so far, I ask that you consider trying it out.

But I don’t have any texts to collate!

No worries! We’re slowly populating a Collation Gallery of comparison sets shared by other beta testers. You might just find something there that gets your creative juices flowing.

Explore Juxta Beta today!

** cross-posted on the NINES site **

Beta-release of Juxta includes online sharing

Calling all beta testers!

Over the past few months, NINES and the developers of Juxta have been busy adapting the application for use on the web. In order to expand our testing capabilities, we’re releasing a version of the desktop client that offers users the ability to share comparison sets online.

If you have any sets of witnesses to a particular work that you would like to collate and share, we invite you to sign up and download the beta version  to try out some of our online features. Please keep in mind that this is a trial version of the web-service, and may be subject to changes and updates over the next few months. Joining us now ensures that your feedback will make the full release of the software better than we could manage in-house.

 Please help us make Juxta better!

New Partnership with the Modernist Versions Project

Great news! Juxta is at the center of a new partnership agreement between NINES and the Modernist Versions Project (MVP). The agreement provides the MVP with programming support to integrate Juxta with a digital environment for collating and comparing modernist texts that exist in multiple textual variants. The MVP, a project based at the University of Victoria, will enjoy full access to the Juxta collation software, including the existing stand-alone application and the web service now under development. The MVP is expected to provide a robust environment for testing and enhancing both versions of Juxta.