Using the Critical Apparatus in Digital Scholarship

    

As the Juxta R&D team has worked to take the desktop version of our collation software to the web, I’ve found myself thinking a great deal about the critical apparatus and its role when working with digital (digitized?) texts.

In the thumbnails above, you can see a page image from a traditional print edition (in this case, of Tennyson’s poetry) on the left, and a screenshot of the old Juxta critical apparatus output on the right. In the original, downloadable, version of Juxta, we allowed users to browse several visualizations of the collation in order to target areas of interest, but we also offered them the ability to export their results in an HTML-encoded apparatus. This was an effort to connect digital scholarship to traditional methods of textual analysis, as well as a way to allow scholars to share their findings with others in a familiar medium.

It has become clear to me, based on the feedback from our users, that this HTML critical apparatus has been quite useful for a number of scholars. Even though our output could seem cryptic without being paired with the text of the base witness (as it is in the Tennyson edition), it was apparent that scholars still needed to translate their work in Juxta into the traditional format.

 In the meantime, scholars working with the Text Encoding Initiative (TEI) developed Parallel Segmentation, a method of encoding the critical apparatus in XML. In her article,  ”Knowledge Representation and Digital Scholarly Editions in Theory and Practice,” Tanya Clement describes the effectiveness of using parallel segmentation to encode her digital edition of the work of the Baroness Elsa von Freytag-Loringhoven.  Using a TEI apparatus along with a visualization tool called the Versioning Machine, Clement argued that her project “encourage[d] critical inquiry concerning how a digital scholarly edition represents knowledge differently than a print edition,” and illustrated the flexibility of working with full texts in tandem. Witnesses, or alternate readings, were not subsumed under a (supposedly static) base text, but living, dynamic representations of the social and cultural networks within which the Baroness lived and wrote.

Working with digital texts can make generating a critical apparatus difficult. One could encode your apparatus manually, as Clement did, but most users of Juxta wanted us to take their plain text or XML-encoded files and transform them automatically. The traditional apparatus requires exact notations of line numbers and details about the printed page. How does one do that effectively when working with plain text files that bear no pagination and few (if any) hard returns, denoting line breaks? Instead of hurriedly replicating the desktop apparatus online –knowing it would posses these weaknesses and more — the R&D team chose to offer TEI Parallel Segmentation output for Juxta Commons.

Juxta TEI  Parallel Segmentation export

Any user of Juxta Commons can upload a file encoded in TEI Parallel Segmentation, and see their documents represented in Juxta’s heat map, side-by-side, and histogram views. Those working with plain text of XML files can also export the results of their collations as a downloadable TEI Parallel Segmentation file. In short, Juxta Commons and can both read and write TEI Parallel Segmentation.

However, we’re not convinced that the traditional apparatus has lost its functionality. We’d like to ask you, our users, to tell us more about your needs. How do you use the critical apparatus in your studies? What other kind of apparatus could we offer to streamline and enhance your work in Juxta Commons?

Juxta and the THEOT Project

The Textual History of the Ethiopic Old Testament Project (THEOT) is an international effort to identify and to trace textual trajectories found in Ethiopian manuscripts that contain books included in the canon of the Hebrew Bible. (The Ethiopian Orthodox church counts a number of other books as part of their canon, but another team is examining those texts.) Although we hope our efforts will eventually lead to full critical editions of each book, the immediate goal is more manageable. By employing profile methods similar to those used in the field of New Testament Textual Criticism we will produce a preliminary textual history based on the collations of 15-70 select readings in 30 carefully chosen manuscripts per book.

Ted Erho, in consultation with the Primary Investigator (PI) assigned to a particular biblical work, selects the manuscripts based on age and significance. All manuscripts predating the 16th century are included by default. A representative sampling of later manuscripts and textual families (when known) populate the remaining number.

I also work with each PI to determine which passages to collate. We look for places where there is clear and significant variation in the Ethiopian tradition as well as in the sources that may have impacted the development of the text such as the Greek, Hebrew, Syriac, Coptic, and Arabic versions. Collations of the Ethiopic texts will provide data for mapping out internal developments. Alignment of the subsequently isolated traditions with external versional evidence will establish the source of the original translation and perhaps what foreign influences subsequently affected Ethiopia’s transmission of sacred texts.

Once these elements are set, every selected passage in each manuscript is collated separately by a minimum of two scholars. We then use Juxta to compare the transcriptions. Juxta highlights in blue the areas where the collations disagree, which expedites comparison and final editing. The investigators work through the differences aiming for a consensus on every reading. The end result is a transcription 99.9% pure.

Second, we then collate these “pure” transcriptions of the thirty manuscripts to identify unique readings and family groupings. What once required a tremendous amount of effort is accomplished in mere seconds with Juxta. Relationships between groups of manuscripts are much easier to identify with the graded highlighting scheme employed in Juxta’s “Comparison Set” window. Plus the words and phrases highlighted in the collation window facilitate the isolation of distinct variation units of value for mapping out Ethiopia’s textual history.

(On the off chance that someone will read this who knows Ethiopic, I should note that we create two copies of the final “pure” transcriptions. One version is retained in its original form preserving all of the orthographical variants and scribal idiosyncrasies. These will be used later for publications and further research. The other version is “standardized” through a process that removes a large amount of the numerous orthographic variations, such as the frequent interchange of gutturals, that occur in Ethiopic manuscripts.)

In addition to providing the data we need for our immediate goals, these units will be compiled into a list that will allow scholars in the future to classify quickly the textual affinity of other manuscripts.

We are very grateful to the Juxta team for providing the software and quickly responding to the particular needs of the THEOT Project. We eagerly anticipate using this new web version and the further refinements and development sure to come.

The Textual History of the Ethiopic Old Testament Project

Co-Directors: Steve Delamarter and Curt Niccum

Steering Committee: Jeremy Brown, Aaron Butts, Ted Erho, Martin Heide, Ralph Lee

A Preview of Juxta Commons

The NINES R&D team is happy to announce a new phase of testing for Juxta online: Juxta Commons. We’re entering our final phase of intensive testing on this new site for using Juxta on the web, which breaks down the processes of the desktop application so you always have access to your raw source files and your witnesses, in addition to your comparison sets. We’ve even added more ways to work with XML, as well as an option to import and export files encoded in TEI Parallel Segmentation.

We have invited a group of scholars and users to try out Juxta Commons for the next two months, and share their experiences online. They’ll be exploring new ways to add source files, filter XML content and browse our newly-updated visualizations, previewed in the gallery above.

If you would like to be a part of this group of testers, please leave a comment below, and we’ll get in touch with you.

On the Juxta Beta release, and taking collation online

In September of 2008, when I first became acquainted with Juxta as a collation tool, I wrote a blog post as a basic demonstration of the software. I hunted down transcriptions of two versions of one of my favorite poems, Tennyson’s “The Lady of Shalott,” and collated them alongside the abbreviated lyrics to the song adapted from work by Loreena McKennitt. Screenshots were all I had to illustrate the process and its results, however – anyone interested in exploring the dynamic collation in full would need to first download Juxta, then get the set of files from me. We had a great tool that encouraged discovery and scholarly play, but it didn’t facilitate collaboration and communication. Now, in 2012, I can finally show you that set in its entirety.

The dream of Juxta for the web has been a long time coming, and we couldn’t have done it without generous funding from the Google Digital Humanities Award and support from European scholars in the COST Action 32 group, TextGrid and the whole team behind CollateX. As Project Manager, I’m thrilled to be a part of the open beta release of the Juxta web service, accessed through version 1.6.5 of the desktop application.

I imagine at this point you’re wondering:  if I want to try out the web service, do I still have to download the desktop application? Why would I do that?

Over the past year, our development team’s efforts have been directed to breaking down the methods by which Juxta handles texts into ‘microservices’ following the Gothenberg Model for collation. We designed the web service to enable other tools and methods to make use of its output: in Bamboo CorporaSpace, for example, a text-mining   algorithm could benefit from the tokenization performed by Juxta. We imagined Juxta not just as a standalone tool, but as one that could interact with a suite of other potential tools.

That part of our development is ready for testing, and the API documentation is available at GitHub.

However, the user workflow for Juxta as a destination site for collations on the web, is still being implemented. Hence this new, hybrid beta, which leverages the desktop application’s interface for adding, subtracting and editing documents while also inviting users to share their curated comparison sets online.

This is where you come in, beta testers – we need you to tell us more about how you’d like to user Juxta online. We know that collation isn’t just for scholarly documents: we’ve seen how visualizing versions of Wikipedia pages can tell us something about evolving conversations in Digital Humanities, and we’ve thought about Juxta’s potential as a method for authenticating online texts. But as we design a fully online environment for Juxta, we want to get a better sense of what the larger community wants.

I want to thank everyone who has set up and account and tried out the  newest version. We’ve seen some really exciting possibilities, and we’re taking in a lot of valuable feedback. If you’ve held off so far, I ask that you consider trying it out.

But I don’t have any texts to collate!

No worries! We’re slowly populating a Collation Gallery of comparison sets shared by other beta testers. You might just find something there that gets your creative juices flowing.

Explore Juxta Beta today!

** cross-posted on the NINES site **

Beta-release of Juxta includes online sharing

Calling all beta testers!

Over the past few months, NINES and the developers of Juxta have been busy adapting the application for use on the web. In order to expand our testing capabilities, we’re releasing a version of the desktop client that offers users the ability to share comparison sets online.

If you have any sets of witnesses to a particular work that you would like to collate and share, we invite you to sign up and download the beta version  to try out some of our online features. Please keep in mind that this is a trial version of the web-service, and may be subject to changes and updates over the next few months. Joining us now ensures that your feedback will make the full release of the software better than we could manage in-house.

 Please help us make Juxta better!

New Partnership with the Modernist Versions Project

Great news! Juxta is at the center of a new partnership agreement between NINES and the Modernist Versions Project (MVP). The agreement provides the MVP with programming support to integrate Juxta with a digital environment for collating and comparing modernist texts that exist in multiple textual variants. The MVP, a project based at the University of Victoria, will enjoy full access to the Juxta collation software, including the existing stand-alone application and the web service now under development. The MVP is expected to provide a robust environment for testing and enhancing both versions of Juxta.

Juxta v1.6 Release

Juxta v1.6 is now available from the download page!

New features:

  • Building on Juxta’s existing support for <add>, <del>, <addspan>, and <delspan> tags, Juxta v1.6 now allows you to control the collation of revision sites by accepting or rejecting additions and deletions to the witness text.
  • The contents of TEI <note> tags now display in the right column of the Document Panel and are excluded from the text collation.
  • Default XML parsing templates are provided for TEI files. As in Juxta v1.4, you can customize these templates or create new ones.
  • A new edit window allows you to make changes to a witness text and save the altered version as a new witness.

This development was made possible by the support of the Carolingian Canon Law project at the University of Kentucky.

Juxta Camp

On July 11-12, 2011, a group of Juxta users and collaborators met at the offices of Performant Software Solutions LLC in downtown Charlottesville, Virginia. The group included Abigail Firey of the Carolingian Canon Law Project at the University of Kentucky; Gregor Middell of Universität Würzburg; Ronald Dekker from the Huygen Institute; Jim Smith from the Maryland Institute for Technology in the Humanities (MITH); Dana Wheeles and Alex Gil of NINES; and Nick Laiacona and Lou Foster from Performant Software. The group previewed new features available in Juxta 1.6 (including changes to revision site display and TEI note tag support), then worked on planning for Juxta WS 1.0, the Juxta web service now in development.

Ronald, Gregor, Lou, and Jim (out of sight, with laptop) hacking at Juxta Camp

Ronald, Gregor, Lou, and Jim's Laptop

Abigail Firey and Alex Gil spoke about what the developers of Juxta could learn in general from considering the particular needs of their textual projects. Jim Smith gave a presentation on Corpora Space Architecture. Gregor Middell and Ronald Dekker spoke about their work on CollateX. Gregor talked about using an offset range model of text markup; Ronald spoke about the Gothenburg abstract model for collation. Lou Foster presented the features new to Juxta 1.6. Finally, Gregor, Ronald, Jim, Lou, and Nick put their heads together in hacking sessions to work on offset ranges, the Gothenburg pipeline model, and the Juxta web service.

You can read notes from Juxta Camp on the Juxta wiki.

Juxta v1.4 Release

Juxta v1.4 is now available in the files area!

In addition to importing UTF-8 encoded plain text files, this new version of Juxta now supports direct import of XML source files in any well-formed schema, include TEI p4 and p5. No more preparing specialized versions of your witnesses for import into Juxta. Just import them and instantly start collating and learning things about your texts! You can configure how Juxta parses the tags it encounters. It can either include them in the reading copy, exclude them, or collate the tag type. For example if <b> changes to <i> for the same word across different witnesses, Juxta can help you detect this move. Complete details are in the online online casino julietta documentation on this website.sumo wrestling suits for sale

Other new features include:

  • The ability to pick a target XPath from which to read a document from an XML file.
  • The user can now easily examine the XML source of a difference and compare the XML of the source and the witness.
  • Support for <add> <del> <addspan> and <delspan> TEI tags. These marks are now visible in the presentation of the document.
  • Automatically reads bibliographic data of TEI XML sources.
  • XML source files contained in the JXT file can now be exported by the user.
  • User can now take a screen shot of the currently displayed comparison.
  • The display font and font size are now configurable.

This development was made possible by the support of the SHANTI group at the University of Virginia.

Posted in Uncategorized

Juxta Receives Google Digital Humanities Award

Good news!  Google has offered its support to help us develop Juxta into a web application:

http://googleblog.blogspot.com/2010/07/our-commitment-to-digital-humanities.html

We are thrilled to have received this competitive award, and look forward to working to optimize Juxta for the web.

Here is an abstract of our application for the Google Award:

With the support of a Google Digital Humanities Research Award, we propose to transform Juxta into a web-based application integrated with Google Books. Scholars could use such a tool to track changes in language over time and to test literary and historical theories through comparative analysis of texts.

As the largest single part of the general remediation of the global library to digital formats, the 12,000,000+ books digitized by Google represent a major opportunity for scholars interested in the history of texts and editions. We want to know how Charles Dickens and Henry James changed their novels as they went through different editions in their lifetimes; and we also want to see the changes introduced by later editors, in later printings.  We want to collate versions of poems published by Sylvia Plath and Walt Whitman to discover their revisions.  We want to compare digital texts of uncertain origin with known versions, as a mode of authentication.

Using Juxta, a scholar can answer these questions and many more. Juxta comes with several kinds of analytic visualizations. The primary collation gives a split frame comparison of a base text with a witness text, along with a display of the digital images from which the base text is derived. Juxta displays a heat map of all textual variants and allows the user to locate all witness variations from the base text. The histogram visualization displays the density of all variation from the base text and serves as a useful finding aid for specific variants.

A web based Juxta would be very similar in function to the Juxta desktop application. Scholars could upload texts into a private storage area and compare them against books from the Google Books corpus. The scholar could also embed the collation into their own website (as with Google Maps) with an HTML code snippet that we will generate. Our goal would be to eventually integrate Juxta directly into the Google Books interface, allowing scholars to compare any two books for which they have access to the full text.