On the Juxta Beta release, and taking collation online

In September of 2008, when I first became acquainted with Juxta as a collation tool, I wrote a blog post as a basic demonstration of the software. I hunted down transcriptions of two versions of one of my favorite poems, Tennyson’s “The Lady of Shalott,” and collated them alongside the abbreviated lyrics to the song adapted from work by Loreena McKennitt. Screenshots were all I had to illustrate the process and its results, however – anyone interested in exploring the dynamic collation in full would need to first download Juxta, then get the set of files from me. We had a great tool that encouraged discovery and scholarly play, but it didn’t facilitate collaboration and communication. Now, in 2012, I can finally show you that set in its entirety.

The dream of Juxta for the web has been a long time coming, and we couldn’t have done it without generous funding from the Google Digital Humanities Award and support from European scholars in the COST Action 32 group, TextGrid and the whole team behind CollateX. As Project Manager, I’m thrilled to be a part of the open beta release of the Juxta web service, accessed through version 1.6.5 of the desktop application.

I imagine at this point you’re wondering:  if I want to try out the web service, do I still have to download the desktop application? Why would I do that?

Over the past year, our development team’s efforts have been directed to breaking down the methods by which Juxta handles texts into ‘microservices’ following the Gothenberg Model for collation. We designed the web service to enable other tools and methods to make use of its output: in Bamboo CorporaSpace, for example, a text-mining   algorithm could benefit from the tokenization performed by Juxta. We imagined Juxta not just as a standalone tool, but as one that could interact with a suite of other potential tools.

That part of our development is ready for testing, and the API documentation is available at GitHub.

However, the user workflow for Juxta as a destination site for collations on the web, is still being implemented. Hence this new, hybrid beta, which leverages the desktop application’s interface for adding, subtracting and editing documents while also inviting users to share their curated comparison sets online.

This is where you come in, beta testers – we need you to tell us more about how you’d like to user Juxta online. We know that collation isn’t just for scholarly documents: we’ve seen how visualizing versions of Wikipedia pages can tell us something about evolving conversations in Digital Humanities, and we’ve thought about Juxta’s potential as a method for authenticating online texts. But as we design a fully online environment for Juxta, we want to get a better sense of what the larger community wants.

I want to thank everyone who has set up and account and tried out the  newest version. We’ve seen some really exciting possibilities, and we’re taking in a lot of valuable feedback. If you’ve held off so far, I ask that you consider trying it out.

But I don’t have any texts to collate!

No worries! We’re slowly populating a Collation Gallery of comparison sets shared by other beta testers. You might just find something there that gets your creative juices flowing.

Explore Juxta Beta today!

** cross-posted on the NINES site **

Beta-release of Juxta includes online sharing

Calling all beta testers!

Over the past few months, NINES and the developers of Juxta have been busy adapting the application for use on the web. In order to expand our testing capabilities, we’re releasing a version of the desktop client that offers users the ability to share comparison sets online.

If you have any sets of witnesses to a particular work that you would like to collate and share, we invite you to sign up and download the beta version  to try out some of our online features. Please keep in mind that this is a trial version of the web-service, and may be subject to changes and updates over the next few months. Joining us now ensures that your feedback will make the full release of the software better than we could manage in-house.

 Please help us make Juxta better!

New Partnership with the Modernist Versions Project

Great news! Juxta is at the center of a new partnership agreement between NINES and the Modernist Versions Project (MVP). The agreement provides the MVP with programming support to integrate Juxta with a digital environment for collating and comparing modernist texts that exist in multiple textual variants. The MVP, a project based at the University of Victoria, will enjoy full access to the Juxta collation software, including the existing stand-alone application and the web service now under development. The MVP is expected to provide a robust environment for testing and enhancing both versions of Juxta.

Juxta v1.6 Release

Juxta v1.6 is now available from the download page!

New features:

  • Building on Juxta’s existing support for <add>, <del>, <addspan>, and <delspan> tags, Juxta v1.6 now allows you to control the collation of revision sites by accepting or rejecting additions and deletions to the witness text.
  • The contents of TEI <note> tags now display in the right column of the Document Panel and are excluded from the text collation.
  • Default XML parsing templates are provided for TEI files. As in Juxta v1.4, you can customize these templates or create new ones.
  • A new edit window allows you to make changes to a witness text and save the altered version as a new witness.

This development was made possible by the support of the Carolingian Canon Law project at the University of Kentucky.

Juxta Camp

On July 11-12, 2011, a group of Juxta users and collaborators met at the offices of Performant Software Solutions LLC in downtown Charlottesville, Virginia. The group included Abigail Firey of the Carolingian Canon Law Project at the University of Kentucky; Gregor Middell of Universität Würzburg; Ronald Dekker from the Huygen Institute; Jim Smith from the Maryland Institute for Technology in the Humanities (MITH); Dana Wheeles and Alex Gil of NINES; and Nick Laiacona and Lou Foster from Performant Software. The group previewed new features available in Juxta 1.6 (including changes to revision site display and TEI note tag support), then worked on planning for Juxta WS 1.0, the Juxta web service now in development.

Ronald, Gregor, Lou, and Jim (out of sight, with laptop) hacking at Juxta Camp

Ronald, Gregor, Lou, and Jim's Laptop

Abigail Firey and Alex Gil spoke about what the developers of Juxta could learn in general from considering the particular needs of their textual projects. Jim Smith gave a presentation on Corpora Space Architecture. Gregor Middell and Ronald Dekker spoke about their work on CollateX. Gregor talked about using an offset range model of text markup; Ronald spoke about the Gothenburg abstract model for collation. Lou Foster presented the features new to Juxta 1.6. Finally, Gregor, Ronald, Jim, Lou, and Nick put their heads together in hacking sessions to work on offset ranges, the Gothenburg pipeline model, and the Juxta web service.

You can read notes from Juxta Camp on the Juxta wiki.

Juxta v1.4 Release

Juxta v1.4 is now available in the files area!

In addition to importing UTF-8 encoded plain text files, this new version of Juxta now supports direct import of XML source files in any well-formed schema, include TEI p4 and p5. No more preparing specialized versions of your witnesses for import into Juxta. Just import them and instantly start collating and learning things about your texts! You can configure how Juxta parses the tags it encounters. It can either include them in the reading copy, exclude them, or collate the tag type. For example if <b> changes to <i> for the same word across different witnesses, Juxta can help you detect this move. Complete details are in the online documentation on this website.

Other new features include:

  • The ability to pick a target XPath from which to read a document from an XML file.
  • The user can now easily examine the XML source of a difference and compare the XML of the source and the witness.
  • Support for <add> <del> <addspan> and <delspan> TEI tags. These marks are now visible in the presentation of the document.
  • Automatically reads bibliographic data of TEI XML sources.
  • XML source files contained in the JXT file can now be exported by the user.
  • User can now take a screen shot of the currently displayed comparison.
  • The display font and font size are now configurable.

This development was made possible by the support of the SHANTI group at the University of Virginia.

Juxta Receives Google Digital Humanities Award

Good news!  Google has offered its support to help us develop Juxta into a web application:

http://googleblog.blogspot.com/2010/07/our-commitment-to-digital-humanities.html

We are thrilled to have received this competitive award, and look forward to working to optimize Juxta for the web.

Here is an abstract of our application for the Google Award:

With the support of a Google Digital Humanities Research Award, we propose to transform Juxta into a web-based application integrated with Google Books. Scholars could use such a tool to track changes in language over time and to test literary and historical theories through comparative analysis of texts.

As the largest single part of the general remediation of the global library to digital formats, the 12,000,000+ books digitized by Google represent a major opportunity for scholars interested in the history of texts and editions. We want to know how Charles Dickens and Henry James changed their novels as they went through different editions in their lifetimes; and we also want to see the changes introduced by later editors, in later printings.  We want to collate versions of poems published by Sylvia Plath and Walt Whitman to discover their revisions.  We want to compare digital texts of uncertain origin with known versions, as a mode of authentication.

Using Juxta, a scholar can answer these questions and many more. Juxta comes with several kinds of analytic visualizations. The primary collation gives a split frame comparison of a base text with a witness text, along with a display of the digital images from which the base text is derived. Juxta displays a heat map of all textual variants and allows the user to locate all witness variations from the base text. The histogram visualization displays the density of all variation from the base text and serves as a useful finding aid for specific variants.

A web based Juxta would be very similar in function to the Juxta desktop application. Scholars could upload texts into a private storage area and compare them against books from the Google Books corpus. The scholar could also embed the collation into their own website (as with Google Maps) with an HTML code snippet that we will generate. Our goal would be to eventually integrate Juxta directly into the Google Books interface, allowing scholars to compare any two books for which they have access to the full text.

Juxta and excess: The case of Aimé Césaire

(Guest post by Alex Gil – read full entry at NINES)

I’m a PhD candidate in the English Department at the University of Virginia currently working on a digital edition of Aimé Césaire’s early works under the sponsorship of  l’Agence Universitaire de la Francophonie and ITEM. Some of this work also moonlights as my rather schizoid dissertation (read French poet/English Department) and I consider it part of my long-term goal of generating and sustaining enthusiasm for reliable digital editions of neo-canonical Caribbean literary texts. I am rather new to this blog, but not to Juxta. I started working with Juxta around the time when I started working with Aimé Césaire’s signature poem Cahier d’un retour au pays natal, roughly 2 years ago. At the time, Juxta saved me enormous amounts of time proofreading my retooled OCRs and generating an apparatus. It was later, when I started working with Et les chiens se taisaient, a longer text with substantially more variants and transpositions, that Juxta revealed to me both its current shortcomings and its ultimate promise.

We could say that Aimé Césaire was a migratory poet in the fullest sense: He had perfect pitch for context and used it to quickly adapt his voice to new audiences as his work traveled around three continents. As a student of literature he was as much a product of his Paris education as he was of the journey that brought him there and back to his home base in Martinique. His major works, and the many revisions they were subjected to during his lifetime, provide the final testimony to his restless poetic trajectory.

To the textual critic who approaches this corpus for the first time, one feature stands out above all others: The sheer number of transpositions from one version to another. In past conversations, I have likened his stanzas and lines to Lego blocks in order to quickly explain how he seems to have an utter disregard (or is it exactly the opposite?) for sequence. In the case of Et les chiens se taisaient the text begins its life as a three-act play on the Haitian Revolution, has an adolescence as a poetic oratorio with heavy Christian overtones and grows up to be a heavily abstract play about the struggle between universal Slave and Master figures. Throughout this transformation, stanzas and lines are bandied about without care for consistency, sometimes going from one speaker to his or her antagonist in a later version.

When I began using Juxta for Et les chiens se taisaient, I only expected the same functionality that was perfect to the T for Cahier d’ un retour au pays natal, but as soon as I started working with the first two instantiations of the text, the manuscript and the oratorio, obstacles and yearnings started cropping up. In its current build (1.3.1), Juxta struggles with long texts with many transpositions. After several meetings with NINES and Nick Laiacona, it became clear that a memory issue combined with the graphic rendering of connectors was the culprit. Apparently, Juxta has a built-in limit to the amount of internal memory it uses from the machine, and rendering the graphic connectors puts substantial pressure on these resources.  To account for transpositions, Juxta allows you to mark “moves” manually from one text to the next, creating a list of these moves as you go along in one of the bottom panels. This system is intuitive and easy to use, and complements the automated functions nicely, but it becomes unwieldy in a collection with heavy traffic. While Cahier d’ un retour au pays natal had a total of four, albeit significant, moves in its four major versions, Et les chiens se taisaient has an overwhelming 64 moves just between the manuscript and the first published version!

Click here to read the full entry at NINES.

Using Juxta in the Digital Variorum Edition of Ezra Pound’s Cantos

(Guest post by Mark Byron, University of Sydney, Australia)

I am currently assembling the digital variorum edition of Ezra Pound’s Cantos with Richard Taylor. This edition aims to collate all published versions of every canto, including page proofs and setting copy, where available, and to integrate digital reproductions of illustrated capitals in deluxe editions, audio and video recordings of Pound reading his poetry, and a very large cache of annals material pertaining to the production of his epic poem over the course of sixty years.

We have chosen to use Juxta to collate the very extensive set of variants for each canto – the total number of witness files runs into the thousands – because this application addresses a number of issues inherent in such a project.

The Juxta interface lists any chosen comparison set, which, for example, might be as small as ten witness files for Canto VI or as large as forty witness files for Canto IV. The degree of variation of each witness text from a chosen base text is visually represented next to each file in the comparison set list. This provides an efficient means to identify the more eccentric versions (bibliographically speaking) of a particular canto. A curious reader viewing the Edit Note in the figure below might choose to compare the 1922 version of Canto II published in The Dial with the so-called “Base text” – the 1975 New Directions edition of the Cantos that was adopted by Faber in place of its own edition, marking the end of the separate stemmatic lineage of the British edition of the text. (It should be noted that any witness file may be chosen as a base text for the purposes of a particular collation.)

Juxta’s elegant interface provides immediate visual information concerning the kind and degree of variation between the two witness files represented here: the reader is already aware of the canto’s changed status after 1922 from the “Eighth Canto” to Canto II, and can see – immediately – that the heaviest revision occurs in the opening lines, a revision that ushers in the now-iconic address to Robert Browning (the rhetorical and semantic implications of which can be processed by means of careful comparison of the two versions).

Variation is visualized in the integrated heat map, and is complemented by the Histogram function, allowing the reader to see exactly at which points the densest variation might occur in the canto. In this case, the beginning of the text bears the most acute variation, but other significant variations occur throughout the canto, including the final lines. To be able to see this at a glance is truly a powerful aid to scholars, even those intimately familiar with the textual state and history of this poem.

The complexity of Pound’s text is legendary, and not all bibliographic features can be captured in either codex or digital editions. Yet Juxta provides the means to collate Greek text, including diacritics (seen in the example above), and the increasingly substantial presence of Chinese in later instalments of the Cantos. Indeed, any element present in the Unicode palette can be deployed in a Juxta text file. While those ideograms drawn by hand (often incorrectly) and included in published editions of the Cantos are not represented in the text field, photographic reproductions of them can be added as Edit Notes at precisely where they occur in a particular canto.

These features provide excellent reasons for the digital variorum edition of Pound’s Cantos to employ Juxta. Potential development of an HTML applet – allowing for an integrated collation function within a web-based edition – is exciting news indeed.

Mark Byron
Department of English
University of Sydney, Australia

Working with non-Roman alphabets in Juxta

Now that Juxta 1.3 has been refined and released, the development team at NINES has been discussing new directions for the software. First and foremost is the adaptation of Juxta’s collating power for texts in languages other than English. Comparisons of texts in French and Italian work pretty well, but we’re still investigating the necessary diacritics to make such operations more exact. However, it seems that scholars working with non-Roman alphabets have been left out of the conversation.

Do any Juxta users out there have any experiences with foreign language collation to share with us?