<title>Title</title>
<short-title>Sigla</short-title>
<author>Author</author>
<editor>Editor</editor>
<source>Source</source>
<date>Date</date>
<notes>Your notes.</notes>
In addition to plain text files, documents can be loaded in a Juxta specific XML format, which is depicted above. This format allows for the custom placement of location markers (which could indicate pages, stanzas, paragraphs, lines, etc.) and the association of images with location markers. It also allows bibliographic information for each document to be stored (see Bibliographic Information Dialog).
The <bibliographic> section of the Juxta XML document is largely self-explanatory to scholars familiar with text markup. All fields must be included even if they contain no data.
The <text> section contains the actual text of the document. However, the text to be compared is not nested within a hierarchy of XML tagging; it is still, essentially, plain text. Line breaks in the text continue to be interpreted by Juxta as the end of a prose paragraph or the end of a line of verse. Within this section, spans of text can be marked with location markers (those familiar with the Textual Encoding Initiative may prefer to think of them as “milestones”). Location markers are specified using a pair of unary tags related by a common reference id. Juxta uses two unary tags instead of a single opening and closing tag to avoid the problems inherent in a strict hierarchical marking system. Location marker tags have the following format:
The <m_s> tag is a start tag (“m_s” stands for “milestone start”). All text following this tag will be included in the marked location until the corresponding end tag is found. The <m_s> tag has the following attributes:
“id” – This is a unique identifier for this location. No other <m_s> tag in this document may share this identifier.
“type” – This is usually a short prefix that denotes the type of marker. Its primary effect is on the numbering of locations. See the “Damozel” sample for examples of page, stanza, and line numbering.
“n” (number) – This is the ordinal number of this location. This attribute is optional; if it is not specified then the number for this location is considered to be one greater than the number of the previous section of the same type.
“img” (image) – This is the image to associate with this location. This attribute is also optional. If it is specified, the image file must reside in a sub-directory named “images” within the folder containing the XML file. Juxta can read JPG and GIF file formats.
This is an end tag. It ends the location marked by the <m_s> tag with the “id” attribute equal to this tag’s “refid”.
There is one thing that every scholar manipulating and preparing digital texts should know about “plain text” files: there is no such thing as a plain text file. All text files, whether they explicitly specify it or not, are encoded when they are saved to disk. Possible encodings include “utf-8” and “cp-1252” formats.
This fact can cause problems as the file moves from one computer and operating system to another. This is because different computer operating systems make differing assumptions as to how “plain text” or “ascii” files are encoded. For example, files prepared on Windows machines with applications like Notepad can generate code in the CP-1252 format, which is a Microsoft Windows™ specific encoding. If these files are then shared across the network to a Mac computer and opened as “plain text” they may be corrupted on the screen. Characters such as “ö” and “æ” may appear like this:
We recommend using a cross-platform compatible encoding format such as UTF-8. When Juxta loads plain text files, it assumes that they are encoded in this format. Most text editing programs will allow you to specify the encoding type of a plain text file. For example, to generate a UTF-8 encoded text file from Microsoft Word 2002, take the following steps:
1) Select “Save” or “Save As..” and then select “Plain Text (.txt)” as the type of the file.

2) The dialog below will appear. Select “Other encoding” and then select “Unicode (UTF-8)” from the list of encodings.

3) Click OK to save.

Juxta also allows the user to specify the encoding of files when they are loaded. See the section on the Add Document Dialog for more info.