Dictionary of Love

Read or Search the Dictionary


Read about the Dictionary:


Select Dictionary Entries by
Random Phrases:


Part of Speech:


Courtship Stage:


Implied Readers:


Read about the Project:


The Dictionary of Love
Edited by Emily K. Davis

Center for Applied Technologies in the Humanties

Presenting the Dictionary of Love as a Digital Text using XML

In his Introduction to the Harleian Miscellany (1744) Samuel Johnson argues that "small tracts and fugitive pieces" are as requisite to understanding a people as are great works of literature:

From pamphlets, consequently, are to be learned the progress of every debate; the various state to which the questions have been changed; the artifices and fallacies which have been used, and the subterfuges by which reason has been eluded: in such writings may be seen how the mind has been opened by degrees, how one truth has led to another, how error has been disentangled, and hints improved to demonstration, which pleasure, and many others, are lost by him that only reads the larger writers, by whom these scattered sentiments are collected, who will see none of the changes of fortune which every opinion has passed through. (242)

The Dictionary of Love is one such example: it records obscure information about manners and linguistic usage that prove very useful for interpreting eighteenth-century plays, novels, and essays. Furthermore, as a digital publication rather than a printed volume, it makes information much more accessible and formats it in ways more useful for modern students.

In an essay published in 1996, Peter Shillingsburg argues that traditional scholarly editions, that is, those in codex form, have failed to "[revolutionize] literary criticism and ... do not fulfill the needs or desires of the cognoscente..." (24). He writes that the most obvious advantage of printed editions over electronic ones—that "they can be read by the unassisted naked eye"—cannot compensate for the more comprehensive research possibilities offered by digital compilations (24). Although, technologically speaking, anything ten years old is considered obsolete, Shillingsburg's analysis of the functions and benefits of the electronic edition is still apposite, and I begin my justification for producing a digital edition of The Dictionary of Love (1753) here.

This project is a resource for studying eighteenth-century English gender roles and courtship rituals and guidelines. I am not merely presenting a text to be examined on its own, but a work of reference used for reading. The digital medium is particularly useful for presenting reference works like dictionaries and encyclopedias, since it encourages users to, as Shillingsburg puts it, "copy parts and react to parts, and to reconfigure parts, and to leave our two cents' worth in the margins for the benefit of posterity" (29).

An electronic edition is more accessible than a book in a library. The DOL and its subsequent editions are not available as reprinted books. Copies are only available on microfilm in several university libraries or in rare books collections. Microfilm provides a relatively primitive scanning and viewing apparatus and special collections understandably have numerous restrictions on the handling of their materials. C. M. Sperberg-McQueen points out, in his rationalization for the use of a standardized markup language, that "scholarship is materially easier when we can have both books [editions] out on our desk at the same time for direct comparison" (57). He recalls the past practice of "libraries... chain[ing] books to the shelves" to illustrate the problems of working with digital editions that are technologically incompatible with one another (57). The DOL is not even comparable to a book chained to a shelf; it is in fact more difficult to find and less convenient to study, having never achieved a modern-day reprinting. It seems that an electronic edition is now the most viable option for publishing this text.

Printed editions, which can contain several volumes, are frequently cumbersome and expensive, and even the largest and costliest tome may not include all of the relevant scholarship associated with a particular text. The substantial monetary and time investments necessary to produce such editions often discourage their publication at anything more frequent than ten-year intervals (Sperberg-McQueen 46). Hence, "our libraries are full of current editions twenty-five, fifty, or one hundred years old, and of editions even older which continue to be consulted although no longer current" (Sperberg-McQueen 46). The work of preparing a digital edition is typically an ongoing process that continues even after publication; in what follows I describe the initial steps taken to put the DOL online in a form useful for students of eighteenth-century literature.

Even a cursory examination of text encoding schemes and webpage layout guidelines reveals a number of options for creating a digital edition. Among these, SGML has become the customary encoding mechanism in humanities research.

Sperberg-McQueen succinctly describes the benefits of this language: "the structure of SGML markup makes it possible for markup to become much more elaborate and subtle, without overwhelming the ability of either software or users to deal with the complexity" (56). An SGML-based editor permits users to define element tags that parse the text according to the types of analysis they wish to do or expect other users to do. In Hockey's words, SGML markup is "descriptive," and it allows programs to perform "functions, such as indexing, searching, printing and hypertext linking" on the semantically encoded text (33). Users can essentially customize their searches in digital text based on their research goals; many different applications can be carried out a single encoded file without altering that file (Hockey 33). SGML files also require a DTD, which provides a description of the elements and their attributes and indicates how all of the elements are related. Moreover, the DTD provides instruction about how to organize the SGML files; otherwise, the files are just pieces of data that do not inherently suggest any particular relationship or convey any clear information (Hockey 34-35). This organization is represented by how the tags are nested, or hierarchically arranged, and is not necessarily pre-determined. In short, the SGML schema puts nearly all of the decisions—from creating and arranging the tags to deciding how that information should appear on a webpage—into the hands of the encoder.

My project utilizes XML, a subset of SGML. XML developed largely as a result of the work done on the Text Encoding Initiative (TEI), a project that aims to develop encoding guidelines, in the form of DTDs and tag sets, for digital humanities research (Short 17). The first version of XML approved by the World Wide Web Consortium in 1998 suggested that it would also revolutionize data organization and representation in the mathematical and scientific fields (Mackenzie, Sikorski & Peters). XML is not only a technical means for disseminating documents; it many applications it is a tool for analyzing the information contained in documents. One could argue that XML's primary contribution is not related to computer science as much as it is to general critical thinking. It is an example of what Rockwell describes as "this intellectual process of iteratively trying questions and adapting tools to help us ask new questions" (211). Short's prediction for XML also reflects this sentiment:

As the wider XML 'revolution' gathers pace, there are signs that some of the long- term significance of the TEI will be related to XML, and the opportunities it is starting to bring to textual scholarship, not only in burgeoning quantities of encoded texts...but also in the development of new tools to exploit them. (17)

Thus, XML provides a flexible structure that users can optimize based on their research needs and a clear syntax that computers can understand and process. After choosing to use XML, the next step was to develop a tag set that would emphasize the pertinent elements of the DOL and create a foundation for putting it on the Web.

Parsing the Text of the Dictionary of Love using XML Tags

Escaping the "pre-defined tag sets" (Hockey 33) of markup languages like HTML is extremely advantageous because it allows for editors to customize their tags based on the text and on what elements of the text they wish to highlight. However, the tags must still be arranged hierarchically, and balancing the issues of accurately representing the text, analyzing that text and creating a manageable, consistent tag set can be challenging.

I began by noting the broadest descriptions of the elements I wanted to encode: the main entry, which includes the word being defined, the definition and the analysis based on that definition. At the most basic level, every record in the XML file would include the entry number (<number>) and the main word of the dictionary entry (<mainword>). The next step was to decide what format or content related aspects of the entry I wanted to highlight. In making this decision, I considered who my audience was likely to be and what types of analysis the text might be useful for.

Students studying the literature or the courtship history of eighteenth-century England are my primary audience. I also concluded that sociolinguists and lexicographers might find the dictionary an interesting document for the study of language and gender and semantic development.

With these potential users in mind, I attempted to choose tags to be nested within the definition element that would facilitate various analyses of the text. For example, these tags would encode dialogue, sub-terms and sub-definitions, proper names, uses of foreign language phrases and uses of the main word in an exemplary context. Once the information was coded, users could search for relevant entries without having to scroll through a long document. Choosing the elements to be nested under the main analysis tag (<analysis>) was simpler because the tags did not have to correspond to an exact feature of the printed text; rather, these tags would contain pieces of information that I supplied.

The tags originally included <gender>, <pos> (part of speech of the main word) and <courtshipstage>. For example, as defined by my DTD, the content of the <gender> tag can be 'male' or 'female.' The <courtshipstage> tag is further divided into the segments <starting>,<negotiating> and <outcome>, each of which has three possible choices: starting (meeting, wooing, flattering); negotiating (dating, conflict, jealousy); and outcome (marriage, sex, breakup). For example, 'difficulties' is defined as follows:

They are the zest of a passion, that would often flatten, languish, and die without them. They are like hills, and tufts of trees, interspersed in a country, that interrupt the prospect, only to make it the more agreeable.

In the <courtshipstage> tag this entry might be represented like this: <courtshipstage><negotiating>conflict</negotiating></courtshipstage>. Users could then bring up all the entries that dealt with negotiating or, more specifically, with conflict. Also included under the <analysis> element are the <index>, <litex> (contemporary literary example), <comment> and <variation> (between editions) tags, which allow me to include additional information that may be useful in understanding or searching for entries.

Other tags that do not nest within the <definition> or the <analysis> element include <syn> (synonym), <explanation>, <maxim> and <reference>. These elements contain information that is found, as I perceive it, outside of the body of the definition. For example, in the header, the main word 'forsake' is followed by the phrase "to quit, leave, desert, cast off," after which the definition of the word begins. To accommodate this header information consistently, I chose to eliminate most of the function words used in this context and encode each word as a separate synonym. The code for this entry would look like this:

<syn>cast off</syn>
(start <definition> tag here)
The final example shows how the <reference> tag functions:

In addition to the semantic tags, I also included several basic formatting tags to indicate paragraphs, emphasis and lists. These can occur within either the <definition> or the <analysis> elements. Such tags reflect the formatting of the original document, but they can also be used to reformat the information on a webpage by substituting HTML code for the XML tags.

In the process of marking up the text, I discovered that certain tags or arrangements would not work and must therefore be edited or sometimes removed altogether. Many of the necessary changes involved rewriting code to allow a term to have more than one entry for analytical elements like <pos> or <courtshipstage>.

For instance, the definition of the term "To Adore" might qualify, depending on the reader's interpretation, as an example of 'flattering' or 'wooing' or both, within the <courtshipstage> tag:

This sacred word is adopted into the love-language, and proves two things.
First, That the men are perfectly knowing, and acquainted with the vanity of women, who are apt to take themselves for little goddesses, or at least divine creatures.
The Second, That they are not sparing for any expressions they thing may make them lose the small share of sense their vanity may have left them.
I love: love did I say? I adore you! The true meaning of which fine speech is, "The secret of pleasing consists in flattering your self-love, at the expence of your understanding. I am straining hard to persuade you, that you have distracted my brain; not that it is so in the least; but, whilst I laugh at you in my sleeve, for your swallowing this stuff, I may gain wherewith to laugh at you in good earnest.

I therefore adjusted the DTD to allow the <courtshipstage> elements to occur more than once so that the tags could contain two or more stages. The code now reads:

<!ELEMENT courtshipstage (#PCDATA | starting | negotiating | outcome)*>
<!ELEMENT starting (meeting | wooing | flattering)*>
<!ELEMENT negotiating (dating | conflict | rivalry)*>
<!ELEMENT outcome (marriage | sex | breakup)*>

The asterisks indicate that the <courtshipstage> element can contain zero or more occurrences of the 'starting,' 'negotiating,' and 'outcome' data and that, subsequently, each of those elements can contain zero or more occurrences of the data listed within their corresponding parentheses.

It also became clear that some of the analytical categories would benefit from having expanded tags that included more specific details about that particular element. Within the <pos> tag, for example, I nested additional elements to describe, if the term was a noun, whether it was a person, emotion, behavior or thing. This would allow users to search only for, say, emotions, rather than having to pull up all the nouns and sift through them. Entries like 'Caprice,' 'Declaration' and 'Haughtiness' might be designated <pos><noun>behavior</noun></pos>; 'Delicacy,' 'Lust' and 'Zeal' may be represented as <pos><noun>emotion</noun></pos>, and so forth.

I rejected several tag ideas because they would almost certainly have been too obscure or specific to be useful. For example, attempting to encode the text using such tags, including <actor> and <acted on>, which would have gone within <pos><verb></verb></pos> and would have contained the content 'male,' 'female' or 'both,' called their utility into question: since there was no question of homosexuality in the DOL, at least as far as I have discovered, noting the gender of the participants in an exchange did not need to be a priority. Although it may be worthwhile to revisit this idea in the future, it was not a fundamental enough distinction to warrant being included in the first draft of this project. Moreover, the purpose of the tag was not lucid, and it is unlikely, in this case, that most people would have thought to search by the same terms that I used to encode. In short, these tags were neither functional nor universal enough to make the first cut.

Putting the Document on the Web

The last major stage of creating a website with XML involves using the transformative language, XSL. XSLTs act on XML files the way CSS act on HTML encoded material. All of the webpages on the World Wide Web are written in some form of HTML. Unlike XML, however, HTML uses predefined tags, and is therefore inappropriate for analytical purposes. An XSLT translates XML files into XHTML, which is based on the more strict syntax rules of XML, but shares the tags of HTML, thus allowing the document to be displayed as a webpage.

According to the W3C (World Wide Web Consortium) XSL is made up of three parts: XSLT, a language used to transform XML to XHTML; XPath, a language used to sort through elements and attributes in an XML document; and XSL-FO, a language used to format XML, which I am not using in the project.

The basic individual letter pages display a simple table with each word and definition on one row, in two different columns; however, I would like to modify that to display the entries more like they appear in the printed text. That is, I would like to list the main word and have the list of synonyms, maxims, explanations or phrases follow underneath with the definition coming last, in a vertical arrangement. Additionally, I would like to include a link, where appropriate, to take the user to the variations among editions. Some words do not change from edition to edition.

Here is an example of the code for a page that displays, in a table, all of the terms that begin with the letter 'B' and their definitions:

<?xml version="1.0" encoding="ISO-8859-1"?><!- DWXMLSource="AllWordsCode.xml" --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template name="BPage" match="/dictionary">
<title>DOL Entries "B"</title>

<div align="justify" id="header">

<h1><font size="+4" color="#CC0000">"B"</font></h1>

<table width="600" border="1">

<xsl:apply-templates select="record"/>


<xsl:template match="record">
<xsl:if test="starts-with(mainword, 'B')">
<td><xsl:value-of select="mainword"/></td>
<td><xsl:value-of select="definition"/></td>

Tags such as <xsl:template> and <xsl:value of> are examples of the XSLT component of XSL. The element <xsl:template> is fairly self-explanatory: its content describes how the XML code is to be represented in XHTML. As always in XML, it is all about nesting tags. For example, this template says that below the page heading, the body of the page should include a table that has two columns, one for the 'Entry' or dictionary term and one for the term's 'Definition.' Because all of the information is already recorded and encoded in the XML document, it is only necessary to direct the browser to it; it does not need to be written out again here. Thus, the code states that the content for the table is a template that pulls out all the information from the <record> tag in the XML file (<xsl:apply-templates select="record"/>).

Then, however, the coder must further specify what the "record" template is supposed to include. The attribute 'match' indicates which tag the template refers to: here again, the template to be nested within the main page is <xsl:template match="record">. For this page, I only wanted to list the 'B' terms, so the code indicates that, out of all the records in the XML file, for this particular template, it should only show those whose <mainword> entry begins with the letter 'B': <xsl:if test-"starts-with(mainword 'B')">. Then, for each record that meets this criterion, the browser should pull the "mainword" entry and the "definition" entry and format them in a table:

<td><xsl:value-of select="mainword"/></td>
<td><xsl:value-of select="definition"/></td>

The way that the information is written in the <xsl:> tags illustrates XPath expressions. For example, the first line of code, after attaching the XML file and giving the XSL declaration, is <xsl:template name="BPage" match="/dictionary">. The backslash before 'dictionary' tells the browser that in this template, everything is going to be specified from, or follow, the root element 'dictionary.' This is an example of an absolute path. If there is no backslash, as in the tag <xsl:template match="record">, it is a relative path, and it says, basically, that this template starts at the node with this name (i.e. "record"), not at the root of the document. In this way, XPath expressions tell the browser where to find information: they follow the hierarchy of the XML document (as laid out in the DTD) and map a "path" based on how the elements are nested and arranged.

The analytical elements will essentially serve as index terms, and the homepage will contain a menu from which users can choose part of speech, courtship stage, literary example, etc. For example, if a user chooses 'wooing' from the courtship stage menu, he will be taken to a list of linked terms that fall into that category. From there, they can navigate to each of the terms. Additionally, the homepage will contain links to the interpretive essays about the dictionary and the project. Any term from the DOL that appears in the essays will be linked so that users can select it and see the entry.

Although my website at present is very basic, more detailed XSLTs, and later CSS, can be used to create stylistically sophisticated pages that focus as much on design as they do on content. I am still in the process of deciding how to present the analytical content of the XML document, such as the textual variations, parts of speech and literary examples. In the coming months, I will experiment with various fonts, page layouts, graphics and other web-design features to create a web resource that I hope provides useful material in an engaging and dynamic fashion.