About the project
Part I: Technology and Techniques
Our intention for this project is to provide a reader-friendly display of the novel Martie the Unconquered by Kathleen Norris and to develop tools to assist in the accumulation of textual data for literary analysis. We have accomplished our goal of an accessible display by using XSLT, Javascript, and CSS to transform and display the original TEI XML encoded file. Additionally, this website uses XHTML and CSS. Our site also provides tools for literary analysis, which include a concordance tool and a keyword in context tool. The tools both allow the user to access the full text, as well as provide the attractive option of searching only within the spoken words of a character. All tools created for this project were developed using the scripting language PHP. These tools are not text-specific, so they can be used for the analysis of other XML encoded texts.
Our group encountered material difficulties at each stage of our project's development. During the initial consideration of this novel, a fair number of tools and encoding techniques were proposed. All ideas at this stage were exciting, but not all, as we discovered, were doable, viable, or useful. Most of these impractical ideas, such as graphical displays of data, were recognized as beyond our abilities during the project's conceptualization. A few good ideas turned out to be obstacles to our progress, and had to be revised to account for limitations of time and skill. One such instance occurred during the development of the user interface for our project. Upon consultation with our professor, we learned that our user-friendly layout would be very difficult to develop using PHP. It was at this point that the group decided to branch out into the more complicated Javascript, as this scripting language is ideal for performing operations that do not require a server. Likewise, during the encoding process, issues of proper encoding logic arose. Our decision to tag all quotes within the text presented difficulties when a quote spanned paragraphs and/or pages. We solved this particular problem by using a "part" attribute within the quote tag to distinguish portions of a singular quote that had been separated for whatever reason. In addition, our decision to encode page breaks, based on the first edition printing of the novel, led to a few problems. In this particular edition, many page breaks occur in the middle of words. Not only did this present theoretical issues of maintaining the sanctity of the original text (described in the following abstract), but it also had the potential to impede the effectiveness of our search tools. We resolved this dilemma by including the hyphens as metadata by using a "hyphen" attribute in the page break tag.
Part II: Theory and Possibilities
Our project provides text analysis tools and multiple digital formats, including a reader-friendly display, of the novel Martie the Unconquered by Kathleen Norris. The project requirements call for the development of a concordance tool and a keyword in context tool for the full text. The group decided to provide the additional opportunity of limiting the parameters of these tools to within a single character's quotes in the text. The option to create a concordance or search for keywords within a single character's quotes is extremely thought-provoking. How might this option prove useful to a user's research? A concordance of a character's speech throughout a novel provides unique data that the user can manipulate to develop new ideas about the text. One could construct a theory of character development based on the data accumulated from such a tool. When applied to the entire body of a single author's work, this tool could provide fascinating information. Patterns and connections will be found in these data sets that give insight into the method of an author, particularly as it applies to an author's use of characters in her novels over time.
We based our encoding of the text on the first edition printing of the novel by Doubleday. Our decision to provide a two-page display of the file required that we encode the page breaks. Rather than divide the entire file and encode the resulting equal page-length segments, which was a bit too random for our tastes, we decided to follow the choices of the publish as presented in the first edition. Providing the page numbers as they appear in the text naturally followed from this decision. Because of the tools the group chose to provide, we were also obligated to encode every fragment of spoken word throughout the novel. A reading of the novel revealed that the narrator spends a good deal of time relating dialogue that the reader is not privy to firsthand. After considering the implications of encoding this narrated dialogue, the decision was made not to encode this information within quote tags. We felt that this would necessitate far more subjective interpretation than we cared to encode.
A reader-friendly display of the text is an important factor in the overall success of digital archives. Our website provides three format options in which the user can access the text: a plain text file, for users who may want to encode the document using a different standard or markup language; the XML-encoded file, for the user who may wish to analyze the text using different tools; and the XML file transformed using XSLT and displayed using XHTML, Javascript and CSS, for the user who desires a visually-pleasing, easily readable display that simulates the familiarity of a physical book.