Responses by Meredith Martin, project director; Mary Naydan, project manager; Gissoo Doroudian, user experience designer; and Rebecca Koeser, technical lead, Center for Digital Humanities.

Background: The Princeton Prosody Archive (PPA) is an open-source, full-text searchable database of more than 4,000 digitized works on prosody in English published between 1570 and 1923. Prosody today means the study of versification and pronunciation. The database searches metadata and full text at the same time. This database is the only resource of its kind. The PPA is an important pedagogic tool to help students make discoveries about how poetry was read in the 19th century and how it is read today. It is also an important research tool for scholars working in the field of historical poetics.

The PPA empowers students and scholars to understand poetics in all of its historical, linguistic and educational valences, and to recognize literary concepts such as meter and rhythm as historically contingent and fundamentally unstable.

Design core: “The colors used and the playfulness seen on the PPA logo and the web application reflect the colors and expressions of the ’80s,” says Gissoo Doroudian. “The original bibliography for the project is T.V.F. Brogan’s English Versification, 1570–1980: A Reference Guide with Global Appendix, published in 1981. Brogan, a literary scholar and computer programmer, created an online version of the project as a PDF in 1998 that is still housed at Oregon State University under the auspices of the journal Versification. Though the links are long broken, they bear the bright blue traces of Brogan’s hopes for an online version of the project that would generate a network of discourse prior to the turn of the century.

“Because of Brogan’s dream of the ’80s, Meredith wanted to employ an ’80s color theme for PPA to communicate to users that the project is inviting and playful. This choice led me to an initial focus on revising the color and visual theme of the web application. The bright colors now seen on the site were selected to convey a sense of playfulness, mutability, and ongoing discovery and conversation, reflecting the characteristics and processes that went into this project’s development. The cool tones add a modern twist to our interactions with the archive, prompting us to ask these questions: ‘What does it mean to interact with this archive today? What ideas can we generate from this archive for the future?’

“In order to add further meaning to the colors of the PPA logo, each was used as a theme for a pivotal section of the site. I made this decision to further communicate how each section of the site is significant on its own, and how each has played a role in the project’s development. Thus, I intended to invoke a sense of ethical parallelism of site content as a whole to be represented through its visual design, emphasizing the inclusiveness and equality of the site’s sections while minimizing hierarchical structures. You can read more on the design of the PPA in my essay, ‘Designing the PPA User Interface & Interactions.’”

Challenges: Teaching prosody using current large databases has made clear two facts: One, archives and databases make arguments and have points of view, but they’re often not stated directly. For instance, whether an archive provides full-text materials or metadata only, book covers or tables of contents, and how the archive structures that information will influence the kinds of discoveries users can make.

Two, methods of searching within databases matter. For example, there is often a significant loss in understanding contexts and patterns when only the metadata parameters are used to get results. While searching and/or filtering by metadata is an important way to search a large database, being able to search full texts and see page images are crucial research tools for prosody. This is why the search results in the PPA include text snippets and page thumbnails.

PPA has focused on creating an empowering experience through multiple levels of extensive data curation where duplicated materials were carefully eliminated. This comprised about 40 percent of the original data transfer. This painstaking process has created a significantly more accurate search experience where results are not skewed because of duplicate copies of each work in the database.

Time constraints: “We originally planned to include excerpted content—for example, when a book has one section that is relevant to prosody but the entire work is not relevant, like a collection of poetry with an important critical introduction,” says Rebecca Koeser. “This is challenging because the content we’re working with is all from the HathiTrust Digital Library. The library provides item-level metadata and bulk access to content, but it still doesn’t have more granular information—like for portions of a book. We still hope to come back to it, but solving this will require more data curation work from the project team to identify the content they want to include and customizing the logic that pulls in HathiTrust’s content. It would also require refinements to the interface to make it clear when you’re looking at different kinds of records.”

Navigation structure: A significant part of this curation resulted in the current editorial collections by which the data is grouped: Linguistic, Literary, Music, Original Bibliography, Typographically Unique, Dictionaries and Word Lists. Editorial collections can be used to filter and refine searches—or to exclude unwanted results. You may notice that when you first visit the archive search, the default presets turn the Dictionary and Word List collections off; this is to improve search results, since basically any word search will match the terms listed in a dictionary.

Including text snippets and page thumbnails as part of the search results helps researchers efficiently identify the parts of the archive that contain the information for which they’re looking. For example, it’s quickly obvious if a page includes poetry or prosodic notation when you glance at an image. In addition, the site has the capability to search within volumes.

Technology: “The site is implemented in Python with the Django web framework,” says Koeser. “It uses a relational database to provide data-curation functionality, such as adding items to editorial collections and correcting inaccurate metadata, and for other administrative and content-management functionality. Search is powered by the open-source search platform Solr. The interface was implemented with a Semantic UI. The archive search page has custom JavaScript to power a reactive search, which automatically submits the form and loads new results as you type new search terms or change filters.”

Special technical features: “The archive search is implemented so that you can search on individual pages as well as volume-level metadata at the same time,” says Koeser. “Other sites like this allow searching on metadata only, or possibly include a full-text search with the metadata but don’t distinguish where the text is from in the volume. It’s less common to support this kind of deep- and high-level searching at the same time.

“This means that you can find a book by a keyword that occurs on a single page, and you can see that page and the context of your search term while also seeing the volume-level information, not just the individual page. Or you can search based on keywords and metadata together, looking for pages with specific terms but filter based on bibliographic information, like author or publication year. This makes PPA one of the few scholarly projects designed this way, along with the Library of Congress World Digital Library.”


