I’m very pleased to be working with Neal Audenaert on the Visual Page project, which has just received NEH Start-Up Grant Funding for 2012-2013. A project website will be up soon, but here’s a project description adapted from our grant proposal.
Large Scale Analysis and Digitized Printed Texts
The digitization of printed materials published before 1900 has already transformed scholarship in humanities fields such as history and literary studies. Digitized resources improve access for scholars, teachers, students, and general readers. They encourage users of all types to begin asking new kinds of research questions about these texts and their cultural context. Increased access also alters the terms for researching the cultural status and historical significance of particular texts or groups of texts, leading to questions like these asked by Dan Cohen: “Should we be worrying that our scholarship might be anecdotally correct but comprehensively wrong? Is 1 or 10 or 100 or 1000 books an adequate sample to know the Victorians?” Large-scale analysis of digitized materials allows humanities scholars to explore just how unique or representative a particular text or group of texts might be.
How, for example, might the success of Tennyson’s 1850 book-length poetic sequence In Memoriam be newly understood, if it were compared with all the other books of poetry published in the same year? Most of those books are now available to researchers in digitized form. However, most tools for large-scale digital analysis focus solely on the linguistic content of texts. A researcher could compare the linguistic contents of Tennyson’s poem to those of other poems of the same year by tracking word frequency and syntactical clusters within the language of the texts. But if one wished to examine how Tennyson’s poem looked on the page in 1850 as compared with other poems, one would be limited to what the human eye can notice and to the constraints of human attention when collating or comparing a large number of texts. To expand beyond those limitations to large-scale analysis, the Visual Page project will be using NEH Start-Up Grant funding to develop an open source application to analyze the graphical aspects specific to digitized printed texts.
The Printed Page as Graphic Interface
All printed texts simultaneously convey meaning through both linguistic and graphic signs. As Jerome McGann suggests, “A page of printed or scripted text should thus be understood as a certain kind of graphic interface” (McGann 2001, 199). Words printed on a book’s title page, for example, communicate linguistic content (such as the book’s title and author’s name) that is made more meaningful through the graphic conventions of book publishing. These graphic conventions convey culturally encoded meaning about the importance, audience, and function of the linguistic content on the page or, by extension, of the entire book (Drucker 2009b, 145-64; McGann 1991; McKenzie). The spatial arrangement of text and white space, typeface size and attributes, and the sequencing of the page within the book all combine to signify a title page. It is these conventions, not the linguistic content alone, that distinguishes the book title from the author’s name (think, for instance, of George Eliot’s Adam Bede and other novels whose titles could linguistically signify an author’s name). Experienced readers assess, categorize, and evaluate the graphical codes of printed texts quickly, often subconsciously: “we see before we read and that . . . predisposes us to reading according to specific graphic codes before we engage with the language of the text” (Drucker 2009b, 242). Graphical aspects of the printed page convey information about the book’s historical period, genre, form, cost, audience, function, organization, scope, and design.
The visual elements of printed material can also be deliberately manipulated by their creators for specific effects, as in decorated or illustrated books, or in multimedia works like those by William Blake and Dante Gabriel Rossetti. Such works cannot be adequately represented by their linguistic content alone: “Typographic transcriptions . . . abstract texts from the artifacts in which they are versioned and embodied” (Viscomi 29). Yet the same holds true for all printed texts, not just those that are highly decorated, because printed words themselves function as images: “looking at a set of graphic marks set off by the frame of white space involves the same cognitive processes as would looking at any image” (Mandell 762). Although a full material analysis of a book, including precise page measurements, bibliographic collation, paper watermark and provenance identification, and analysis of binding materials, is not available from the digitized file, digital images of the book’s pages offer researchers more information about “the interaction of its physical characteristics with its signifying strategies” than can textual description alone (Hayles 103). Accordingly, most scholarly digital archive projects today recognize the value of this graphical meaning and provide users access to both digitized page images and plain text versions of printed materials. Because the recent digitization projects conducted by research libraries and Google have taken photographic scans of book pages, the graphical meaning of books is available, to varying degrees of fidelity, for a very large corpus of digitized items.
The Visual Page Project
For the Start-Up Grant period, we will be working with a data set of 300 books of poetry (approximately 60,000 images) published between 1860-1880. In books of poetry printed after 1800, the visual appearance of the page often signals and reinforces linguistic features of the poem. The relative amount and distribution of white space on the page cues the reader to the presence of poetry and even to specific verse forms such as the sonnet, which was frequently surrounded by ample marginal space. The graphic conventions of line capitalization, punctuation, and indentation visually distinguish many kinds of poetry from prose. Extra leading, or white space, between poetic stanzas and the indentation of poetic lines reinforce rhyme patterns and formal structures of historically specific verse forms.
The Visual Page project will use NEH Start-Up Grant funding to develop a prototype application to identify and analyze visual features in digitized Victorian books of poetry, such as margin space, line indentation, ornamentation, and text density. The proposed application will integrate tools for machine learning (i.e., discovering which features help to visually distinguish books from two different publishers), pattern analysis and classification (identifying groups of visually similar works or finding other poems that look like Tennyson’s) and visualizing relationships between poems (juxtaposing sets of images or scatter plots based on computational measures of visual similarity or difference).
We anticipate that computational analysis of these visual features will reveal new ways of thinking about both the printed book and its digitized forms. Because scholarship in the history of printing, publishing, and book design is grounded in empirical analysis of material artifacts, one of the important goals of the Start-Up Grant period will be to demonstrate the validity of this computational analysis. We will conduct bibliographical measurements of a sampling of the books contained in the digital data set to verify the measurements and comparisons generated by the Visual Page.
Much of the previous scholarship on Victorian publishing practices and page design has focused on particular publishers, illustrated books, or particular authors, such as William Morris or Oscar Wilde, whose highly decorated books represented particular political or aesthetic goals and strategies. The Visual Page application will enable researchers to examine the graphical aspects of these decorated books as well as those of ordinary books of poetry, thereby contributing to a broader understanding of literature’s circulation, consumption, and function within Victorian culture.
The Visual Page and the History of the Book
The Visual Page project will be valuable to researchers in the fields of literature, history, cultural studies, and media studies. This application will enable researchers to expand the scope of current research questions about the material book to a larger scale. For example, this application will allow researchers to explore:
Similarities and differences between sets of printed materials. How do books published by Macmillan and those by Bell and Daldy during the second half of the nineteenth century differ in their visual appearance? How do the text pages of illustrated books of poems compare with those in books without illustration? How do these differences correlate with other features of these texts, such as price, distribution networks, poetic forms, or themes?
Historical changes in printed materials. When do ornamental initial capital letters become widely used in books of Victorian poetry? When and how do they arise, change, or disappear? How does their usage correlate with specific publishers, authors, or poetic forms?
Measuring and identifying distinctive features and/or distinctive books. What are the most unusual books of poetry published in the 1860s? What makes them different from other books in their visual appearance? Does that difference correlate with specific authors, publishers, poetic forms, or themes?
Measuring and identifying representative or typical books. What does an average or typical book of poetry look like in the 1860s? What might this suggest about Victorian reading practices?
Influence and imitation in the design of printed materials. How were distinctive book designs by Joseph Cundall or William Morris imitated by others? Do these artistic designs have any effect on mass book publishing?
Although these research questions are specific to the set of Victorian books of poetry we will be using for this Start-Up Grant, similar questions are of interest to scholars working on other kinds of printed materials and in other periods.
The extent to which we can currently research such questions is constrained by our human capacity to view, compare, and understand only a limited number of texts at one time. Thus our understanding of what constitutes a significant or representative text is based on relatively limited historical information. Computational analysis can point to significant patterns and trends over a much larger set of texts, which will lead humanities researchers to study previously unknown texts as well as to understand canonical texts in new ways. Ultimately, such large-scale research will transform the boundaries and definitions of humanities research by changing our understanding of key ideas, developments, and conflicts within print culture.
Cohen, Dan. “Searching for the Victorians.”
Drucker, Johanna. “Not Sound.” The Sound of Poetry: The Poetry of Sound. Ed. Marjorie Perloff and Craig Dworkin. Chicago: U of Chicago P, 2009. 237-48.
—–. SpecLab: Digital Aesthetics and Projects in Speculative Computing. Chicago: U of Chicago P, 2009.
Hayles, N. Katherine. My Mother was a Computer: Digital Subjects and Literary Texts. Chicago: U of Chicago P, 2005.
McGann, Jerome J. Radiant Textuality: Literature after the World Wide Web. New York: Palgrave Macmillan, 2001.
—–. The Textual Condition. Princeton: Princeton UP, 1991.
McKenzie, Donald Francis. Bibliography and the Sociology of Texts. Cambridge: Cambridge UP, 1999.
Mandell, Laura. “What is the Matter? What Literary Theory Neither Hears nor Sees.” New Literary History 38.4 (Autumn 2007): 755-76.
Viscomi, Joseph. “Digital Facsimiles: Reading the William Blake Archive.” Computers and the Humanities 36.1 (February 2002): 27-48.