Features From Frequency: Authorship and Stylistic Analysis Using Repetitive Sound

Christopher Forstall, Walter Scheirer


A growing number of studies in the humanities now use the tools of authorship attribution to answer traditionally “subjective” questions of literary style. However, scientists still for the most part develop these tools with more traditional classification tasks in mind, and ultimately most scholars of literature still believe that quantified data cannot tell the whole story. We aim to hone the tools of textual analysis to literary goals, to make the expression of digital analysis more flexible, and to strengthen that tenuous connection between feature set and literature upon which stylistics depends. In this paper, we introduce a new feature for stylistics called the “functional n-gram,” which captures the repetitive stylistic nature of sound oriented texts. Using functional n-grams and Support Vector Machines, we present a variety of authorship attribution experiments using English language novels, as well as Romantic, Renaissance and Classical Poetry. Extending our analysis further, we go on to use functional n-grams as a feature basis for a series of Principal Components Analysis experiments examining stylistic consistency in Homer.

Full Text: PDF


  • There are currently no refbacks.

Humanities Division Logo The Division of the Humanities
1115 East 58th Street, Chicago IL 60637 / Tel 773.702.8512 / Fax 773.702.6305
The University of Chicago