The statistical analysis of texts is an important area of work and is used widely in information retrieval (e.g. web search). It is also a mature area of research in its own right, and has been used in the past for things from author attribution to the ordering of works through time. For example, in a letter published in 1882 Augustus De Morgan speculated about using statistical techniques to explore authorship questions around St Paul's Epistles and the Epistle to the Hebrews [Lea76], while more recently Jockers, Witten, and Criddle used sophisticated statistical techniques to reassess the authorship of the Book of Mormon.
In contrast, the abstract visualization of book texts is not a large or a mature field of study, but there are notable and inspirational examples. The following sections list some of these
Clarence Larkin's "Dispensational Charts"
Data visualizations fall into two overlapping camps: exploration and communication. Larkin's 1914-1918 Dispensational Charts are about communicating scripture and prophesy from The Bible. They diagram the structure of each topic (e.g. "The Heavens" or "The Second Coming") and use flow, representational images, and references back to Bible passages to illuminate each topic.
The seminal work of abstract exploratory visualization of book texts is Brad Paley's "Text Arc". TextArc is a screen based application Paley designed and implemented that takes a text and displays it twice. Firstly, line by line in tiny font around the edge of a giant ellipse. And then secondly word-by-word with each word anchored by invisible springs to the sentences in which it occurs. Common words are removed (so called 'stop words') and the remaining words are rendered so that more common words use a larger font and are drawn on top of any less common words sharing the same screen area. Paley's TextArc can be used to explore any text but he often demonstrates it using Alice in Wonderland and then at the centre, in big letters, is the word Alice as that occurs throughout the book. TextArc has many other features, including an elegant dynamic path sweeping through the work as the text is read through.
Text Arc was conceived as a tool to help academics and other readers analyse texts. Another outlet proved to be selling high quality printouts as a beautiful memento of one's favourite texts. The application of book visualization to academic literary studies has been continued in work like Plaisant et al's "Exploring Erotics in Emily Dickinson's Correspondence".
Partly because of the widespread availability of electronic versions of the text, partly because of its cultural significance, and partly because of the huge numbers of people who care about it The Bible has proved an intriguing source of visualizations.
Anh Dang's "Gospel Spectrum"
While on NYU's Interactive Telecommunications Programme Anh Dang built "Gospel Spectrum", an interactive visualization exploring the gospel accounts of Christ's life. Each episode in Christ's life is represented as a coloured bar with the colours representing the different gospels and their length representing the number of verses spent on that episode. The resulting visualization allows one to see how Christ's life unfolds through the gospels: which gospels concentrate on which parts of his life, and when the gospels come together to record an episode.
Linda Becker's "In Translation"
Started at Central Saint Martin's School of Art, Becker's "In Translation" shows visually the structural similarities and differences between different language translations of the Tower of Babel story, for example showing the position allocated to each letter-combination. "In Translation" both enforces the message of The Tower of Babel Story by highlighting the differences between human languages, but also cuts across it by showing structural similarities.
Chris Harrison's "Bible Visualizations"
Chris Harrison's visualizations of The Bible follow two paths. Firstly Harrison took a set of textual cross references found in The Bible compiled by Lutheran Pastor Christoph Romhild and displayed the links visually, resulting in a beautiful picture that gives detail about which chapters contain most cross references that also impresses the viewer with the sheer number of cross references. The second set looks at proper nouns through The Bible and overlays them as a tag cloud. But rather than abstracting the positions of the nouns from their occurrence in the text they are placed at their 'centre of mass'.
Steinweber and Koller's "Similar Diversity"
The last Bible visualization we'll touch on is Steinweber and Koller's "Similar Diversity". Like Harrison's work Steinweber and Koller use arc-diagrams and other visual features, but rather than using them to explore the structure within The Bible Similar Diversity shows the similarities and differences between holy books of different religions.
Before moving on to describe our own visual explorations of the text of Pullman's His Dark Materials trilogy there are four other interesting book visualization projects that are worth drawing attention to because of other potential features they make use of.
Ebany Spencer's "Romancing Dimensions"
In her CSM MACD project "Romancing Dimensions" Ebany Spencer attempts to use purely visual notations systems to retell Edwin Abbott Abbott's "Flatlands" story. Though entirely paper based Spencer's work uses three dimensions by using paper cut-outs to move some of her time-line representations of the work out from the background plane.
Tim Walter's "textour"
Tim Walter's textour (in German) has uses time and animation to show the structural elements of the book accruing as data is added or filtered.
Stephanie Posavec's "Writing Without Words"
Stephanie Posavec's beautiful visualizations of Jack Kerouac's "On the Road" (and some other contrasting novels) are not the result of a computer analysis of the work but the result of careful, loving, and painstaking analysis by-hand of the text itself. Posavec produces several visualizations, from the spider-like Posavec diagrams which map the sentence lengths authors' use (a line continues for the length of the first sentence, then turns ninety degrees and continues for the length of the second sentence, etc) through to the elegant 'literary organism' flower like structures.
IBM Research's Visual Communications Lab's "Many Eyes"
Many Eyes is a social visualization site. It is social in many ways: users upload data sets that are immediately shared with all the other Many Eyes members; anyone can use any of the provided visualization tools to visualize the data sets; these visualizations can be shared and discussed on the Many Eyes sites, or embedded into blog posts to foster conversation and analysis beyond the site. Many Eyes was conceived, designed, and built by IBM Research's Visual Communications Lab. It was originally thought that most of the datasets and visualizations would be based on numeric data, and so the visualizations were tailored towards quantitative data. In fact the inventors were taken aback by the amount of textual data sets uploaded, including notably The Bible and political speeches, and they have written about the text based visualizations designed and added in response [WV08].