Visualizing the text of (children's) book series: Visualizations

Our work focuses on the abstract visualization of children's book series, and in particular the trilogy "His Dark Materials" by Philip Pullman. Pullman's trilogy is made up of the three novels "The Northern Lights" (called "The Golden Compass" in the USA and in the movie adaptation), "The Subtle Knife", and "The Amber Spyglass". We choose this genre partly through personal passion and partly because of the range of potential enthusiastic readers. The best children's book series (especially before they are completed) are read and discussed by both child and adult readers and many of these readers develop their own theories which they share with their friends and with other readers online. Similarly academic interest is piqued leading to conferences and journals dedicated to the study of children's literature.


Design Ideas

Design Idea for Whole Text VisualizationIn order to narrow the design space Linda Becker and I decided to focus on two questions:

  1. How the language used about different characters contrasts and how it changes through the series;
  2. How linguistic themes (like religious language) are used through the series.

We started with Linda sketching out how some visualizations might look (without using actual) data.

In the first of these Linda looked at the distribution of words (e.g. characters names) throughout the text, using connecting arcs (among other ideas) to give a sense of the rhythm of related characters through the text.


Character Analysis Design ThumbnailThe second set of sketches looks at the character word plots, what form they might take and what visual dimensions this would give us to plot differing data or to reinforce existing data.


Theme Design ThumbnailThirdly Linda tackled the notion of themes, and the sketches she produced show how we might plot themes progression through the books. These are the sketches that we have had least success moving into functioning visualizations since they rely on a more sophisticated notion of theme than looking at individual word positions may provide.


Text Only Visualization Design ThumbnailThe last series of visualization sketches Linda produced looked at text. Instead of drawing structures based on the relationships between words we looked at drawing the structures with the words themselves. This proved quite playful. I had wanted the visualisations to be legible themselves as text, but some of the sketches jump to the opposite pole, for example rendering only the words of interest and leaving the surrounding text as measured space. 

The two ideas that we built up into working visualizations are the flower-like structures showing the words occurring near the characters names (or other given words) and renderings of the whole text with the character names of interest highlighted with colours and arcs.

Character Flowers

The first of the visualization ideas that we implemented were the character flowers. Figure 1 shows the character flower for the word Lyra. Central to the flower is the word "lyra" itself, surrounded by a 'lifebelt' which shows, starting from the 12 O'clock position, the occurrences of the word "lyra" through the series, with each occurrence resulting in a thin red line.

Character Flower: "Lyra"Figure 1: Character Flower of the word "Lyra" (click for larger version)

We can see from the number of crowded red lines that "lyra" is a frequently occurring word, as we would expect, but that the second and third books contain episodes where she is not mentioned. Moving out from that each ‘bud’ represents a word. Here we are looking at all the words which immediately follow the word "lyra" in a sentence. Those words are arranged in order of the frequency with which they appear after "lyra", and the size of the bud reflects the frequency of the word overall (i.e. the number of chapters it occurs in, regardless of whether it occurs after the word "lyra"). The final measure is the distance from the centre that the bud is drawn. This reflects the probability that when the word occurs it occurs after "lyra". So we see two buds placed near the centre at the start are two words that occur frequently after the word "lyra" and are unlikely to occur elsewhere. Indeed the two words are Lyra’s surnames: Silvertounge and Belacqua. Other words drawn towards the centre are evocative of Lyra's personality: "joyfully", "quelled", "exulted", "definitely", "judged", "raided", ... but two stand out as anomalous: "blushed" and "obediently". Clicking on the bud brings up the sentences in which the word follows the word "lyra". From these sentences we find that the terms are used when Lyra is in disguise. In some respects this shows that the visualization works – the anomalies are indeed anomalies, but they are ones consciously placed by Pullman, rather than subconscious ones.

Character Flower for the word "Lyra's"Figure 2: Character Flower of the word "Lyra's" (click for larger version)

Characters names can also be used in their possessive sense, e.g. "lyra's" and the character flower in Figure 2 shows the diagram for the words after "lyra's". These are mostly body parts (les, arms, hair, etc) and this style is born out in Pullman's writing about the other characters.

Whole Text

Whole Text Visualization ThumbnailThese visualizations show the entire text of the three volumes that make up the trilogy. We were interested to see the rhythm of the characters occurrences in the whole text, especially two related characters. Figure 3 shows a fragment of the entire trilogy, with linked coloured disks over occurrences of Lyra and Will's names. We can quickly see simple facts like Will's absence from the first book, and more curious aspects like the periods of the second book where neither of them are mentioned (presumably the sections focussed on Mary Malone, Lord Asreil, or Mrs Coulter). Printed out this diagram is many feet long, and the text itself is (just) readable. This combination of text level detail and global pattern is particularly interesting. I was hoping that this visualization would highlight a poetic choice across the trilogy. Tolstoy starts and ends "Anna Karenina" at a railway station, and Pullman purposefully opens the first book with the word "Lyra" and ends the last book with the word "Lyra". This should stand out as the visualization should start and end with a coloured disk. But it does not. In fact Pullman precedes the start of his book with a quote from Milton's "Paradise Lost", which stops the poetic symmetry coming out in the visualization.

Cropped Whole Text Visualization Highlighting Lyra and Will's namesFigure 3: Fragment of Whole Text Visualization Highlighting Lyra and Will's names (click for larger version)

Implementation Detail

Screenshot of a SQL Query of Pullman Text DatabaseThe initial sketches were built in Adobe Illustrator. Having chosen our two initial candidates for implementation these were prototyped in Processing, a language aimed at designers new to programming. Later these prototypes were re-worked into C# and WPF. The texts themselves were drawn from the publishers Quark documents, saved to plain text, broken down into chapters, sentences, and words in C# and stored in a SQL Server 2008 database.