Valence english | deutsch
Benjamin Fry

How can word usage in a book be represented visually? The Innocents Abroad by Mark Twain is 200,000 words long, of which 15,000 words are unique. A bar chart containing this many elements would be virtually worthless. It would be too large to take in at a glance, or if scaled down to fit one's field of view, too small to understand. Enormous disparities in word usage make many visualization techniques useless: of the 15,000 words, fully half are used only once, less than 25 per cent of the data would be worthwhile at all, with the most interesting features not even appearing until the top 5 per cent. Even if issues such as these could be overcome by using some statistics and a modified bar chart, it's unlikely that this would be a useful description of the data. There would be no concept of relationships between words. For instance, how could one tell what words appeared near one another in the text? How could changes in word usage throughout the book be expressed? To understand large sets of information, one must ask: "What does the information *feel* like?"

Valence is a project that uses the properties of organic systems (things like growth, atrophy, adaptation, and metabolism) as methods for building representations that are based on the interaction of many simple rules in an attempt to achieve a more telling representation. These representations seek to make information expressive, where organisms consume and metabolize data to provide a qualitative feel for the information it represents.

In Valence, every unique word in the book becomes a node. Branches are assigned to connect words that are found adjacent to one another in the text. A set of rules is applied to the system over time, as new words are being added to the space, and based on these organic properties. The resulting program reads the book in a linear fashion, dynamically placing each word into three-dimensional space. The words most frequently used make their way to the outer parts of the composition, so that they can be more easily seen. This leaves the less common words closer to the center. When two words are found adjacent to one another in the text, a line is drawn between them in the visualization. Each time these words are found adjacent to each other, the connecting line shortens, pulling the two words closer together in space. The strength of these interactions are weighted based on the frequency of word use across the native language being used.

This Organic Information Visualization continues to change over time as it responds to the data being fed to it. Instead of focusing on numeric specifics (i.e. the exact number of times a word appears), the piece provides a qualitative feel for perturbations in the data, in this case the different types of words and language used throughout the book. This provides a qualitative slice into how the information is structured. On its own, the raw data might not be particularly useful. But when relationships between data points can be established, and these relationships are expressed through movement and structural changes in the on-screen visuals, a more useful perspective is established.

The individual movements of words co-ordinate into a symphony of small parts. For the viewer, focus shifts between the overall shape of the piece to anomalies that call attention to themselves through rapid movement or change in color. Related parts of the composition will group together in clusters, which is not explicitly stated in the movement rules, rather it is implied through the way the nodes interact with one another as they execute the interaction rules. Groups of relations begin to form which aid in the perception of the system.