The Visual Display of Quantitative Information Edward R. Tufte
From "The Visual Display of Quantitative Information"
"Each year, the world over, somewhere between 900 billion and 2 trillion images of statistical graphics are printed. The principles of this book apply to most of those graphs. Some of the suggested changes are small, but others are substantial, with consequences for hundreds of billions of printed pages." [References]
- Preferred graphics to tables because graphics show the shape of the data in a comparative perspective
- However, "small, non-comparative, highly labeled data sets usually belong in tables"
Tufte believes that graphics are more useful than tables because they visually represent the data in a clear and comprehensible form.
"An especially effective device for enhancing the explanatory power of time-series displays is to add spatial dimensions to the design of the graphic, so that the data are moving over space as well as over time." (40)
Napoleon's march to Moscow was portray in an image that depicted the number of troops that slowly declined over the disastrous endeavor. The plot is quite effective in that multiple variable are represented and evolving in both time, space, number of troops, the path of the troops, and even the temperature at various dates during the march.
Tufte's View on Relational Graphics
"This meant, quite simply but quite profoundly, that any variable quantity could be placed in relationship to any other variable quantity, measured for the same units of observation." (46)
What Tufte is referring to here is the ability to plot one set or data versus another, and thus portray an evolving series of events that are inherently interdependent on each other. The beauty of this strategy is that it allows the reader or viewer to establish a proposed relationship between the variables, and to consider how one variable might change in response to another.
As depicted in the graphic below, a clear relationship exists between each data point and the projected line:
Report of the Advisory Committee to the Surgeon General, Smoking and Health (Washington, DC, 1964), 176; based on R. Doll, "Etiology of Lung Cancer," Advances in Cancer Research, 3 (1955), 1-50.
Note that the USA located far away from the line.
Another very detailed and illustrated graphic, according to Tufte, is the "Thermal Conductivity of Copper," which represents the convergence of data on a uniform dependence of thermal conductivity on temperature.
C. Y. Ho. R. W. Powell, and P.E. Liley, Thermal Conductivity of the Elements: A Comprehensive Review, supplement no. 1, Journal of Physical and Chemical Reference Data, 3 (1974), 1-244.
Principles of Graphical Excellence
Tufte emphasized several principles of excellence for presenting good data:
2. Clarity and precision
3. Greatest number of ideas in shortest time
"Graphical evidence begins with telling the truth about the data." (53)
2. Avoid Distortion
- Representation of numbers should be directly proportional to the numerical quantities represented
- Label important events
Design and Data Variation
"Deception results from the incorrect extrapolation of visual expectation generated at one place on the graphic to other places." (60)
- Do not confound data variation with design variation
- Show data variation, not design variation
The Case of Skyrocketing Government Spending
Principle #1: In time series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units
Principle #2: The number of information-carrying dimensions should not exceed the number of dimensions in the data (think value of the shrinking dollar)
Context is Essential for Graphical Integrity
- Emaciated, data-thin design should always provoke suspicion
- Graphics must never quote data out of context
See this graphical image which gives sample contexts for the information you need to display: Periodic Table of Visuals
Source of Graphic Integrity and Sophistication
- The trick is to present statistics as a visual idea rather than a tedious parade of numbers
- "If the statistics are boring, then you’ve got the wrong numbers. Finding the right number requires as much specialized skill – statistical skill – and hard work as creating a beautiful design or covering complex news story" (80)
Consequences of thinking that graphics are only for the unsophisticated reader
- "No one can write decently who is distrustful of the reader's intelligence or whose attitude is patronizing." – E.B. White (81)
- A study has found that 80 percent of the 1.5 million readers of the Sunday New York times attended college (84) Conclusion
- "Substantive and quantitative expertise must also participate in the design of data graphics, at least if statistical integrity and graphical sophistication are to be achieved." (87)
Tufte's Theory of Data Graphics
Tufte's theory states that above all else "show the data" (92). This concept is best characterized by the "Data-ink ratio", which is a good indicator of how much information is actually contained in the data. A good rule of thumb is to maximize data ink by removing any portion of the graphic that does not contain meaning.
More specifically, the ratio is described as:
This quantity is also the same as 1 - the proportion of a graphic that can be erased without loss of data information
Tufte argues in this section that although computer programs are powerful for generating high quality images that is visually pleasing, it's better to include a table that is rich in content, but more clear to understand.
Data-Ink Maximization and Graphical Design
1. Redesign of the Bar Chart / Histogram
* A combination of erasing and data-ink maximizing
* Dot-dash plot combines the two fundamental graphical designs used in statistical analysis, the marginal frequency distribution and the bivariate distribution
2. In Summary
* More information per unit space (increases in efficiency)
* Maximize data ink and erasing to generate graphical alternatives
* Don’t underestimate the audience
* Further considerations: complexity, structure, density, and beauty
Multifunctioning Graphic Elements
1. "The same ink should often serve more than one graphical purpose" (139)
* Design must accompany multifunction graphics to facilitate understanding
2. Data-built Data Measures
* In some case, the data itself is the measure of data that is being conveyed:
* Think "engineering standards for painting lane stripes on read pavement." (144)
3. Data-Based Grids
* Grid reports directly on the data
4. Sometimes it’s better to make the coordinate labels into the actual data points themselves. (this strategy eliminates the need to perform eye-work)
an example of this is :
* Generally works better for smaller data sets
5. Puzzles and Hierarchy in Graphics
* "Seeing is forgetting the name of the thing one sees" – implies that there shouldn't be too much internal verbal effort going on in the mind of the viewer or reader (153)
* "central to maintaining clarity in the face of the complex are graphical methods that organize and order the flow of graphical information presented to the eye" (154)
6. Integration and separation of information is beneficial because it facilitates integrated content and separation of different uncluttered paths in looking over the data (159)
High-Resolution Data Graphics
Principle of data density:
data density =
1. The advantage of high data density is that the data can be highly interactive (161)
* Aside: the "The Super Graphic" principle
Ideally, you could have one graphic or place on your web-site that allows access to a huge data-base. Ex: on recovery.gov, you can type in your specific zip code to see where stimulus money has gone to in your living area, or anywhere else in the country.
Principle for high-information graphics
1. They should often be based on "large rather than small data matrices and have a high rather than low data density" (166)
2. The "shrink principle" (167) – states that graphics can be reduced by more than a factor of two and still be legible
3. "small multiples" – shown as frames of a movie, with a combination of variables, with one changing variable
Jorge J. Yunis and Om Prakask, "The Origin of Man: A Chromosomal Pictorial Legacy," Science, 215 (March 19, 1982), 1527.
The advantage of using small multiples is that a large amount of data rich information can be explained, and the variations in the data is easy to understand, in light of the variables that are being compared to. In this case, the first two chromosomes on the left side are being compared; one is from a human; one is from ape. The point of this diagram is to expose the similarities.
Sparklines: Intense, Simple, Word-Sized Graphics
1. Small, high resolution graphics usually embedded in a full context of words, numbers, images. Sparklines are datawords: data-intense, design-simple, word-sized graphics.
2. provide a context for better decision making based on previous data
On being approximately right rather than exactly wrong, "see John W. Tukey, "The Technical Tools of Statistics," American Statistician, 19 (1965), 23-28.
Aesthetics and Technique in Data Graphic Design
What is the Definition of Good Design?
1. Simplicity of design, complex of data
2. properly chosen format
3. combine words, numbers, and drawing
4. reflect scale
5. make complexity accessible
6. narrative quality
8. avoid content-free distribution
* Noted by Albert Biderman: "Illustrations were once well integrated with text in scientific manuscripts, such as those of Newton and Leonardo da Vinci, but that statistical graphics became segregated from text and table as printing technology developed." (181)
Tufte also believed in the concept "accessible complexity", namely that serif letters are preferable to sans serif because of greater differentiation between letters. Another good aesthetic technique emphasized by Tufte is the idea of proportion and scale, particularly the idea that graphics usually tend toward the horizontal. He asserts that "our eyes are naturally practiced in detecting deviations from the horizon" (186) Tied in with this idea is the concept of the golden rectangle, which apparently is visually pleasing because of its length to width ratio of 1.6/1.
Images and Quantities, Evidence and Data
The central theme behind this book is this: "Those who discover an explanation are often those who construct its representation" (9)
Visuals and Statistical Thinking: Displays of Making Evidence for Decisions
The Cholera Epidemic (logic of data display and analysis)
1. Place the data is an appropriate context for assessing cause and effect
2. Make quantitative comparisons
3. Consider alternative explanations and contrary causes
"Standards of quality may slip when it comes to visual displays; imprecise and undocumented work that would be unacceptable for words or tables of data too often shows up in graphics. Since it is all evidence - regardless of the method of presentation - the highest standards of statistical integrity and statistical thinking should apply to every data representation, including visual displays." (35)
The Space Shuttle Challenger
1. Numbers themselves can act as evidence if they reveal significant relative values to other numbers
2. "There are right ways and wrong ways to show data; there are displays that reveal the truth and displays that do not." (45)
3. "Visual representations of evidence should be governed by principles of reasoning about quantitative evidence. For information displays, design reasoning must correspond to scientific reasoning. Clear and precise seeing becomes as one with clear and precise thinking." (52)
Explaining Magic: Pictorial Instructions and Disinformation Design
1. Magic helps us to understand what not to do when design scientific presentations, precisely because it involves illusion, trickery, and deception
* Illusions are a form of disinformation design
* Dotted lines in images and graphs are useful for depicting motion and action
2. Particular, General, Particular
* This is the general process by which visual data in the form of tables should be explored and characterized
- Seek to give high resolution talks to your audience
- Maximize rate of information transfer (68)
- Give your audience paper containing all your information
* Some questions to consider in terms of content integrity:
- Is it truthful?
- Accurate representation?
- Carefully documented?
- Clear display?
- Appropriate comparisons?
The Smallest Effective Difference
1. Involves muting secondary elements to reduce visual clutter
2. (See map of basin on 76)
Parallelism: Repetition and Change, Comparison and Surprise
1. Using codes in your visual explanations disrupts the parallelism, but are sometimes necessary for "highly complex data" (98)
Multiples in Space and Time
1. In order to demonstrate motion:
* Need greater density of time sampling to build up a sequence of still images
2. For data-rich processes, use architecture:
* blending quantitative multiples, narrative text, and images
* increased rate of information transfer
3. Multiples "help make fine distinctions and close comparisons among similar nouns… showing various butterflyfishes all nicely lined up in parallel for identification."
* Methods of organization
* Narrative sequence
1. Definition of confection: "An assembly of many visual events, selected from various Streams of Story, then brought together and juxtaposed on the still flatland of paper" (121)
* The goal is to combine the real and the imagined into one
* Read details are necessary to come together and form a real picture. See the example of the "Ultimate Weed" (126)
* Show actions, initiative, the act of doing rather than passively letting something happen
Two methods of organization
1. Imagined scenes – e.g. Thomas Hobbes' Leviathan, in which there is a large scene of a man composed of many people, followed by a series of many compartments detailing individual themes
* Little content can be portrayed
* Not too much room for annotating text
* Clumsy arrangement of structure
2. Interfacial interfaces: effective techniques to provide interaction between the individual and the informational content. (example is the touch screen directory of the National Gallery in Washington, which allows the individual to touch the item that they wish to learn about)
Tufte poses the question here of what do with representing three dimensional data using two dimensional spaces. As Paul Klee writes, "It is not easy to arrive at a conception of a whole which is constructed from parts belonging to different dimensions. And not only nature but also art, her transformed image, is such a whole".
Micro / Macro Design
1. This portion of the design component facilitates extreme detail but also coverage on a macroscopic scale at the same time. An example would be a panorama or vista or some other medium of visual communication that shows immense detail, but also the greater picture. Tufte praised this concept in graphic design by exclaiming: "Such designs can report immense detail, organizing complexity through multiple and often hierarchical layers of contextual reading". (38)
2. We thrive in information-thick worlds because of our marvelous and everyday capacities to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort , integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, dip into , flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff, and separate the sheep from the goats. (50)
Layering and Separation
1. "Layering of data often achieve by felicitous subtraction of weight, enhances representation of both data dimensionality and density on flatland". (60) An example of this is the triangle whose sides are cut out and points placed outside the triangle to give the visual effect of an invisible triangle laying on the first one. This technique is also called "visual activation of white areas". (61) Normally, this added information can be perceived as either non-information, noise, or clutter, but occasionally it actually helps to include information that helps to subtract some of the weight.
1. A third visual activity results, based on the convergence two other visual elements
2. A general rule of thumb is that the noise of 1+1=3 is directly proportional to the contrast in value between the figure and ground.
Role of empty space in the image
* "A vessel is useful only through its emptiness. It is the space opened in a wall that serves as a window" (65) – Lao Tse
* "The greater the variety and distinction among respective background units, the clearer becomes the comprehension of a character as an individual expression or sign" (65)
Color and Information
1. Principles to manage color damage:
* Pure and bright colors should only be used sparingly on or between dull background tones
* Small color spots against a light grey or muted field highlight and italicize data, and "weave an overall harmony"
Don’t place light, bright colors with white right next to it for large areas
* Each color has a hue, saturation, and value
* Colors can improve "information resolution" on the computer screen
* Try using colors found in nature, since they are familiar and coherent
* Muted colors provide the best background for the colored theme
* Be careful of using the colors of a rainbow as a template for indicating the gradient of some value; "the mind's eye does not readily give an order to it ROYGBIV" (92)
"Comparisons must be enforced within the scope of the eye-span" (remember the example of the Chinese poets and distribution of temples of Matsu
Joseph Hutchins Colton, Johnson’s New Illustrated Family Atlas with Physical Geography (New York, 1984), 10-11.
Color and Information
According to Tufte, if you must use bright colored spots, include them against a light grey or muted field to highlight and italicize data, as well as weave an overall harmony (83)
Narrative of Space and Time
Serpentined data formations occur when there is a set of data that conflicts and collides with a rigid grid
* Example of the atlas of the earth that is stretched out to avoid portraying an ethnocentric view and showing each land mass and ocean in full.
* Multiplied consecutive images are effective for portraying motion (see dancer figurines on page 116)