Data visualization - Design Principles and Perception
Understand core design principles, how perception guides effective visualizations, and which graph types best convey specific quantitative messages.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
According to Edward Tufte, what should graphical displays show without distortion?
1 of 15
Summary
Principles of Effective Graphical Displays
Introduction
Effective data visualization is not about making graphics look pretty—it's about communicating data truthfully and clearly so that viewers can understand and act on the information. This chapter covers the fundamental principles that separate truly effective visualizations from ones that mislead or confuse. Understanding these principles will help you design, evaluate, and select the right graphics for any data communication task.
Core Design Principles
Edward Tufte, a pioneer in data visualization, established six foundational principles that guide all effective graphical displays:
1. Show the data without distortion. Your visualization should represent the data accurately. This means avoiding exaggerated scales, misleading axes, or visual tricks that make small differences appear large or vice versa. The visualization should be an honest reflection of reality.
2. Induce viewers to think about the substance, not the design. The viewer should be focused on understanding the data itself, not admiring (or being distracted by) the graphic design. If someone comments on how fancy your visualization looks rather than what they learned from the data, the design may be getting in the way.
3. Present many numbers in a small space and make large datasets coherent. One strength of visualization is that it can show complex, multidimensional data efficiently. A single well-designed graphic can communicate what would take paragraphs of text or pages of tables. The goal is to maximize information density without creating visual clutter.
4. Encourage the eye to compare different pieces of data. Effective visualizations make comparisons easy and immediate. If viewers have to work hard to compare two data points or categories, the visualization has failed. Design so that comparisons are natural and require minimal effort.
5. Reveal data at several levels of detail. Good visualizations work at multiple scales—the big picture should be immediately apparent, but viewers should also be able to drill down into specific details when needed. Think of this as telling a story with both a clear headline and supporting details.
6. Serve a clear purpose. Before creating a visualization, know what you want it to accomplish. Are you describing known facts? Exploring data to find patterns? Presenting results for decision-making? Your purpose shapes every design choice.
Data-Ink Ratio and Chartjunk
The data-ink ratio is a specific principle for maximizing clarity: it's the proportion of a graphic's total ink devoted to displaying actual data versus everything else. Tufte advises maximizing this ratio by removing non-data ink—elements that don't represent data values.
However, this doesn't mean stripping graphics to bare minimums. You need sufficient context (axes, labels, legends) for the data to make sense. Instead, focus on eliminating chartjunk: decorative elements that clutter without adding information.
Common examples of chartjunk include:
Unnecessary 3D effects: Three-dimensional bar charts or pie charts that distort how viewers perceive values
Decorative backgrounds or gradients: Visual noise that doesn't represent data
Excessive gridlines: While some gridlines help with reading values, too many become visual clutter
Ornamental illustrations: Cute graphics or clip art that don't enhance understanding
Floating legends: Legends placed far from the data, forcing viewers' eyes to move constantly back and forth between the legend and the graphic
The visualization above demonstrates the problem: the top display uses a sankey diagram and stacked bar chart (showing percentages and counts) to communicate the same outcome data. The middle pie chart wastes space and makes comparisons harder. Different visualizations of the same data have very different levels of chartjunk and clarity.
The principle is: if removing an element makes the visualization harder to understand, keep it. If removing it doesn't matter, take it out.
Audience-Centric Best Practices
Two critical practices from the Congressional Budget Office and other communication professionals ensure your visualization actually works:
Design graphics that stand alone. Your visualization should be understandable even if someone encounters it without reading the surrounding report or hearing your presentation. Include a descriptive title, clear labels, and any necessary context. Someone should be able to understand the key message from the graphic alone.
Communicate key messages clearly within the graphic. Don't rely on text before or after the graphic to explain what viewers should see. The visualization itself should make the main point obvious—through title, labels, highlighting, or visual emphasis.
Readability and Bijective Mapping
For a visualization to communicate effectively, viewers must be able to:
Read the data accurately: Decode values from the visualization with reasonable precision
Make accurate comparisons: Compare values across the visualization without ambiguity
Understand the encoding: Know what each visual element represents
This relies on bijective mapping: establishing a one-to-one correspondence between data variables and visual elements. Each data dimension gets mapped to exactly one visual property (like length, color, or position), and each visual property represents exactly one data variable.
Without bijective mapping, viewers become confused. For example, if a single color represents three different variables, or if the same variable is encoded in both the bar height and the bar color, readability breaks down.
Human Perception and Cognition in Visualization
Understanding Pre-Attentive Attributes
Human visual processing has remarkable capabilities. Some visual properties—called pre-attentive attributes—are processed so rapidly that we detect them almost instantaneously, without conscious effort or focused attention.
The most salient pre-attentive attributes include:
Length: Differences in how long a line, bar, or mark extends
Position: Differences in where marks are located on an axis
Orientation: Differences in angle or direction
Shape: Different forms or symbols
Color: Differences in hue, brightness, and saturation
Humans can detect these differences in roughly 250 milliseconds—before you've even consciously looked at the graphic.
In contrast, attributes like area (the size of a shape) and volume (the size of a 3D object) are not pre-attentive. Comparing areas or volumes requires conscious attention and interpretation, which is much slower and more error-prone.
Why This Matters: Choosing the Right Visualization Type
This is why certain visualization types work better than others:
Bar charts leverage the pre-attentive attribute of length, making comparisons immediate and accurate
Pie charts rely on comparing areas and angles, requiring more mental effort and producing more errors, even though they're commonly used
This also explains why a line chart is excellent for time-series data (position and length on axes are pre-attentive) and why a scatter plot is effective for correlation (position on both axes is pre-attentive).
This image shows why scatter plots work so well: the position of each point on both the $x$ and $y$ axes is immediately processed pre-attentively, making the correlation pattern apparent at a glance.
When designing visualizations, prioritize pre-attentive attributes for encoding your most important data comparisons. This makes the visualization more intuitive and reduces viewer error.
How Humans Perceive Patterns
Beyond pre-attentive attributes, human visual processing is particularly good at detecting:
Changes: We quickly notice when something shifts, increases, or decreases
Differences in lightness: We easily distinguish between lighter and darker shades
Shape variations: We readily identify different forms or patterns
Boundaries and clusters: We naturally group similar items together
Effective visualizations exploit these strengths. Time-series charts work well because humans easily detect trends (changes over time). Color intensity variations help us spot geographic patterns on maps. Clusters in scatter plots are immediately visible because humans naturally group nearby points.
Matching Message Types to Visualizations
Different types of data stories require different visualization types. Choosing the right visualization for your message is critical to clarity.
Time-Series Messages
A time-series message shows how a single variable changes over time. Examples: unemployment rate over decades, daily stock prices, monthly sales figures.
Best choice: Line chart
Line charts are ideal because the $x$-axis (time) and $y$-axis (value) are both pre-attentive, and humans naturally read left to right, making temporal progression intuitive. The continuous line emphasizes that time flows continuously.
Ranking Messages
A ranking message orders categories from highest to lowest (or vice versa). Examples: countries by GDP, states by population, companies by revenue.
Best choice: Bar chart (horizontal orientation for long category names)
Bar charts show ranking clearly through length, which is pre-attentive. Arrange bars in order (longest to shortest or vice versa) to reinforce the ranking. Horizontal orientation works especially well when category names are lengthy.
Part-to-Whole Messages
A part-to-whole message shows how components combine to make a whole, typically expressed as percentages or ratios. Examples: market share breakdown, budget allocation, demographic composition.
Best choices: Stacked bar chart or treemap
While pie charts are popular for this message, research shows stacked bar charts are more accurate because viewers compare lengths rather than areas or angles. Treemaps (as shown in img10) work well for hierarchical part-to-whole messages where you have multiple levels of breakdown.
Deviation Messages
A deviation message compares categories against a reference value or shows how much each category differs from an expected baseline. Examples: actual spending versus budgeted spending, performance versus industry average, temperature versus historical normal.
Best choice: Bar chart (with a reference line at zero or baseline)
Position the reference line clearly, then show actual values as deviations from that line. Some bars extend above the line (better than expected), others below (worse than expected). This makes the deviation immediately apparent.
Frequency Distribution Messages
A frequency distribution message shows how many observations fall within different intervals or ranges. Examples: distribution of test scores, age distribution of a population, distribution of company sizes.
Best choices: Histogram or boxplot
Histograms show the full distribution shape with bins representing intervals. Boxplots are more compact and allow easy comparison across multiple distributions. The choice depends on your audience and whether you want to show detailed shape or focus on comparing summaries.
Correlation Messages
A correlation message compares two continuous variables to reveal whether and how they relate. Examples: relationship between study time and test scores, height and weight, advertising spend and sales.
Best choice: Scatter plot
Scatter plots position each observation according to its $x$ and $y$ values, making the relationship pattern visible at a glance. Trends, outliers, and clusters all become apparent immediately.
Nominal Comparison Messages
A nominal comparison message contrasts unordered categories (categories with no inherent ranking). Examples: revenue by product line, survey responses by demographic group, sales by geographic region.
Best choice: Bar chart (any order is acceptable, though grouping similar values can help)
Unlike ranking messages, nominal comparisons don't require a particular order, so you might arrange bars by size for visual interest or group them logically for narrative flow.
Geographic or Geospatial Messages
A geographic message compares a variable across locations, showing spatial patterns. Examples: COVID cases by state, income by county, temperature by region.
Best choices: Choropleth map or cartogram
Choropleth maps color regions according to data values, making geographic patterns obvious. Cartograms distort region sizes according to data values, which can be striking but requires careful explanation since it distorts geography itself.
<extrainfo>
Historical Context
Data visualization has evolved significantly over the past 50+ years. Early visualizations were constrained by printing technology and hand-drawing, yet often exemplified the principles discussed here. Modern tools make creating visualizations easier, but this doesn't guarantee they'll be effective—principles of design and human perception remain constant regardless of technology.
</extrainfo>
Flashcards
According to Edward Tufte, what should graphical displays show without distortion?
The data
What should viewers be induced to think about when looking at a graphical display?
The data substance (rather than the graphic design)
What activity should graphical displays encourage the eye to do with different pieces of data?
Compare them
What levels of detail should a graphical display reveal?
From overview to fine structure
What are the four clear purposes that a graphical display should serve?
Description
Exploration
Tabulation
Decoration
How should the data‑to‑ink ratio be maximized in a visualization?
By removing non‑data ink
In data visualization, what does the term "chartjunk" refer to?
Unnecessary decorative elements that do not enhance the message
What are two examples of chartjunk mentioned in the text?
Gratuitous three‑dimensional effects
Separate legends that force the eye to move back and forth
According to the Congressional Budget Office, how should graphics be designed in relation to their surrounding report?
They should be able to stand alone
What does it mean for a visualization to be "readable"?
Viewers can understand the underlying data by making accurate comparisons or decoding legends
What is the requirement for bijective mapping in a visualization?
Each visual element must correspond uniquely to a single data variable
Why are bar charts generally more effective than pie charts in terms of human perception?
They use length (a pre-attentive attribute) rather than area
What chart type is commonly used to show how a single variable changes over time?
Line chart
Which chart type is typically employed to order categorical subdivisions?
Bar chart
What is the standard graph type for determining directional relationships between two variables?
Scatter plot
Quiz
Data visualization - Design Principles and Perception Quiz Question 1: According to Edward Tufte's core design principles, what is essential for a graphical display?
- It should show the data without distortion (correct)
- It should use bright colors to attract attention
- It should prioritize decorative elements over data clarity
- It should hide detailed data to simplify the view
Data visualization - Design Principles and Perception Quiz Question 2: Which chart type is most appropriate for a time‑series message that shows how a single variable changes over time?
- Line chart (correct)
- Bar chart
- Pie chart
- Scatter plot
Data visualization - Design Principles and Perception Quiz Question 3: What does maximizing the data‑ink ratio involve when designing a graphic?
- Removing any ink that does not represent data (correct)
- Adding decorative colors to attract attention
- Including detailed legends for every element
- Using three‑dimensional effects for depth
Data visualization - Design Principles and Perception Quiz Question 4: Which chart type is generally more effective for comparing quantities because it relies on a pre‑attentive attribute?
- Bar chart (correct)
- Pie chart
- Line chart
- Scatter plot
Data visualization - Design Principles and Perception Quiz Question 5: Which visualization is most appropriate for displaying the correlation between two quantitative variables?
- Scatter plot (correct)
- Bar chart
- Histogram
- Stacked bar chart
According to Edward Tufte's core design principles, what is essential for a graphical display?
1 of 5
Key Concepts
Visualization Principles
Data‑ink ratio
Chartjunk
Pre‑attentive attributes
Bijective mapping (visualization)
Edward Tufte
Chart Types
Time‑series chart
Bar chart
Choropleth map
Scatter plot
Histogram
Definitions
Data‑ink ratio
The principle of maximizing the proportion of ink used to represent data while minimizing non‑data ink in a visual display.
Chartjunk
Unnecessary decorative elements in a chart that do not enhance the communication of information.
Pre‑attentive attributes
Visual features such as color, shape, orientation, and length that the human visual system detects instantly without focused attention.
Bijective mapping (visualization)
A design approach where each visual element corresponds uniquely to a single data variable, ensuring clear interpretation.
Edward Tufte
Influential statistician and author known for defining core principles of effective graphical displays and data‑ink concepts.
Time‑series chart
A line‑based visualization that shows how a single variable changes over chronological intervals.
Bar chart
A graphical representation that uses rectangular bars to compare quantities across categories or rankings.
Choropleth map
A thematic map that shades geographic regions according to the magnitude of a variable.
Scatter plot
A diagram that plots paired numerical values as points to reveal relationships or correlations between two variables.
Histogram
A bar graph that displays the frequency distribution of a dataset by grouping observations into intervals.