Subjects/Arts and Humanities/Visual Arts and Design/Graphic Design/Data visualization

Data visualization - Design Principles and Perception

Understand core design principles, how perception guides effective visualizations, and which graph types best convey specific quantitative messages.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

According to Edward Tufte, what should graphical displays show without distortion?

1 of 15

Summary

Principles of Effective Graphical Displays Introduction Effective data visualization is not about making graphics look pretty—it's about communicating data truthfully and clearly so that viewers can understand and act on the information. This chapter covers the fundamental principles that separate truly effective visualizations from ones that mislead or confuse. Understanding these principles will help you design, evaluate, and select the right graphics for any data communication task. Core Design Principles Edward Tufte, a pioneer in data visualization, established six foundational principles that guide all effective graphical displays: 1. Show the data without distortion. Your visualization should represent the data accurately. This means avoiding exaggerated scales, misleading axes, or visual tricks that make small differences appear large or vice versa. The visualization should be an honest reflection of reality. 2. Induce viewers to think about the substance, not the design. The viewer should be focused on understanding the data itself, not admiring (or being distracted by) the graphic design. If someone comments on how fancy your visualization looks rather than what they learned from the data, the design may be getting in the way. 3. Present many numbers in a small space and make large datasets coherent. One strength of visualization is that it can show complex, multidimensional data efficiently. A single well-designed graphic can communicate what would take paragraphs of text or pages of tables. The goal is to maximize information density without creating visual clutter. 4. Encourage the eye to compare different pieces of data. Effective visualizations make comparisons easy and immediate. If viewers have to work hard to compare two data points or categories, the visualization has failed. Design so that comparisons are natural and require minimal effort. 5. Reveal data at several levels of detail. Good visualizations work at multiple scales—the big picture should be immediately apparent, but viewers should also be able to drill down into specific details when needed. Think of this as telling a story with both a clear headline and supporting details. 6. Serve a clear purpose. Before creating a visualization, know what you want it to accomplish. Are you describing known facts? Exploring data to find patterns? Presenting results for decision-making? Your purpose shapes every design choice. Data-Ink Ratio and Chartjunk The data-ink ratio is a specific principle for maximizing clarity: it's the proportion of a graphic's total ink devoted to displaying actual data versus everything else. Tufte advises maximizing this ratio by removing non-data ink—elements that don't represent data values. However, this doesn't mean stripping graphics to bare minimums. You need sufficient context (axes, labels, legends) for the data to make sense. Instead, focus on eliminating chartjunk: decorative elements that clutter without adding information. Common examples of chartjunk include: Unnecessary 3D effects: Three-dimensional bar charts or pie charts that distort how viewers perceive values Decorative backgrounds or gradients: Visual noise that doesn't represent data Excessive gridlines: While some gridlines help with reading values, too many become visual clutter Ornamental illustrations: Cute graphics or clip art that don't enhance understanding Floating legends: Legends placed far from the data, forcing viewers' eyes to move constantly back and forth between the legend and the graphic The visualization above demonstrates the problem: the top display uses a sankey diagram and stacked bar chart (showing percentages and counts) to communicate the same outcome data. The middle pie chart wastes space and makes comparisons harder. Different visualizations of the same data have very different levels of chartjunk and clarity. The principle is: if removing an element makes the visualization harder to understand, keep it. If removing it doesn't matter, take it out. Audience-Centric Best Practices Two critical practices from the Congressional Budget Office and other communication professionals ensure your visualization actually works: Design graphics that stand alone. Your visualization should be understandable even if someone encounters it without reading the surrounding report or hearing your presentation. Include a descriptive title, clear labels, and any necessary context. Someone should be able to understand the key message from the graphic alone. Communicate key messages clearly within the graphic. Don't rely on text before or after the graphic to explain what viewers should see. The visualization itself should make the main point obvious—through title, labels, highlighting, or visual emphasis. Readability and Bijective Mapping For a visualization to communicate effectively, viewers must be able to: Read the data accurately: Decode values from the visualization with reasonable precision Make accurate comparisons: Compare values across the visualization without ambiguity Understand the encoding: Know what each visual element represents This relies on bijective mapping: establishing a one-to-one correspondence between data variables and visual elements. Each data dimension gets mapped to exactly one visual property (like length, color, or position), and each visual property represents exactly one data variable. Without bijective mapping, viewers become confused. For example, if a single color represents three different variables, or if the same variable is encoded in both the bar height and the bar color, readability breaks down. Human Perception and Cognition in Visualization Understanding Pre-Attentive Attributes Human visual processing has remarkable capabilities. Some visual properties—called pre-attentive attributes—are processed so rapidly that we detect them almost instantaneously, without conscious effort or focused attention. The most salient pre-attentive attributes include: Length: Differences in how long a line, bar, or mark extends Position: Differences in where marks are located on an axis Orientation: Differences in angle or direction Shape: Different forms or symbols Color: Differences in hue, brightness, and saturation Humans can detect these differences in roughly 250 milliseconds—before you've even consciously looked at the graphic. In contrast, attributes like area (the size of a shape) and volume (the size of a 3D object) are not pre-attentive. Comparing areas or volumes requires conscious attention and interpretation, which is much slower and more error-prone. Why This Matters: Choosing the Right Visualization Type This is why certain visualization types work better than others: Bar charts leverage the pre-attentive attribute of length, making comparisons immediate and accurate Pie charts rely on comparing areas and angles, requiring more mental effort and producing more errors, even though they're commonly used This also explains why a line chart is excellent for time-series data (position and length on axes are pre-attentive) and why a scatter plot is effective for correlation (position on both axes is pre-attentive). This image shows why scatter plots work so well: the position of each point on both the $x$ and $y$ axes is immediately processed pre-attentively, making the correlation pattern apparent at a glance. When designing visualizations, prioritize pre-attentive attributes for encoding your most important data comparisons. This makes the visualization more intuitive and reduces viewer error. How Humans Perceive Patterns Beyond pre-attentive attributes, human visual processing is particularly good at detecting: Changes: We quickly notice when something shifts, increases, or decreases Differences in lightness: We easily distinguish between lighter and darker shades Shape variations: We readily identify different forms or patterns Boundaries and clusters: We naturally group similar items together Effective visualizations exploit these strengths. Time-series charts work well because humans easily detect trends (changes over time). Color intensity variations help us spot geographic patterns on maps. Clusters in scatter plots are immediately visible because humans naturally group nearby points. Matching Message Types to Visualizations Different types of data stories require different visualization types. Choosing the right visualization for your message is critical to clarity. Time-Series Messages A time-series message shows how a single variable changes over time. Examples: unemployment rate over decades, daily stock prices, monthly sales figures. Best choice: Line chart Line charts are ideal because the $x$-axis (time) and $y$-axis (value) are both pre-attentive, and humans naturally read left to right, making temporal progression intuitive. The continuous line emphasizes that time flows continuously. Ranking Messages A ranking message orders categories from highest to lowest (or vice versa). Examples: countries by GDP, states by population, companies by revenue. Best choice: Bar chart (horizontal orientation for long category names) Bar charts show ranking clearly through length, which is pre-attentive. Arrange bars in order (longest to shortest or vice versa) to reinforce the ranking. Horizontal orientation works especially well when category names are lengthy. Part-to-Whole Messages A part-to-whole message shows how components combine to make a whole, typically expressed as percentages or ratios. Examples: market share breakdown, budget allocation, demographic composition. Best choices: Stacked bar chart or treemap While pie charts are popular for this message, research shows stacked bar charts are more accurate because viewers compare lengths rather than areas or angles. Treemaps (as shown in img10) work well for hierarchical part-to-whole messages where you have multiple levels of breakdown. Deviation Messages A deviation message compares categories against a reference value or shows how much each category differs from an expected baseline. Examples: actual spending versus budgeted spending, performance versus industry average, temperature versus historical normal. Best choice: Bar chart (with a reference line at zero or baseline) Position the reference line clearly, then show actual values as deviations from that line. Some bars extend above the line (better than expected), others below (worse than expected). This makes the deviation immediately apparent. Frequency Distribution Messages A frequency distribution message shows how many observations fall within different intervals or ranges. Examples: distribution of test scores, age distribution of a population, distribution of company sizes. Best choices: Histogram or boxplot Histograms show the full distribution shape with bins representing intervals. Boxplots are more compact and allow easy comparison across multiple distributions. The choice depends on your audience and whether you want to show detailed shape or focus on comparing summaries. Correlation Messages A correlation message compares two continuous variables to reveal whether and how they relate. Examples: relationship between study time and test scores, height and weight, advertising spend and sales. Best choice: Scatter plot Scatter plots position each observation according to its $x$ and $y$ values, making the relationship pattern visible at a glance. Trends, outliers, and clusters all become apparent immediately. Nominal Comparison Messages A nominal comparison message contrasts unordered categories (categories with no inherent ranking). Examples: revenue by product line, survey responses by demographic group, sales by geographic region. Best choice: Bar chart (any order is acceptable, though grouping similar values can help) Unlike ranking messages, nominal comparisons don't require a particular order, so you might arrange bars by size for visual interest or group them logically for narrative flow. Geographic or Geospatial Messages A geographic message compares a variable across locations, showing spatial patterns. Examples: COVID cases by state, income by county, temperature by region. Best choices: Choropleth map or cartogram Choropleth maps color regions according to data values, making geographic patterns obvious. Cartograms distort region sizes according to data values, which can be striking but requires careful explanation since it distorts geography itself. <extrainfo> Historical Context Data visualization has evolved significantly over the past 50+ years. Early visualizations were constrained by printing technology and hand-drawing, yet often exemplified the principles discussed here. Modern tools make creating visualizations easier, but this doesn't guarantee they'll be effective—principles of design and human perception remain constant regardless of technology. </extrainfo>

Flashcards

According to Edward Tufte, what should graphical displays show without distortion?

The data

What should viewers be induced to think about when looking at a graphical display?

The data substance (rather than the graphic design)

What activity should graphical displays encourage the eye to do with different pieces of data?

Compare them

What levels of detail should a graphical display reveal?

From overview to fine structure

What are the four clear purposes that a graphical display should serve?

Description Exploration Tabulation Decoration

How should the data‑to‑ink ratio be maximized in a visualization?

By removing non‑data ink

In data visualization, what does the term "chartjunk" refer to?

Unnecessary decorative elements that do not enhance the message

What are two examples of chartjunk mentioned in the text?

Gratuitous three‑dimensional effects Separate legends that force the eye to move back and forth

According to the Congressional Budget Office, how should graphics be designed in relation to their surrounding report?

They should be able to stand alone

What does it mean for a visualization to be "readable"?

Viewers can understand the underlying data by making accurate comparisons or decoding legends

What is the requirement for bijective mapping in a visualization?

Each visual element must correspond uniquely to a single data variable

Why are bar charts generally more effective than pie charts in terms of human perception?

They use length (a pre-attentive attribute) rather than area

What chart type is commonly used to show how a single variable changes over time?

Line chart

Which chart type is typically employed to order categorical subdivisions?

Bar chart

What is the standard graph type for determining directional relationships between two variables?

Scatter plot

Quiz

According to Edward Tufte's core design principles, what is essential for a graphical display?

1 of 5

Key Concepts

Visualization Principles

Data‑ink ratio

Chartjunk

Pre‑attentive attributes

Bijective mapping (visualization)

Edward Tufte

Chart Types

Time‑series chart

Bar chart

Choropleth map

Scatter plot

Histogram

Definitions

Data‑ink ratio

The principle of maximizing the proportion of ink used to represent data while minimizing non‑data ink in a visual display.

Chartjunk

Unnecessary decorative elements in a chart that do not enhance the communication of information.

Pre‑attentive attributes

Visual features such as color, shape, orientation, and length that the human visual system detects instantly without focused attention.

Bijective mapping (visualization)

A design approach where each visual element corresponds uniquely to a single data variable, ensuring clear interpretation.

Edward Tufte

Influential statistician and author known for defining core principles of effective graphical displays and data‑ink concepts.

Time‑series chart

A line‑based visualization that shows how a single variable changes over chronological intervals.

Bar chart

A graphical representation that uses rectangular bars to compare quantities across categories or rankings.

Choropleth map

A thematic map that shades geographic regions according to the magnitude of a variable.

Scatter plot

A diagram that plots paired numerical values as points to reveal relationships or correlations between two variables.

Histogram

A bar graph that displays the frequency distribution of a dataset by grouping observations into intervals.