visualization_lecture

Gestalt theory of data visualization

Examples

import matplotlib.pyplot as plt
import numpy as np
# Sample data
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.choice(['red', 'blue'], size=50)
plt.scatter(x, y, c=colors)
plt.title('Scatter Plot with Similarity Principle')
plt.show()
import matplotlib.pyplot as plt
# Sample data
categories = ['A', 'B', 'C', 'D']
values1 = [5, 7, 3, 4]
values2 = [6, 8, 4, 5]
x = np.arange(len(categories))
width = 0.35
plt.bar(x - width/2, values1, width, label='Group 1')
plt.bar(x + width/2, values2, width, label='Group 2')
plt.xticks(x, categories)
plt.title('Bar Chart with Proximity Principle')
plt.legend()
plt.show()
import matplotlib.pyplot as plt
# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('Line Graph with Continuity Principle')
plt.show()
import matplotlib.pyplot as plt
# Sample data
sizes = [30, 20, 25, 15]
labels = ['A', 'B', 'C', 'D']
plt.pie(sizes, labels=labels, startangle=90)
plt.title('Pie Chart with Closure Principle')
plt.show()
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Sample data
data = np.random.rand(10, 10)
sns.heatmap(data, cmap='coolwarm')
plt.title('Heatmap with Figure-Ground Principle')
plt.show()

🤔❓ Applications to modern data science

Visual perception principles

Use of colour in data visualization

🎮 Colour activity

💡 Glyphs and visual variables

Change blindness

Optical illusions

🤔❓Retinal variables

Associative vs. Dissociative visualizations

Vibratory effect in point representations

vibration

Orientation variation

page 14 of PDF here - Semiology of Graphics by Jacques Bertin

chaos in orientation

Ancient data science

Shape variation

🤔❓ Semiotics of data visualization

Semiotics is the study of signs 🪧 and symbols 🧩 and how they communicate meaning. When we apply this to data visualization, we are looking at how visual marks (like points, lines, and areas) serve as “signs” for data values.

At its core, visual semiotics involves three main components:

  1. The Signifier 🎨: The physical form of the sign (e.g., a red bar on a chart).
  2. The Signified 📊: The concept or data the sign represents (e.g., “Company Expenses”).
  3. The Code 📖: The set of rules that allow a viewer to connect the two (e.g., “longer bars represent larger amounts”).

By understanding these relationships, we can design visualizations that are more intuitive and less likely to be misinterpreted.

  1. Bertin’s Visual Variables 📐 Discover the “alphabet” of data visualization—size, value, texture, color, orientation, and shape—and how they function as signs.
  2. Levels of Measurement 🔢 Learn how the nature of your data (nominal, ordinal, interval, or ratio) dictates which visual signs are the most effective.
  3. Visual Rhetoric & Context 🏛️ Explore how cultural associations and the way we frame “signs” can subtly influence or even mislead an audience.

Jacques Bertin, a French cartographer, changed the way we think about charts in 1967 when he published Sémiologie Graphique. He proposed that every data visualization is built from basic marks (points 📍, lines 📏, or areas 🟦) that we change using specific visual variables.

Think of these variables as the “alphabet” of your visualization. Just as letters form words, these variables form the “signs” that readers decode to understand your data.

The Original 7 Variables

Bertin identified seven primary ways we can vary a mark to convey meaning:


How Our Brains Process These Signs

Concept: 🧩🚀 Not all variables are created equal. Some are better at showing differences (like Color Hue), while others are better at showing amounts (like Size).

If you use the “wrong” variable for your data—like using different shapes to represent “How much money was spent”—the viewer’s brain has to work much harder to decode the sign.

🎮🛠️ Exercise

To see how this works in practice, let’s imagine you are designing a map of a city. You want to show two things:

  1. Where the parks are located.
  2. How many people visit each park per day.

Which visual variable would you choose to represent the number of visitors at each park, and why?

Solution:

Since the number of visitors is a quantitative value (meaning it represents a specific amount), we want a variable that our eyes naturally perceive as “more” or “less.”

When we look at a map, our brains are very good at comparing magnitudes if they are represented physically.

Between Size 📏 and Shape 🔺, which one do you think would allow a reader to immediately see that one park is much busier than another?

To represent the number of visitors, the best choice is Size 📏 (specifically, the area of a circle or the length of a bar).

Here is why:

Magnitude Perception 🧠: Our brains are naturally wired to associate “larger physical space” with “larger quantity.” If you see a giant circle next to a tiny dot, you immediately know one has more visitors without even looking at a legend.

Ordered Nature 📈: Data like visitor counts is quantitative and ordered. Size has a natural order (small to large) that matches the data (few to many).

The Problem with Shape 🔺: Shapes are “nominal.” A triangle isn’t “more” than a square; it’s just different. If you used triangles for 100 people and squares for 1,000, the viewer would have to memorize a complex code to understand the map.

Exercise: Testing Another Variable

Now, imagine the city wants to show the type of park (e.g., Playground 🧸, Nature Reserve 🌲, or Sports Field ⚽).

In this case, we aren’t showing “how much,” but “what kind.” Which of Bertin’s variables—Value (lightness/darkness) or Color Hue (red, blue, green)—would be most effective for distinguishing these categories?

Solution

Spot on! Color Hue (red, blue, green, etc.) is the perfect signifier for categories because our eyes see these colors as “different” but not necessarily “better” or “more” than one another. 🎨

In semiotics, we call this a Nominal variable. It establishes identity without establishing a rank.


Hue vs. Value: The Hierarchy Trap

While Hue is great for “what kind,” Value (the lightness or darkness of a color) carries a hidden message. Our brains automatically interpret darker or more intense colors as “heavier” or “more important.”

Variable Best For… Example
Color Hue 🌈 Categories (Nominal) Different colors for different subway lines.
Value 🌓 Intensity (Ordinal/Ratio) Darker blue for deeper water, lighter blue for shallow.

If you use different Values of green for park types, a reader might accidentally think the “Dark Green” park is “better” or “larger” than the “Light Green” one, even if you didn’t mean that!


Putting the Alphabet Together

Now that we know the “letters” (Size, Hue, Value, etc.), the real magic happens when we combine them. This is where we create a Visual Language.

Imagine you are looking at a weather map. 🌡️ It uses a Color Gradient (a “Heatmap”) to show temperature across a country.

Here is the puzzle: Why do we almost always use Red for hot and Blue for cold? Is there a mathematical reason, or is this a different kind of “sign” altogether?

Isarithms (Contours)

pg 18 of PDF

Value Variation

Resources