Zan Armstrong

How I think about making data visible (and useful)

These articles/talks show how I approach creating charts and how I think about data. If you like these, you'll like working with me.

Stop Aggregating Away the Signal

Stack Overflow's blog "The Overflow"

Published on Stack Overflow's blog, this article shows how "by aggregating our data in an effort to simplify it, we lose the signal and the context we need to make sense of what we’re seeing."

Links

Blog Post

Analyzing Time Series Data visualizations

Featured on Data Viz Today podcast

Make "the Important" Visible

Nightingale Magazine

Published in the "Guidelines" issue of the Data Visualization's Nightingale Magazine, this article shows readers how to make more effective data visualizations by ensuring that "whatever is most important about their data" is actually made literally visible in some way. It's not as obvious as it seems, but when applied it is extremely powerful.

Links

Make the Important Visible (PDF)

Nightingale Issue 3: Guidelines

Beyond Explore & Explain

Outlier Conference 2025

Beyond Explore and Explain: Diagnostic Charts and the Question Dimension

This talk expands the classic explore/explain paradigm of data viz by adding "diagnostic charts", the charts you use again and again each time you have new data come in. Discover how focusing on the questions you need answered can make your data visualizations more effective and impactful.

Presentation

Video

Slides

Selected work

OpenVisConf

Everything is Seasonal

In this talk you'll learn why your commute is worse in November than June, why so many babies are born around 8:30am, and why it's almost always a terrible idea to look at monthly data.

Presentation

Video

Slides

Information is Beautiful - Silver Award

Why are So Many Babies Born Around 8am

Scientific American

It is common to aggregate data over time when striving for simplicity, but sometimes the story is revealed in the details: minutes, hours, days, and weeks.

Publication

Scientific American

About the Viz

Kantar Awards

Featured on Data Viz Today podcast

Zan's Story Behind the Viz

Nadieh's Story Behind the Viz

Recreated in Play Doh by Amy Cesal

Distill.pub

Activation Atlas

The Activation Atlas helps us see through the eyes of a model, not just how it responds to a specific input but to provide a map of what is "seeable" at each layer. This work was led by Shan Carter; I contributed to the interactives and analysis.

Published on AI research's Distill.pub, and exhibited in Ars Electronica's Understanding AI exhibit.

Links

Activation Atlas

Understanding AI Exhibit

SF Moma, Cooper Hewitt, Cube

Metagenomics

with Stamen Design, for UC Berkeley

Originally created for researchers at UC Berkeley to study the genetics of micro-ecosystems like the gut of an infant, the inside of a nuclear reactor, or between a dolphin's teeth, this visualization was also embraced by the art & design community.

My favorite part were the bold colors. They met a scientific need: despite being close numerically, the values 0 and 1 were shown in black and bright colors, respectively, due to the scientific context that "none" and "at least 1" were conceptual opposites.

Stamen Design Blog

SF Moma: Designed in California

2019 Design Triennial Announcement

IEEE InfoVis

Research: Visualizing Mix

Comet chart for visualizing Simpson's Paradox and mix effects

Simpson's Paradox is an extreme version of a notoriously confusing effect in which aggregate numbers seem to contradict more detailed statistics. A "comet chart" is a special type of scatterplot to reveal "changes in mix". In addition to showing the metric of interest on the y-axis, it also shows the size of each subgroup on the x-axis. It's immediately obvious if comets are all "streaking together", going all different directions, or if there is some correlation between where the comets are spatially and how they are changing.

Stanford's d.school

Taught "Design of Data"

Design of Data course at Stanford d.school

Created by student Sarah Ann Li-Shen Teaw

In collaboration with Catherine Madden and Irene Alisjahbana, I taught a 10 week course in Stanford's d.school in fall 2024. Our students included both undergraduate and graduate students.

Links

Stanford d.school

Example of student work

Outlier, Data Visualization Society

Three Simple, Flexible "Tools"

This talk introduces the three simple, flexible "tools" of data visualization: making color meaningful, using small multiples, and making order matter. Furthermore, it shows *why* these tools are so powerful to reveal information that might have been visible, but not noticeable or attention-grabbing. Most importantly, it demonstrates how practioners can use these principles to bring their own domain expertise or understanding of the question into their visualizations, and to make their implicit assumptions explicit.

Links

Presentation

Featured visualizations

Data Visualization for Science, Analysis, & Decision Making

While some data visualization specialists are focused on effective communication and storytelling with data, much of my work is aimed at helping domain experts better use data to inform decisions or make new discoveries. For example, as a member of the Google Applied Science team I contributed to research on drug discovery, materials science, automating insight from biological images, fusion, improving scientific computing with machine learning, and more. While I sometimes consulted on scientific communication, my focus was not about communicating results. Rather, it was about working with the scientists during the discovery and analysis process: advising, prototyping, performing viz-based analysis, and building analysis tools. Success was defined as visualizations changing the scientific decisions (which model is chosen, what experiment to perform, how to change experimental conditions, etc.), directly leading to a discovery, or using scientific data to inform a "real-world" decision.

Searching for New Malaria Drugs

Google Applied Science: Malaria

Malaria drug discovery data visualization

Over the course of three years, I worked closely with scientists and ML experts as they built models to create new methods in the effort to find new anti-malarial drugs.

In the scientific context, authorship is reserved for contributions to the scientific results. So, my inclusion as an contributing author indicates the critical role visualization played in this work.

Publication

Science Advances

Analyzing Time Series

Observable: illustrating techniques

At Observable, I led this project to illustrate how we can literally change how we look at our data to more effectively analyze data heavily influenced by hour-of-day, day-of-week, and season-of-year patterns. This work was done in collaboration with Ian Johnson and Mike Freeman. Through a series of 6 "stories" we showcase techniques that can be applied to any dataset with values measured each hour or day.

Publication

Analyzing Time Series collection

SCAN wastewater to track disease

Stanford University

SARS-CoV-2 wastewater monitoring visualization

When infected with a disease, we poop fragments of the viral RNA. Sampling sewage and isolating/sequencing the extracted genes therefore provides unique insight into community-level transmission of infectious diseases, including Covid-19. Because "everybody contributes" their poop to the sewage system, tracking viral fragments in wastewater provides data that doesn't rely on individual people going to doctors to get official tests.

This LA Times article demonstrates the impact that this data has already had on the decisions of public health officials and of doctors making decisions about which drugs they should use to treat patients based on which variant is predominant.

Changing How Scientists Create Viz

Google: Developing & Teaching Best Practices

Visualization best practices guide for scientists at Google

My work is not just about specific charts, but also about identifying new best practices and changing how researchers think about using viz more effectively in their own work.

Testimonial

"That was probably one of the most valuable hours that I spent in my entire time at [Google] X, completely shifted my thinking on how I do figures and what is important. Zan's skillset is unique in my experience."

- Sylvia Smullin - Physicist and ML Researcher

Gene Expression in the Brain

Yale University

The design emphasized the importance of the two features that made this dataset special: quantitative accuracy and gene expression assayed by cell type.

This visualization played a role in the analysis leading to A multiregional proteomic survey of the postnatal human brain published in Nature Neuroscience

Link

Live Tool

Contributing to Top Tier Research

Google Accelerated Science: Materials

Materials science research visualization with 48 line charts

My first attempt was a beautiful, interactive 3D surface. Unfortunately, it was useless for analyzing the data. Instead, these 48 line charts show 10x more data and focused attention on the most important attributes of the data. Most importantly, the tool informed key research decisions and led to discovering an unexpected scientific phenomenon.

My contributions led to being an author in scientific research published in Proceedings of the National Academy of Sciences (PNAS), the world's second most cited scientific journal.

Published Research

Discovery of complex oxides via automated experiments and data science

Discussing the visualiation techniques

Data Vis for Analysis

Informing ML feature space

Google: Viz + ML

ML feature space visualization for algae cultivation optimization

A common problem in ML is understanding the interconnected relationships between 3 to 20 potential features in a dataset to identify what combinations of features most affect a "score" variable. My key data visualization insight was identifying that if we discretized the continuous score function into "bad, meh, good, better, and best" categories we could much more effectively use color and small multiples to scan the space of pairwise interactions. This was especially important in this study where the ML algorithms informed growing conditions for algae, so measuring the effectiveness of a particular set of parameters was costly and time-consuming because it required actually growing the algae in the lab.

Research

Machine Learning Optimization of Photosynthetic Microbe Cultivation and Recombinant Protein Production

Viz for Scientific Communication

Google: Advising

Scientific communication visualization example

While my primary role on the Applied Sciences team was creating visualizations for analysis & discovery, I also advised researchers across the team on how to more effectively communicate their results.

The "curse of knowledge" bias is especially challenging in transitioning from analysis to communication: once you know what's important about the data in the chart, it's hard to imagine that someone else looking at the same chart wouldn't see the same thing.

Featured Publications

Optimization of Molecules via Deep Reinforcement Learning

A Bayesian experimental autonomous researcher for mechanical design.

Machine learning on DNA-encoded libraries: A new paradigm for hit-finding

Quantum Optimization with a Novel Gibbs Objective Function and Ansatz Architecture Search

Investigating Quantum Approximate Optimization Algorithms under Bang-bang Protocols

Deep learning and automated Cell Painting reveal Parkinson's disease-specific signatures in primary patient fibroblasts

Covid Response

Google: Data Viz Hub

Google Community Mobility Reports visualization

In March or 2020, I founded and led the Data Viz Hub: a group of Googlers with data viz skills lending their expertise to projects across Google related to Covid-19. For example, our group contributed to the Google Search and News teams' visualizations of covid statistics.

Additionally, Adam Pearce and I collaborated to create visualizations and thought questions to contextualize the data in the Google's Community Mobility Reports.

For Fun & Curiosity

Weather Circles (and Lines)

Mark Twain was wrong.

Weather Circles: hourly weather patterns by city and month

I lost a bet, but made a fun viz. It is colder in January than June in San Francisco, every single hour of the day.

See it Live

Weather Circles

Weather Lines

Which is Bigger?

Created because Africa is really big

Everyone who's watched West Wing knows that Greenland is not as big as it seems in mercator projections. But, do you have a good intuition for the relative sizes of Saudi Arabia vs. Alaska or Europe vs. Antarctica?