google accelerated science

data viz for scientific discovery

Since July 2017 I’ve been working with the Google Accelerated Science team. The team is “taking advances in machine learning and artificial intelligence and applying them to accelerate progress in natural science: biomedical research, chemistry, and material science.”

Often data visualization specialists are sought out to explain or communicate results. Instead, my goal on this team is to use visualization to make scientific discoveries.

I work with the scientists, machine learning experts, and statisticians to understand what is most important to them about the data and what questions they have. Then I create visual representations of the data designed with respect to their datasets or data questions. A sign of success is if they look at the visualized data and say “huh… that’s strange” as they notice something they didn’t expect, leading them to ask new data driven questions. This is, of course, an iterative process.

My core principle is designing visualizations with respect to the goals, context, and constraints for a particular project.

My work includes four main approaches:

* Advising: talking with a scientist about a chart or graph that they’re working on, asking questions to understand what is most important to them about the data, and suggesting ways to improve the chart to better show them those features of the data. Usually works best when the core chart form works well, but there is a lot to be gained from paying close attention to the details of the visual representation of the data (color mapping, spacing, etc).

* Prototyping: if it’s not yet clear what the type of graph should be, I’ll prototype chart forms to help show a space of promising possibilities.

* Interactive Tool building: Often we’ll want to do the same type of analysis over different, but related, data. The scientists might have a set of questions they want to answer for each new batch of data, but what they learn from the data will vary based on what comes in. In this case, it might be worth investing in creating an interactive data visualization based tool. A core challenge is ensuring that if there is something important in the data, that they will see it.

* Synthesis & Teaching: As I discover generalizable principles through working on varied projects, I synthesize these insights into presentations so that scientists and analysts can build more of these ideas back into their work on their own. An example of this is my presentation on Data Visualization for Analysis and Discovery at the 2018 SciPy conference.