Idealized example of the comet chart. This contrasts an "expected" case, in which all subgroups increase in value similar to the aggregate, against the pure "mix effects" case, in which increases or decreases in the relative size of subgroups causes increases in the aggregate value.

Idealized example of the comet chart. This contrasts an "expected" case, in which all subgroups increase in value similar to the aggregate, against the pure "mix effects" case, in which increases or decreases in the relative size of subgroups causes increases in the aggregate value.

Example of the comet chart implemented in D3, using real data about the relationship between the number of babies born at a given death rate vs the number of deaths per 500 babies born, broken into different birth weight categories and location of birth. 

Example of the comet chart implemented in D3, using real data about the relationship between the number of babies born at a given death rate vs the number of deaths per 500 babies born, broken into different birth weight categories and location of birth. 

Research

Visualizing Statistical Mix Effects and Simpson's Paradox was published in the Proceedings of IEEE InfoVis 2014 (23% acceptance rate) and presented at InfoVis in Paris, in November 2014. In this paper, "We discuss how “mix effects” can surprise users of visualizations and potentially lead them to incorrect conclusions" - Abstract and paper.

As a senior financial analyst at Google, I identified that there was a critical need for better tools for understanding how "mix effects" and Simpson's paradox affect how we interpret data. I collaborated with Google’s Data Visualization research group led by Martin Wattenberg and Fernanda Viégas to tackle this from a visualization perspective. 

In 2016, I was Invited by Carlos E. Scheidegger, University of Arizona, I presented this research at the Joint Statistical Meetings in Chicago as one of 5 speakers in the Recent Advances in Information Visualization session. 

Moritz Stefaner and Enrico Bertini’s Data Stories podcast commended the work as "really, really interesting." On his Eager Eyes Blog Robert Kosara, recommended the research noting that these issues are "an important consideration for aggregate visualization, which is common given today's data sizes."  

Check out the slide deckD3 comet charts example, or R example code and charts