Introduction of Zan Armstrong
00:00:11
Speaker
Welcome back to the Policy Viz Podcast. I'm your host, John Schwabisch. I am joined today by Open Viz Conf Hero and designer of many other cool things that we're gonna talk about, Zan Armstrong. Zan, welcome to the show.
00:00:22
Speaker
Thanks. Thanks so much for having me. And we're in person in DC. Yes. That is a unique situation here for the show. So I'm glad we get to like actually hang out in person. Very cool. I am too. So I first saw you a few months ago now at OpenVizConf, did a really interesting talk on seasonality, which I want to get to.
Zan Armstrong's Background & Entry into Data Visualization
00:00:41
Speaker
But why don't we start by having you maybe introduce yourself for listeners, your background and maybe what you're doing now and then we'll roll it all together.
00:00:48
Speaker
Sounds good. It's been an interesting journey that makes a lot more sense when I look back on it than running through it in life. But I studied pure math at Williams. I taught high school in rural Maine for a couple of years, a lot of precalculus, which has come in really handy in using D3. I lived abroad in Sweden, studied some bioinformatics, didn't know what I was doing with my life, and drove to California as
Zan's Role at Google in Data Visualization
00:01:09
Speaker
I found my way to Google and after a year I transferred to a team called the revenue team and spent five years as a data analyst trying to understand everything I could about Google's revenue and lots and lots of time series analysis, lots of forecasting. And that process brought me to visualization. There was a great internal
00:01:30
Speaker
class taught by a woman named Cole who does the storytelling with data website. And she was actually at the time doing internal classes. She was at Google and teaching internal classes on database. And I went to one of her classes, blew my mind, loved thinking about that psychology of perception and how that influenced how we make visuals and just kind of started building that into my own work.
00:01:50
Speaker
as an analyst and kept getting more and more into it. And finally I was like, yes, this is the thing. And amazingly had the opportunity to spend six months in Boston working with Fernando Viegas and Martin Wattenberg's team, their DataVis research team. And while I was there, my job was to do something with DataVis that I could bring back to my revenue team that would make a difference for the team.
Detecting Simpson's Paradox at Google
00:02:11
Speaker
And what I built was a tool to help us identify when Simpson's paradox and mix effects were at play in our data.
00:02:17
Speaker
to help us understand when we should trust the aggregate data and when we should dig in deeper and when there's more of a story. And at the end of that Martin was like, I think we can publish this. And so we did. And since then I took some time off. That process also showed me how valuable it was to know your tools and I was really inspired by their team. I just really wanted to
00:02:38
Speaker
dive in head first on JavaScript. I've been mostly an R user before that. Lots of ggplot, which is amazing. But wanted to do the interactive, wanted to make things like that. So dove in head first on D3 and JavaScript. Did a bunch of projects on my own and started freelancing. And most recently I've been doing a bunch of work with statement, which has been in the spring. Great. Great. So let's come back to the statement project.
Recognizing Seasonality in Data
00:02:58
Speaker
Yes. I'm curious about that. That would be great. When you gave your talk at OpenVizConf, it was about seasonality. It was about how people really need to be paying attention.
00:03:07
Speaker
changes over time. Can you maybe give us a quick rundown of that talk? And then maybe some concrete examples, because I sort of feel like, yeah, everybody kind of knows what seasonality is. Maybe they don't know what selection bias is or omitted variable bias is. But seasonality, maybe they can get their head around maybe a little bit easier. And that was part of your talk, I think, why it's kind of intuitive to us. Yeah. So seasonality is just all the rhythms of our hours, our days, our weeks, our years.
00:03:32
Speaker
Our day or minute of day day of week and we give year and then kind of major category So those patterns things that happen and as well as like holidays and things that happen with that what happens on a Monday Tuesday Wednesday Thursday Friday versus a Saturday Sunday for example I mean these are things that I think they're so fundamental to us and we feel them so strongly that they're super easy to just accept and
00:03:52
Speaker
in our lives, but overlooking our data. And to think, yeah, of course it's there, but does it really matter? And so the talk was all about looking at situations of why we should pay more attention, why we should look at our data before we aggregate it, and think about how seasonality matters in data analysis. So things like being careful of months, because you may have some months that have four weekends versus five weekends, and it makes for unfair comparisons. If you have a strong day, a weak effect in your data. Or just looking at things at the minute level, I was looking at some birth data.
00:04:23
Speaker
I mean, you could see these amazing spikes in the number of babies born at different minutes of the day, um, based on when people were having C-sections and when those were scheduled. I think it's really interesting, um, to expose these rhythms and they often can actually have real life consequences around there was a CDC case of looking at tractor accidents, um, in the past. And they found that, um, a lot of the accidents happened right before lunch and right before dinner.
00:04:47
Speaker
And I can actually, looking at that and noticing that hourly data can change recommendations, can change people's behavior, can actually save lives. So it's not just a fun part, but can have real life ramifications.
00:04:58
Speaker
As you know, it is also super fun because I think it's an amazing way to see these patterns of our lives and of other people's lives. And you talked a little bit about some of the tools, technology, models, I guess, that some agencies use to adjust for seasonality. So you talked about the X-12, X-13? X-13 now, I think. X-13 now.
00:05:18
Speaker
So, but for folks who may not be thinking about the sort of BLS x13 model, like what are the basic lowest hanging fruit people should be thinking about when they're trying to adjust for seasonality? I think the biggest thing is just actually look at the data. Like don't aggregate your data away. See if you have minute data, if you have our data, take a look at that, put it on a chart. And it might feel like there's going to be a lot there because it's so much more data when you start multiplying times 24 minutes.
00:05:45
Speaker
But just look at it and see if there's anything there. And the birth date was actually really interesting for this because I first, I hadn't gotten the right data set at first and finally I got the data set and looked at the minutes and was like, whoa, I can't believe I hadn't looked at this yet. So number one is just look at it and go to that extra step and look at days. The second is year on year growth is a really great tool because a lot of, or week on week or whatever year, if you're looking at years,
00:06:15
Speaker
doing your own growth comparing to the previous period because whatever was happening then is probably what's happening now. So it kind of is a way of taking away seasonality just by having a fair comparison. Right. So you mentioned the paper you wrote with Martin and Fernanda earlier and in that paper you talked about other types of bias, submitted variable bias in one. Do you get a sense that people kind of grasp seasonality bias or seasonality issues? They can grasp it a little more easily because
00:06:40
Speaker
that makes sense to us, like it's warm during the summer and it's cold during the winter, like is that? Yeah, definitely. I think with seasonality, it's pretty intuitive. The places where I think it's more challenging is when it's not your own seasonality, when it's somebody else's seasonality.
00:06:52
Speaker
So I was a data analyst in California and explaining that weather really, really mattered if you were on the East Coast or in Dublin or someplace that had much more variable weather than California. So I think that's where, even when it's something graspable, it just really put in the empathy and experience of somebody else's. No, I think that's really interesting to think about how another culture might...
00:07:14
Speaker
work or sleep or harvest crops or whatever. That's really interesting. And that's actually another part, even if it's in your own country, but it's a different demographic than yourself. Like these people that are older or...
00:07:25
Speaker
younger or have different circumstances that you might have a different seasonality than you do. Very cool. So I'll put the link to the OpenViz talk on the site, but now I want to switch gears and talk a little bit more about some of the DataViz projects that you have. Two of particular interests that I think are great is the which is bigger project in which you let users sort of dynamically compare geographies. And then your weather circles project, which uses a spiral graph basically that now sort of everyone has seen. We've seen a different kind of one of those.
00:07:55
Speaker
But you had one of the earlier ones on changes in temperature.
Project 'Which is Bigger' and Map Projections
00:07:59
Speaker
So can we talk about the Witches Bigger project? Maybe explain it first and then talk about why you created it.
00:08:05
Speaker
Sure. So which is bigger is about comparing the sizes of countries in a map projection where countries are actually equal area. So a lot of the map projections we usually use distort the sizes of countries because there's other gains and there's other good reasons for that. But it uses a projection where there's actually equal area. You can play with two little globes moving them around and then you can see on an overlay
00:08:28
Speaker
Different countries compared in different parts of the world. Yeah, and you can see how Australia compares to the United States or how Saudi Arabia compares to Greenland Yeah, and just it turned out to just feel really lightweight and fun and playful Way of kind of exploring our world and there's a number of presets the things that I found that I thought were interesting as a kind of a giving a starting point Yeah, there's like 12 like certain like compare these two continents or these two countries to each other. It's really it's really interesting I'm curious that the d3 site Like boss oxide has a whole bunch of stuff on map projection
00:08:58
Speaker
Yes. So are you one of those people who's like has strong feelings on map projections and gets like deep into the weeds? I love map projections. I don't have strong feelings on like this is the right one. I have it's like all of data is where it's like the right tool for the right question. Yeah. But I do I do love the concept that there are all these different ways and I think it's a really I think actually cartography.
00:09:19
Speaker
really exemplified some has been long struggling with a lot of things that we struggle with and data is more generally around. There isn't one right answer. There's not one perfect map projection. It totally depends what you're doing and and why.
00:09:29
Speaker
and which projection you choose. For this project, I chose one that had equal area because that was fundamental to what the project was about. There's other downsides. If you zoom really far out, it's actually the same projection that the UN uses. For the UN, it makes sense that they use an equal area projection so that you don't overemphasize certain countries. Their logo actually uses that same projection all the way zoomed out. I was doing a more zoomed in version because I didn't want to have as much distortion for that country.
00:09:56
Speaker
because even though the areas maintain the shapes are actually historical a little bit. Oh, interesting. Okay. All right. We'll think about that in trying to... So, yes. Projections are awesome. Okay. Tip of the day. Projections are awesome. Okay. So that's a super fun project. And then weather circles. So recently there was a similar sort of spiral graph that, you know, got all this play on Twitter by Ed Hawkins that was on global climate change, but yours is a little, is a little more drill down,
Weather Circles Project
00:10:22
Speaker
a little more specific. So can you talk about, talk about that project a little bit?
00:10:25
Speaker
Sure. So this project actually came out of me losing a bet and being convinced that maybe I hadn't really lost the bet. So I mistakenly made the bet that it was hotter in January than it was in June in San Francisco. And I think my perception of this was actually in comparison because it's fairly warm compared to everywhere else in January and it's fairly cold in June. And I bet my fiancé this and he was like, no way you're totally wrong.
00:10:48
Speaker
Turns out I was totally wrong. So I looked at the data on WeatherSpark and was like, yeah, but I'm wrong. I was like, well, maybe it's about hour of day. Maybe at the hours when I'm perceiving the temperature, I'm right. Maybe at noon or 5 p.m. or something. So I got hourly data from NOAA and started graphing it. And I'm still wrong.
00:11:10
Speaker
The good news is it sparked a really interesting visualization because I started playing with this. This is really interesting. And it's not just weather. It's also cloud cover. So in the visualization, you can see you can choose different cities and you can look at one of five different weather metrics of temperature or cloud cover and it's radial.
00:11:26
Speaker
and you can actually animate and see it grow, and it's hour of day around the circle, and then day of year for the 365 days of the year, and it's the most typical weather over the last 30 years. So it kind of gave a different way of experiencing it, and you can both experience things that are familiar to you. People in San Diego love seeing the June gloom in the cloudy data, which I didn't even know was a thing. June gloom in San Diego, I mean. OK. Yeah. Yeah, OK. People from the East Coast really enjoy appreciating their cold winters, and showing them off.
00:11:56
Speaker
But it gives that way of both seeing what's familiar to you and seeing other cities that you aren't as familiar with.
00:12:00
Speaker
So did the weather circles project inspire the seasonality analysis or vice versa, or were you doing them simultaneously? So seasonality, both the seasonality and the Simpsons paradox research, both of those came out of things that I faced a lot doing forecasting analysis. Yeah. Okay. At Google. At Google. Okay. So it's basically if you take those two talks together and you listen to both of them, you've learned everything I learned in five years. Cramcorp. Sorry. Cool. Cool. Cool. And I think since then I've had a fascination with weather and thinking about
00:12:28
Speaker
seasonality, but here's a different way of playing with those themes. Yeah, just a lot of fun. Yeah. So are you really into weather? I mean, there've been a couple of really cool visualizations over the last, say, a year. I don't really know. A lot of really nice ones about global climate change. Bloomberg did one. There's a little more on scrolly-telling, and I know you have some thoughts on scrolly-telling, so maybe I'll give you an opportunity to talk. And, you know, Jim Vlandingham, who's been on the show before, he can listen with interest to what you're about to say.
00:12:56
Speaker
Actually, so yeah, I love when we started talking about some of the climate change
Scrollytelling in Data Visualization
00:13:00
Speaker
stuff. One of my favorite visualizations of recent years is the Eric and Blackie on Bloomberg did this great visualization on what's causing climate change.
00:13:09
Speaker
And one of these I actually love about that is they show the things that aren't, not just the things that are. So I think we get so used to seeing graphs of like, okay, sure, sure, sure, like that line matches that line, we're good to go. And I think what made that so compelling is the way they told the story and how they showed all the things that could be related, could be causing it. And when you look at the charts, you're like, yeah, that's not it. And just, I think it was really a really interesting concept and amazingly well executed. Yeah. So for those who haven't seen it, let's describe it quickly. So we've got,
00:13:38
Speaker
timeline, horizontal, we're going from some year, like 1900, maybe before that? Yeah, I can't remember if it's 1900, 1950. Okay, so we're starting there. We're in the past and to today, and we have a global temperature, sort of like the fixed static line that sits there. And then as you scroll vertically, all these different elements sort of pop up on top. So you've got volcanoes and you've got
00:14:01
Speaker
other things that now I don't remember. Sun, I think. Sun was one of them. Right, right, yeah. And so there are a couple of things that are interesting about it. For me, one was that they show sort of the confidence interval, which is great. And two is the animation. I thought it was really interesting that it's not just a line chart with five different lines on it. It actually animates across. And so there's this sort of tracking as your eye looks across the screen.
00:14:23
Speaker
And then it has this vertical scroll as you go through and it sort of adds. So how do you, what's your feelings on it? So scrolly telling, like everybody has strong feelings. So I'm curious how you feel about scrolly telling. So my feeling I think comes back to almost the same thing I said for projections. I just wrote a blog post about this having been inspired by a bunch of the conversations people have been having recently. And I think for me it's not about scrolly telling versus steppers. It being one is like the best. It's more about finding the right tool for what you're doing. I think in that case,
00:14:50
Speaker
They actually have a little bit of kind of a stepper feel on the right that you can kind of see where you are. And one of the things about steppers is they provide nice context potentially.
00:14:59
Speaker
Um, but the scrolling makes for this very kind of like seamless experience. So you don't have to make a decision of do I go on, you just kind of keep going on the way we do when we're reading. Um, so I think that was a really excellent execution. And yeah, in general, I think, um, you can, you can read my blog posts to find out more of my specifics, but it's a lot about, I think just understanding this potential strengths and weaknesses of, of one version or another, what you're going for, what's the right story.
00:15:21
Speaker
What's the right tool for what you're doing, right? And also that there's a bunch of actual examples out there in the wild that combine elements like this one does actually what has some of that stepper Context while also being a scrolling right? Okay. All right. We'll see how people are gonna fight you on that one. But okay So you mentioned execution is great too that was like yeah when they feel really smooth Yeah, it felt really smooth and it felt really natural to sort of see these things build and the other thing that I really liked about it is that it's
00:15:48
Speaker
building what I would suppose is a fairly sophisticated model underlying it and sort of just adding these components together. So A plus B plus C plus D equals this thing in a very visual and natural, as you said, natural way to interact with it.
00:16:03
Speaker
And I love, I think also, I think one interesting thing with scrolling is that it means that the animation or the transition is actually part of the content. It's not just a way to get from here to there. I think as you point out, part of what makes that magical is the animation. It is content because it's showing you that build. That build, yeah. Yeah, absolutely.
00:16:20
Speaker
Okay, so you mentioned the beginning that you're doing some work with stamen and I know you have a bunch of other projects going on So can we talk a little bit about what you're working
Collaboration with Stamen on Genetics
00:16:28
Speaker
on right now? You've been now you left Google Two years ago, okay, so now you're you're on your own doing all these very cool projects So if you talk a little bit about what you're up to for the summer
00:16:38
Speaker
I've been having a lot of fun working both with Stamen on a genetics-based project and actually with one on my own with some Yale researchers. And really interesting to do things that are tool-based for scientists. So kind of getting back to the analysis side of
00:16:52
Speaker
of building tools and it's a really different process to build a tool that they're going to use in different ways than to build a story or experiential experience. So it's fun getting to play with both of those. And we did one pretty recently, Atlas of Emotions was also a statement project that was commissioned by the Dalai Lama and the client was working with Paul Ekman, who also was the consultant for Pixar's Inside Out. And it was interesting thinking about
00:17:22
Speaker
this completely different project. How do you get a sense of emotion? How do you think about emotions and a way to explore that from a web visualization context? So were you attracted to work with Stamin because of their work with maps? I mean, is it a mapping-based project? Maybe if you can describe the project, the tool for researchers. Is it a mapping-based project? This was not a mapping-based project. So Stamin's been getting more into data visualization.
00:17:52
Speaker
over the years, coming out of that great cartography experience. And this project is really a matrix at the end of the day, comparing organisms to possible genes and metabolic pathways that they have. And it's about you take a sample of dirt or something, you sequence everything in it, and then you try to figure out what's there. And so it's helping, it's a tool for researchers to look at those organism categorizations and then the types of
00:18:22
Speaker
genes and enzymes that are, that have been categorized and try to understand what do these organisms do? What can they do? Can they do everything they need to do? If they can't, what other organism might be doing it? Is there some kind of symbiotic relationship? A lot of this is happening in like.
00:18:35
Speaker
the researcher we're working at loves to talk about dolphins' teeth. They have amazing ecosystems in dolphins' teeth. They don't brush their teeth, and they've got whole ecosystems in there. Okay. So how does that play with the Yale researchers? Is that altogether? Oh, these are actually separate. So the Yale researchers were some friends of mine, and that's a neuroscience-based. They're looking at
00:19:00
Speaker
genes that express in your brain. And one of the really exciting things is they actually have a lot of quantitative information in a field where they often have very qualitative information. This set of labs, this consortium has really good quantitative information about genes and what cell they're in. And that cell type is actually pretty unique too. So we're looking at giving a way of looking up and exploring this data that really takes advantage of the fact that it's quantitative and that they have cell-based, cell-type based information. So some pretty hardcore science
00:19:29
Speaker
type of visualization work, which is really interesting. Very good. Well, thanks for coming on the show. This has been really interesting, some really great projects you have worked on and are working on. It's great. I actually had one more thing for you. Yeah, okay. Let's do it. We were talking a little bit earlier about types of things that happen in data that might be like seasonality, we can get it, but there might be things that are a little bit less intuitive. I just want to give a quick shout out for Amelia McNamara did a great talk at OpenVizConf on do you know nothing when you see it?
00:19:57
Speaker
And then Jessica Halman did the visual uncertainty experience. And I think both of those play with really interesting ideas around what you actually see in data and if it's really there, and how do we think about uncertainty. We know uncertainty matters, we know stats and all that, but it's different. When you see a bar chart, it feels so concrete. So I think we actually experience the uncertainty. And then last, I think when data is missing, it's super interesting.
00:20:24
Speaker
And the data that you don't have, because we're so focused on the data that we do have and the story that's in there, but often the story is, well, what data didn't we actually get to collect? And Seth Stevens-Davodowitz did an amazing article on how Googling unmasked child abuse, where there was actually a lot of data about child abuse that was missing during the recession, because the agencies that collected the child abuse reports got defunded, and so that wasn't an opportunity.
Visualizing Uncertainty in Data Analysis
00:20:50
Speaker
So just a couple extra things about it, I love thinking about
00:20:53
Speaker
when we might be misinterpreting something. And these are three examples that are super fascinating. Oh really good. And I don't want to close up yet because you just like open the box. But I want to ask one more question then because you mentioned obviously a lot of things about uncertainty. And I'm curious whether you think some of the debate about how we visually represent uncertainty is an issue with data visualization or if it's an issue with people's basic ability or inability to understand what uncertainty is. Yeah I think
00:21:22
Speaker
And I know that's a big question, but you hear all the time, like, oh, boxing whisker plots aren't very good because people don't understand what they mean. And I always wonder, well, is that the chart type's fault or do people not understand what a percentile is? And I tend to lean towards that. Yeah. I think I actually think that it is, it is a lot about the not, that we don't fully understand the concepts because they're not as intuitive. I think for scientists who are a statistician who's grounded in this all the time, um,
00:21:48
Speaker
I think it's a great representation because they know what it's representing. And I think the question is, who is your audience and who are you representing this for?
00:21:58
Speaker
Why do they know? I mean, it's an abstract, like they're all abstractions, so do you know what the mark means? Like, you could debate in mathematics, like, how do we, like, is the multiplication sign really good, or is the square root sign good? Does square root really tell you about square root? Right. Yeah, yeah. It means something different to somebody who's like knows what a square root is versus somebody that might be introducing, experiencing square root for the first time. And if you know what a square root is, like, maybe it's not the perfect signal, but it tells you exactly what you need to know. Yeah.
00:22:23
Speaker
Whereas if you're trying to get the experience of a square root, maybe it doesn't give that. Yeah. Oh, that's interesting. Yeah. Well, it always goes back to the audience, I guess. Yeah. Yeah. Very good. Well, great. There's a lot of stuff for people to go through, and I hope they will. Some great projects and the great talk, of course, at OpenViz. So, Zan, thanks for coming on the show. Thank you so much. I really was happy to be here. Great to talk to you.
00:22:46
Speaker
And thanks to everyone for tuning in and listening to this week's episode. If you have any questions or comments, please let me know. So until next time, this has been the Policy Fizz Podcast. Thanks for listening.