Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #73: Xan Gregg image

Episode #73: Xan Gregg

The PolicyViz Podcast
Avatar
173 Plays8 years ago

On this week’s episode of the show, I’m excited to chat with Xan Gregg, Data Visualization Development Director for JMP at SAS (see below). Xan regularly creates visualizations and makeovers for all sorts of interesting topics, as well as, of...

The post Episode #73: Xan Gregg appeared first on PolicyViz.

Recommended
Transcript

Intro and Sponsorship

00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:00:19
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.

Introducing Zan Gregg

00:00:49
Speaker
Welcome back to the PolicyViz podcast. I'm your host, John Schwabisch. On this week's episode, I'm very excited to have with me as a guest, Zan Gregg, who is Research and Development Director at JUMP, which is a division of SAS. JUMP, as you may know, is a long-time supporter of the show, so I'm really glad to have Zan on, not only because
00:01:08
Speaker
of the support from Jump, but also because he does a lot of very interesting work with lots of different things. And if you don't follow him on Twitter, you really should because he's got a lot of great resources and insights on how to use Jump and a variety of other tools. So, Zann, thanks so much for coming on the show. How are you doing? I'm doing good. Glad to be here.

Overview of JMP and Its Focus

00:01:27
Speaker
I'm really excited to talk to you, especially about some of the work you've done making over different visualizations. And I want to get your take on some of your, maybe you have some pet peeves and data visualization. Maybe you don't, we'll see. But before we dive into that, can you maybe introduce yourself for folks at home and maybe talk about jump a little bit and what it can do with respect to data visualization? Okay. Glad to. I'm a software developer by trade. And I sometimes say I'm an amateur data scientist. I've been focusing on.
00:01:56
Speaker
more sort of data analysis and data visualization software for the past 15 years or so. And so part of that is actually not just writing the code, but exercising it and trying out things. Jump is a division of SAS, which is a big stack company. But Jump itself, it's spelled JMP, is a desktop product for Mac and Windows. And it does statistics with a focus on interactivity and visualization. One of the founding principles of Jump is that
00:02:26
Speaker
every statistic has a graph. And so that makes it easier for people that aren't statisticians. Our main focus is more scientists and engineering. It makes it easier for them to understand what they're seeing and everything is sort of personal and right in front of you as opposed to being somewhere in the cloud and more enterprise kind of product. So that's a little bit about what JMP is. It's mostly UI driven. We have a scripting language for when you want to
00:02:53
Speaker
Do something, make something reproducible so that you can run it again the next day. Basically, you can use the UI, discover something, save it to a script, and make it repeatable. So we like that combination. Some people never go into the scripting. Some people are just gung-ho scripting all the time.
00:03:15
Speaker
But that's, you know, a little bit of jumping a nutshell.

Understanding Graph Perception

00:03:17
Speaker
Yeah. So you said that all statistics have a graph and yet yesterday I watched a video of one of your talks where the title was, all graphs are wrong, but some are useful. Which I'll post on the show notes page, but can you talk a little bit about, talk a little bit about that talk and your perspective on that? Sure. First of all, for people that aren't, you know,
00:03:41
Speaker
don't follow statistics, the title is a reference to this famous statistician, George Box, aphorism, all models are wrong, but some are useful. And I sort of took a play off of that. The idea being, you know, graphs, you know, like models are just representations of the world. So they're, you know, they're wrong in that sense that they aren't exactly the world, they're some approximation. But more, more to the point of what I, what I spoke about was about how, you know, there's a lot of
00:04:10
Speaker
error in graphs, not just the data. The data has error and variation, but also our perception. We don't see exactly what the data is behind the graph. You've probably seen the perceptual studies, like the seminal one by Cleveland and McGill, where they ranked visual attributes. But even then, the highest ranked attribute position still had a 3% error, and the lowest was not even that much worse. It was like 12% or 15% or something.
00:04:38
Speaker
So they all, you know, all graphs, even the best ones have some perceptual error and it's just something, it's good to keep in mind as we're making graphs, they all have some kind of message and we have to realize that there's some loss of information just by the fact that our perception system is not perfect. It's not a, our eyes are not a camera and our brain is not a computer. Those are useful models, but they're not exactly right. Yeah.

Designing Effective Data Visualization

00:05:06
Speaker
When you're working on developing a data visualization software,
00:05:14
Speaker
permitting or letting the user choose a graph or guiding them through a graph. Are you thinking about these sorts of perceptual issues? Are you trying to guide them into using one particular graph type over another? For example, if I go into the Excel dropdown menu, I have 3D cones and I have all this sort of things that I would argue we shouldn't be using. But are you thinking about these perceptual issues as you're designing the software to guide the user to certain things and maybe away from other things?
00:05:42
Speaker
Definitely. Probably William Cleveland's books on visualizing graphs. Visualizing data, I guess, was a call. It was very important for me in recognizing the importance of defaulting to basically dot plots as the default view for almost any data set. One of the parts of jump that I'm responsible for is called Graph Builder, which is a general purpose to make a graph from data. And it has to pick some default view, and it normally uses a dot plot unless you've got
00:06:12
Speaker
millions of points. In that case, it might summarize it with a box plot or something. So it's definitely informs the defaults and it also informs the priorities. We don't even have 3D cones. Maybe if we had as many developers as Microsoft does, we might add it to the bottom of the list. But given that we have limited resources, we don't even get to the bottom of the list. It definitely plays into defaults and priorities.
00:06:37
Speaker
And how are you thinking about your user base as opposed to say Microsoft, right? So Microsoft's user base is basically everybody, whereas a user of JOMP or user of SAS is a different type of user, you know, is programming in a statistical language. How are you, how do you sort of segment the audience in that different way? Well, they're not all programmers for sure. A lot of our users are just using the menus and doing analyses or
00:07:05
Speaker
you know, just creating graphs or usually both. But they're a little more technical, which is, in a way, it's a bit of a crutch for us, because we don't have to make things as generic, the language and things like that. Sometimes we can use technical language, which we wouldn't be able to do if we were doing it for, you know, everybody in the world. So that's, that's the main difference is they're just a little more scientific. But as far as, you know, there's some plenty of things that are universal, we don't, you know, just
00:07:35
Speaker
UI design and basically trying to meet expectations is the important thing for us. Some things we do might be a little tricky, but once you learn the jump wave, we
00:07:50
Speaker
want them to always live up to that way. Right.

User Priorities and Software Versatility

00:07:54
Speaker
And do you find that there are certain things that your users prioritize over other things? I mean, they are prioritizing speed versus elegance of the final product. Are there things that you find that they really prioritize? It's pretty broad. I mean, different users have different priorities. Speed is definitely a priority for us. And being able to handle
00:08:18
Speaker
unlimited amount of data. Well, we're limited by the amount of memory in a machine, not much of a limit these days. Some users are wanting to have the most advanced, you know, stat techniques, you know, the gradient boosting and things like that, which we have. Some of them want a very specialty kind of, you know, just part of the application, like they want control charts or survival analysis or things like that. They're a little more specialized. Right. But it's generally, you know,
00:08:47
Speaker
You know, because of that diversity, a lot of companies just like the fact that jump, you know, has a, has a breadth of all those things, but still with one sort of core focus. Sure.

Remaking Graphs: Approach and Ethics

00:08:59
Speaker
One of the other things that you do, or at least that I see publicly, aside from work on Jump, are the makeovers you do. I'll post a number of different ones that we've been sort of talking about offline onto the show notes page. I think the one that I found interesting for a couple of different reasons is the Wall Street Journal measles heat map that you remade in June.
00:09:25
Speaker
Just because I think that's a really interesting example of an alternative to way to show a fairly standard time series graph. Can you sort of walk through for us what you're thinking is when you are remaking a graph? What are your goals? What are you looking for when you are going to remake something? What goes through your mind is you say, this is a graph I want to remake and write a blog post about it and share the world with the world. I want to share this thing.
00:09:55
Speaker
Okay, yeah, that was a lot of fun. That was one that a lot of people, actually several folks wrote about and did some a little bit of criticism and commentary. So that one should be widely known. But it was a heat map of measles rates in the US over time with every state and every year getting a colored cell. Well, just to step back a little bit on the general question of why and several reasons. One is just
00:10:23
Speaker
to educate myself, practice the craft, basically, trying to see if I can create what they created or I can do something a little different or a little better, explore, and share that with other people if it does end up being something that might be better or might be more suitable if you had a different message with the same data, for instance. And I also want to basically sort of QA the graphic to make sure it came out right, which sometimes I've done
00:10:49
Speaker
done, you know, tried to redo things and it turned out to find an error in the original. So that's gratifying. But the Wall Street Journal heat map was interesting because the data was different than I thought it would be when I got the data from their source, because they had a lot of missing values in it, which they didn't focus too much on in the original. I mean, they indicated when the whole year was missing, but not when half the year was missing. And that sort of, you know,
00:11:18
Speaker
brought up an interesting question of what do you do when half the year is missing. Do you assume it was zero for that half the year or assume it was the same as the half the year you have? So there's all different ways you might want to show that or indicate it or throw out the whole year or whatever. So I think that's one of the valuable things about trying to do the makeovers is you realize it's a little more complicated than it looks. I mean, the data is a little more complicated. It's not as clean as it looks or as you would expect.
00:11:46
Speaker
And you might discover why they did certain things. When I first saw that heat map, actually, you know, I first saw it, it looked, you know, I thought, wow, this is great. You know, the country had measles and then it didn't have measles. So it works in that sense. But then I sort of
00:12:03
Speaker
didn't like the color scheme a little bit, because it goes from blue to green to red, or maybe yellow to red. And it wasn't quite perceptually aligned. And I thought, I'll try something that's a little more sequential color scheme. And then you realize, well, they did it that way, because if it's skewed data, there's very high rates at the beginning of the century, and then low rates after that. And if you just did a plain sequential scheme, you wouldn't be able to see both of the amount of variation
00:12:33
Speaker
So I can understand a little bit why they did the multiple colors. Maybe not. I've never understood why they mixed in green in there. I can imagine just going, you know, light blue to dark blue and light red to dark red or something. But those are the kind of things that come up as you try these makeovers.
00:12:48
Speaker
There's a lot of these, well not a lot, but there's a few of these projects out there with makeovers. I think right now the most popular one is probably the Monday makeover project from a couple folks at Tableau. There's also my Help Me Vis project. There's cross-validated. And there's other sorts of things out there, people doing makeovers. And a couple years ago, Fernando Villegas and Martin Wattenberg wrote a post about the responsibility of the person doing the makeover.
00:13:15
Speaker
So when you are doing a makeover or you're seeing other people do makeovers, how do you feel the person doing the makeover? What are the responsibilities or even the ethics of the person doing the makeover with respect to the person who created the original? I mean, a lot of the things that we are making over are we're not exactly asking the original creator, hey, do you mind if I take a look at this? Do you mind if I edit it or do a makeover?
00:13:40
Speaker
We're just sort of taking and saying, I think this could be improved. So do you think there are challenges there? Do you think there are ethical issues there? I mean, where do you sort of see this thing about making graphs over without sort of having a communication with the original creator? Yeah, I haven't thought about it from that angle. When I do a makeover, I make sure to add attribution to the original. And usually when I show a picture of what I'm making over, I try to make it a sort of a thumbnail picture just so
00:14:10
Speaker
I don't want to make it look like I'm trying to get attention by posting their graph, essentially. So that's about as far as I go in that angle. I have contacted a few folks when I was doing makeovers, but it wasn't for that reason. It was usually because I wanted to figure out why they did things in a certain way or get their original data.
00:14:34
Speaker
But I haven't actually got much success with that, so I haven't been doing it lately.
00:14:50
Speaker
You don't see a lot of critics, I don't think, sort of taking a passage out of a book and saying, I didn't like this passage this way, and so I would rewrite it this way. But in the field of data viz, we tend to do that. We say, I don't like it this way. I'm going to try to remake it this other way. And in some senses, I don't know if it's better or worse. In some ways, we're trying to put our own spin on it or take a different view, or at least, I guess, put ourselves on the line a little bit, right? Because we're saying, this is what I would try to do.
00:15:20
Speaker
And I think it's better, but also maybe not. Yeah, that's right. It's not necessarily better, but you're just partly just showing some other ideas or other angles. But I think it's, I mean, I feel like you're sort of like a level above normal criticism because you're actually not just saying, you know, these colors are bad, these labels are bad or whatever, you're actually trying to redo it. So I think in a way that's
00:15:47
Speaker
a bit higher than sort of more ethical than a regular criticism. But I can see your point about how it doesn't quite map well to prose or something like that.
00:16:00
Speaker
I think you have this thing where, you know, you remake a visualization and maybe the goal of the original author was to communicate to some particular audience and I as the person doing the makeover say, oh, well, I would have done it this way, but the audience that I have in mind is slightly is a different audience and therefore my makeover maybe in some cases doesn't make sense to do it my way because I'm thinking about a different audience. So it's just I think an interesting
00:16:26
Speaker
responsibility that for those of us who are making things over to think about who the creator is and what you know what constraints or goals they have that we may not be aware of or that we may not share either obviously or sort of implicitly. Yeah I think the audience is a big factor and that's you know the Wall Street Journal heat map is a good example because that one you know
00:16:51
Speaker
The original biz is probably great for its audience, but if you're trying to do something more analytical, then you might want to either both have your colors aligned more perceptually or maybe even use a
00:17:03
Speaker
a normal time series line graph or something like that. Right. Right. I mean, you also make the good point that you can sometimes discover issues or even mistakes.

Story of a Washington Post Graphic Error

00:17:14
Speaker
So you mentioned that sort of how do you deal with the missing data issue and the heat map one. You also did sort of a tile grid map thing. And I think it was the summertime that Chris Ingram had done from the post on mosquitoes. And if I recall correctly, you had found an error for the Washington DC data. Is that right?
00:17:33
Speaker
Yeah, everything after Washington DC was off by one all the states mislabeled. I think he had two data sets he was merging and one had DC and the other didn't and ended up getting getting messed up. Right. So that was yeah, I mean, and that one when I first looked at it, it was actually because it, you know, this doesn't look right. Why is, you know, Tennessee so much different from Arkansas or
00:17:58
Speaker
Why is Alabama so much different from Georgia? And it turns out it's because Alabama was right, but Georgia was shifted. It was another state. I've mentioned that to him and he corrected it right away.
00:18:10
Speaker
This is on his Washington Post wand plug. Right. And I think part of that has to do with how you as the person doing the makeover approaches it, right? If you approach it as, you know, pointing fingers and, you know, with that sort of frowny face, critical eye, as opposed to I'm coming into it with this perspective of I'm just want to try it in this tool and I want to explore these data. I think, you know, that's where you get sort of a positive response from the original creator versus maybe a negative response you would from someone else. So.

Annoying Practices in Data Visualization

00:18:41
Speaker
So I want to talk about or ask you about whether you have any other any data this
00:18:49
Speaker
Practices or chart types that you're sort of a real absolutist about so we've got sort of like the pie chart debate obviously we have Steven few on one side saying no round things ever and Then we've got you know I don't know maybe like David McCandless on the other side Using sort of all different shapes and whatever so are there things that sort of drive you? Up the wall where you're like no no no you should never ever do X and
00:19:14
Speaker
Um, I wouldn't say on an absolute level, but there's definitely a lot of things like that. That'll make me cringe and I'll avoid doing them personally. But, um, I sort of, I used to be a little more absolute about things, but I've come to realize, you know, sometimes you're, you know, the, actually a low precision graph is not that bad because your data is low precision or the message is not that critical. It's just a vaguely informative graph. Anyway, the data is, I mean,
00:19:42
Speaker
And like I've mentioned in that Cleveland research about the ranking of the attributes, I mean, the best, you know, even though position was best at 3% error, the whatever color intensity or the last one was only like 15% error. So it's not like it's, you know, they're that much different. Like one is 50% error or something. For instance, I used to really cringe when I would see pie charts in a financial report or something like that.
00:20:13
Speaker
Money's important, I would think. They should be using something more accurate. But then I realized people aren't making decisions based on that pie chart. That's just sort of a little, by the way, here's the vague idea of how our assets are divided or something like that. But if you really care, you're going to need to dive in more than that anyway. And the same is for likes of rainbow color scale or
00:20:36
Speaker
or just round things in general. Dual y-axis or things like that. Right.

Interactive and Animated Graphics

00:20:43
Speaker
Do you have any love or hate feelings or sentiments about interactivity or animation? I mean, that's a big thing. Everybody wants to be able to make, you know, the big interactive thing that they see on the New York Times or the Washington Post website. Do you have folks that you try to like talk back from the edge that they want to make big interactives or big animated things? They will really you just did a bar chart. You know, let's make the bar chart.
00:21:06
Speaker
Yeah, I usually will prefer panels of bar charts or small multiples kind of thing. I've always seen these animations recently. You've seen some about the global temperature over time. They'll have it each year sort of overlaid on top of the previous one in an animation. And I always want to stop it at the last frame so I can see all of them.
00:21:32
Speaker
They don't really support that, at least in an animated GIF. I think the animation... We have animated plots and we sort of got that when a lot of people did, after seeing Hans Rosling do it. And that's actually where I think it works best, is when it's narrated. When you've got someone guiding you along like he did, saying,
00:21:58
Speaker
You know, here's the beginning here. We've got these countries, you know, here and these other countries over there and now watch them merge together as I go forward kind of thing. But, you know, so when you've got that, the animation really, you know, supports, you know, supports the message very well that I'm just not.
00:22:16
Speaker
I'd usually just show me the whole thing. Yeah. I think you hit on something there because the thing about the Rosling piece is what is so important about it is his narration of the graphs. But I think for most people, we're creating graphs that are sort of like the one chart. Like here's the chart that goes, here's figure one. And so we don't have sort of that ability to do that narration without maybe having the small multiples or having the panels.
00:22:45
Speaker
So I think it's a little tricky as it were to sort of do the narration when you're in a static world where you don't have someone who's sort of leading you. You don't see what the end point is because they're going to walk you through it to that end point as opposed to here's a page of the graphs and you can sort of see them all at one shot. Right. I mean, he's got the excitement in his voice and all that drama. And if you take that away and just look at the bubbles moving, you're not going to
00:23:12
Speaker
Not going to get his message. Yeah. Yeah, I think that's right. Which I think leads into a whole other discussion about stories and data and narration that is a topic close to my heart these days.

Conclusion and Farewell

00:23:24
Speaker
But we'll pause on that and pick that up at another time. Okay. Zan, I want to thank you for coming on the show, also because now, at least I know how to pronounce your first name. So hopefully others are, you know,
00:23:37
Speaker
That's one of the great things about your podcast is I get to find out how to pronounce everyone's name. I never would have guessed Elaine Harrison, for instance. Yeah, that's all I'm trying to do, just bring people together so they understand how to pronounce each other's names. So that's great. Zann, thanks so much for coming on the show. I appreciate you taking the time. Thanks for having me. Pleasure.
00:24:01
Speaker
And thanks to everyone for tuning into this week's episodes. I will, of course, post all of Zant's stuff, his makeovers and his videos to the show notes. And so if you have questions or comments or suggestions about this episode or any other episode for the show, please do get in touch on the website or on Twitter. So until next time, this has been the Policy Viz Podcast. Thanks so much for listening.
00:24:33
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:24:52
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.