Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #201: Leland Wilkinson image

Episode #201: Leland Wilkinson

S8 E201 ยท The PolicyViz Podcast
Avatar
329 Plays3 years ago

Leland Wilkinson is Chief Scientist at H2O and Adjunct Professor of Computer Science at the University of Illinois Chicago visits The PolicyViz Podcast to discuss The Grammar of Graphics, coding, and much much more.

The post Episode #201: Leland Wilkinson appeared first on PolicyViz.

Recommended
Transcript

Introduction to Season 8

00:00:12
Speaker
Welcome back to the Policy This podcast. I'm your host, John Schwabisch. Welcome to season eight and episode 201 of the podcast. That's right. 201 episodes of this podcast. Thanks so much for tuning in. I hope you're learning a lot about how to communicate your data and how to visualize your data and all the things that are required to be an effective data communicator.

New Video Content and Patreon Tiers

00:00:36
Speaker
So on this
00:00:38
Speaker
new season of the show. I'm really excited to bring you some great, fantastic guests. I have a whole lineup set up going through the fall of 2021 and into 2022. But before we get into this week's episode, just a few updates about the show. I'm still going to bring you great sound quality.
00:00:54
Speaker
great great transcription of the show. I've also started to add a little bit of video content so for the first at least few episodes of the season you'll be able to go over to YouTube if you want and watch the actual video recording of my interviews with the guests. So if you're interested in watching the interview in addition to just listening to it head over to the YouTube channel. It'll be a little bit different because of some of the audio editing versus the video editing
00:01:19
Speaker
But I think you will enjoy being able to see some of the faces of the folks that I talk with on the podcast. I've also set up some new tiers on the Patreon page. If you're interested in supporting the show, you can head over there. And if you do become a patron, you will have the opportunity to ask questions to my guests. So every month I'm going to not just give you a sneak peek into who's going to be on the show, but I'll give you the opportunity to send me questions that I will ask to those guests.
00:01:49
Speaker
if you're interested in providing a one-time payment to help support the show. I have a PayPal link set up on the show, so I would love for your support to help bring you the show every other week with great guests in the world of data, data visualization, presentation skills, and more.

Featuring Lee Wilkinson on Data Visualization

00:02:07
Speaker
Now, to kick off season eight, I can't think of anyone better to have Leland Wilkinson on the show. If you are in the field of data visualization, you know that name,
00:02:17
Speaker
Father of the grammar of graphics. If you're an R user, you of course know what the grammar of graphics are. It underlays the entire ggplot2 system. I sit down with Leland and we talk about the history of ggplot, his work in the field, his perspective on data visualization tools. Of course, being the author of one of the early tools in the field, he has a lot to say, a lot to think about that. I met Leland early on in 2021.
00:02:44
Speaker
as part of a panel for a federal agency who is seeking to improve the way that they were visualizing their data. And so Leland and I got to know each other during that experience. And so I was so grateful that he would take time out of his schedule to come and chat with me about his work. So I'm gonna turn it over to that discussion. Thanks so much for again, for tuning back into the policy of his podcast. And here is my discussion with Lee Wilkinson.
00:03:11
Speaker
Hi, Lee. Good to see you. How are things? Hi. Good, John. Things are going well. Great. Great to see you again. We talked at length, I think, earlier in the year on this panel that we were on. So it's good to see you again. So I'm excited to have you on the show. I thought we would start with what I think is kind of the obvious question, which is on the grammar of graphics.
00:03:35
Speaker
Can you talk about the origin of it? How you developed it? Are people missing something about it? How does it work across these multiple fields? It's applied in statistics, in computer science, in mathematics. How do you think about all this and its evolution over the last decades?

Wilkinson's Journey with Grammar of Graphics

00:03:53
Speaker
Well, yeah, there are a lot of interesting questions associated with it. And by the way, I can't resist now seeing in the bookshelf up there your book, giving it a plug. I've been plugging it to friends because I do think it is the best book on use of visualization for business applications. It's chock full of
00:04:17
Speaker
research and examples and so on. So it's just a beautiful piece. And the publisher did a great job. Columbia University Press, I think. Yeah. Great reproduction job. So it's nice to see. And it is one of the things I'm proudest about for the grammar graphics that Springer handled the reproduction. It was the first four color book published period using
00:04:45
Speaker
not period, but published using four color PDFs that software that would take PDFs and generate the actual print plates. Right. So I never let the printer or the editors touch the manuscript, the plates, and that I think accounted for why it's got so few. It's got two bugs that I know of, but that's about it.
00:05:12
Speaker
So yeah, it was funny. When I went to SPSS, after selling them SysDap, which was the stats software company that I created, I had already written a graphics package that was pretty powerful. It's not widely known now, but it was called Cygraph and it was part of SysDap.
00:05:37
Speaker
And my goal there was to create every possible scientific, at least statistical, graphic I had ever seen. Because I was teaching a course, a graduate course in visualization, statistical graphics.
00:05:55
Speaker
And I thought, well, gee, none of the software actually can do that. Right. Yeah, they do a great job of pie charts, bar charts, whatever, but they don't have things like, you know, bi plots or, you know, even parallel coordinates were very rare back in the 1980s, which is when that came out.
00:06:17
Speaker
So I sold it to SPSS. I joined them. I met a lot of wonderful people, but basically, without going into details, certain managers assiduously opposed all my efforts. I reported to the president and nobody reported to me.
00:06:35
Speaker
And he was greatly encouraging. He was a huge fan of the Systec graphics. We toured Europe, showing them off in SPSS offices and so on. But you know how it happens in corporations, even relatively smaller ones like SPSS, that when you get deeper than the C-suite, you suddenly find these entrenched bureaucrats. Yeah.
00:06:59
Speaker
who you got to deal with. Well, so anyway, I think I'm going a little bit too much into dirty laundry. I will simply say that I finally had sort of a blow up where I said in a meeting after again, I had been explaining how I architected the graphics in SISTAD and got more opposition. I said very well,
00:07:23
Speaker
I'm going to write a book so the world can see what I'm trying to tell you here. And you don't have to use it. I don't care. But I'm just going to do it. And the president supported me through the whole thing. And it really became a wonderful subversive subgroup inside of SPSS. And as far as I was concerned, the right people supported me. And I don't want to name them, but there were plenty of right people, especially the president.
00:07:53
Speaker
So I sat down and I was authorized to build a team. I built a team of seven people who, you know, not surprisingly were considered privileged characters inside the company because we didn't have to follow all the regulations about, you know, pair programming and, you know, what language we use Java because it was hot at that time. And off we went. So
00:08:20
Speaker
I then started to look at the design and we sort of tabulated all these possible charts and I suddenly realized, my god, there's an algebra under these charts. If you want to create a chart,
00:08:38
Speaker
Pay attention to the algebra. Don't pay attention to the chart type because that doesn't tell you anything about how to hook up an element, so to speak, or what Hadley Wickham calls a geom and other people. How do you hook that up to a column of the data or a subset of rows?
00:09:00
Speaker
We began to code this in Java and achieved some real success along these lines. At which point, I contacted Springer. I had a favorite editor, John Kimmel, who just retired. But he was an editor for all the great Bell Labs people and later AT&T Labs and extremely supportive gentlemen and gave me free reign.
00:09:28
Speaker
even though ultimately he lined up reviewers for the manuscript. Now, the thing is, I didn't set out to write a computer program, even though that's exactly what we did in Java. Because as you know, a computer program that implements an algorithm is in effect formalizing an argument you would otherwise make in ordinary scientific discourse.
00:09:57
Speaker
And so I regarded the computer program as a way of checking my ideas. And as we proceeded, it began to occur to me that my motivation in doing all of this, despite the fact that it would be very, very nice in the end to have something to give SPSS that actually worked,
00:10:20
Speaker
My motivation was go down as deep as possible to understand the meaning of graphics. And now we get into a slippery term because the meaning of visualization is a favorite term.
00:10:37
Speaker
especially among certain people who regard themselves as authorities on visualization without going into any quantitative or other aspects of the thing. And I believe, and I won't mention his name, but maybe the most
00:10:54
Speaker
famous person who's an example of that is someone both you and I actually believe has contributed virtually nothing to the understanding of graphics but by contrast has made magnificent

Philosophy and Technical Breakthroughs in Graphics

00:11:10
Speaker
contributions to the visual design of graphics and deserves credit for that. But a lot of the book
00:11:20
Speaker
over the years has been to try to argue against those, particularly, you know, in the extreme postmodernist ideas of the meaning of visualization, but other varieties like, oh, here's a taxonomy of charts. And that'll tell you something about the meaning of visualization. And my reply, and I made it pretty clear in the book, that is absolutely
00:11:48
Speaker
wrong. In other words, several people have done taxonomies of charts. I have pointed out examples where, in fact, they're not only incorrect, they're dangerous. Along the lines of that wonderful paper you may remember, was it Dijkstra or GoTo is harmful in computer languages.
00:12:13
Speaker
Well, from my point of view, chart taxonomies are truly harmful to the understanding of the meaning of graphics. So now let's take this idea that I evolved over writing that book.
00:12:29
Speaker
to is it extreme? Because I believe there's a lot of work to be done today in exploiting the meaning of graphics from the point of view of a particular system to create software that does remarkable things. And I'll only point out now, and we can talk more about it later, but
00:12:50
Speaker
I've talked with, for example, Jeff Hare, one of my heroes, I'll say, in this field, because he's very adept at formalisms and languages and so on, and I've said,
00:13:02
Speaker
Suppose, as I outlined in the book, you want to develop a program that reads the newspaper, finds graphs, and then translates the graphs into a spreadsheet. And then does a statistical analysis, an alternative analysis. And yeah, that sounds like cheap thrills, right? A cheesy little program. Well, imagine doing that for the entire corpus of the New York Times going back to the early 19th century.
00:13:32
Speaker
There are many, many thousands of graphics that can be analyzed. And of course, that doesn't begin to cover scientific graphs from journals and so on. So at this point, we developed the algebra. And it turns out I got the ideas for the algebra or the structure.
00:13:53
Speaker
from work that was done mainly in statistics by people like John Nelder in England, people like John Chambers at Bell Labs, and also a project at the Bureau of Labor Statistics called TPL, Tables Production Language, and
00:14:16
Speaker
I was able to go deeper into that because at SPSS, I was able to hire for our little team, Dan Rope, who came straight from the Bureau of Labor Statistics and who had already written Java software to generate visualizations. It was a first effort
00:14:37
Speaker
but then was extremely creative, smart, and so on. And what happened was as the algebra evolved, and it turns out I only needed three operators to do what I regard as the entire corpus of graphics. And by the way, after the book came out, I anticipated somebody somewhere adding two, three, four, five more operators to this language, and you don't need to.
00:15:07
Speaker
I had other ones myself, but after months of work, almost actually more than a year, I would throw out certain operators because I realized they were redundant, just didn't need them. I remember certain highlights that we just were jumping up and down when this happened, some of them.
00:15:32
Speaker
Namely, we developed all sorts of scatterplants, even pie charts, bar charts, et cetera, using the algebra and a renderer which incorporated a number of graphical elements and features, topological types of features. And then we started to get stuck with certain graphics. And one of them was a scatterplant matrix.
00:16:01
Speaker
Now, if you think about a scatterplot matrix, you could write one pretty trivially by making two iterators, embed one iterator inside another and just iterate all the way through all the possible subplots. Right.
00:16:18
Speaker
And then all you do is position each of the subplots in the right place. That's trivial. And you've got a scatter plot matrix. But that doesn't fit the algebra that I designed. It's not incompatible, but it's just irrelevant to that algebra. And I was struggling to find out a way. And Dan Rope and I sat there.
00:16:41
Speaker
I'd say we worked for maybe two months on that problem. Wow. And all of a sudden, I looked at a symmetric scatter plot matrix and I said, oh my God, this thing is a classic quadratic form.
00:16:57
Speaker
If you think about it in matrix terms, it's x transpose x. And therefore, it's a product term. And I bet if Dan codes the product properly,
00:17:13
Speaker
And I'll mention basically how we did that. But if he codes it properly, out should pop a scatter plot matrix. And by God, it did. I mean, we just were blown away without doing anything about positioning the plots or specifying how big or small they were.
00:17:35
Speaker
We made a scatterplot matrix and the examples in the book, which basically says list of variables is x columns asterisk, which is the product term x. And you type that into the program and out pops a scatterplot matrix. Now, the other benefit of that is you then embed in that expression because the algebra is only one seventh of the grammar of graphics.
00:18:05
Speaker
You embed in that expression what I called elements, but I kind of like geoms even better as a descriptive term, so that then you can say, good, I've got a product of frames. In this case, it's a symmetric product, or although it's trivial to make it x asterisk y and do a rectangular bomb. Now just go ahead and grab anything you want, line,
00:18:35
Speaker
area, interval, point, and you just put it in there and it plots in the proper place in the scatter plot matrix. So now you have a system where if you go to the fairly state-of-the-art systems, at least at the time, things like sigma plot or SPSS graphics or SAS graph,
00:18:59
Speaker
you had a limited number of things you could put in the scatter plot matrix, in some cases mainly just a set of points. But I was able to show numerous examples of a splom with points and then
00:19:16
Speaker
joint confidence intervals on those points, either using ellipses from a normal assumption or using kernels from a non-parametric approach. So there were other breakthroughs like that where I thought, don't
00:19:35
Speaker
put, as some people have written, a bag on the side. There was a wonderful book on the development of the Wang computer system back in the days where it was the world's first word processor. And they used to joke, the people on that team at Wang, joke about how the IBM
00:19:56
Speaker
programmers to do the same word processing type of function would hang a bag of shit on the side of the programmer, make an exception. And I refused to make any special cases in the book. Yeah. So then that took care of the algebra. And of course, then there are six other major objects. Now, here was the trick.
00:20:22
Speaker
which is vastly unappreciated by readers of the book, including the people who actually read it. Boy, I tell you, just as an aside, I had people run up to me at Viz Week. When I first went there in 1999,
00:20:42
Speaker
And they they didn't know me, but they said, oh, my God, you wrote the grammar graphics. I just loved it. It's so fantastic. And it's this and that. And then they said something which revealed to me that they hadn't read the book at all. They opened it up and looked at all the pretty pictures. Right. And one of the impacts of the book itself is quite conspicuous, namely.
00:21:06
Speaker
Because of the seminar I taught a few years earlier, I included examples of graphs nobody had ever heard of, at least in the viz community, like phase plane graphs from physics or, you know, biplots from statistics and so on. And lo and behold, I've noticed with some amusement that over the subsequent years, people would do papers on phase plane plots. Right. Yeah.
00:21:32
Speaker
Or when I did Skagnostics, which is, I can talk about it later, but it was a particular type of structure to impose on a scatterplot to analyze shape. And then I started to see papers coming out, and by the way,
00:21:51
Speaker
introduced that in the grammar of graphics, but only in a single page or two. There were people who started to do things like, oh, pixnostics, parnostics. In other words, they grabbed any kind of element they could, used in visualization, and then apply the same sort of characteristics. But that sort of misses the point that Skagnostics took his idea
00:22:21
Speaker
was about point sets in high dimensional spaces. That is actually 2D subspaces of high dimensional spaces. And you couldn't just apply the same ideas to willy-nilly anything. Well, all right. Here's what I'm leading to.

Structuring Graphics: Seven Fundamental Steps

00:22:41
Speaker
It was an equal amount of intense work
00:22:45
Speaker
to develop these seven steps that underlie every graphic. And when I say every graphic, I mean every graphic. You cannot draw what I call a well-formed
00:23:02
Speaker
statistical graphic without implementing code to do every one of those seven steps. And they're fairly self evident. Many of them like data source is an object that needs to input data that has an arbitrary organization and we don't care what that organization is. But the output is what we now call a frame. But at that time I called a table.
00:23:32
Speaker
And you have to have a table to do this class of graphics. Now that immediately in the grammar of graphics, that immediately tells you there are some graphs that are not suited to the grammar of graphics. And those would be certain kinds that cannot be organized in the table.
00:23:53
Speaker
But that doesn't, by the way, rule out node edge graphs, for example, which is a huge class, because edge lists are easily organized as a table. So now we get through data, and we then go through the other things, which involve things like geometry, which is all those different forms or geoms.
00:24:22
Speaker
that are needed to draw these graphs. Now, there was another thing. Boy, I hope I'm going into some detail here. That's fine. But it will give the interested reader an understanding of how the thought process developed, because things like GM, you'd immediately think, oh, OK, that's easy. We're going to go draw.
00:24:50
Speaker
bars, lines, pie slices, et cetera. Oh, wait a minute. Pie slices are nothing but rectangular elements that have been put through a polar transformation. Nobody ever said that for some incredible reason that I've missed. But so in fact, I made this big collection of geoms and then realized I could get rid of most of them.
00:25:21
Speaker
And all you need is about, you know, whatever, 10 of them to draw anything that appears in, you know, journals or newspapers or whatever. Similar things happened in other areas like aesthetics. Now, aesthetics, I drew heavily on Jacques Bertin's work. And by the way,
00:25:46
Speaker
Probably not surprising to you, but almost the only people I found of any intellectual use in this field, this is getting to be really arrogant here, are Jacques Bertin, who profoundly wrestled with these problems of geometric forms.
00:26:08
Speaker
Jock McKinley, who is now at Tableau, Jock did an absolutely brilliant dissertation on a program that would draw a graph based on some of Bertin's theory and other ideas. And the first edition to the book, I didn't recognize or know about Jock's research, and I credited him in the second volume.
00:26:35
Speaker
because, you know, the stuff was really good. Well, I won't mention in passing all the others I consider hugely significant. I mentioned them in the book, obviously the entire group at Bell Labs. And I would claim that almost everything I've ever seen at Viz Week
00:26:54
Speaker
involving interactive data analysis can be traced back to the Bell Labs group. There's nothing new under the sun there. Okay, well, let me get back to the point here. When you go through each of those other classes, those seven fundamental classes,
00:27:16
Speaker
A similar simplification takes place. Statistics, of course, was a piece of cake because I'm a statistician. And so I drew heavily on Bill Cleveland's work and John Chambers and others to understand how you create statistical functions that can inject statistical summaries into these graphs.
00:27:44
Speaker
Now, by the way, I'm sorry if I wander here, but I should mention that a major assumption of the book, I'd almost call it a breakthrough, except it's so trivial, is that every graph is a function.
00:28:00
Speaker
Now that's high school algebra, at least if you learned it after the new math and not back in the 1950s. But the fact is, when you understand that every single graph, and I'm talking about everyone in the book,
00:28:17
Speaker
And everyone, anybody draws at this week is actually a function, and so it can be expressed as a function. Suddenly again, you get this huge simplification, and that's what statistics does.
00:28:32
Speaker
it exploits just a tremendous number of statistical functions that do things like regression lines or kernel smooths and so on, and are able to inject themselves into what I call the frame that contains the actual graph. Now,
00:28:54
Speaker
You'd think we're done there with everything you need now to make the graph short of having a renderer, which is very important. You have to write one of those, but that's not hard in Java or C++ or even Python. There are plenty of renderers you can draw on.
00:29:12
Speaker
But I thought of coordinates. My God, pie charts require coordinates. Those aren't statistical functions. Those are simply, you generate, and that's chapter one in the book, you generate a bar. I'm running through a polar transformation and now you got a pie slice.
00:29:32
Speaker
All right, so now we're done with seven elements and presumably you've written all seven. And here's what people missed who read the book, who think that it's all about charts.
00:29:44
Speaker
It's a total order. Some reviewers in the book completely ignored this as well. I said it was a total order, and there is no other way to execute those seven objects except in the order I printed in the book. And that's the order we used in the software. I'll give you a quick example. The people at Tableau, for example, but other companies as well.
00:30:13
Speaker
who implemented the grammar graphics as the basis for their engine, and by and large did a really good job, didn't believe me when I said it's a total order. So when I got to Tableau,
00:30:29
Speaker
I did a document for them evaluating, since I was so-called VP of statistics, I evaluated the accuracy of the statistical routines, etc. One test I did surprised me a lot. I did a scatterplot and I fit a regression line in the scatterplot. Did a beautiful job.
00:30:53
Speaker
Now I went back because Tableau allows you to do this. And in the book, I described how to do this. I decided to log the x axis and log the y axis. Okay. Now, if you do that,
00:31:09
Speaker
the regression line that went through all the points needs to adjust to the transformation. And what happened instead was the points all worked perfectly because they executed those in the right order. But what they had done was they inserted the regression line at the end of everything
00:31:31
Speaker
And now the regression line didn't realize, wait a minute, I'm in log log space. And it flew right off to the top of the chart. And they were quite disturbed by that, justifiably. And I worked with them to help them realign the order things were executed in.
00:31:52
Speaker
So in wrapping up this architecture question, I have to say that the extreme... Well, let me put it this way. God is in the details. It's a saying that I put in the frontispiece of the book, and it's probably more true of this system than any other. Namely, if it took me 10 years, which it did, with a very talented committee,
00:32:21
Speaker
To generate software that implemented this, and you can see for each graph in the book, as I'm sure you know, there's an actual language there that needs to be executed by the interpreter that's built into the program in the proper order. If you don't get the details right, you're going to draw garbage.
00:32:44
Speaker
I distinguish garbage, which is actually ill-formed specifications. Literally like you wrote a calculator program and typed in two plus two and it came out as five. That's ill-formed and the graph is meaningless.
00:33:02
Speaker
versus weird shit that actually, and this was one of my goals, and I never quite got to it, weird things that came out of the algebra. I'll give you an example. Graham Wills, the extremely talented Brit, who actually, he got his PhD in Ireland, in Dublin, because he grew up there. But anyway,
00:33:30
Speaker
Graham was an expert in time series, but he wrote a beautiful book for Springer called, I think it was called Time Series Visualization. But one day he was playing because he used the grammar graphics program at SPSS to generate all the graphs. And one day he mistakenly put a doubly polar transformation into the specification and he got an outlandish pie chart. Yeah.
00:33:59
Speaker
Every slice of the pie was then put through another nonlinear transformation. But the point is, it was not meaningless. It was ugly. It was hard to read. But in fact, it was a faithful representation of the data. So in addition to my goal of making a program that could understand the meaning of graphs, which I still believe,
00:34:26
Speaker
I also was looking forward to an interactive program, which we developed to execute it, that would allow you to type in some arbitrary algebra expression. And you had no idea what was going to come out. And I thought, you know, if you played around with this thing long enough, you were going to invent a ton of new chart types. Sure. Right.
00:34:51
Speaker
So, I don't know if anyone has done that yet. The full language is embedded inside SPSS. And I've almost run out of time here. I'm sorry. No, it's... I'm fascinated by the whole process. And as you're talking, I'm curious...
00:35:11
Speaker
Do you think you would have developed it differently if, say, you didn't go to SPSS working with this team in a computer statistical package company? And instead, we're like, you know, you went to a university. Like, the way you describe the iteration of the theory and then the implementation in Java, I'm just curious, like, how do you think it would have been
00:35:39
Speaker
Evolved differently if you had not gone to SPSS. Well, that's a really good question. No one's asked me before
00:35:48
Speaker
It wouldn't have worked. You immediately reminded me of the thought that is probably an interaction with my own personality, which can be very oppositional at times. If you want me to do good work, go get somebody that I'm going to say, oh yeah, conventional wisdom crap, who is in power to stop me.
00:36:10
Speaker
And it goes back to an old theory in psychology, actually achievement motivation. You find high achievers who often devise methods to break all the rules, make an end run around the rules, and then produce something. And that's what happened at SPSS. So I don't want to slander SPSS. It had some tremendously smart people, well motivated, but there was a small group, as I said, that was working every day to shut us down.
00:36:40
Speaker
I have, for example, the head of marketing there, he eventually got fired, who just said, well, Lee, when do you finish? I said, never. He basically tried to rearrange the personnel, change job titles, and so on, and didn't succeed.
00:36:59
Speaker
So, yes, I taught in also a wonderful university environment where there wasn't a lot of conflict. And so, yeah, I did learn a lot of stuff there as I was teaching, but I don't know about you, but, you know, when it gets too comfortable, you stop having ideas that contradict conventional wisdom. Right, right.
00:37:24
Speaker
So yeah, that's in these different ways. Yeah. Yes and and actually Jack Noonan the president of SPS says shortly before the company got sold to IBM did something brilliant I I went to him and I said, you know, they're just not going to implement our software is we're done It's all working and everything. They won't do a single menu for it. They've refused and he said Lee
00:37:52
Speaker
I got an idea. You embed the interpreter inside SPSS and you simply do a block in the code that says begin grammar graphics, then the language and grammar graphics. Right. And I said, oh my God. And we did that and they couldn't stop us. Oh.
00:38:11
Speaker
And it turns out there are a few selective users inside the SPSS community out there internationally who are actually coding grammar graphics despite the wishes of all the UI people and whatever. It's like an Easter egg. It's like a grammar of graphics Easter egg in SPSS. It was. That is amazing.
00:38:34
Speaker
Yeah, and I have to say, without going to names for obvious reasons, that when I left SPSS, there were similar reactions elsewhere, where people simply didn't want to take the time to understand
00:38:52
Speaker
the book. But fortunately I had people like Hadley Wickham come along. And Hadley was a grad student of Die Cook, again, one of my heroes, who was behind G-Gobie, X-Gobie, and some of those wonderful programs at Bellet.
00:39:10
Speaker
Hadley read the damn book, you know, sat down and coded it properly and put it into R and it just took off. And I told him, you know, Hadley, thank you so much, because without you, this book would have gone back into the library. Right.
00:39:29
Speaker
And ironically, and I think in large measure due to Hadley, the damn sales of the book have increased every single year over the last 20 years. And I told my editor that when I first published the book. And he said, well, I've never seen that before with one of our books.
00:39:49
Speaker
And it really has because people, after they used ggplot2, they thought, well, Hadley keeps mentioning grammar graphics, bless his heart. And to me, that is the highest standard an academic can acquire is
00:40:07
Speaker
to, you know, distribute the credit backwards in history and then talk about his innovation or her innovation. That's what academics is all about. Not, oh, look, I invented this new thing and I'm not going to tell you where I got the ideas. Right, right.
00:40:23
Speaker
So you've mentioned Hadley and you've mentioned, and you mentioned Jock and Tableau. Before we go, I want to get your sense of looking now and forward in the data visualization toolkit. You know, what are your thoughts on where things are and where things

Future Trends in Data Visualization

00:40:40
Speaker
are going? I mean, clearly you're a fan of Tableau, clearly you're a fan of R, but you know, where do you see that the field of data is tools now and going forward?
00:40:51
Speaker
Yeah. Well, the good, bad, and the ugly. Yeah, right. Yeah. I mean, I was at a 20th anniversary session on grammar graphics at the statistical, one of the statistical conference. Someone asked me the obvious, so when are you going to do a third edition? I said,
00:41:08
Speaker
Never. It's a math book. Well, if I made a bug, obviously I got to fix it, but you don't add to a theorem that is designed to cover, you know, X, Y, Z, and then sort of just change it willy nilly into a new quote theory. And, you know, one of my gripes with
00:41:33
Speaker
at least the vis community version of computer science is like using words like scientific theory. I mean, give me a break. That's those aren't theories. They're interesting observations about the world. Very useful. But please don't try to dignify them by calling them a new theory of visualization. They're not. Yeah. That's why I was quite confident that when people finally read the book,
00:42:03
Speaker
you are going to see spinoffs. And I have to say, you know, the Ant Group in China, this bunch of like, I swear they're teenagers, but they've been going lickety-split, developing a JavaScript grammar or graphics program. And there, of course, is Hadley, and then the groups at Python and elsewhere who've done other stuff.
00:42:26
Speaker
Now, I've already alluded, so I won't say much more, about one of my big gripes about InfoViz or now VisWe is that they get too easily impressed with high school math. You know, my daughter is a pure mathematician, my son-in-law is a pure mathematician, and I sort of feel like the VP
00:42:51
Speaker
candidate who said, I know John Kennedy, you're no, I knew John Kennedy, you're no John Kennedy. Well, I'm sorry. I have spent enough time asking mathematicians, what do they think? How do they, you know, and this and so on. And whenever I bring up some topic where it's, you know, some info of his paper that has three pages of symbols or algebra or whatever,
00:43:18
Speaker
The pure mathematicians look and say, that's high school math. That's not mathematics. And there's nothing wrong with applied math. But don't try to impress me by using so much math that you make it hard for me to understand what you're actually saying. So I've reviewed papers for InfoViz where I looked at a particular new, quote, graph.
00:43:42
Speaker
And I said, congratulations. You've designed the first graph I've ever seen. This is really snotty. I'm sorry. I apologize. But first graph I've ever seen that made it impossible to see the three clusters in the Fisher-Anderson iris data.
00:44:01
Speaker
And you're looked at it, and by God, they put that in as an example of like, look how nifty this new graph is. Well, no, unless you can show a real...
00:44:14
Speaker
The second thing is the idea, and I want to get to the positives, but I'll just mention there's a whole drive inside the InfoViz movement to evaluate the graphs that people are inventing, and that's laudable. The only problem is
00:44:31
Speaker
There any psychologist who looks at those quote experiments knows right away. Sorry. Those aren't experiments They're not even randomized. You don't even know the basic You know experiment or effects, you know go read Robert Rosenthal and you know to learn about this and so inevitably I review some paper that says
00:44:54
Speaker
Well, our graph was preferred by 70% of the users, and the alternative graphs were preferred by 20% of the users. Really? What does preferred mean? Right, yeah. And did you do a double-blind experiment on this? Because they probably knew which one you invented, and then you asked which they preferred. Which they prefer, right.
00:45:18
Speaker
So now, having said a lot of these negative points, and I've said things that I do not appreciate that I said at some meetings, although they weren't nasty, but still, I want to say the amazing thing about the Viz community is its creativity.
00:45:38
Speaker
I mean, they just sometimes come out of left field and do remarkable stuff. And so when I consider things, you know, by, oh, Tamara Munzner and Jeff Hare and Martin Wattenberg and Mariah for another, though I didn't know her work as well until we had our committee session.
00:46:02
Speaker
I thought, damn, you're really on the right track with that stuff. And I never would have thought of it that way, so to speak, and whatever. That is the very best of the best. Yeah.
00:46:13
Speaker
But when you do impose requirements, like you have to use LaTeX, which I use for everything I ever write, and you may have only 10 pages, not including reference, and you may use this only double column for blah, blah, blah. By the time you're finished, it is possible to produce a production ready
00:46:36
Speaker
piece of crap. I mean, the tools are so, you know, so sophisticated that reviewers miss the essence of what someone's writing about. And I just submitted a paper
00:46:51
Speaker
to an ASA journal, Journal of Computational and Graphical Statistics, which has got a pretty tough reviewing schedule. And they seem to take about a year before they get you reviews back. But you know, you get pages and pages of single spaced reviews. Whereas in infoviz, I've had some reviews that consist of four sentences. And obviously, the person hadn't read the paper. Yeah.
00:47:20
Speaker
So, and some of this isn't always related to intelligence. I've had some truly brilliant people say incredibly stupid things and I don't know what it is. Well, I mean, Mariah and I actually talked about this for the interview I did with her a few months ago. Yeah.
00:47:37
Speaker
I think there's effort on some aspects of the field to open it up to more practitioners and data journalists who all have a lot to say, but then you get into this, like you said, there's this specific format and it's just so hard to use. It's like this gatekeeper situation. We need to make it this difficult.
00:48:00
Speaker
to get people in the door. And, you know, I completely agree. And that is, again, one of the logical aspects of info is and, you know, the program committee.
00:48:14
Speaker
Every year works to refine and learn from their mistakes and so on. So they need to be praised for that. It's just that whenever you make rules and get things set up, it's like a tax law, right? Somebody smart enough sitting in an office in New York City is going to figure out how to get around what you tried. That's right.
00:48:36
Speaker
Well, Lee, thanks so much.

Wilkinson's Legacy and Reflections

00:48:38
Speaker
I mean, this, this history is, is really interesting and I appreciate you taking the time and telling me all about it and the origin of grammar graphics. Really appreciate it. Well, thanks. Um, I probably have ended my career here from all the gossip I just gave you, but you know what? I'm 77 years old and I don't give a shit anymore. I'm having fun. I'm still writing and thinking and so on, but at a certain point I don't depend on getting tenure or whatever by getting
00:49:06
Speaker
the citation counts are each index. I don't care. But I've met so many, you know, just brilliant people. And I was delighted to meet you on that committee. You know, who really are passionate about visualization, and it's going to be a great future. Yeah. On that note, we'll close up. Thanks so much, Lee. I appreciate it. Terrific. Yep. Thanks. Thanks. Bye bye.
00:49:34
Speaker
And thanks to everyone for tuning into this week's episode of the show. I hope you enjoyed that interview with Lee. I've put all the links to the things we've talked about in the show notes page, so go check them out. If you would like to be a financial supporter of the show, please head over to my Patreon page or to make a one-time donation, head over to PayPal. So until next time, this has been the Policy Vis podcast. Thanks so much for listening.
00:49:58
Speaker
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs. Audio editing is provided by Ken Skaggs. Design and promotion is created with assistance from Sharon Satsuki-Ramirez. And each episode is transcribed by Jenny Transcription Services. If you'd like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, YouTube, or wherever you get your podcasts.
00:50:20
Speaker
The PolicyViz podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our PayPal page or our Patreon page at patreon.com slash PolicyViz.