Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Data Sketches: Nadieh Bremer & Shirley Wu image

Data Sketches: Nadieh Bremer & Shirley Wu

S7 E195 ยท The PolicyViz Podcast
Avatar
223 Plays4 years ago

Data Sketches authors Shirley Wu and Nadieh Bremer visit the PolicyViz Podcast to talk about their work, their process, and the future of dataviz.

The post Episode #195: Nadieh Bremer & Shirley Wu appeared first on PolicyViz.

Recommended
Transcript

Introduction to 'Data Sketches' and Authors

00:00:13
Speaker
Welcome back to the Policy Vis Podcast. I'm your host, John Schwabisch. On this week's episode of the show, I am very excited to welcome Shirley Wu and Nadi Bremmer to the program. Shirley and Nadi, as you probably know, are authors of the new book, Data Sketches. It's a fantastic book looking at their process of their year-long project of creating interactive visualizations. It is one of the few data visualization books that have come out recently that is larger than my book, which I really appreciate.

Unique Podcast Perspective

00:00:42
Speaker
And so Nadi and I and Shirley talk about a variety of different things in this week's episode. So I've noted that they've already done an interview with Colin Nussbaum or Naflick on the Storytelling with Data podcast. They did an interview with Ali Torben on the Data Vids Today podcast, both dealing with slightly different issues,

Creating Data Visualizations: Process and Challenges

00:01:01
Speaker
different questions. And so I wanted to make sure that our conversation would give you a little bit of a different flavor for their work and their thinking around data and data visualization.
00:01:10
Speaker
So we do talk in depth about their process of creating data visualizations. But we also talk about their process for cleaning and extracting data. That's sort of step zero in the data visualization process. And both Nadi and Shirley talk about the processes that they use in the book Data Sketches. But we talk about that in a little more depth in this week's episode.

Announcements: Excel Videos and Clubhouse Series

00:01:31
Speaker
We also talk about the tools that they use and we talk about some of the what they are sort of hoping for, wishing for in the future of data visualization. So I think this is a really great interview, a really great discussion. I hope you'll enjoy it. And before we get there, let me just give you a couple of other things to check out on PolicyViz. So I am about to publish a few more Excel videos if you're interested in learning how to expand your use of Excel to create data visualizations.
00:01:58
Speaker
I'm also starting a new series on the new Clubhouse app, which is an audio-only app. I'm starting a new weekly series. It'll take place at Thursdays at 12 o'clock Eastern time. It's called All Charts Considered. Yes, it's playing off of the NPR show, All Things Considered. We're going to talk about all things going on with charts and data visualization. And so check out that app. If you need an invite, just
00:02:25
Speaker
send me a note, send me a DM, or you can send anyone who's on the Clubhouse app a little note to get your invitation.

Professional Backgrounds of Shirley Wu and Nadi Bremmer

00:02:33
Speaker
So I hope I'll see you on the Clubhouse app. So before we get to the show, let me tell you a little bit about my guests, Shirley Wu.
00:02:42
Speaker
is an award-winning creative focused on data-driven art and visualizations. She's worked with clients such as Google, The Guardian, Scientific America, and NBC Universal to develop custom, highly interactive data visualizations. And Nadi Bremmer is a graduated astronomer. That's right, so if you've listened to the show, you know people come
00:03:01
Speaker
from all walks of life. So Nadi is an astronomer. She's turned data scientist, turned freelancing data visualization designer. And she's worked for companies like Google, UNESCO, The New York Times, and Sony Music. And both Shirley and Nadi create amazing visualizations, amazing visualizations that are both print and online, as you'll hear about in this week's episode of the show. So I hope you will enjoy this week's episode of the Policy Vis podcast. And here's my chat with Nadi and Shirley.

Central Theme: Visualization Processes in 'Data Sketches'

00:03:32
Speaker
Hey Shirley, hey Nadi, great to see you. How are you both doing? Welcome to the show. Ooh, hello. Thank you so much for having us. Yes, thank you.
00:03:42
Speaker
I am very excited to chat with you. Congrats on the new book. It is, first off, okay, so first it's a beautiful book. It's amazing. And also it's larger than my book. So I really appreciate the fact that it just like sort of dwarfs my book on the bookshelf. So I appreciate that too. But congrats, it's fantastic.
00:04:06
Speaker
So we have a lot to talk about and I'll just, I'll just preface this whole conversation for folks who are listening that you've done a couple of other podcasts, interviews with folks in the database field with Cole Naflic over at storytelling with data and Ali Torben over at database today. And I'll put links to those two shows. We'll try to avoid like rehashing the same topics. So I have a couple of new questions for you and we'll see, we'll see what we can talk about and cover.

Data Collection and Cleaning: Unique Datasets

00:04:32
Speaker
I want to start with.
00:04:33
Speaker
I think what's sort of the central tenet or theme of the book, which is the process by which you both create your visualizations. And before we get to the actual like the visualization creation process, I want to start with the data part, sort of step zero of this whole process. And maybe I'll throw this over to Nadi to start, because I know in some of the previous other interviews I've listened to, you've talked about collecting some of your own data, and I know you come from this astronomy background, so you've worked with lots of data. So can you talk a little bit about
00:05:03
Speaker
you know, how you think about cleaning data, whether you should be visualizing the data, you know, whether it's objective or not, like all these huge questions about data. If you could answer those in 30 seconds and give everybody the answer, that would be great. Right, right. Sorry, you know, I don't have the magic answer to that. I can, you know, it's always the, it depends. That would be my 30 second answer.
00:05:26
Speaker
And yeah, so four data sketches where we basically, Shirley and I went through 12 different topics and we both created our visualizations based around the sort of a singular topic word like books or nostalgia or fearless.
00:05:42
Speaker
And because we wanted to create things that were fun, we were often going into directions that were less sort of not the government type data, but things like Hamilton or Dragon Ball Z or SF MoMA. And that meant that we had to collect our own data and do it manually. But also throughout that process, for example, there is a topic where we had the Olympics.
00:06:08
Speaker
And I had this dataset from the Guardian where they gathered all of the medal winners for every Olympic game that has ever been happened since the very first one in 1896. And while I was working through that, you feel that the Guardian is a very respectable source, but even there in such a large dataset with thousands and thousands of rows, things can go wrong. So at some point,
00:06:33
Speaker
I noticed when I made my first visualization that some of the medals were missing and then I felt like, oh wait, of course I need to take a step back and actually check this dataset to see if things make sense. But I don't want to manually check every single value that would have kind of defeat the purpose of, you know, relying on a, on a dataset, but also it was a personal project. So there's only so much time.
00:06:57
Speaker
So in these cases, and in general, I like to find proxies. So I like to think about adding total values. Like if I add up all the values from all of my separate observations, does the total make sense in a way? Or if we are talking about percentages, should the total add up to 100% and if yes, does it actually sort of get there?
00:07:15
Speaker
And for the Olympics piece, one of the things that I thought about was, well, I have all of these separate medals, and if I look only at the gold medalists, if I add them up per Olympic edition, do they add up to the total number of events that occurred during each Olympic edition?

Verification and Trust in Data Analysis

00:07:30
Speaker
Because that should be like a one-on-one thing.
00:07:32
Speaker
So on Wikipedia, I could find the number of events that occurred for each of these and I compared that to the number of gold medals that I had. And then I found some really interesting reasons why either there was a difference in gold medals, for example, a wrestling match that lasted for more than nine hours after which they felt like you both get silver.
00:07:52
Speaker
Although I feel that would have been like a gold effort. Yeah, but another thing was that in that particular data set, for a few of the editions, the horses were also included as having one gold, which was kind of interesting to see like Princess having being a woman winning gold in the Olympics and Lady Mirka and these kinds of things. That was kind of funny.
00:08:18
Speaker
But in the end, I felt like they had they had a different like a podium like the podium was bigger so the horse can get up there. I don't. Yeah, I don't actually. I don't actually know. Yeah, that would have been fun, though. Yeah.
00:08:34
Speaker
Fixing those kinds of issues and then really understanding that, okay, so now all of my gold medals do add up. I had so much more trust in the dataset. So even if you have datasets from wherever, it's always good to check to see if it kind of makes sense and don't assume from the very start that your dataset is correct because there's always something weird and odd going on with datasets found online.
00:09:00
Speaker
Absolutely. Especially, yeah. Especially online. Yeah, especially online. So Shirley, I want to turn the question a little bit, pivot it a little bit for you. So when you are working with clients, so I'll sort of jam these together, but when you're working with clients or you're just looking at stuff, visualizations out in the world, are you
00:09:21
Speaker
You know, for the client work, are you making sure that you're going through the data? Do you make them go through the data? Like do you, is there a quality control that you do with them? And then when you're looking at stuff on out there, are you thinking about, you know, does it seem like the data that they're showing me like make sense? Like you, like, does that enter your, your thought process? Like I'm looking at this great visualization, but did the person do these checks that Nadi did? Like, how do you, how do you think about that as you're going through your, your day to day?
00:09:47
Speaker
Yeah, this is a really great question. And actually, I don't think it comes as naturally as it does for you and Nadi. And this is something that is so interesting because my background is computer science and business. So I didn't come from a data background. I kind of came into this from a coding. And I didn't even know what database was. And so for years, I would
00:10:12
Speaker
And this is one of the big things I learned from DataSketches and from Naughty, which is like for years I made DataViz not knowing how important it is to validate and verify the data. Because for me, I'm like, oh, this is self-expression.
00:10:40
Speaker
I'm like, I'm just putting pretty things onto the screen. And like, you know, I, when I first started, I didn't even care if like, other people could understand what I was trying to show as long as I had the fun coding it. And it wasn't until I started reading
00:10:55
Speaker
And I don't think I really fully grasped what it meant to make data visualizations until I started DataSketches and I started reading Naughty's data sections and reading through one of the first things I read was about how she validated her data. And I'm like,
00:11:13
Speaker
And I was like, whoa, this is important. And then as you know, I started to realize how important it was. I became more and more aware of it. I still don't think I'm very good at it because I don't think it comes to me naturally still. That's why for me as one that like haven't even across all the years haven't quite developed the intuition. That's why for me, it's so important to work with clients
00:11:41
Speaker
If the data set itself and the topic itself is extremely serious to work with clients that are domain experts, or I won't touch a topic or a data set that's sensitive, unless I can guarantee that I'm working with the domain expert to make sure that I'm presenting it correctly. Because I know that I still don't have the best intuition for verifying data.
00:12:10
Speaker
And that's why I think even when I see something, um, I'm only now just starting to develop the like Spidey sense of like, wait, this doesn't, this dataset doesn't, I think it's only in the last two years that I've like started to like question database and be like, this data source doesn't make sense or it feels misleading. Um, and so that's also why in my personal projects, uh, if I just try my best to do like pop culture things that can't affect
00:12:38
Speaker
anyone if I accidentally like, you know, don't verify all the data correctly. Right. If I have the script from Lord of the Rings, like if I said Gimli said something that Legolas was supposed to say like, okay, but when you're like showing COVID data, like that's a that's a serious thing, right? Oh, yeah. Yeah.
00:12:57
Speaker
So what are those without, you know, obviously violating anything that you can't say, but what are those conversations like? And I think you can both speak to this, but what are those conversations like when you're talking to clients about their data? Is it you really asking them to dive in and demonstrate things to you? Well, like when you work with the data and you see something weird, you go back to them and say, Hey, can you explain why this thing is

Collaborating with Clients for Data Accuracy

00:13:20
Speaker
over here? So just give us a flavor of what that's like. And maybe surely you can start the naughty. I'm sure you have similar conversations with the folks you work with.
00:13:27
Speaker
Yeah, I don't think I have, I can remember something off the top of my head, like a specific incident or something, but I do try my best to go through the data myself and then try and ask them questions. But I do, guiltily, assume that the data, if I'm working with a client, I do assume that the data I have is a data set I could trust. Right.
00:13:51
Speaker
There's only so far you can go, right? I mean, at some point, you have to like, the client's giving you the data, like, you sort of have to assume that they've, they've collected it objectively, right? Yeah. And I think it's, for me, it's very much about making sure that if there is any sort of, you know, inconsistencies or biases, and that it gets really laid out in the methodology. I think that's where I go to the most. I think I don't,
00:14:21
Speaker
And I'm not saying that this is a good thing. I think it just like tends to be because I think data, the data side is my weakest. I think I tend to be like, okay, so if we can't fix this in any way in terms of the data collection side, then I want to do my best to either point that out in the visualization itself or in the methodology. Cool. Nadi, you have any stories or experience or thoughts on how you sort of handle this data issue?
00:14:49
Speaker
Yeah, so I like working with these big diverse data sets. And so I think for me, maybe more than half of my client work involves data that has things. Sometimes it's errors, sometimes it's things that they might have thought were supposed to be interpreted in a certain way, but then it appears that it's more subtle than that.
00:15:11
Speaker
And I am always very sort of open and blunt about that. It's like, Hey, I'm finding this in the data. I'm, uh, I thought it should be this, but now I'm seeing this, how should I interpret that? Or it's, it's always a very much of a, I don't understand this, please explain, um, kind of questions. And I, I think it's because, uh, like I come from science, but I have these, I always write really long emails to my clients, especially the first one after I, I'm like going through the data. It's like really long and I.
00:15:40
Speaker
I always try to give lots of examples. Usually when one is wrong, I'll try and find more of those specific cases where I'm seeing the same thing going wrong, or at least give very specific screenshots and examples why I don't understand it. And for now,
00:16:01
Speaker
Every client is always responded in a very normal human kind of way where they either explain it or they go and dig deeper or they ask somebody else who is even more closer to the data and they i've had recently i actually had a client who.
00:16:17
Speaker
For as long as he thought that a certain variable called I was the index that connected everything and it wasn't. And so I was actually going through the data and at some point things started making, didn't make sense anymore because these were basically stations on the world. So they had fixed positions, but when I started digging deeper, it appeared that these positions could

Documenting Methodology in Visualization Projects

00:16:40
Speaker
just suddenly swap, you know, October 23rd, it just, just swapped to the, the other side of the, just moved.
00:16:46
Speaker
And it appeared that the index wasn't actually fixed. Like if they did a certain data update in their system, the index indices all got reassigned and there was something else that was actually fixed. And he was like, oh, I never knew that. That was, that was, I should have. You saved the day.
00:17:04
Speaker
We saved the day on that. Yeah. And, and, and some locations were in the data set twice. Uh, and that also had to be very quick reason that he also, he, that he also didn't realize, but they're like, this was a data set of several, like tens of millions of data points. It was so big that you, it's hard to, uh, understand every part at exactly. Um, so these are actually, usually the clients are, are kind of okay with me finding quirks so they can actually fix it. Yeah.
00:17:33
Speaker
Um, surely you mentioned the methodology. So for all of your projects or not, well, I guess my question is, do you write up a methodology document or paragraph or thing that's either internal or external or in the vis or outside the visit? Do you do that for, is that like a thing that you try to do for all of your projects?
00:17:53
Speaker
Ooh, that's also, so not all of them. I think I do them, I'm more likely to do them the more serious a topic is and the more I want people to know about all of the considerations that we put into it and all of the places that maybe we
00:18:13
Speaker
you know, didn't have the data or we had to make different assumptions. And so I remember when I made the pandemic game last year, the person I was working with and collaborating with Stephen, he wrote up this huge document methodology that like kind of explained every single thing that went into it. I remember when we worked with the Guardian,
00:18:37
Speaker
like Nadia wrote up like all of the methodology and assumptions. I think it's like I write them when I want to make sure that like I communicate all of the shortcomings across, but I don't do them for like, let's say when I made Hamilton, like Hamilton is like, this is how I, you know,
00:19:01
Speaker
And so I don't do them all the time. The answer is it depends on. Yeah. Yeah. It depends. Yeah. Yeah. Do you feel like maybe I'll shift to Nadi. Do you feel like having that methodology or sources section or whatever it may be paragraph or document, do you feel like that helps users or readers have more trust in the work that you're doing because you're so transparent about it?
00:19:32
Speaker
Oh, yeah, at least I think that way if if I'm I would be the reader and I could read a methodology, I would definitely I think that would definitely increase my trust level if I can sort of follow along sort of not maybe not understand every step but it's better if I understand more but you get a feeling you get a you kind of understand the logic that went into doing certain steps and then you understand how they came up with these sort of final numbers. Yeah, definitely.
00:19:59
Speaker
Yeah. Um, okay. So we've talked about data. Let's talk about the actual visualization part, because I think there's probably a lot of listeners who was like, okay, let's, let's get to the like actual creation stuff. Um, now of course people could just buy the book and they could read all about it. Um, but I think this process question is, is maybe the biggest question in data is, especially for people who are maybe

Beyond Standard Charts: Creating Unique Visualizations

00:20:20
Speaker
just starting out and you know, maybe they're accustomed to making bar charts and line charts and pie charts and they want to, uh, I don't know. I don't know if the right word is evolve.
00:20:29
Speaker
grow, maybe grow, they want to expand, expand, expand, and they want to they want to get to that point where they can create some of the stuff that's that that you that you know, things like that you've created in data sketches. So like saying, tell us about your process is a super broad question. So I guess I'll try to narrow it in a little bit and ask
00:20:51
Speaker
When you are going through a dataset and you're visualizing it, how do you move away or expand away from these sort of standard graph types? And everybody knows and knows how to read, but you know, it doesn't really grab your eye for when you see the 900th bar chart.
00:21:08
Speaker
you know, on a Wednesday afternoon. So and you've both been doing this for years. So I know it's like part of your DNA now. But all right, so I'm gonna make this two part question and and whoever wants to start. So do you start with sort of these standard sort of traditional, you know, Excel drop down menu type graphs, like bar charts and line charts? And then how do you go from there to the sorts of things that you that you showcase in data sketches?
00:21:31
Speaker
I think I, at least I have like a process, I think, part of it. And so when I'm trying to understand the data myself, which I generally do through R and then I do make lots and lots of bar charts and line charts and scatter plots to sort of really kind of build up this sort of mental model of the stories that are in the data and what would be interesting to show. And once I have like, I feel like I know my direction and the story that I want to tell and thinking about how to,
00:22:01
Speaker
tell that to an audience in one chart in an engaging way. For me, I think if you're just starting out and wanting to make that step, I would definitely, at least that was my tactic, I see what other people have done. And then I go and find one that I think is just awesome. And or actually many of them that are just awesome. So I use Pinterest sports for that.
00:22:24
Speaker
and I had it on there, and then I look through my Pinterest board, and I have my dataset and my goal in mind, and I try and project that dataset into that way that that person made that specific chart. I feel like, oh yeah, maybe if I use this variable on the size of the circles in that visualization, and I use this variable on the lines to connect them in the same visualization, I think that could work.
00:22:49
Speaker
And then I might actually, if I really think that there is something there and I really think the, you know, that visualization is awesome, I might try and recreate it in that sense. And that's really how I started out when I had like very little experience that be able to sort of come up with my own things, just, you know, still like an artist or I like to call it remixing. So it's, you're inspired by a data visualization that somebody else made.
00:23:14
Speaker
But you're not copycatting it like one-on-one, but you're kind of taking things from it that you think are the reason why you actually like it so much. And sometimes when you actually project your data in their kind of visualization, you might see that it doesn't work for your particular dataset. That could happen that you need to do the process again and find it with another dataset.
00:23:36
Speaker
And then what also helps is really just this building up of the experience of doing it more and more often. So you're, you're broadening your view of the kinds of ways that data could be visualized. And then it also helps to look at things like the database catalog to, you know, see that there is more than just bar charts and line charts.
00:23:54
Speaker
And then it really comes with time. So at first I really had to do it that way. And now years later, I don't really do it that way anymore. I kind of always start from the data and the goals again, but then I just start sketching. And with the backlog of experience that is now in my mind, I can kind of, I draw from like all the things that I've seen and try and come up with that. So I guess I am still remixing and stealing like an artist, but it's now a little bit more internal in my mind. Right.
00:24:24
Speaker
Shirley, do you have such a well-defined process like that? I actually do have a process of my own, and I actually think that over the years, Nadi and I probably have converged in some ways because we've just been working together for so long, but I'll try and highlight the parts where we differ a little bit in our process. For me, the process that I've developed is really because I mentioned this before, and I guess I keep mentioning it because I'm just like,
00:24:51
Speaker
surrounded by the two of you that like, you know, has the data side really like data collection, data analysis, data cleaning all like so well. And yeah, it just like so intuitive for you. And for for me, because I
00:25:06
Speaker
again, didn't have that data background. I really struggled for years to try and figure out what is the style of analysis that makes sense for me. This is my ad hoc way that I think I want to eventually replace by just going through the books and teaching myself. But the process that I've come up for myself over the years is
00:25:33
Speaker
What I'll do is once I've finished collecting a dataset, which we've already talked about all of the considerations that go into that, and then assuming that it's been cleaned and verified, the first thing I do is I kind of look at the dataset and I kind of look at what all the attributes or what all of the columns I have are and I start listing and I'll do something
00:25:58
Speaker
I don't know if anybody else does this, but I'll list the attribute, and I'll list what type of data it is, and I'll keep doing that. And so I'll be like, this attribute is quantitative or qualitative, or it's temporal. And then once I have that, I'll look at the columns I have available, and I'll be like, oh,
00:26:22
Speaker
this makes me think of this question, or it makes me have this curiosity, or now I have this hypothesis about data. The process I have now is I'll go and I'll put the data set into Vega-Lite, I'll get an observable notebook, and that way, this is my version of Naughty's R, and then I'll start plotting it, and that's why
00:26:47
Speaker
I like to list the type of data first, because then it really helps me figure out what kind of quick charts it would lend itself well with. And I'll explore that way, and I'll try to answer all of my questions and hypotheses. And some of the questions and hypotheses, I'll be completely incorrect.
00:27:07
Speaker
I'm in that exploration. I'll find something interesting. I'll jot that down as another thing to explore. And I'll keep exploring until I find the set of things that I'm like, oh, this is the set of things that are really interesting. And I want to build the visualization around this to communicate this finding or the set of findings.
00:27:28
Speaker
And then from there, what I do is like, I completely forget all of the like quick charts that I used. And I just go like, okay, now that I have the central message, which is kind of like what Nadi was saying about her goal. Now that I have this message or goal, how do I want to communicate it
00:27:45
Speaker
how do I most effectively communicate it, but also in a really interesting way such that it grabs people's attention because, and I guess this depends on who my audience is, but for all of my personal projects, this is for a general audience, so I want something that's like fun and delightful. And from there on, I completely try to forget
00:28:08
Speaker
any charts and any visualizations of anybody's I've seen. So whereas like Nadi goes and finds like other charts that like she gets inspiration from, I tried to forget all of them. And then I'll use a Pinterest board. But what I do is I try to find a Pinterest board for like mood and colors and shapes and like, like general shapes that have nothing to do with database. And I'll see if it like sparks my imagination.
00:28:34
Speaker
And then from there on, I'm like, oh, actually, like, you know, this set of like inky dots actually makes me think of like this. And then actually that like looks kind of like a network. And then that's how I start thinking of the visualization itself. And then from there on, I'll start sketching. This is something I used to really just like doing, but
00:28:57
Speaker
because of Nadi's insistence I started doing, and now I'm appreciative of it. And I'll start sketching out my ideas and jotting them down, and then eventually I'll convert that into the visualization itself. And that's my process, and I think a lot of it is that I realize how important it is to understand the data, which is not something that I used to know.
00:29:21
Speaker
And that's why I developed that whole like process at the beginning. And then also the second part about like forgetting about all of the charts, I think it really is like a weird, it gets into my head when I feel like I've copied someone. I'm like, oh, I don't know where this comes from, but I'm like, oh, um, and I'm, I feel like this is something I'm working on getting like over, but I'm like, if I copy anyone, I'm a fake. Like I don't know.
00:29:50
Speaker
And it really gets into my head. I don't think that way of anyone else. But if I do it, I just beat myself up over it. And that's

New Visualizations and Future Trends

00:30:00
Speaker
why I think I try to look for inspiration in tangential fields. And I'll try to look for inspiration in nature or in art museums. And I think that's probably why sometimes I'll come up with things where people are like, huh.
00:30:17
Speaker
That's what I would have expected for this dataset. And I'll be like, yup, me neither. Well, it's interesting and it's a whole other discussion I think is.
00:30:28
Speaker
at what point do you need to cite someone else's visualization, right? Like, we don't cite William Playfair or Florence Nightingale every time we make those charts, but at what point? And that's a whole other discussion. I did want to ask, when you have that final visualization and it's up and you're looking at it, it doesn't resemble
00:30:49
Speaker
really anything that anybody has ever created. It's like this new visualization, which is like, Nadi, I think I have, well, I know I have one of your visualizations at the end of my book, because it's like, here's this whole library of graph types, but like, it's not finite. It's an infinite space. But when you're done, and you're looking at this thing that you've created, that's like, no one's ever done it before, you like, holy shit, I just created like a whole new thing. Like, does that, does that like occur to you? Or you're just like, no, I just, I just made a visualization. And let's see if people like it.
00:31:18
Speaker
Yeah, just made a visualization. Hope people like it. If not, I like it. I think I generally have that, except for that lore of the ring specialization that I created, which is heavily based on a core diagram, a specific type of visualization, but then kind of mutated into a thing that is
00:31:41
Speaker
Like it's been evolved into something that's no longer compatible with the first, so it's become its own species kind of thing. And then somebody said, I should make it into its own chart. And then we called it the loom with the springs. So in itself, I was suddenly very aware that this became a new possible chart type if people wanted to use it.
00:32:04
Speaker
So sometimes it kind of still comes into my mind where I think also one of the later visualizations that I did about card capture Sakura, where I think, oh, this is kind of, I haven't seen this yet, but it feels like it could be
00:32:20
Speaker
How do you say that? Standardized in a way that could be used for other datasets as well. That one actually pops into my mind because I've had several emails from people saying, hey, this kind of visual, I want to use it for different datasets. So then I feel like, oh, I guess maybe it is a new kind of chart type, even though it's a very niche thing and it will only be useful for a handful of people.
00:32:45
Speaker
I do want to also say on the topic of citation, I do think of for example like what Nadi says about remixing. I think when she remixes it truly a remix where I don't think you can see the hints of the original as much or I think you can see bits and pieces of the original but like it's all kind of
00:33:07
Speaker
you know, then wrapped up together into something that's new and naughties. And I feel like in those sorts of cases, she's like kind of, it's like that like creative comments and then like, what's like, like, she's like modified enough of the original, that's her own, that I'm like, I don't think you need to do any sort of like inspired by or anything. But I also come across examples of like, when it's like a one to one, it's almost like a one to one, and maybe they like change the data set or something.
00:33:37
Speaker
In that case, I would feel really unhappy if there's no like inspired by like this piece or something. And I think that's really the big difference. If somebody uses the exact same code and very similar dataset, I believe that's just plagiarism. Yeah. Yeah. Yeah.
00:33:57
Speaker
That just should not happen. Now, let me take a twist on this. So you both share a lot of your code and a lot of the code people just get by inspecting the code. So is there a point where you feel like you own that code or because it's often built on an open source platform that if someone grabbed your code, they're going to grab the code, but they're going to
00:34:21
Speaker
change it, they're going to basically do, let's just say what naughty did, right? Like they're going to remix it, but the, but the base is going to be off a code, you know, code that you put out there. Do you feel like people need to cite the fact that they started with your code or that's just, you know, that's open source and you're freely providing that.
00:34:37
Speaker
If it's a true remix, so even I would feel that I might be able to see that maybe it started out as this thing, but it became its own unique thing, then no, that's totally fine if they use that sort of starting code. But it is a gray area where it goes from play copycat to the remix part. So then it depends on the specific case and how friendly the person making that new visual is, I guess.
00:35:06
Speaker
I've thought about this a lot. And the conclusion that I've come to is that while, yes, we've open sourced our code, I actually think that our work is
00:35:22
Speaker
I'm less similar to open source libraries where it's kind of like it's this tool that people have built with the intention for other people to use it. And ours is more like an artwork where we have our own style and we have our own intention with it.
00:35:41
Speaker
In general, if someone takes my code and gets inspiration from it, it modifies it enough. And yeah, it's a gray area of how much is enough. But let's say they write a bunch of their own code, and the output is their own artwork with their own style. I'm really happy when that happens. I'm like, oh, you liked my work so much that you went and did something in that style. That's super flattering.
00:36:06
Speaker
But when it's like my exact code and plopping a data set in like that, a different data set in that offends me to like no end because I'm like, you don't know the number of hours I've like thought through like this design and why I chose this design for this data set. Like it's actually offensive.
00:36:28
Speaker
for you to just do that. It's really interesting. And I have this conversation with people at work all the time about to what extent should we release our data, right? Because if I go and grab six data sets, public data sets, and I merge them together, and I put all this work into cleaning it and making sure it's this and that,
00:36:51
Speaker
even if they're public data sets, I put a lot of time and effort into that. And so at what point do you release something like that, even though it's based on, as you said, Shirley, it's an open source platform, but you've put all this time and effort into creating something off of that platform. Same thing if you're merging all these data sets, you've created something out of all this publicly available stuff. And so at what point is that sort of an ownership thing versus an open sourcing? It's all these sort of weird gray areas.
00:37:20
Speaker
Yeah. So we're getting towards our end of our time. And so I wanted to look forward a little bit. DataSketch is volume two. No, just kidding. If you're anything like me, like you don't want to think about a book project for a long time.
00:37:36
Speaker
I know based on our conversations we've had and the other things I've heard you talk about that you both have a variety of interests. Nadi is currently making a robot in front of us as we talk. But I wanted to ask, you could pick how you want to answer this question. I wanted to ask about the future of data visualization. I wanted to ask,
00:37:59
Speaker
Either what do you see as the future of DataVis and future can be whatever you want to be, a year, five years, 10 years, 100 years. Although we may all be under water in 100 years, so that might not be as interesting. Either what do you see for the future of data visualization or what do you want to see or what do you wish to see in the future of data visualization, be it tools or technologies or what have you. Maybe Nadi, you can go first if you have.
00:38:29
Speaker
Yeah, so on the one hand, I would really like it if data literacy would increase or in general that less people would feel fear when they hear the word data or when they see a data set that they wouldn't feel fear, they would see kind of curiosity or interest in it.
00:38:52
Speaker
So that would be the main thing because I think that would really help also. And then sort of my next hope is that we can all go into taking that next step beyond the bar charts and the line charts and maybe using a little bit more complex but not that sort of straightforward charts.
00:39:11
Speaker
With the idea that 100 years ago, line charts were something that you had to properly explain to everybody, like maybe in another 50 years when data literacy has increased, who knows, we might not have to explain scatterplots at all because they're so obvious. Or even other things like the Sankey diagram, network charts, and then... Exactly.
00:39:36
Speaker
So in that sense, being able to expand what we can do with datasets because sometimes very specific charts can be just the right thing for that particular dataset. I think Sankey diagrams are amazing for certain kinds of datasets and they feel right now like they're on the cusp of maybe getting out of that too exotic kind of place towards the, actually, this is very useful kind of place. So that's what I really hope. And in terms of tools,
00:40:03
Speaker
I don't know. In that sense, I kind of hope that browsers will become better so I can do more of the things that I have in my mind, but browsers aren't able to keep up. But other than that, I don't know if any kind of specific things that I would want from a tool. I kind of like that there is this wide variety of tools that we can use, that there's not just one, but we have Tableau or D3, or you can go crazy with WebGL or with, I don't know, Play-Doh.
00:40:33
Speaker
It creates this very large variety. I think that's good. But I hope we're not going to go crazy with frameworks that JavaScript has in the web. It stays a manageable number of tools that you could use and doesn't kind of blow up. Or everybody has to use distro all of a sudden because otherwise you cannot function anymore. Right. Yeah, that we narrow into a single tool. OK. Shirley, hope swishes dreams for the future of database.
00:41:03
Speaker
Yeah, I mean, Naughty took the good one, which is the data literacy part. She did. That was a good one. Yeah, she did. Like, that was what I was going to say. You should think about this the whole time, this whole discussion. She's like, I'm going to get it first.
00:41:18
Speaker
I was like, and I was just thinking, I was like, Oh, I got a good one data literacy. And then, and then Nani just goes, and I'm like, Okay, so um, you can also have fatal literacy. Yeah, what? Yeah, we don't have to compete with each other. This is no, that's right. That's right. Yeah, you're already a partnership. You've got this. Yeah, this is not a competition. It's not a scarcity mindset. No, it's a Yeah, so
00:41:45
Speaker
That's something that I was thinking a lot about. Earlier on in the podcast, we mentioned something about methodologies. And I just thought, we know to look at methodologies because as data professionals, that's the first place we'll look at to verify a chart and to understand where it's coming from. But that's not common knowledge. And I feel like it's
00:42:14
Speaker
On top of being able to see a chart and recognize what it's for, I think some things like with all of the misinformation that's online and all of the times that my mom has been like, look at this! And they're like,
00:42:30
Speaker
No. I think it'd be really great if there is more conversation about how to suss out charts or information that's
00:42:45
Speaker
not legitimate versus those that are. And I think it is like being able to not only kind of understand the chart and where it's coming from, but also to know to look at the methodology. So that's one thing I think from like a general public perspective. Another thing that I'm excited about and have been quite excited about is like
00:43:06
Speaker
how data visualization is becoming more prevalent in kind of like the industry from like a corporation perspective. I think even like four or five years ago when I first started freelancing, most companies were like, database, what's that? Like they like kind of just starting to understand the importance of like data science and data analysis and like thinking about visualization as a separate practice.
00:43:32
Speaker
for communication wasn't even a thought on most companies' minds. And I think that's slowly, slowly and like starting to change. And I'm very excited for that to happen, where we get to a place where companies understand the value of, you know, maybe it's like internally understanding their data and the importance of communicating it, or even
00:43:54
Speaker
the importance of communicating it to their consumers. I'm enamored by what Susie Liu put out one time, and then she kind of just dropped it on us and then disappeared. But that one receipt project she had, where she kind of took one of her receipts from a grocery store, and then she
00:44:16
Speaker
she basically I think got like a receipt printer and then she like visualized the items that she bought as like very simple bar charts but like she would like she put it into I think it was like how much I spent on produce how much I spent on like meat or something like that but like
00:44:36
Speaker
this idea of just seeing data displayed in an easy and engaging way in items around us, that is something that I'm like really excited for. I don't know, I think it will take a while, but like five, 10, 20 years. And for that to be commonplace, I think that's so cool because it just means that like we're hopefully much more

Conclusion and Promotions

00:45:00
Speaker
informed. And then I guess the last thing is like just like a personal thing of
00:45:04
Speaker
I'm also enamored with like, you know, big installation and art. And I guess this is like the direction I want to go towards, which is kind of like installation art installations that have its core is informed by data. And that data and story is like told through this installation that also is like interactive and brings the community together. Like that's just my personal dream to work on. Yeah.
00:45:34
Speaker
That's really cool. So I'll just say on the second one, I think we're closer to that than 10 or 20 years. Because if I look at my Apple Watch, it's got the little rings on it. It's a visualization on my watch. I feel like we're closer to that, especially as everything moves to being on your phone, when you can just tap your phone to the credit card thing.
00:45:59
Speaker
You know, I can imagine you don't have to print the bar chart on a receipt. It could actually show up on your on your phone. Right. Yeah. So I think we're close to that. And that's really a really cool idea. And, you know, the date of his art is when we're back in museums.
00:46:14
Speaker
Yeah. Or even in like outside common spaces, because that's also another conversation of like putting things in museums is actually kind of inherently like exclusionary. Yeah. That's a whole separate conversation. That's a whole separate conversation. I will not get into it.
00:46:31
Speaker
No, but it's true, right? I mean, you look at how many people surround the bean in Chicago on a nice summer day. Yeah. You could have something that's data-driven that people are surrounding. Yeah. Whole other conversation. Yeah. We could talk forever. Thank you both so much for coming on the show. Congrats again on the book. Thank you so much. It is lovely. And I have been inspired just by peeking through it. So I will link to these other interviews you've done so people can hear more about you guys ranting about color.
00:47:01
Speaker
which I think was on maybe the polls interview. Yeah, more on process on Ally's interview. So there's a lot that people can learn from. So thanks to you both for coming on the show. It's been really great chatting with you. Thank you so much, John. Thank you for having us. It was a pleasure.
00:47:19
Speaker
And thanks so much for tuning into this week's episode of the podcast. I hope you enjoyed that. I hope you'll check out more episodes of the podcast going back through the archives. I hope you'll also join me on some of the new episodes of the All Charts Considered sessions on the Clubhouse app. Again, if you need an invitation, just send me a note over at policyvis.com. So until next time, this has been the Policy Vis Podcast. Thanks so much for listening.
00:47:46
Speaker
A number of people helped bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs, and each episode is transcribed by Jenny Transcription Services. If you would like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, or wherever you get your podcasts. The PolicyViz podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our Patreon page at patreon.com slash PolicyViz.