Introduction and Sponsor
00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:00:19
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.
Welcome and Episode Overview
00:00:49
Speaker
Welcome back to the Policy Viz Podcast. I'm your host, John Schwabisch. Thanks for tuning in this week. Today's episode continues my month of story. We're going to be talking about data visualization and story again, of course, because I'm all in on story this month. But before we get to that, we are going to talk about lots of other cool projects with this week's guest, Jan Willem Tulp from Tulp Interactive. Jan Willem, welcome to the show. Thank you. How are you?
00:01:16
Speaker
I'm really good. Yeah. Very much. How are you? I'm good. Springtime in the Netherlands there. Today, yes. That's cool. You've been doing some very cool stuff, obviously, for a long time. Some interesting work with Scientific American that, I don't know, that may be some of the stuff that most people know you for.
Jan Willem Tulp's Background
00:01:39
Speaker
products you've done with them. And you're also doing some cool stuff with Google this year. But before we dive in and talk about all the good work you're doing, can we start with maybe you sort of give folks an introduction to who you are, where you came from, and what you're doing now? Sure. So yeah, I live in the Netherlands in the Hague, and I've been doing data visualization for six years now.
00:02:00
Speaker
Before that I worked as a software engineer. I studied interaction design because I wanted to do both design and software. And right now that's what I'm doing. Basically
From Print to Interactive Visualizations
00:02:09
Speaker
what I do is I create custom data visualizations and for me that means reading in some data set and writing custom software to visualize it to communicate insights.
00:02:20
Speaker
Great. And a lot of your work is both you do the interactive side and the static side as well, right? So I'm thinking of the flavors visualization you did for Scientific American. Do you want to talk a little bit about that visualization and then the process you go through when you're designing for both the static side and then the interactive side?
00:02:40
Speaker
Yeah, sure. But first I must say that it's primarily the magazines where this is the case because usually they have a print version and a website as well. So usually I start with the print version and later I will turn that into an interactive version because the technology I use, web-based technology, can easily be extended to turn web-based non-interactive visualization into an interactive visualization.
00:03:09
Speaker
So that's how I arrange the two parts, the two visualizations. And when I usually get started, especially for the scientific American projects, usually they have some data sets. Usually they also know what they would like to.
00:03:26
Speaker
see in a visualization, but other than that, they just leave me completely free. Most of the time, they leave me completely free to come up with a visualization that supports this gold and this idea they have in mind.
Iterative Process in Data Visualization
00:03:43
Speaker
And then so i start experimenting and usually this means that i usually try to visualize data set as quickly as possible because i need to get a sense of how data set works visually does it have lots of overlap what's the spread of data points things like that and it usually.
00:04:01
Speaker
generates some new ideas and I can see what works and what doesn't work and so basically coming up with the end result is actually creating a lot of visualizations that's an improvement of the previous one so I constantly look at what I have in front of me on the screen and then see what can be improved what doesn't work and then build on that and then finally I end up with the print version and then
00:04:30
Speaker
The next step is obviously the interactive version and the process is actually the same. So this does mean that for most clients, this works, but it does require some kind of trust between a client and me in this case, because they are paying me to do something for which they don't know what the end result would look like. So that can be kind of tricky, but I think it's the right way to do it because you really have to find out what works best for a particular dataset.
00:05:00
Speaker
It's part of the process to figure this out.
00:05:03
Speaker
Right. So when you are moving from the static version to the interactive version, what are some of the decisions that you make in terms of how you guide a reader through the visualization? So there's obviously the user experience and the things that people can do that allows them more of the control than when they have the static version, when they have to trace it with their finger, they have to look. So
Static to Interactive Visualization Decisions
00:05:28
Speaker
what are some of the decisions you make with that sort of annotation layer when you move from a static
00:05:32
Speaker
visualization to the interactive visualization? Well, the thing is that with a static visualization, you you simply have to put all the information that you want to put in in one single image. And this works. But you usually have to make some decision on well, we have to exclude this because otherwise it becomes too complex or well,
00:05:59
Speaker
This doesn't work because it requires a different view or something like that. And using interactivity, you can include different views of the same data set. You can remove or reduce the complexity by allowing users to filter data. You can zoom in or highlight certain parts of a visualization that may be interesting. And those are all kinds of things that you cannot do when you just have a static visualization because then everything has to be clear all at once.
00:06:29
Speaker
And so basically, when I discuss this with a client, or scientific American in this case, then we usually come to the conclusion together from based on the static visualization, okay, this might be a good way to add the interactivity because this really adds something or
00:06:46
Speaker
did one with exoplanets, for instance, and there were two views. One was two hemispheres, and that was also in the print version. But for the interactive version, you could also switch to a view where you saw the planets based on the distance from the sun. And so that's another dimension which we were not able to include in the print version.
00:07:05
Speaker
we were able to include in the interactive version. Interesting, interesting. Very cool. So let's talk about some of the recent projects you've been working on and some upcoming work.
Google Project: Inauguration Speeches
00:07:15
Speaker
So I know you've done some really cool work with Google and I know they're working with Alberto Cairo and Simon Rogers and the folks from Accurate. So can you talk a little bit about the work you've been doing with them and what you've been working on and how it is working with teams of folks all over the place?
00:07:33
Speaker
Well, in my case, my direct contact was Alberto Cairo and he was my sparring partner, so to say. Other than that, it's actually really great working for Google because basically what they say is, well, if you have a great idea, well, you can do it as long as it includes search data. I actually came up with my own
00:08:00
Speaker
idea. And my idea, I collect all kinds of ideas, I have a very long list of my to do projects. And this came from this list. And it was actually based on an idea I had for while the queen in the Netherlands, or the king, right, we have a king.
00:08:18
Speaker
He also has a speech every year about the state of affairs in the country, things like that. And I was just interested in how does it evolve over time, because you can find those speeches from 100 years, I think. And it's every year. So what do they talk about? How does it change over time? And after discussing this with Simon and Alberto, we came to the conclusion that this could also work for, well, first we had historical speeches, but then we narrowed it down to inauguration speeches.
00:08:48
Speaker
So that's what I focused on. But it was also a little bit tricky because inauguration speech is really about all kinds of topics from economy, from military to foreign affairs, from everything. So it was kind of difficult to really
00:09:04
Speaker
get a sense of how does it evolve over time. And also, I had to work with the Google Search API. And there were some things I was really interested in, like combinations of words that, well, when I put it in the API, it didn't get any results. So I really had to use the entities, the main keywords of a speech.
00:09:28
Speaker
Yeah, in the end, I think it worked out really nice. So yeah, I was able to organize them by theme. And then you could see for the different presidents, what they were talking about. And you can see some very interesting insights. So for instance, Ronald Reagan, he really uses much more words related to the Cold War, like nuclear weapons and things like that, which which doesn't occur with the other presidents. So yeah, in the end, I really liked the end result.
00:09:55
Speaker
Can you talk a little bit about the challenges with analyzing and visualizing text? I think this is a common question a lot of people have. Either taking text and trying to make it quantitative or visualizing qualitative data.
Approach to Text Visualization
00:10:08
Speaker
I think a lot of people sort of like probably gravitate towards word clouds right away and like, you know, that's where they start. But what is your approach when you are analyzing and then trying to visualize text?
00:10:17
Speaker
Well, it is definitely a challenge because the thing is that when text is your data and you want to understand it well directly, well, words are things that you have to read. So it also takes up a lot of space. So you have to try to come up with something that is a measurement of these words. So word length of the number of words in a sentence or something like that, or sentiment.
00:10:42
Speaker
or something like that. I think one of the things that you can do is have different information layers. I also did it at the Google project. So the overview allows you to see the speech, the total speech and then you have one, it's actually a column with all rectangles and each rectangle is a sentence and the height of the rectangle is based on the length of the sentence or the number of words in a sentence.
00:11:08
Speaker
But if you hover over it, you can actually read it. So that's actually two levels of information. The first one is the high level. So how long is it? How long are the words? And then if you want to see more details, then you can hover over with your mouse and then you can read the actual text. And I think this is in general
00:11:27
Speaker
one way at least that you can approach text-based visualization to come up with some kind of metric or an abstraction of the text that you can summarize it and then give an overview and if you want to see more details then you can zoom in or hover over and see the details.
00:11:42
Speaker
Right. I think the qualitative data visualization is still a big challenge for lots of people. And with text, obviously, you have semantic issues as well. And it seems like you tried to parse all that out and dive into that. So that's an interesting challenge. And I'll,
Creating Flavor Maps for a Cookbook
00:12:00
Speaker
of course, link to that project on the show notes so people can take a look.
00:12:03
Speaker
Um, I want to talk about, uh, maybe one or two projects that you have coming up. You have, um, an interesting one on more flavor maps, right. Right. Yeah. Yeah. It's, uh, I think it's still an untitled book, but it's a, it's a, it's a chef who's a quite successful chef. And he also teaches, uh, the school and he also wrote other books and he's right now working on a book about flavor maps, which is supposed to be used by people actually in the kitchen.
00:12:33
Speaker
And so some of the publishers actually used my visualization in the kitchen as a test case, which which luckily was successful. So that was nice. But yeah, the main idea is that different foods go well together or don't go well together. And this book is all about those connections between foods. And so what I've done is I've created 60 visualizations. Basically, they're all the same, each time with one central food.
00:13:01
Speaker
And for each central food, you can see how well they go together with secondary food and they're categorized and things like that. And so, yeah, that's very interesting. And I've done, yeah, you mentioned it, a similar project for Scientific American, which, and the approach is actually quite interesting because the Scientific American project was
00:13:20
Speaker
really only based on chemical structure. So the chemical compounds that different foods share. But in this project, he's a chef. So he also knows from experience and his knowledge.
00:13:34
Speaker
things that go well together and how you should better categorize groups and foods and things like that. So it's really interesting to see that. Well, I sometimes come back and know you should put this over there because it's that fits much better. And so it's really interesting to see this process and the difference.
00:13:52
Speaker
Do you view a visualization that someone is going to use for, say, cooking differently than a visualization that someone will use for learning or information? So the Scientific American piece was the network of flavors, but it was for people to sort of
00:14:09
Speaker
view and look at and understand how these things match or don't match up. Whereas it sounds like this book is more for actual use in the kitchen. When you approach these two visualizations, do you think of them differently? Is that the audience is going to use them differently?
00:14:26
Speaker
It does, yeah. For the Scientific American project, I did receive some emails from people who were also going to try out new recipes based on this visualization. I think because the Scientific American one is really an overview of all the foods altogether and you can explore the connections between the foods.
00:14:47
Speaker
It's really more for understanding how these connections work and things like that. But this one is really focused on one sensory ingredient for each visualization. So in that sense,
00:14:58
Speaker
I would say that this one is really more for actual use in the kitchen. So I'm working with potatoes right now. What goes well with potatoes? So it really answers one very clear question. Yeah, your task of cooking this meal, right? Right. Very interesting. Well, wait, let's talk about one more project before I quiz you on stories and what that means. You were telling me before we started about a project you're doing for architecture museum in the Netherlands. You want to talk about that a little bit?
Visualizing Dutch Architecture Museum Archives
00:15:26
Speaker
sure. Yeah, it's actually one of the greatest. I actually have quite a few really nice projects at the moment, but this one is really nice, I think. Yeah, I haven't even told you everything yet. But this one is really nice because personally, I really like the project where a client kind of sets a higher level goal and not very specific so that there's room for playing with the data.
00:15:53
Speaker
So this architecture in the Netherlands, they, well, they claim to have the largest architecture archive in the world. So, and their archive contains archives. They have different archives from architects or architecture institutes or whatever. So they have this collection of archives and.
00:16:15
Speaker
they just want to know what does this archive look like. So the people who work with this archive, when they acquire a new archive, all they have to do is fill out a lot of big forms to enter metadata and to structure it and to put it into their system. And that's it. And on the website, you can search it and you have to be very specific on what you're searching for. But
00:16:39
Speaker
Yeah, one of the values, of course, of data visualization is to get an overview and get and see some patterns and get some understanding of what are we dealing with. So what I'm doing right now is looking at all these archives.
00:16:53
Speaker
I'm having two approaches. The first one is the structure of the archives because every archive is organized according to a tree structure, but some are nine levels deep, some are one level deep, some have 5,000 elements, some have three. So there's really a big variety in the structure of the archives themselves. And the other thing is the topics. What are they about?
00:17:18
Speaker
This is also quite interesting with regards to the effort that I have to do because there's a lot of manual data editing. And so not very much structured. Let's just say that most of the metadata I'm currently acquiring myself by extracting words from titles, which every node in the tree structures have. And I'm splitting up words that are a combination of words.
00:17:47
Speaker
I'm almost there yet and it's almost a million words that I'm going through. I can see you getting tired as you're talking about this. Yes and no. It's a lot of work, but at the same time, I'm really close to getting at a point where I'm really going to see something nobody has seen before because they don't have this information. I'm still very motivated because now you're going to see this.
00:18:11
Speaker
these archives are going to be about forms and these are going to be about these types of buildings and that's what you can get from it and yeah that's really nice and I also wanted to be
00:18:21
Speaker
a cool-looking visualization and they even think about preparing a room in the museum where they can show this visualization. So yeah, I'm really motivated. Yeah, that's really cool. So it would be an actual exhibit in the museum. Right, right. That's very cool. Very good. Well, we'll look forward to seeing both of those projects coming up shortly. So before we close up, I want to quiz you on stories because
Storytelling with Data: A Valid Concept?
00:18:46
Speaker
I'm calling this my month of story because I've been doing a lot of reading about telling stories and stories with data. I have my own sort of view, but I don't want to tell you my view. I want to get your thoughts on this idea of telling stories with data. I mean, a lot of your work, I would sort of argue as more exploratory pieces, either static or interactive.
00:19:06
Speaker
When you hear or think of telling stories with data, what comes to mind? What do you think about it? And sort of a follow-up question is, do you think that phrase, data stories or telling stories of data, do you think that's overused? Do you think we're actually doing that? I'm just kind of leaving this open, but no.
00:19:25
Speaker
Well, I really like that you wanted to talk about this because I've given it some thought in the past and you mentioned it yesterday and I gave it some more thought. Because before yesterday, actually, my idea was storytelling with data, you cannot really do it. It's actually just a collection of insights and that's it.
00:19:47
Speaker
I gave it some more thought since yesterday. So maybe my thoughts are now. There are two things to storytelling. So the first one is story. And you often hear people say that we have to find a story in a data. But I don't think that you can find a story in the data. That's simply not possible because storytelling it well, obviously it comes from literary storytelling and it's from reading a book and in a book
00:20:14
Speaker
So the writer explicitly wrote down a story. So that story is there and you can read it over and over and it's the same story. And that's really a story. What you're doing with finding a story in data is actually you're trying to find insights. And if you have those insights, you try to find some kind of connection between these insights.
00:20:38
Speaker
And I think when you have that, that could be a story. Now, for literary stories, what connects them, what you have there is basically a collection of events connected by time. But with data, it can be a collection of insights connected by, well, whatever. And in that sense, I think you can have a story with data. Now, the other part of storytelling is telling. And that's also a very important part, I think, because
00:21:07
Speaker
without telling, it's not really storytelling, you're still just showing collection of insights. And when you're telling a story, you are really thinking about how to structure these insights, how do I communicate those insights, maybe sequentially, maybe in parallel, maybe in other ways, but you are structuring these insights in such a way that it really becomes some kind of, well,
00:21:36
Speaker
what story or whatever. So now that I've given it some more thought, I think you can you do have storytelling with data. But yeah, it is kind of the characteristics of literary storytelling applied to data. Oh, and one other thing is that in literary stories, you you while you clearly have main characters, and they go through some adventure and things like that.
00:22:03
Speaker
And I thought, well, in data, you don't have really have characters. But if you take, for instance, well, let's say the gross national product of a country, and you look at it over time, well, then your main character is the gross national product. And then it goes through, well, in this case, time is the connector. Yeah. So I think you also do have character. So it's kind of the topic or the thing that you're trying to visualize. Yeah, I think most of that is right on. I think
00:22:31
Speaker
The other thing that I've been sort of playing with in my head is whether, I think the Gross National Project is a good example because that is sort of an aggregate statistic. So it's sort of hard, if you wanted to couch those numbers in a story, in a traditional literary story, it'd be hard to do because then the protagonist, I guess, would be like the country and it's harder to connect with that. And so in that case, I wonder whether the protagonist becomes the creator of the visualization.
00:23:01
Speaker
And in some level, I wonder, as creators of visualizations, we're always a protagonist, because as you said, you're looking at data, you're pulling out insights, and then you're stringing those insights together. And so is it always the case that we as the creators are the protagonists, for better or for worse, because we're adding our own perspectives and biases? Yeah, well, at least it's not that the data itself actually doesn't
00:23:29
Speaker
It's not a story. It's you who decide what is the story. And so it's your interpretation of a collection of insights, which you also determined that that happens to be an insight, which is also well. Yeah. What's really interesting about it in some ways, just to sort of like, I guess, step back is, you know, we download or capture some survey data.
00:23:53
Speaker
And people have answered the survey. And then we sort of forgot, I mean, at least in what I do, is I just forget about that. Those people like really exist, right? Those people have their own stories, but we sort of just forget about it. And we look at these people and they do have their stories. But then we, as you mentioned earlier, you try to like get this top view. So you are kind of summarizing it. But I mean, I'm trying to ultimately make this argument that like, I think we probably agree, which is unfortunate because I'm agreeing with it. But you know,
00:24:21
Speaker
Those are not stories when we're stringing those together that it requires something else that I would hypothesize, I think, that most people who are sort of data analysts are not comfortable with or familiar with for lots of different reasons that we don't need to talk about. But I guess the other question I would ask is, do you think then, given the three sort of pieces that you talked about, that it has to be a story and you have to tell it, do you think that the word story is then overused in the field?
00:24:50
Speaker
And that we shouldn't be careful with it. Or it's okay, because everyone's like, I know what a story is. It's okay. Don't, you know, don't get all worried about it. I don't know. But well, at least I think the story with data is a little bit different from a literary story. So if it gives you some kind of expectation, okay, we do storytelling. So you automatically think of literary stories, but that's not really what it is. It's it's
00:25:17
Speaker
concepts from literary storytelling apply to you. And the visualization itself also is not really, well, that is also not the story. You are telling the story and you're using the visualization to support your story with images or something like that. So the story is actually non-existent. It's only you chose it and it's supported by visuals and so it's an implicit story actually.
00:25:44
Speaker
Yeah, that's interesting. I think there's a lot to be said for the type of visualization, how that relates to story, whether it be a literary definition or a different definition where you have animation versus static versus interactivity and how those always sort of relate to story. Let me sort of close up with this. Well, do you think about
00:26:07
Speaker
story and whatever, however you want to define story. Do you think about story when you are creating static versus interactive visualizations and allowing people to take their own path through the visualization? Yeah, but it's, I must say that I think in general that I don't really think of selling it as a story or something, but it's more
00:26:35
Speaker
Yeah, I think in general, I would say that I see a lot of trade offs on many different aspects. So how effective should it be? Or should I compromise some effectiveness for the sake of aesthetics or something? Should I make it interactive or not? How much interactivity should I explain this extensively or just make a small label? And so, yeah, for me, it's all
00:27:03
Speaker
a combination of all kinds of trade-offs and decisions of how to find a balance between all kinds of things, actually. For me, it's not really, I have this set of insights and we're going to tell this story like this. It's really, we have this set of insights, how are we going to guide the user through these insights that we, well, explain it or communicate it in such a way that he understands it or likes it or whatever.
00:27:32
Speaker
So I think that's, in my mind, that's more the way that I approach projects. In your mind that's less of story and more of navigation or narration or a path through the data.
00:27:46
Speaker
Right. I think in the end, you can come to the conclusion, you build a storytelling visualization, but for me in the process of creating it, it was more thinking about how to navigate through it, how to annotate it and things like that. Right. Very good. Wow. Okay. So this is just the beginning. So next time we, next time we sit down with a big beer and hash this out some more. Jan, well, you've got some very cool projects that you've obviously already done and are coming out. So we'll look forward to that. Thanks so much for coming on the show.
00:28:16
Speaker
Yeah, thank you too. And thanks to everyone for tuning into this week's episode. The month of story continues for another couple of weeks, so be sure to tune back in. So until next time, this has been the Policy Viz Podcast. Thanks so much for listening.
00:28:42
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:29:01
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.