Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #117: Steve Haroz image

Episode #117: Steve Haroz

The PolicyViz Podcast
Avatar
176 Plays7 years ago

Steve Haroz is a postdoc researcher at Pierre and Marie Curie University in Paris. His research explores how the brain perceives and understands visually displayed information like charts and infographics. We talk about data visualization research, uncertainty, connected scatterplots, and...

The post Episode #117: Steve Haroz appeared first on PolicyViz.

Recommended
Transcript

Introduction to Data Visualization Research

00:00:11
Speaker
Welcome back to the Policy Viz Podcast. I'm your host, John Schwabisch. I hope you're well. This week on the show, we're going to talk about data visualization research, and we're gonna talk a little bit about how to communicate uncertainty through visualization and to help me with these various tasks. I'm very pleased to introduce Steve Herose, who is a postdoc researcher at the Sorbonne University in France. Steve, how are you? Welcome to the show. I'm great, thanks for having me.

How Does the Brain Perceive Data?

00:00:39
Speaker
Really good to chat with you because I think we've only chatted in person once, maybe. Yeah, that sounds about right. Except we chat all the time on Twitter. But it's always fun to like meet the Twitter friends in real life and have an actual conversation. Yes, exactly. Hi, we've spoken many, many times, but we've never met.
00:00:56
Speaker
Right. I tell lots of people that what I really want to do is have a role of parchment and a quill and just bring that along with me. And anytime I meet someone from Twitter, I'm just going to scratch it out and just, you know, so I want to talk about some of the research you do and the methods behind some of that research. And I also want to talk about communicating uncertainty because I know you have some thoughts and feelings on how to do so. So why don't we start by having you talk a little bit about yourself and maybe just jump into some of the research that you're doing.
00:01:23
Speaker
Sure. So my research focuses on how our brain perceives and understands visual information and how it sort of takes that information and uses it to perform some sort of action or make some sort of decision or recall some information later on.
00:01:40
Speaker
And it looks at the perceptual aspects, the memory aspects, the selection of a subset from your visual information, as well as sort of cognitive processing and computation of that visual information.

Data as a Tool to Explore the Brain

00:01:54
Speaker
So yeah, a lot of that as a consequence tends to focus on data visualization, prime example of visual information. And an interesting aspect there is that data visualization for me is more kind of a
00:02:08
Speaker
an arbitrary medium. I'm often called a data visualization researcher. I consider myself a data visualization researcher. But at the end of the day, the data is for me is sort of a means to an end to understand what the brain, what our visual system is doing and how to take advantage of that as best as possible. So when you're thinking about conducting research,
00:02:32
Speaker
to explain how we perceive information. What are some of the core techniques that you like to think about? I'm not sure a lot of people think carefully about visualization research. So this might be a good primer for people. What are some of the primary techniques that you might use to run some of these tests? Sure. A lot of what I try to look at is trying to understand what happens often in fairly small time frames.
00:02:55
Speaker
And then what are the sort of sequences of very small behavioral cognitive processes that our brain is doing in order to form some more sort of complex decision in order to be able to select some information or recall some information or understand information.

Isotypes and Cognitive Limits

00:03:12
Speaker
And in order to do that, oftentimes what we'll do is we'll present some information very quickly. Imagine if you're sort of reading through a paragraph of text and you see sort of a figure in the corner of your eye and you take a quick glance at that figure and then you come back to the text. What information can you grasp? What did you, what do you miss? What is the kind of things where you're going to have to really carefully inspect the scene, otherwise you won't catch anything. And
00:03:40
Speaker
To do that, usually what we'll do is we'll either just very briefly present something on the screen and ask, what did you see or what did you not see? Or we will show something on the screen either sometimes very vaguely or sometimes very precisely what it is that you have to do and see how long it takes people to accomplish something.
00:03:58
Speaker
And usually what we focus on because we're trying to understand how and why not what is the best improvement of this specific application because we're trying to look at the how and why question will very carefully and precisely sort of manipulate the display, which may look nothing like a sort of realistic data visualization or even a natural scene that you might see in the world. But it lets us sort of pull apart. Hey, when we add this little thing, we get this big change in behavior.
00:04:26
Speaker
When we add this other thing, we get no change in behavior. Or we can measure the behavior in terms of accuracy, in terms of how quickly people respond. Usually, I try to stick to more objective measures like that, the time and accuracy questions. But you can also look at it for more qualitative questions like it could be preference, it could be which thing does a person select even when there is no right answer.
00:04:55
Speaker
So trying to see sort of how we can change behavior often with very seemingly subtle or very sort of small changes to a display is the goal.
00:05:05
Speaker
Right. So can you give us an example of some of the research that you've done? Um, I know you, you have an interesting paper on isotypes. You have a paper that I really liked on connected scatter plots, but can you talk about one of those or, or any others that you're working on that, um, implements these, the sorts of techniques you're talking about? Yeah, absolutely. So, um, the isotype one was sort of an interesting case because, um, it started with, uh, seeing some examples, some isotypes being discussed on blogs, on Robert Casara's blog and others.
00:05:34
Speaker
and in various data visualization research, as well as some negative discussions of isotype in the context of the quote-unquote chart junk debate, where the idea is that any superfluous aspect that is not the most minimal representation of the data that you have is going to be bad. Now, some of the original authors may or may not have intended it to be so strict when discussing chart junk.
00:06:04
Speaker
But oftentimes it's interpreted that way. So what I also wanted to know is, OK, well, first of all, how much of this is true, right? Is it helpful? Is it not helpful? What are people doing when they're looking at these?

Understanding and Communicating Uncertainty

00:06:16
Speaker
And what started the project was we saw myself and Steve Frank and Ari and Robert Casara saw an example of someone presenting some isotypes. And the first thing that came to mind was that some of these isotypes, which for those who don't know, these are these
00:06:32
Speaker
sort of little stacks of images. So if you can imagine a bar chart, instead of sort of a long bar to represent some value, you would have just a stack of little things, which could be, you know, car production. And so you'd have a car representing millions of cars produced by a company or, or by a country versus, I don't know, a plane production. You can have, you know, maybe a smaller stack of planes because fewer planes are produced.
00:06:59
Speaker
And what I noticed is that sometimes there were very few of these sort of icons in a graph and other times there were these huge stacks of them. And what I was wondering was, I wonder if there's a cognitive or attentional limit to how quickly and readily you can get a grasp of how many of these icons are in the display. And this is sort of based on some past research, some of it my own, much of it based on many, many, many other people.
00:07:29
Speaker
who have shown that if you want to very quickly count something, there's four or fewer things. You can just get that quantity very, very quickly. And as you start going higher, your accuracy starts degrading gradually as a function of the quantity until they go from what they call sort of this very precise and accurate counting called subitizing to a much less precise estimate
00:07:56
Speaker
generally called numerosity estimation. And so it shifts from this exact, precise, immediate understanding of the quantity to an estimate. Does that show itself in terms of the isotypes, in terms of these little icons? And also, again, going back to it, well, people are using images in a graph. Is that helpful? Is that harmful?
00:08:18
Speaker
And in what ways? What is it that people are potentially benefiting from using the image? And what is it that maybe the image might be hurting people? And like is true with the quantity. So what we do is we sort of present some simple graphs on the screen for about a second or so. And then we take it away. And then we ask how many maybe it was
00:08:42
Speaker
numbers of, you know, how many fruit was produced. And we asked, you know, how many apples were produced and how many bananas were produced. And we'd see, you know, how well after a very brief display and then immediately right afterwards asking them, you know, what you saw, could we, you know, sort of measure some of these effects? And in one case we measured it, you know, like I said, just very instantly on the screen for a second off and immediately asking the question. And in that case, what we found was that
00:09:09
Speaker
with about four or fewer items, five or fewer items, you do pretty well. You're pretty accurate even with this one second display, which is comparable to reading some text and then getting a glance at a figure. And then if you go back to the text, how well are you remembering the figure? Or likewise, just maybe seeing a brief display of a graph in your periphery. You take a look for a second, you don't think about it, and then what are you getting out of that?
00:09:36
Speaker
The other side of it was what happens over a longer period of time. So in a short period of time, okay, the small numbers sort of improves performance, which suggests that there is this sort of capacity limit, some sort of resource limit that's preventing you from being precise with larger numbers.
00:09:53
Speaker
But with longer term, what we found was something different. We found the number of items, the size of the stack didn't matter so much. And what instead mattered was that when you have these images, these icons in the figure, as opposed to we replaced it with just a simple shape like a circle. When you had these icons, you were able to remember it sometimes 10, we check, you know, several seconds later rather than right afterwards. And your
00:10:19
Speaker
performance improved substantially. So those those icons were helping you sort of helping your memory have something to sort of hook on to some something whether it's that the number of fruit the number of pets the number of objects that were manufactured and it just helped you help your memory hold on. Interesting was that in both of those cases, we didn't find any sort of necessary harm in having the images there. We didn't
00:10:45
Speaker
see that it hurt anything. And the same was true when we compared the stacks of images with a bar graph, you know, this simple bar, the stack versus the bar, stack never hurt. However, if you started doing things like what some people will do in a data visualization, which is put background images or add images on the side, in those cases, we found just across the board, huge hits in performance, which suggests that basically they're just distracting, which is kind of straightforward.
00:11:15
Speaker
What the research seems to suggest is that icons or iconography that is rooted to the content
00:11:22
Speaker
helps our memory, but diagrams or icons that are not necessarily rooted to the content in a specific in that sort of specific explicit way that really that's more distracting. So a background, you might have a graph on agriculture and having a picture of a cow in the background doesn't help you or may actually hurt but having the icons of cows be the data that actually helped because it's tied explicitly. Exactly.
00:11:49
Speaker
And we don't necessarily know if that's because there are, that something about it being part of the data is critical or that it maybe directs your attention in a certain way. That could also be an explanation. But either way, it seems to be that, as you said, when the data and the imagery are in the same place rather than pulling attention away from each other, in that case, there does seem to be a very big benefit.
00:12:19
Speaker
Yeah, it'll be interesting whether we see lots of graphs with cows and cars in the next few weeks and months. So that research, the isotype research in some ways, those are sort of simple graphs because you can use icons in these

Importance of Statistical Literacy

00:12:34
Speaker
sort of simple places. It's really sort of counting or the length of the
00:12:37
Speaker
the little people standing next to each other. You also have this paper with, I think, Steve and Robert as well on connected scatter plots, which is an even sort of more complicated graph type. I wanted to also get your thoughts on what it takes to convey uncertainty.
00:12:56
Speaker
You know, the Isotype paper as an example, those are fairly, I would hypothesize fairly simple graphs. Absolutely. What is your take on how we as people who are trying to communicate data can do a better job communicating uncertainty, either uncertainty behind the, you know, just using the data in general, that there's uncertainty on whether these numbers, these estimates mean anything at all?
00:13:22
Speaker
or uncertainty around the point estimate, that there's some distribution around the estimate that we're showing. Yeah, I think a difficult part of that question is understanding how well people comprehend the notion of a distribution, right? Do people understand that you could say that in general, cats are smaller than dogs?
00:13:44
Speaker
And that if someone happens to report finding a very small dog and a very large cat, that doesn't change the original statement's truth. In general, one is larger than the other. So when we look at news and politics and finance and social policy that in general, there seems to be this sort of very common mistake made where people sort of prioritize and anecdote, prioritize a single data point.
00:14:11
Speaker
and fail to consider the distribution or the whole data in general. So that's one part of this question of if we're going to represent, if we're going to convey uncertainty, is the person who is being conveyed too capable of understanding it or is it being conveyed in a way that they would understand? So that's one part of the question. The other part of the question is,
00:14:35
Speaker
How does our visual system and our perceptual system perceive and understand uncertain information? It actually would be very weird and unusual if we were in some sort of visual environment where there were just one single item that we were looking at. That's actually probably fairly unusual. We're usually looking at whether it's
00:14:56
Speaker
Trees or people in a room or cars or you know, you're you know, imagine crossing the street and there's People all over the place. There's cars all over the place. You're selecting some of it. You're ignoring other parts of it You know, the the visual environment is big and complex and uh as a consequence We sort of are going to have to our brain must be finding some way to simplify and represent the information and as
00:15:20
Speaker
maybe it's a distribution, but some sort of uncertain representation. So the brain must have some way of doing it. It's not clear if people in general understand it in a way that they can speak it and hear about it via sort of verbal or written communication. So there's this sort of bit of a sort of chasm there, because also at the same time, the data that people want to present is usually going to be uncertain.
00:15:49
Speaker
If you're visualizing something, it's very rare that you have one or two data points or these sort of simple graphs that I was talking about with the isotype where it's just two or three bars in a bar graph and that's it. So as far as what the best way to do it is, right now the answer is kind of unclear.
00:16:07
Speaker
Do you present all of it? Do you just show all of the data, maybe as a histogram, maybe as every single data point? Do you represent it as a distribution in terms of maybe confidence intervals and a mean? In my experience,
00:16:23
Speaker
very much unfortunately, presenting confidence intervals is not necessarily that successful. Let me ask you, why do you think that is? Is that a lack of statistical literacy? Or is it
00:16:39
Speaker
we being people, we just wanna have an answer, right? Like the unemployment rate is 4%, right? Or, you know, or whatever the, we just want it to be like the answer. And even if we understand statistics, maybe the best example for lots of people is election polling. There's always a margin of error around it. And most people sort of ignore it, even though it's really important. So do you think it's a statistical numeracy issue or it's a,
00:17:07
Speaker
I don't know, trying to just sift through the weeds of everything and just want like, yeah, it's a question of whether statistical training maybe earlier on, or maybe more sort of integrated as part of a curriculum.
00:17:23
Speaker
or maybe sort of replacing potentially other facets of middle school, high school education with statistics. It's a question of can those things allow us to sort of overcome our various biases or biases, I want an answer.
00:17:40
Speaker
right? And if it's 50.1% plus or minus 10%, okay, that's good. That one wins. In order to sort of be able to understand any sort of statistical information, let's just call it understand a distribution, in order to get that
00:17:57
Speaker
At the very least, there must be some degree of statistical training as far as the best way to get that statistical training across and allow people to overcome their biases. I don't know, but we have the end goal and we have the current situation, which is they don't, myself included.

The Role of Experience and Education in Data Interpretation

00:18:14
Speaker
As you exactly said, when you see a poll, that's 1% difference with a pretty big margin of error.
00:18:20
Speaker
feeling confident if that's if the person that you want to win is ahead or if the person that you want to lose behind you start freaking out so the end goal there is definitely to overcome your own bias.
00:18:31
Speaker
I think we kind of agree there. You overcome the sort of jump to conclusion. And also to have people who are aware that when they might be making a statistical fallacy, right? If you look at a single data point and you ignore the entire distribution because of that one data point, that's a fallacy that you're making. That's a mistake. Can people become aware of themselves doing that?
00:18:55
Speaker
And people avoid doing that. What are the ways that you can go about it? I think, you know, integrating statistics into, you know, earlier education and maybe, you know, potentially replacing other facets of education. Maybe it's higher level math that you can replace with statistics, or maybe it's taking something like reading and writing in an early age and having people describe uncertain things rather than things that might have a clear conclusion. That might help. I don't want to sort of jump to say the answer, but I think it's definitely worth
00:19:25
Speaker
you know, having folks in education research, having folks in statistical research, sort of look into those questions of what is it that prevents statistical biases in adults or what reduces them?
00:19:39
Speaker
It's interesting in lots of ways, but one story I like to tell people is when I went to graduate school for economics is I didn't really learn how to be an economist, right? You learn all the theory and all the math and all that stuff, but at least when I was going to school, they didn't teach you how to code, they didn't really teach you how to write, certainly not about data visualization or anything that we're talking about.
00:20:00
Speaker
But what it does is it sort of attaches to your DNA of how to think like an economist. And I wonder what needs to happen in some ways is to teach people, is to get people just to think in this sort of statistically minded way where we get away from our biases, where we see that poll number of approval rate is 52% or whatever it is, but there's a big margin of error where we just
00:20:27
Speaker
come at the world with this different lens, this different view of being a little more skeptical of the data that we're it's funny, I found sort of the same thing amongst folks who studied vision science in in school is that even if you know, sort of an early levels, you might
00:20:43
Speaker
your early maybe undergraduate you might discuss a lot of experiment design and statistics etc etc but there's some process that you develop overtime of going through experiments and being able to just spot the compound before it happens to be able to detect hey wait a minute you know what even though i'm measuring something very precisely i'm measuring the wrong thing or i'm not factoring something out.
00:21:07
Speaker
That comes with it with sort of experience and i think sort of what you're asking there is sort of this this ability to think in a way that's both very logical but also sort of aware of various missteps that you're about to make right that sort of that cautiousness that you should have and.
00:21:26
Speaker
It's sort of a weird combination of both education and experience. If you maybe potentially get, I don't want to say necessarily a wrong education, but maybe a different education, you know, for example, people who might have, and I'm sure you've encountered this too, people who have done years of math, they've done all kinds of calculus and differential equations and all of these sorts of things. And, and they are just as likely to make a statistical fallacy as someone with, you know, barely high school education.
00:21:54
Speaker
Your point about the straightforward education not being enough that the experience maybe is sort of a critical step. I don't know how to characterize that experience except screwing up a lot, frankly, just making all the mistakes.
00:22:08
Speaker
Yeah, yeah. And you know, one thing I noticed, at least when it comes to data visualization, is a place like the New York Times, where they publish, for example, scatter plots quite regularly. And that's a fairly recent phenomenon. And I think part of the reason they are confident in doing so is their annotation is just really good and explains how they explain how to read the graph. But they've also been doing it now for at least a little bit of time. So they're sort of educating their readers
00:22:38
Speaker
so that, you know, their regular readers are now accustomed to that graph type. And I wonder if there's room to do something similar with educating our readers on how to think about uncertainty when they're looking at graphs specifically. Although I think this conversation is a little bit broader than just data visualizations about
00:22:57
Speaker
all consumption of data and numbers. But at least with data visualization, whether it's, oh, you know, I see these graphs with confidence intervals, and if I see enough of them, I at least get the sense of what they mean. Maybe not in a deep statistical mathematical sense, but I get an idea that there is a bound, there's not just one number, there's a bound to the number that I'm looking at.
00:23:24
Speaker
Yeah, I would say that at least in terms of learning other things, including learning how to read data visualizations, people do tend to take on certain routines, certain established practices. For example, if you're looking at a political map and you see blue and red,
00:23:38
Speaker
you immediately know that that means Democrat and Republican. That's fairly new. I think like as recently as like 2000, give or take. And that's, yeah, and it's totally arbitrary, right? Both parties are arguably the red, white and blue party. So why did the colors become associated that way? Well, it was an arbitrary decision. People had an established practice, and it stuck.
00:23:59
Speaker
I wonder if there are some straightforward tricks and likely there are that people use with statistical graphics that whenever you see this, you should always think that. One example is maybe if you ever see a graph of anything involving finance or economics or social policy, if you ever see a graph like that without error bars, immediately be skeptical.
00:24:25
Speaker
Another one that I have is whenever I hear a politician use the word millions of dollars, federal government politicians use millions of dollars, you're talking about a fraction of a fraction of a fraction of a percent. So there might be a way to train people with certain very simple rules of thumb. If you don't

Using Visual Cues for Better Understanding

00:24:43
Speaker
see this or if you hear this but not that, then be skeptical, look in a little deeper.
00:24:49
Speaker
On the other hand, if you have everything you need, if they're presenting everything to you, but not everything maybe matches. There was an example where in the recent election where there were graphs being sent around of
00:25:03
Speaker
I forget if it was primary results or polling results or something where one candidate's bar was some certain proportion higher than the other candidate, but the numbers on the label were completely misleading. It had a bar that was twice as big as the other, but in actuality, they were off by like 1% or something. It hasn't just happened in this election. It's happened plenty of times before. Is there a way to train people to say, hey, make sure all the information matches?
00:25:33
Speaker
right, or make sure that you have more than one data point being shown.
00:25:37
Speaker
there might be a way to sort of, maybe it's through, you know, journalism and, or maybe it's, you know, in other ways to always say that, you know, what if, you know, every time a journalist said this versus this, they always said, and here's the distribution. I don't know, I don't know if confidence intervals are the perfect answer, but maybe showing a bunch of sample data points, you know, anything along those lines of just what if it always was there? What if we just didn't have these sort of very simple,
00:26:04
Speaker
just showing the mean graphs or just showing a single value. Getting people to understand the difference between the median and the mean, I think would be a huge step forward and would fix a lot of confusion that can happen sometimes when reading these graphs. That's something that I think would be a little easier if the distribution is clear. You can see a skew looking at a histogram, whereas if you just get the mean, it's hard to know what's going on underneath there.
00:26:34
Speaker
Yeah, I think you're right and I think there's a bunch of people out there thinking no, but that's gonna be chart clutter and it's gonna be hard We're gonna see all this extra stuff, but maybe it's it's worth that trade-off There is actually a really good post that Lisa Charlotte wrote recently on the mean versus median and I'll put that in the show notes and I think that maybe you're right maybe that's the first step is just these very simple things of means and medians and and Let's you know get people to understand what a percentile is and that's just the you know, that's the first step
00:27:02
Speaker
And maybe for 95% of what we're showing people or that we're seeing, that's enough. And then there's sort of the 1% of the things that we produce take up most of the effort. But for the most part, maybe we just need these simple statistical concepts to relate.

Conclusion and Call to Action

00:27:19
Speaker
Well, we'll see. It's a fun discussion and I'm sure we'll continue going on and on. And I'm looking forward to seeing what research you come out with this year. I think people should certainly check out the Isotype paper and the Connected Scatter plot papers. Both great. I'm looking forward to seeing what you come up with 2018. Thanks for coming on the show, Steve. It's been great. Thanks. Yeah, it was great. Great chatting with you.
00:27:40
Speaker
Yeah. And thanks to everyone for tuning into this week's episode. If you have thoughts on how to communicate uncertainty, please do let me know. This is an ongoing discussion, obviously, and one of the more, I wouldn't say contentious, but difficult areas of data visualization is how to get people to understand uncertainty and distributions. So until next time, this has been the Policy Viz Podcast. Thanks so much for listening.