Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #44: Lane Harrison image

Episode #44: Lane Harrison

The PolicyViz Podcast
Avatar
145 Plays9 years ago

Recorded along the lovely Boston Harbor (enjoy the outdoor sounds), in this week’s episode of The PolicyViz Podcast, I chat with Lane Harrison, Assistant Professor in the Computer Science Department at Worcester Polytechnic Institute. We not only talk about Lane’s interesting...

The post Episode #44: Lane Harrison appeared first on PolicyViz.

Recommended
Transcript

Introduction and Sponsorship

00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by Juice Analytics. Juice is the company behind Juicebox, a new kind of platform for visualizing data. Juicebox is a platform designed to deliver easy to read interactive data applications and dashboards. Juicebox turns your valuable analyses into a story for everyday decision makers. For more information on Juicebox or to schedule a demo, visit juiceanalytics.com.

Guest Introduction: Lane Harrison

00:00:37
Speaker
Welcome back to the Policy Vis podcast. I'm your host, John Schwabusch. I'm here this week with Lane Harrison, Assistant Professor at Worcester Polytechnic Institute, WPI, fans of the school.

Lane's Journey into Data Visualization

00:00:49
Speaker
Lane, thanks for coming on the show. Thanks, John. I'm glad to have you here. Want to talk a little bit about what you...
00:00:54
Speaker
do for a living, which is research and data visualization. I think a lot of people are probably familiar, obviously, with reading and looking at data visualizations, and many people are familiar with creating visualizations, but then there's this whole other slice of the pie on doing research about data visualizations. What are the best techniques? What are the best tools?
00:01:12
Speaker
So I'm curious if you can sort of start, maybe talk a little bit about yourself, a little background, and then maybe tell folks what the field is like. What does it mean to do research on data visualizations? Sure thing. I'm Lane, Lane Harrison. Brand new assistant professor at WPI in the Department of Computer Science. I sort of fell into data visualization by accident. We can actually thank Robert Casara for some of this back at UNC Charlotte about 10 years ago.
00:01:38
Speaker
There was a big visualization group and one of the things they did there is a visualization in the world symposium. I remember being a freshman, a brand new freshman in college and going to this symposium because it was extra credit for a computer science course and it turned out I found that you could do graphics with a purpose.
00:01:57
Speaker
So it's the talks I saw there from people at Utah and all kinds of places that was really fascinating to me. And from then on, I was kind of hooked on data visualization and data visualization research. So after UNC Charlotte, I came up to Tufts University for a postdoc. I worked in a visual analytics lab with Renko Chang. And after a year or two of that,
00:02:18
Speaker
Actually, it was around this time last year that I accepted a job at WPI, moved in over there, started recruiting students, and now we have a tiny lab that we can start doing more research with.

Evolution of Data Visualization Research

00:02:29
Speaker
So, can you give us a sense of the field in this progression over the last 20-30 years? Sure. So, this research
00:02:38
Speaker
as an actual field, as an actual conference, started around 1990. It's either 1990, 89, or 91, I forget. It kind of grew out of the graphics community, and you can see that influence in early data visualization techniques. It ended up having a lot of 3D charts and early data visualization papers, which turns out for abstract data, that's not a great way to look at your data. It's not the best way, okay.
00:03:03
Speaker
We all have things we're embarrassed about in our younger days. They've grown a lot since then. So the Viz conference, you'll see that's where techniques like the tree map was introduced in the early 90s and parallel coordinates plots. They show up a lot in early data visualization research.
00:03:22
Speaker
So years and years of techniques. So as data got longer, so we had more and more rows, larger and larger databases, more complex with more and more dimensions, we needed techniques that can handle that. A lot of early data visualization research kind of centered along this technique-driven research. But what we found in more recent years is that with a plethora of techniques or any given dataset, you have so many different techniques that you could throw at it. How can we decide which one is best for a given user, for a given audience, for a given task?
00:03:52
Speaker
Some more recent research in data visualization has kind of turned its focus away from the techniques and away from the data and back to the user.

Human Perception in Data Visualization

00:04:00
Speaker
I find that really, really exciting. Honestly, the human component of data visualization is the thing that I like to research the most. It's gratifying to me to figure out how humans work in the context of these.
00:04:12
Speaker
So when it comes to the human experience, it's about how do we perceive a 3D graph versus a 2D graph? How do we work with animation and interactivity? So let's just take a made up example. Sure. Although I'm sure it's not made up. I'm sure someone's done this. But let's say how we perceive quantities from a bar chart. So you want to, as a researcher, you want to know whether people accurately discern the quantities from some bar chart. So how do you actually physically do that test? Do that research? Sure.
00:04:38
Speaker
Well, first we would define what it means to accurately discern quantities from a bar chart. You could read the values off of a bar chart. You could compare two bars in a bar chart. That's actually a classic task, taking bar A and a bar B and telling me what percentage is A of B. That's actually one of the first studies where we had evidence of bar charts being better than pie charts. You could characterize the distribution. You could tell me the average. There's so many different things that people do with a bar chart.
00:05:05
Speaker
So to study that sort of thing, we would draw on some research from statistics and research from psychology. So psychologists have this really great understanding of vision and how we perceive different shapes and how accurately and precisely we perceive different shapes.
00:05:22
Speaker
So things like our perception of line length is very, very accurate, whereas our perception of area, our perception of angle, is less accurate. So that's sort of a reason underlying why we find the bar charts are better than the pie charts. But to actually run that study, a classic one, Cleveland and McGill 1984, they would randomly generate a bar chart of five bars and ask people again and again, they'll randomly select two of those bars, and what percentage is bar A and bar B?
00:05:50
Speaker
you ask that again and again and you can actually build a nice model of error for different charts so in that way you can test the bar chart against the pie chart against the stack bar chart or whatever other variation you want to come up with. Now for a lot of the research
00:06:05
Speaker
that does that sort of thing. And we're going to turn to your specific research in a moment, because I know you are doing more sort of crowdsourcing and using the mechanical turk.

Large-scale Studies Using Online Platforms

00:06:13
Speaker
But for sort of traditional forms of that sort of study, at least the studies that I have read, not a lot, but at least as I've read, the sample sizes seem to be pretty small. So how can I take a study that's 10 people or 12 people and say, oh yeah, I can apply this to the world at large?
00:06:31
Speaker
Yeah, that's a problem. I mean, you can apply an average to a sample size of 10 and make things look like they're stable, especially if you're reporting things like standard error. So how do we combat that? How do we look at that? One of the best things I've seen people do lately is publish their data. But taking a step back, the first thing to do is to change who you're getting your data from.
00:06:52
Speaker
We live in the internet age and it's very easy these days to put studies online. You can have volunteers take them. I've seen a lot of people run studies on Twitter and I will happily take studies if they come across my Twitter timeline. But another increasingly popular way is through crowdsourcing platforms like Amazon's mechanical Turk.
00:07:12
Speaker
Right. So the Amazon's mechanical Turk lets you set up a task and pay people just a few cents to a dollar. It helps to be ethical because you get to set your own prices. So a few cents to a dollar for these people to take your experiment, to take your study.
00:07:28
Speaker
And in that way, I've done studies now with thousands of participants and you can get responses in, you know, minutes or even hours from the time you launch your study. You're able to quickly answer questions and refine your experiment methodology. So it's a really great platform for doing that. So obviously this is expanding what we can do with data visualization research.

Emotions and Data Perception

00:07:50
Speaker
So can you describe some of your more recent work you've done, this really interesting study on perception, you've done some other work, starting some work on virtual reality we were talking about earlier. But can you talk about some of the work you've done where you're using the mechanical trick or using some of this crowdsourcing and what you're finding? Yeah, absolutely. I think that's one of the most fun types of study to run is where you can sort of get instant gratification.
00:08:10
Speaker
Yeah. So we started running these studies kind of to test our assumptions and data visualization. So there are a lot of things that we've, you know, hinted at or thought about, you know, from an intuitive sense for a long time, like, you know, maybe emotion, you know, does that play a role in the way that we interpret charts? And it turns out, you know, with some thinking you can run a study that tests the impact of emotion and data visualization online in a very scalable way and figure out, you know, if I'm sad, does that mean that I've received a chart differently?
00:08:40
Speaker
So we had a paper in 2013 that tested this. We prime people with stories from the New York Times. That took a long time to figure out stories that were appropriate length and that sort of thing. We had to test them. They were very sad. One was about hospice and one was very happy. We tried to find something neutral, but it turns out if you give people Stephen Hawking, which should be neutral, people get really angry at you. So that didn't work. But what we found was very interesting is that
00:09:09
Speaker
The perception of these basic charts, like bar charts and high charts, was indeed impacted by emotion. People who were negatively primed performed significantly worse than people who were positively primed, so people who were made to be happy. What's more is that the reaction times didn't change. No matter what we were testing, the reaction times were the same, but error was very different.
00:09:33
Speaker
So, as to what's going on underneath that, we need to turn to cognitive psychologists, and we did that. A colleague at Northwestern, Steve Frinkinari, he came up with some possible explanations for that. If you're negatively primed, the perceptual spotlight of attention is a little bit more narrow, and that can manifest through how we show charts to people.

Impact of Emotions on Critical Data Interpretation

00:09:54
Speaker
And where it becomes increasingly important is we're starting to use data visualizations for more than just news stories. People are using data visualizations to not only decide who to vote for, but maybe decide what treatment to get for cancer.
00:10:10
Speaker
And when you're presenting this sort of data to people, you need to be able to quantify the impact of things like the emotional state of your audience. So that was an interesting study a few years ago. And I can imagine, not just on readers of news articles, right? It's doctors who are dealing with life and death situations.
00:10:29
Speaker
So that's really interesting on the effect of emotion on visualization. So now what about research you've done on animation or interactivity? And we've talked about sort of your classic bar charts, pie charts, that sort of thing. But what about sort of the new way of data visualization, right? Interactivity, animation, those are the new way that we're thinking about working with data. That's true. So we're starting to think about interaction and what it means
00:10:56
Speaker
you know, one of our person's perceptual processes are in play with interaction and how we can exploit that and help people to interact more deeply with data visualizations. I don't want to say too much since this is still under review at the InfoBiz conference.
00:11:11
Speaker
But suffice to say that there's some very interesting things you could do that, in hindsight, kind of makes sense. And so other studies I've seen recently with animation, something that people do, they transition between charts. And there's a huge design space that's unexplored there. One of the early techniques was just to stagger the animations. And there's some recent research, actually, not for me, but from Vanishing Blair and Emeria that came up with ways

Modeling Perception with Weber's Law

00:11:40
Speaker
to do that.
00:11:41
Speaker
One thing I can talk about, a little bit older, but the perception of correlation. So a lot of traditional studies, we compare bars, we compare slices on pie chart, correlation is a little bit more complex. So we're talking about cause and effect here, linear relationships between variables.
00:11:59
Speaker
What can we say about the perception of correlation in data visualization? To illustrate that, I'd like to put you through an experiment. Actually, Don, I hope you don't mind. I'm ready. Everyone else can follow along. Hold out your imaginary hand. In your imaginary hand, I'm going to place a paper clip. You can feel that. You felt the paper clip drop in your hand. You can feel it as it's still there.
00:12:23
Speaker
So now I'm going to give you a 35-pound weight. So maybe you're at the gym or maybe you have a big, you know, pile of cat food. So now you're holding this really heavy 35-pound weight. So now I've put the paper clip on top of that 35 pounds.
00:12:38
Speaker
If you think about it, if you've done this before, you don't register a difference. You can't tell that it's there. So this is an instantiation of a really old thing from psychology called Faber's Law, more than 200 years old. And a psychologist slash computer scientist, Ron Rinsink at University of British Columbia in 2010, he found that the perception of correlation in scatter plots can be modeled using Faber's Law.
00:13:04
Speaker
It's really interesting. So Weber's law is something that applies to our perception of weight, our perception of brightness. Psychologists have applied Weber's law across the board. Things like taste. Weber's law kind of manifests there. And what's interesting is now it's applying to correlation. And correlation is something we think of as being a higher level.
00:13:24
Speaker
So following that study, I had a hypothesis, a hunch. We can represent two variables in many different ways. If you, you know, throw two variables in Excel, you get, you know, several different charts, bar charts, line charts, maybe scatter plots, maybe even parallel coordinates plots if they're getting fancy these days. So the hypothesis was, you know, if the perception of correlation can be modeled using Weber's law in a scatter plot, can we do it in other charts?
00:13:51
Speaker
So we ran a study recently in 2014, I think it was 2014, that tested this. So very large study around 1,700 participants on mechanical torque. Very fun to run because perception, you know, it sort of works or it doesn't. What we found is that the perception of correlation in all the charts we tested, we tested donut charts, we tested radar charts, we tested line charts, nine different chart types, parallel coordinates, plots, all of them could be modeled in the same exact way. So now you can talk
00:14:21
Speaker
about comparing the effectiveness of scatter plots and donut charts using a common ground. Oh, interesting. Now, when you're doing a test on correlations, you move, you know, maybe a step closer to statistics. So do you think people's understandings or perceptions of even correlations are draft? Does that become somewhat more problematic because of people's basic understandings of what a correlation is? And they, like, don't get that.
00:14:48
Speaker
That's a great observation. So we didn't test for people's prior knowledge of correlation before we ran this study. We just let everyone take it. And it turns out everyone was able to do relatively well. You can perceive, if I show you two scatter plots, and one is very highly correlated and one is not, we sort of have, I guess, a neural network in our heads that can sort of adapt and start to recognize this pattern.
00:15:13
Speaker
What's very interesting, anecdotally, I did run this. We talked about running tests on students. I did run this test on students early on in the testing phase and people who knew correlation very well.
00:15:27
Speaker
can do very well if they take a long time. So while there's sort of this population model that we can build, you can also think of building a perception of correlation for you, John, and for me, Lane. And then using that to determine which chart is best for you. So I have no evidence to say that it changes between people, but it could.
00:15:46
Speaker
But it could. And if it does, then that says a lot about what people need to be doing when they're creating things, right? They need to be thinking. We've always said they need to be thinking about our audience, but this even suggests even more. And when you tie it back to the work you did with emotion, it suggests even more. So when you think about applying your research to the real world,
00:16:06
Speaker
Are you out there talking to practitioners and saying, hey, look, you know, we have found, you know, this thing on emotion. So if you're writing a story about hospice care, you should be at least considering different kinds of graphs and this big complicated thing. That's true. It's such a huge gap between them. And we are trying. So some recent studies before I left Tufts, we were actually working with doctors and medical center.
00:16:30
Speaker
sort of along this line. Very, very interesting findings there. So the whole purpose of that study about the perception of correlation was to come up with a ranking. I just wanted to see which one was best overall and sort of what the overall ranking was. So we titled that paper, ranking visualizations of correlation using Faber's Law.
00:16:50
Speaker
And we made an attempt at the end to sort of produce a chart that kind of showed these rankings in a simplified way. And it had the desired effect. People started to share it and use it in different ways. But one of the other things we did
00:17:03
Speaker
a practitioner might just want to get their hands on the data themselves. So one of the things that we started to do is to actually publish data from our experiments in a GitHub repository right alongside the papers. So whenever I went up to give the talk at the VIS conference of years ago, the GitHub repo was already there.
00:17:24
Speaker
and that made it available for people to use. So there is some overhead to doing that. You have to spend extra time tweaking your experiment scripts and your analysis scripts and getting the data in a nice form. It turns out it was worth it. Some of you might know Hadley Wickham, but before I had gotten off the stage, Hadley had actually submitted a pull request that fixed some of our nasty R code analysis scripts. Thankfully, it didn't change the results.
00:17:48
Speaker
But that's, you know, one of the testaments to putting your data and your materials out there is that people will use them, people will improve them. A few months later, we found that Jeff Hare and a student, Matthew Kay, actually took this data and ran a follow-up analysis of it and came up with some better modeling techniques that actually, you know, added some error bounds to the original ranking that I produced and took care of some of the problems like outliers, the student data.
00:18:17
Speaker
And that was only possible because we put the data out there otherwise they would have had there on this study this large study again by themselves.

Data and Code Sharing in Research

00:18:24
Speaker
Right right so that's a really interesting point and I'm curious do you think there's any limits to that should people always be putting their code and their data out there. I mean you are spending.
00:18:35
Speaker
grant money or university money to run these experiments and and you're putting a lot of your time into these things right so and this may just be personally right or whatever like do you feel personally like you own this thing because you put so much time and effort and money into it or do you feel like
00:18:50
Speaker
I'm part of a community and I should be putting out there. I'm not saying either one is right. I'm just curious what you would. So do you want me personally or do you want the answer? So I can tell you. So let me answer it two ways. Yeah. I've heard it said that data is power and data is power. If you hold on to that data, I mean, you could publish several more papers following that and experiment materials and everything.
00:19:12
Speaker
I'm of the opinion that you have to put it out there. People improve on it. I mean, karma is a thing. Whenever Jeff here and Matt Kay wrote this paper, they did a beautiful job towards the end talking about the value of open science and putting your data out there. That's something that should always be done, in my opinion.
00:19:29
Speaker
Put your data out there i mean how often do we actually go back and you know incrementally build on our results anyway when search of the next new thing i've already started doing things that are completely different from that although i would like to revisit it yeah yeah the replication in science is probably not as.
00:19:49
Speaker
So before we wrap up, I want to ask you, where are you headed now?

Future Vision for Data Visualization

00:19:54
Speaker
You've done this work on Weber's law, you've done this work on emotion and lots of other things, but I'm curious where you see your research going over the next year. I'll give you the overall vision. Any potential students, feel free to buy into this vision and come work with me.
00:20:07
Speaker
So the vision is this. Those 25 years that I talked about earlier, or 30 years now, in this research, we made a lot of techniques. And one thing that we're good at in visualization is mapping from data to visuals. We're very good at that. We can do it very accurately. What we're not good at is connecting the visuals with humans. So all these things, cognitive states like emotion and perceptual limitations, that kind of mess up this connection between the human and the visual.
00:20:37
Speaker
My research goal is to quantify that as best we can, but to not just stop there. I mean, that can produce actionable results that can help practitioners. But what if we could take these models and build them into systems, build them into, you know, make systems that are aware of perceptual limitations, systems that calibrate to you, systems that are aware of, you know, the impact of emotional state or cognitive state or even cognitive traits who you are and how you interact with and perceive data.
00:21:06
Speaker
If we can build those sorts of systems, we can help people in these critical situations if you're communicating cancer statistics to someone or if you're a cybersecurity analyst and the room's on fire and you've just had a breach. So these critical situations, I love the space of modeling humans and how they interact and perceive data and using them to build meaningful systems.
00:21:30
Speaker
Interesting. Great. Well thanks for coming on the show and good luck with that research. It's fascinating. Andrew will all be following us. Thanks a lot. Thanks, John. And thanks to everyone for tuning in to this week's episode. This week coming to you live recorded at Paulson Whorf and a very lovely spring day. But thanks to everybody for tuning in.
00:21:50
Speaker
Feel free to shoot me a line or a note about things that you'd like to hear about on the show on the website at policyvis.com or of course on Twitter. Feel free to reach out. So until next time, this has been the Policy Viz Podcast. Thanks so much for listening.
00:22:16
Speaker
Again, thanks to our sponsors, Juice Analytics. For 10 years, they've been helping clients like Aetna, the Virginia Chamber of Commerce, Notre Dame University, and U.S. News & World Report create beautiful, easy to understand visualizations. Be sure to learn more about Juicebox, a new kind of platform for visualizing data at juiceanalytics.com. Also, check out their book, Data Fluency, found on Amazon.