Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
PolicyViz Podcast Episode #4: Ben Casselman image

PolicyViz Podcast Episode #4: Ben Casselman

The PolicyViz Podcast
Avatar
153 Plays10 years ago

We’ve seen some big changes in data driven journalism over the past year or so with new media companies like 538, Vox, and the Upshot launching. In this fourth episode of the PolicyViz Podcast, I speak with Ben Casselman, Chief...

The post PolicyViz Podcast Episode #4: Ben Casselman appeared first on PolicyViz.

Recommended
Transcript

Introduction to the Episode

00:00:11
Speaker
Welcome to the Policy Viz Podcast. I'm your host, John Schwabisch.

Innovative Storytelling in New Media

00:00:15
Speaker
As you are probably aware, there's a new sort of media out there, folks like FiveThirtyEight, Vox, the Upshot, all trying to do new types of storytelling with data, data visualization, all sorts of great work.

Meet Ben Castleman, Chief Economics Writer

00:00:29
Speaker
And so I'm really happy to have Ben Castleman, the chief economics writer from FiveThirtyEight here today. Ben, how are you doing?
00:00:36
Speaker
Doing great. Thanks so much for having me on. Yeah, thanks so much for coming. I really appreciate it.

Ben Castleman's Role at FiveThirtyEight

00:00:41
Speaker
So I thought we would just sort of maybe you just sort of talk about what your role is over there. And then we can talk about some fun stuff that you guys are working on.

The Evolution of FiveThirtyEight under ESPN

00:00:50
Speaker
If I can figure out my role, I'll let you know. Yeah, I mean, so, you know, FiveThirtyEight, we obviously launched by Nate Silver, best known for election prediction work. He'd done that on his own, done it at the Times about a year ago, almost exactly a year ago now.
00:01:06
Speaker
we relaunched at the, owned by ESPN, and a much broader coverage of kind of all things data journalism.

Journalism Meets Academic Research in Data Storytelling

00:01:16
Speaker
So my particular role is on the economics side. I'd been with the Wall Street Journal kind of covering economics there in a more traditional journalism approach and kind of trying now to figure out how we go about doing that in a more data-driven way, in a way that can appeal to the stats heads who have sort of long been a core part of
00:01:34
Speaker
of nate silvers audience but then also kind of reach it a broader audience of people who are uh... kind of interested in in understanding evidence in understanding how we know what we know but you are necessarily uh... you know running running regressions on the side right very right and so i mean you you know that you've i've you and i've talked a lot and i've written some things on sort of this
00:01:54
Speaker
I guess merging of sort of like the journalists and the researchers.

Integrating Statistics and Narrative at FiveThirtyEight

00:01:59
Speaker
I mean, and I sort of have complained a few times about, you know, you guys. And not all those things have been friendly, John. Not all, I know, I know, you know, I'm trying to be constructive.
00:02:11
Speaker
What's your thought on that? You guys are really good at the graphics and the reporting and the telling of the stories and then the research-y side, which is the statistics and the regressions. Traditionally, it's been left to academics and full-time researchers. Do you have a thought on those sorts of things? Where's journalism going with that more generally?

Balancing Traditional and Data-Driven Journalism

00:02:36
Speaker
yeah i so i i think this is something that we think a lot about and and i don't think that any of us at five thirty eight certainly i wouldn't uh... claim to have gotten this balance right all of the time uh... you know i think that we think of ourselves as kind of operating a little bit in a space between some of the traditional journalism uh... you know can i come out of a very crucial to those background i i'm not running away from that i have a huge admiration for the work that they do every the times the journal every morning uh... but we sort of think of ourselves as operating somewhere between that space and then
00:03:07
Speaker
uh... the sort of academic space right we uh... are operating on a probably a more often than not a slower timeline than um... then you see picking up but you know it's flipping over to

Communicating Complex Data to a Lay Audience

00:03:18
Speaker
uh... to you know whatever website uh... but we are you know more even a much much faster timeline then is realistic in the academic order or even sort of the think tank world and i think some of what we do is operating in a space is not that dissimilar to what what urban might be doing what you might be doing uh...
00:03:36
Speaker
but maybe a little bit faster than that and and trying to reach a more general audience and i think that you know this is something that um... that you know a lot of a lot of crucial from i think through yeah but you know we were not trying to limit ourselves to to just people who who you know spend their lives paying attention to this stuff and what's that what do you view is sort of the big challenge of communicating that sort of work too

Challenges in Explaining Statistical Uncertainty

00:04:01
Speaker
the broader audience they're trying to communicate with. I mean it's one thing if you write an academic paper and you talk about some regression model and you publish it for your academic peers. It's another thing to say I'm going to run a regression and publish it on the 538 site for sort of a broader audience. How do you balance the sort of technical data work with the sort of lay audience you're trying to communicate with? Yeah, I mean so I think we struggle with that. I think that the
00:04:29
Speaker
The sort of driving force is that we need to be communicating clearly to our broad audience. We will certainly try to include the details of the regressions that we run, of the formulas that we use. Wherever possible, in footnotes, we post data and code on GitHub. We want to be very transparent in that kind of way. But that's not realistically a way of communicating to a

Using Visuals to Convey Data Uncertainty

00:04:54
Speaker
broad audience. I think the thing that I certainly spend a lot of time thinking about
00:04:58
Speaker
is how do we communicate to a non-technical audience the degree of uncertainty that we have. I don't think we have a responsibility to have everybody who reads a story come away with a precise understanding of where the confidence intervals are, what the sort of rigorous standard errors might be, but there is a responsibility to explain to people what do we know and how do we know it, what do we think and why do we think it. What are the
00:05:28
Speaker
What are the orders of magnitude anyway on those error

Communicating Point Estimates and Model Uncertainty

00:05:31
Speaker
bars? How do we think about the differences between things where there's very, very strong evidence in places where there's less strong evidence? And we try to communicate that in a way that may not look like what you would see in an academic paper or in a technical appendix of a white paper, but it still communicates those ideas.
00:05:53
Speaker
Do you think this sort of lay audience, do you think they understand this uncertainty, the standard errors, the confidence intervals? Not when presented that way, no. I mean, I should qualify that a little bit. When we put together our election models, we'll include a confidence interval on there. Some people are going to understand that, some people aren't. But I think that people have an understanding in that case where it's an explicitly
00:06:22
Speaker
statistical idea or we present it as a probabilistic model, I think that people can understand that. In a lot of stuff though, I think trying to communicate it in just the way of error bars is not the best way of communicating to a general audience, so then you need to come up with some other way of communicating the idea that there's uncertainty here. Yeah, I think you guys do a really nice job of the visualizations that have a lot of uncertainty.
00:06:50
Speaker
you know, correlations that you're showing maybe over time and you sort of have these sort of shaded error bars or shaded regions that sort of stand

Addressing Null Findings in Data Stories

00:06:57
Speaker
in the back. But my sense is that this idea of uncertainty either sort of like a point estimate or results is one set of uncertainty that's hard to communicate to a broad audience. And the other sense is a sort of research or model uncertainty.
00:07:14
Speaker
And I don't know how well, I don't know if that's easy to communicate and I don't know how well it's done sort of throughout. I don't think it's well done in the academic literature for certain. Certainly there's a sort of publication bias where you only get things published if they have some big, successfully significant result. So, how do you guys balance all that? I mean, if you sort of
00:07:39
Speaker
working on a story and you say, I'm going to run this regression and I sort of have this zero, not statistically significant result. Do you feel like you have the same pull towards the publication bias that the sort of, I think is sort of documented in the academic literature? Do you have that same sort of feeling that you're being pulled like, well, this is zero or statistically zero, so we're not going to publish it? Yeah, look, we obviously face some of those same pressures, right? I mean, running a
00:08:08
Speaker
website that's trying to reach a broad audience and having too many of those articles sort of say, yeah, we kind of thought this might be interesting and we didn't find anything. You're going to run out of interested readers pretty quickly. I do think that we sometimes think that there are interesting stories in null findings. There are times where you set out thinking that there's going to be some kind of interesting relationship and it turns out it's not there and it's pretty clear it's not there and we can then explain why it
00:08:35
Speaker
It might not be there. I mean, if you look at a lot of the pieces that Nate has done on elections, a lot of that is actually reporting null findings, essentially. You think that there's going to be a relationship here and there's not. We have certainly had occasions where we have set out to find something and felt a bit of an obligation then to report when we didn't. But look, we all the time are looking for possible relationships.
00:09:03
Speaker
discovering that one doesn't exist and tossing that story aside because it turns out not to be that interesting. And I think that that's a real concern, but it's also, there's a limit to what we can sort of do about that in a context where we're trying to reach an audience.

Text vs. Visuals in Communicating Statistics

00:09:21
Speaker
Right. And what's the balance when you're publishing the results? What's the balance between the talking about it in the text and sort of showing it in the data visualization?
00:09:31
Speaker
Yeah, I think in a lot of cases, text ends up lending itself more clearly to that. So when we're talking about a probabilistic model, we can show that pretty clearly in a visualization. Where we're trying to show a conventional standard error, we can show that in a visualization. But when we're talking about some of the issues that we're talking about here, I think that visualizing that can be difficult. We have a lot of flexibility in our writing to say,
00:09:58
Speaker
that the evidence suggests something to remind people that data is preliminary. We write about the jobs report every month like everybody seems to these days. We spend a lot of time in that, dedicate a significant amount of that to saying this is a preliminary estimate to highlighting places where there's conflicting evidence to
00:10:22
Speaker
pointing to trends but then making the point that it's a volatile series. There are a lot of opportunities in the way that we write those stories to highlight the uncertainty.

Exploring Ideas with the Data Lab

00:10:34
Speaker
It's a lot harder in a visualization to do that, right? We're not going to every time we show the number of jobs added last month, show the error bars on that, let alone try to represent the different concepts of employment and unemployment that could yield very different.
00:10:51
Speaker
Right, right. Interpretations. Yeah. I will give one example, though. I mean, actually, to that point, you know, we, in the econ space, right, thinking a lot about, you know, how tight the job market is right now. This is something that everybody's spending a lot of time thinking about. And we have had cases where we'll run kind of various different, well, charts of various different ways of thinking about these questions and show that they all kind of conform to the same basic trend line, right? And it's a way of showing
00:11:19
Speaker
Yes, these all look a little bit different from one another.
00:11:22
Speaker
But the trends all, you know, are broadly similar. And I think that that can give people a sense, both of the uncertainty, but also why we have a degree of confidence in that. Do you feel in the sort of space that you're in, which is, you know, I think different than sort of the traditional Washington Post, New York Times, or traditional journalism, do you feel like you have a broader opportunity to sort of play with these sorts of things and publish more in terms of like, we're going to play around with this and we've
00:11:51
Speaker
Maybe this is a little story that's interesting, but we can sort of publish this quickly, whereas in sort of traditional journalism, they may have to develop a more, you know, longer story or a fuller story.
00:12:02
Speaker
Yeah,

Maintaining Standards while Experimenting

00:12:03
Speaker
I think we do, although I think that we've actually probably found that harder in some cases than we thought. When we launched, we had this notion, and we still have this notion, about a distinction between features, longer, bigger features, and blog posts. We have a blog that we call Data Lab. And we named it that in part because it was an idea that it could be a place that was a little bit of a scratch pad. We'd show stuff that we were working on. We'd show ideas where we had an interesting idea, but we weren't sure that it would pan out yet.
00:12:32
Speaker
That's proved very difficult in practice to communicate the idea that we're intrigued by this, but we're not yet sold on it. I think we've had some cases where we've done that successfully, but we've also gotten backlash on cases where we either didn't do a good enough job explaining the notion that this was preliminary, or where readers just didn't understand that that's what we were trying to do, and that can become, that can get pretty tricky.
00:13:03
Speaker
Yeah, it seems a little bit difficult when sort of all of these different pieces of a website that maybe and the producer's mind are separate, but from the consumer side, they sort of look and feel the same. And it can feel like a cop-out, right? You don't want to be sitting here saying, oh, well, that was just on our data lab. You can't take that seriously. You know, we still want everything we do to rise to our standards. I guess the distinction that we would make is that our standards are not that we're always right, our standards that we're clear about.
00:13:32
Speaker
you know our levels of uncertainty and that to say you know hey we we looked at this thing and and then you know later we come back and you know it turns out that i didn't pan out that's fine as long as we were transparent about our degree of confidence to start with and as long as we're transparent when the evidence comes in and doesn't hold up what we originally thought yeah yeah

Informal Peer Review at FiveThirtyEight

00:13:50
Speaker
So can you talk a little bit about the process that you guys have when it comes to doing these more modeling type approaches? I think the political model is what Nate and you guys are most well known for. But when it comes to these shorter one-off things, I'm going to run a regression of y on x. Is there a peer review process that you guys go through? Is it a formal thing? Is it just an informal where you show it to a few people?
00:14:19
Speaker
Yeah, I would say it's probably more informal than formal, although it depends some on the topic, of course, and it depends on how we're presenting it. And certainly when we're talking about the sort of full models, right, those are things where we have a lot of people involved and are thinking very seriously about them. But yeah, for these sort of one-off things, we have a person on staff here. He used to be with the Atlanta Fed, Andrew Flowers, whose title is Quantitative Editor. We think we invented that. Everybody should feel free to copy it.
00:14:47
Speaker
But part of his job is to both be there as a resource to answer questions that some of us might have, but then also to serve as an editor, who will go back through and look at the work that we're doing. That's especially true when we're talking about freelancers, who we may not be as familiar with. But it's true for me as well. It's true for any of the people on staff. He'll have a look and see what we're doing.
00:15:15
Speaker
You know, Nate will get involved on things. I mean, I've had discussions with Nate where he'll ask to see our regression specifications and, you know, ask to run, you know, run some robustness checks, you know, slightly terrifying occasionally, but in a constructive way, I think. But, you know, look, we also will run pieces where essentially what happens is, you know, I do do the work that I do and we put it up there and it's been edited from a text standpoint.
00:15:45
Speaker
but where it hasn't been checked by a lot of people. And I think the goal there, again, is to make sure that we're transparent about what we're doing. And we post a lot of that stuff on GitHub. We put a lot of that stuff in footnotes. And I field a lot of questions from readers who will then ask about things. And we try to be transparent in answering those questions as much as possible as well.

The Future of Data-Driven Journalism

00:16:08
Speaker
Right. Yeah, interesting. Where? So let me just let's just finish up where where do you see both 538 and sort of this the new, we can call it new media, the data driven journalism side, where do you see things progressing over the next year or two, I guess, you know, I think that there's
00:16:28
Speaker
There's kind of where do things go for us and for the other sites that are kind of similar and then there's kind of what happens with journalism more broadly. For us, I think our goal continues to be to cover more areas.
00:16:46
Speaker
to be more ambitious in our stories, to bring this kind of journalism to a wider audience. We really do believe that there's an audience for this that goes beyond people who are probably listening to this podcast. We think that there are people who are skeptical of some of what they read in the media and that this kind of approach can resonate with them. But I also think that we do see part of the mission as to help steer broader journalism
00:17:16
Speaker
in a more statistically-minded direction. That doesn't mean that everyone at the Times or everyone at the Post or everyone at CNN, God help us, is going to be running statistical models. But it does mean that they can be thinking about evidence. They can be thinking about statistical evidence. They can understand ideas of uncertainty and probability. They can do that on a day-to-day basis.
00:17:44
Speaker
You don't have to be running an elections model to have recognized that by the time the election came around in 2012, there was not a lot of uncertainty. We pretty well knew who was going to win that. There's no reason for the talking heads on TV to pretend otherwise. Part of our mission there is to change the way
00:18:05
Speaker
the media writ large, sort of think about these issues. Yeah, it's sort of interesting because it seems like from my perspective as sort of a researcher that the journalism field is getting a little bit closer to the sort of statistics and regressions and data-driven side. And on the other hand, the academics and researchers need to get a little bit closer to the storytelling side. Yeah. And that's how you sort of engage with people.
00:18:27
Speaker
And you're certainly seeing that in more academics and who are thinking about this or think tanks that are speaking directly to the public right where before it would just be trying to reach the journalists who would then write it up. You know, hey, why can't Urban, why can't Pew be speaking directly to the public in a lot of these things? Right, exactly. Great. All right, man. Well, thanks for coming on the show. I appreciate it. This has been really interesting. And good luck over the next weeks and months and years.
00:18:53
Speaker
Thanks so much. I really appreciate it being on. All right. Thanks, everyone. I will see you next week.