Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

PolicyViz Podcast Episode #8: Nick Diakopoulos

The PolicyViz Podcast

144 Plays10 years ago

In this week’s episode, I chat with Nick Diakopoulos from the University of Maryland at College Park. Nick recently co-hosted an event on algorithmic transparency at the Tow Center for Digital Journalism at Columbia University. Nick and I chat about algorithms...

The post PolicyViz Podcast Episode #8: Nick Diakopoulos appeared first on PolicyViz.

Recommended

Fiscal Policy, Data, and Democracy: Insights from Former CBO Director Doug Elmendorf image

Fiscal Policy, Data, and Democracy: Insights from Former CBO Director Doug Elmendorf

S12 E291 · The PolicyViz Podcast

00:29:28·9 days ago

From Data Literacy to Storytelling: Insights from The Little Book of Data image

From Data Literacy to Storytelling: Insights from The Little Book of Data

S12 E290 · The PolicyViz Podcast

00:34:27·16 days ago

Economic Data Under Fire: Accuracy, Trust, and Transparency with David Wessel image

Economic Data Under Fire: Accuracy, Trust, and Transparency with David Wessel

S12 E289 · The PolicyViz Podcast

00:28:48·21 days ago

Inside the BLS: William Beach on Trust, Data, and the Future of Federal Statistics image

Inside the BLS: William Beach on Trust, Data, and the Future of Federal Statistics

S12 E288 · The PolicyViz Podcast

00:46:28·23 days ago

Why the BLS Matters with Former Commissioner Erica Groshen image

Why the BLS Matters with Former Commissioner Erica Groshen

S12 E287 · The PolicyViz Podcast

00:46:39·1 month ago

Season 12 Premiere! image

Season 12 Premiere!

S12 E286 · The PolicyViz Podcast

00:07:28·1 month ago

Shirley Wu on Data, Art, and Innovation in Visualization image

Shirley Wu on Data, Art, and Innovation in Visualization

S11 E285 · The PolicyViz Podcast

00:45:49·3 months ago

Edward Tufte: Designing with Data, Art, and Purpose image

Edward Tufte: Designing with Data, Art, and Purpose

S11 E274 · The PolicyViz Podcast

00:53:16·3 months ago

From Tableau to AI: Where Data Visualization Is Headed with Andy Kirk image

From Tableau to AI: Where Data Visualization Is Headed with Andy Kirk

S11 E283 · The PolicyViz Podcast

00:37:10·4 months ago

Drawing Data with Dragons: Cole Nussbaumer Knaflic on Teaching Kids and Adults Alike image

Drawing Data with Dragons: Cole Nussbaumer Knaflic on Teaching Kids and Adults Alike

S11 E282 · The PolicyViz Podcast

00:37:38·4 months ago

Amanda Cox on Data Journalism, AI, and Democratizing Design image

Amanda Cox on Data Journalism, AI, and Democratizing Design

S11 E281 · The PolicyViz Podcast

00:36:16·4 months ago

Data for a Continent: Inside the European Correspondent’s Visual Journalism image

Data for a Continent: Inside the European Correspondent’s Visual Journalism

S11 E280 · The PolicyViz Podcast

00:38:41·5 months ago

Data Are Local: Context, Power, and Storytelling with Yanni Loukissas image

Data Are Local: Context, Power, and Storytelling with Yanni Loukissas

S11 E279 · The PolicyViz Podcast

00:40:25·6 months ago

Exploring the Evolution of Data Visualization with Moritz Stefaner image

Exploring the Evolution of Data Visualization with Moritz Stefaner

S11 E278 · The PolicyViz Podcast

00:41:45·6 months ago

Engage, Elevate, Communicate: Davina Stanley on Crafting Clear Business Messages image

Engage, Elevate, Communicate: Davina Stanley on Crafting Clear Business Messages

The PolicyViz Podcast

00:39:54·7 months ago

Mapping Inequality: Braden Crooks on Redlining and Urban Transformation image

Mapping Inequality: Braden Crooks on Redlining and Urban Transformation

S11 E276 · The PolicyViz Podcast

00:35:51·7 months ago

Revolutionizing Web Development: Rich Harris on Svelte's Creation and Impact image

Revolutionizing Web Development: Rich Harris on Svelte's Creation and Impact

S10 E275 · The PolicyViz Podcast

00:34:35·8 months ago

Kevin Wee’s Tableau Journey in Visualization and Innovation image

Kevin Wee’s Tableau Journey in Visualization and Innovation

S11 E274 · The PolicyViz Podcast

00:35:25·8 months ago

Unlocking Data Communication: Unleashing the Power of R with David Keyes image

Unlocking Data Communication: Unleashing the Power of R with David Keyes

S11 E273 · The PolicyViz Podcast

00:35:35·9 months ago

Captivating Minds: The Science of Storytelling and Attention in a Distracted World with John Medina image

Captivating Minds: The Science of Storytelling and Attention in a Distracted World with John Medina

S11 E272 · The PolicyViz Podcast

00:43:31·10 months ago

Transcript

Introduction of Nick Diakopoulos

00:00:11

Speaker

Welcome back to the Policy Viz Podcast. I am your host, John Schwabisch. I am very excited for my new guest, Nick Diakopoulos, professor at the University of Maryland College of Journalism. Nick, welcome to the show.

00:00:25

Speaker

Hi John, thanks for having me on today. Great to

Workshop on Algorithmic Transparency at Columbia

00:00:28

Speaker

have you. I wanted to have you on so we could talk about this recent workshop you put on at Columbia University that you were nice enough to invite me to because it was really an interesting day. So we met up in New York. This was in what, April now I think we're talking?

00:00:46

Speaker

And we were talking about working on algorithmic transparency in the media. So maybe you could give us a little summary of the workshop. I think you'd probably do it better justice than I would.

Media's Use of Algorithms and the Need for Standards

00:01:00

Speaker

Yeah, so basically this was a plan that I hatched with the Towson Center for Digital Journalism at Columbia to kind of jumpstart the conversation around how the media needs to start thinking about

00:01:16

Speaker

their use of algorithms and in particular about how they could be more transparent about their use of algorithms. And sort of digging into specific case studies. So, you know, people are using what they call robot journalists now, so automatic content generation algorithms. News organizations are using simulation and modeling and storytelling.

00:01:40

Speaker

They're also using algorithmically enhanced curation. We wanted to drill into these different case studies, get the participants talking specifically about what could be made transparent about algorithms in each of those cases, why you would make it transparent, what the benefits would be, what the

00:02:04

Speaker

cost would be and try to sort of come away with a little bit of a prototype standard or some framework for thinking about what are the kind of dimensions that you would want to be transparent about as you were publishing information using algorithms.

Broader Issues of Algorithmic Transparency

00:02:25

Speaker

Right, it was a fascinating day not only because from my perspective

00:02:30

Speaker

You know, when I went up there, I sort of thought, okay, so the real issue here is

00:02:35

Speaker

Some news organization is going to do some data analysis, some regression model, and they should be posting those data. They should be posting the methods by which they're analyzing the data. But we actually sort of worked our way into lots of other topics and ideas, right? Should the algorithms that are used to move comments around at the bottom of a page, should those algorithms be made more transparent?

00:03:01

Speaker

The ads pop up, should those be made transparent? I mean, there's a pretty wide range of things to think about here.

00:03:10

Speaker

Yeah, there's a lot of stuff to think about here. I mean, news organizations are using personalization now, so how are they using your personal data to adapt content or serve ads? How are they inferencing certain things so they're classifying you or they're classifying things?

00:03:32

Speaker

investigations and of course classifiers have error rates and you want to kind of, you know, let the reader essentially know

00:03:41

Speaker

how sure you are of your analysis. So I think there's lots of different kind of ways in which outruns are being used now. One example that I like a lot actually is the New York Times fourth down

Transparency through Visualization: NYT's Fourth Down Bot

00:03:58

Speaker

bot. I don't know if you saw this. So it's a sports bot running a model of American football where

00:04:09

Speaker

They're basically trying to predict what should a given team do on any given fourth down play. Should they pump, should they go for it, and so on. It's all very data driven and they have a well-reasoned model behind it. It's surfaced on Twitter as this bot. What I like about this is actually

00:04:34

Speaker

they have a whole page online, the New York Times runs this bot, and they have a whole page that actually visualize kind of the biases in the model. And they actually show you, you know, they sort of visualize the field and they show you, you know, what would the bot predict you should do at any given location on the field on a fourth down? And they kind of compare that to the actual data of what

00:05:01

Speaker

Coaches actually do at that position on the field. I mean it's just sort of I think a nice example which highlights the potential for visualization to help with transparency. So when you do have models that are running these things in the media, we can visualize those models in ways that help.

Balancing Transparency and Competitive Advantage

00:05:19

Speaker

The end user the consumer understand the biases in these models or the error rates in these models right now How do you view so so in that case the NFL data are essentially public, you know, you know some of it you can buy but essentially it's public so where how do you view new stories where the data is either

00:05:41

Speaker

the news organizations collecting it themselves, or they're purchasing the data, or they're creating a model that the model itself has monetary value. So where's the, do you see that, do you see it as a line between where they should and should not release the algorithms and release the data? Or is it sort of like, whenever a media organization is doing some analysis with data, they should be publishing everything.

00:06:09

Speaker

Well, data is a tricky thing, right? I mean, because people buy data all the time, or there's potentially private information that might be in data sets that you wouldn't want to make public. So it's not all straightforward. And I don't think that it's as easy as just saying, well, they should always publish the data behind their models. I mean, I think there's some caveat.

00:06:34

Speaker

In general, we want to know what is the quality of that data? Is it accurate? Is it complete? How is it collected? What's the methodology behind that collection? If you're training a model based on data, well, how much data did you train it on? Is it enough data that we can be confident in your model? Are there any other assumptions in the way that your data was collected?

00:07:00

Speaker

How was your data processed? Did you have to clean it? Did you edit it in some way? There's all kinds of decisions that get made along the way of how data is transformed that you might want to be transparent about. I think oftentimes the sort of counter argument for transparency is

00:07:23

Speaker

well you know it it it ruins our competitive advantage and you know for transparent about this then were we leave ourselves open to manipulation uh... and and and you know at times that's legitimate right i mean you know if you have a very valuable data set that you spent uh... a lot of time collecting uh... you know you might choose to protect that uh... and not publish it uh... you know

00:07:52

Speaker

as a way that other people could pick it up and use it, maintain that as a competitive advantage. But you might still disclose, like I was saying, certain dimensions of the data, the quality, its accuracy, how it's processed, whether or not it includes private information, stuff like that.

00:08:11

Speaker

And do you think, for the most part, that news organizations have the ability?

Where Should Transparency Information Be Stored?

00:08:15

Speaker

I mean, they certainly have the ability, but should these notes and caveats and descriptions, should those be, for the most part, housed in a

00:08:26

Speaker

Not with the article itself, I would assume, but on like a separate platform or in a blog. I know like the New York Times has some development blogs, so does the Washington Post. So is that the sort of place where things should be, should things all be housed together so that, like FiveThirtyEight, for example, has a GitHub site where you can go use some of the data that they've used? Yeah, I'm a big fan of

00:08:51

Speaker

kind of providing some interface affordance or some hook in the interface that people will see that's kind of salient. So that if they are interested in seeing sort of the behind the scenes work of what fed into this model,

00:09:08

Speaker

the data and the code and so on, they can sort of click into it and dive into it. And I think there are some really fundamental human-computer interaction challenges here, right? I mean, you don't want to overload the reader, the end user, right? I mean, you don't want cognitive overload. You don't want people feeling like, oh my god, there's all this stuff going on, and I have to understand everything, and why are they showing me all this stuff? I just want to read this article.

00:09:36

Speaker

But I think having some kind of way of sort of presenting information in a multi-scale fashion where at the surface level you can just read the article or the content, but with some kind of salient hooks to be able to drill in and say, okay, now I can sort of see the layer behind this.

00:10:01

Speaker

I don't necessarily have a strong feeling of whether or not that needs to be on another platform like GitHub or if it should be hosted by the media organization itself. I mean, the way I see most organizations doing this

Essential Skills for Data Journalists

00:10:14

Speaker

now is they'll throw

00:10:19

Speaker

workflow. And that seems to make sense. And there's other advantages to that as well, like just version control of your code and your data and so on. And of course, having the historical chain there for how a project evolved could also be an interesting thing to have access to.

00:10:42

Speaker

So I want to switch just a little bit because I'm curious about your thoughts since you teach journalists. I'm curious about your thoughts on the staffing and the training that sort of modern journalists need because the one thing that came up at the Columbia workshop was this sort of concept as journalists as researchers because I've written on this a little bit. You know sometimes it makes me nervous when I hear

00:11:09

Speaker

journalists say, well, you know, or journalism programs say, for example, oh, you know, our students now take two statistics classes so they know how to run a regression, and that always makes me cringe, because, you know, it's always like, do they know how to run one regression, but do they know how to identify a good regression versus a bad sort of approach? So I'm curious when you have

00:11:32

Speaker

the sort of new media where there's more data-driven analytics and they're running regressions. What is your view of like the skill set that new journalists need and to be responsible when they're working with data and then to sort of this next step of making it more transparent so people can sort of, so people can evaluate it and analyze the quality of the work.

00:11:54

Speaker

Yeah, I mean, that's a great question. And it's not an easy one. I mean, I think we're basically looking at data journalism as a new form of public social science. And you're having data journalists do original analyses and publish those original analyses, oftentimes without the benefit of peer review.

00:12:20

Speaker

Although, you know, data journalists do lean on each other internally to a news organization, or they lean on an editor to evaluate analysis, or in some cases they'll call, you know, a statistician friend, or they'll call a, you know, a statistics professor and say, hey, you know, is this crazy? You know, these are my results.

00:12:44

Speaker

and just kind of do a sanity check on it. Probably not as rigorous as a peer review kind of thing. So there are kind of models, I think, that exist for making this kind of work.

00:12:59

Speaker

viable in public, you know, outside of the traditional peer review model. Now, in terms of what that means for skills, I mean, I think, you know, we are looking at, you know, more statistics, more programming, so being able to move data around

00:13:16

Speaker

being able to set up a machine learning classifier in Python, for instance, being able to churn through large sets of documents or use APIs to apply other kinds of analyses. I think these are all important skills. Combine that now with what journalists have traditionally done well, which is communicate information,

00:13:43

Speaker

And there's a whole new set of skills I think that need to be developed in terms of the effective communication of data-driven investigations. So what's a good way to present a news app? It doesn't always need to be a written story. Sometimes it makes a lot more sense as a written story plus some charts, or maybe just as a data utility, where it's really not much text

Innovations in Data Transparency

00:14:10

Speaker

at all. It's more about

00:14:11

Speaker

letting the end user explore a dataset interactively or explore a dataset through visualizations and so on. So, I mean, I think there's sort of, in addition to the core journalism skills of reporting, writing, and editing, that writing part is, I think, being generalized out to communicating with data, which I think, more often than not, kind of comes down to data visualization, right?

00:14:41

Speaker

What's the effective way to visualize this dataset for an end user? Really interesting, really interesting. So you hosted the workshop or co-hosted the workshop, you have your students, so where are you headed next with this line of discussion and I assume research?

00:14:59

Speaker

Well, yeah, so for me, you know, I have a background going back into graduate school of human-computer interaction. And so in terms of the research agenda, for me, I'm very interested in developing new interaction techniques, new visualization techniques that help you be more transparent with algorithms that you're using. So one example of this is a project I'm continuing to work on with the

00:15:25

Speaker

the IEEE spectrum magazine. We published last year an interactive top ranking of programming languages. It's all very data driven and kind of an interesting new methodology for looking at top programming languages.

00:15:43

Speaker

And I'm sort of thinking about, well, how do you let people step into that ranking of top languages? How do you let them tweak and tune and re-weight and create their own ranking from those languages? So I'm sort of continuing to work with them on developing visualization techniques and interface techniques that allow people more transparency and flexibility in rankings that they're interacting with online.

00:16:09

Speaker

I have some other theoretical work that I think will build very nicely on top of this workshop, the Columbia Workshop, where we're really trying to think underneath it all, what would a standard look like for algorithmic transparency?

Creating Standards for Algorithmic Transparency

00:16:30

Speaker

What are all the different elements that will factor into that? And if we can articulate that,

00:16:41

Speaker

and get the industry to sort of look at that proto-standard and comment on it and iterate on it. And over time, maybe we can kind of agree and develop almost an industry understanding or hesitate to set a consortium, but some kind of straight understanding that these are the standards, these are the expectations ethically or professionally of using algorithms in the media and that it be

00:17:11

Speaker

integrated into other kinds of editorial policies. That is really interesting and you had mentioned earlier the GitHub approach and there's a workflow question there.

Impact of Data-driven Methods on Media Workflows

00:17:22

Speaker

I think that workflow question, I know lots of organizations are

00:17:26

Speaker

trying to figure out how this sort of new data-driven, I guess, society in which we live, how that affects their workflow. And I think that conversation, as part of this discussion on transparency, will be really interesting to see, especially as it comes to different media places. Yeah, absolutely. I mean, workflow is essentially what everyone's

Podcast Wrap-up and Farewell

00:17:57

Speaker

and do it efficiently and do it in teams, right? So collaboration workflows and so on. I'm not sure anyone's really solved that yet. Digital organizations have their workflow and they've kind of duct taped it together and it's sort of working to some extent, right? But I think there's lots of room for innovation in that.

00:18:24

Speaker

Well, Nick, this has been terrific. Thanks a lot for coming on. This has been really interesting. All right. Well, thanks for having me, John. And I hope to see you around town very soon. Absolutely. I look forward to the next workshop. Well, thanks, everyone, for listening. If you have questions or comments, please let me know. Hit me up on Twitter or visit the site, policyvis.com. I'm John Schwabisch, and this has been the Policy Vis podcast. Thanks so much for listening.