Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Episode #115: Data Ethics with Laura Noren & Hetan Shah

The PolicyViz Podcast

177 Plays7 years ago

On last week’s episode, I sat outside Facebook and chatted with Andy Kirk about our experience at the Social Science Foo Camp, a two-and-a-half day conference at Facebook that brought together all sorts of social scientists. One of the first...

The post Episode #115: Data Ethics with Laura Noren & Hetan Shah appeared first on PolicyViz.

Recommended

Fiscal Policy, Data, and Democracy: Insights from Former CBO Director Doug Elmendorf image

Fiscal Policy, Data, and Democracy: Insights from Former CBO Director Doug Elmendorf

S12 E291 · The PolicyViz Podcast

00:29:28·2 days ago

From Data Literacy to Storytelling: Insights from The Little Book of Data image

From Data Literacy to Storytelling: Insights from The Little Book of Data

S12 E290 · The PolicyViz Podcast

00:34:27·9 days ago

Economic Data Under Fire: Accuracy, Trust, and Transparency with David Wessel image

Economic Data Under Fire: Accuracy, Trust, and Transparency with David Wessel

S12 E289 · The PolicyViz Podcast

00:28:48·14 days ago

Inside the BLS: William Beach on Trust, Data, and the Future of Federal Statistics image

Inside the BLS: William Beach on Trust, Data, and the Future of Federal Statistics

S12 E288 · The PolicyViz Podcast

00:46:28·16 days ago

Why the BLS Matters with Former Commissioner Erica Groshen image

Why the BLS Matters with Former Commissioner Erica Groshen

S12 E287 · The PolicyViz Podcast

00:46:39·25 days ago

Season 12 Premiere! image

Season 12 Premiere!

S12 E286 · The PolicyViz Podcast

00:07:28·1 month ago

Shirley Wu on Data, Art, and Innovation in Visualization image

Shirley Wu on Data, Art, and Innovation in Visualization

S11 E285 · The PolicyViz Podcast

00:45:49·3 months ago

Edward Tufte: Designing with Data, Art, and Purpose image

Edward Tufte: Designing with Data, Art, and Purpose

S11 E274 · The PolicyViz Podcast

00:53:16·3 months ago

From Tableau to AI: Where Data Visualization Is Headed with Andy Kirk image

From Tableau to AI: Where Data Visualization Is Headed with Andy Kirk

S11 E283 · The PolicyViz Podcast

00:37:10·3 months ago

Drawing Data with Dragons: Cole Nussbaumer Knaflic on Teaching Kids and Adults Alike image

Drawing Data with Dragons: Cole Nussbaumer Knaflic on Teaching Kids and Adults Alike

S11 E282 · The PolicyViz Podcast

00:37:38·4 months ago

Amanda Cox on Data Journalism, AI, and Democratizing Design image

Amanda Cox on Data Journalism, AI, and Democratizing Design

S11 E281 · The PolicyViz Podcast

00:36:16·4 months ago

Data for a Continent: Inside the European Correspondent’s Visual Journalism image

Data for a Continent: Inside the European Correspondent’s Visual Journalism

S11 E280 · The PolicyViz Podcast

00:38:41·4 months ago

Data Are Local: Context, Power, and Storytelling with Yanni Loukissas image

Data Are Local: Context, Power, and Storytelling with Yanni Loukissas

S11 E279 · The PolicyViz Podcast

00:40:25·5 months ago

Exploring the Evolution of Data Visualization with Moritz Stefaner image

Exploring the Evolution of Data Visualization with Moritz Stefaner

S11 E278 · The PolicyViz Podcast

00:41:45·6 months ago

Engage, Elevate, Communicate: Davina Stanley on Crafting Clear Business Messages image

Engage, Elevate, Communicate: Davina Stanley on Crafting Clear Business Messages

The PolicyViz Podcast

00:39:54·6 months ago

Mapping Inequality: Braden Crooks on Redlining and Urban Transformation image

Mapping Inequality: Braden Crooks on Redlining and Urban Transformation

S11 E276 · The PolicyViz Podcast

00:35:51·7 months ago

Revolutionizing Web Development: Rich Harris on Svelte's Creation and Impact image

Revolutionizing Web Development: Rich Harris on Svelte's Creation and Impact

S10 E275 · The PolicyViz Podcast

00:34:35·7 months ago

Kevin Wee’s Tableau Journey in Visualization and Innovation image

Kevin Wee’s Tableau Journey in Visualization and Innovation

S11 E274 · The PolicyViz Podcast

00:35:25·8 months ago

Unlocking Data Communication: Unleashing the Power of R with David Keyes image

Unlocking Data Communication: Unleashing the Power of R with David Keyes

S11 E273 · The PolicyViz Podcast

00:35:35·9 months ago

Captivating Minds: The Science of Storytelling and Attention in a Distracted World with John Medina image

Captivating Minds: The Science of Storytelling and Attention in a Distracted World with John Medina

S11 E272 · The PolicyViz Podcast

00:43:31·9 months ago

Transcript

Introduction and Guest Overview

00:00:11

Speaker

Welcome back to the Policyviz podcast. I'm your host, John Schwabish. Thanks for tuning in. I'm coming to you from Northern California at the SoSci Foo Camp at Facebook. If you listen to last week's episode, you heard me and Andy Kirk give the recap of what we saw. So this week, I'm very excited to have two guests with me to talk about data ethics and a session that they held early on the first day. And so I'm excited to have Heitan Shah, who is the executive director of the Royal Statistical Society.

00:00:39

Speaker

And Laura Noreen, postdoc at NYU. Thank you both for coming on the show. I'm excited to have you.

Backgrounds and Data Ethics

00:00:45

Speaker

So why don't we have, maybe you could talk just a little bit about your background and then we can dive into the topic. Laura, you want to start? Sure. So I got my PhD in sociology and I did almost all of my research on organizations and they were all tech organizations. And then I had the great opportunity to do a postdoc in the Center for Data Science where I continued to study

00:01:05

Speaker

how universities grapple with data science. And along that route, I realized that we weren't really addressing ethics as much as we could be. And you started a data ethics course there, right? And you're teaching that this semester. Teaching that right now. Yeah. That's good. Going well. Well, we've had two sessions, so. OK. OK. So far, so good. So far, everyone. Everyone's still enrolled. OK. All right.

00:01:30

Speaker

So I run the Royal Statistical Society. I'm a professional charity chief exec. So I've run a different charity before that. And before that I was a think tank kind of policy person. So for me, I'm sort of the interface for the RSS on policy matters and

00:01:45

Speaker

data ethics is obviously kind of becoming a bigger and bigger agenda. We convened a big summit a couple of years ago in Windsor Castle and because of the location whoever we asked to come came along. We discussed kind of you know where is the kind of ethics of big data going and one of the big recommendations from that was we need a new institution a kind of council for data ethics to be set up and that's one of the things that's now moving forward in the UK.

00:02:09

Speaker

So, you two didn't know each other though before Facebook. No, we were sitting next to each other. Oh, okay. And you were like, we need a session. Yeah. Okay. All right.

Complexity and Principles of Data Ethics

00:02:17

Speaker

So, why don't we start with what is data ethics? Or what are data ethics? No, I think it's what is data ethics. The fact that there's this long pause, I think, means there's a lot of work to do.

00:02:32

Speaker

Well, I think that there's a whole series of questions, ethical questions which come out of the kind of new data landscape that we're sitting in. Actually, my starting point is, what is the purpose of the use of the data? And that's often overlooked. People jump to kind of privacy or bias or these sorts of things, but actually,

00:02:52

Speaker

If you ask the public, what is the thing that really drives your response to whether you want to share your data for a particular purpose? It's, are they doing this in some way which is going to help me or help the public good? For example, if you hear of Las Vegas casinos who are estimating your spend threshold as you walk in and then trying to disrupt that so you'll spend more, that is clearly not in the public interest.

00:03:16

Speaker

If you're using data for health research, etc, they're much more likely to be happy for you to do that. So it seems to me public interest is a really important starting point. Yeah, I would just say one simple place to start is to think about the Hippocratic Oath, which is often summarized as first do no harm.

00:03:34

Speaker

but that's not actually the Hippocratic oath if you go back and read the real thing. But anyways, that's what it's summarized as. So if with data you could think of not only first do no harm, but how do you make sure that like, you know, if you're working to sort of 95% confidence intervals, like you're very aware of what's happening to the other 5%. And we, you know, you can see other kinds of issues of scale and issues of dealing with some of our previous

00:03:59

Speaker

kind of agreed upon practices for being ethical, not really working for data science. Like how do you obtain informed consent when you're talking about

00:04:08

Speaker

you know, a giant corpus of tweets that you can't scale it and, you know, what if you store it indefinitely? How do you consent someone to that? Does that even make sense? Can they cognitively understand what they're agreeing to? Right. I mean, it seems to me like health data, security data, personal identifying information, Facebook data, for example, or setting your Facebook, I think sort of defining what data ethics mean or being ethical with those data seems pretty obvious to me, clear cut.

00:04:35

Speaker

But if I were to ask, what would data ethics or being ethical with data mean if I were to apply it to, you know, the census, for example, or sort of a standard, maybe a more standard data set that doesn't have people's names and specifically identifying information. So there's still basic issues around kind of data security, for example, and

00:04:55

Speaker

consent still comes into it. But I think that those are less tricky because we've built up those practices over many years and in a sense with new data sources and trying to combine all the new data sources, that's where we're getting into trickier spaces. The other thing to remember is that

00:05:10

Speaker

then our notions of ethics have shifted as technology has shifted over time. It's not so easy as to say, what are the ethics and let's just apply them. Our notions of privacy changed over time and will continue to change.

Ethical Challenges in Machine Learning

00:05:23

Speaker

Our notions of, for example, attitudes to sex and morality around sex changed because of the pill. So technology and morality and ethics have always had a kind of to and fro relationship.

00:05:35

Speaker

But what would a person, your sort of standard data analyst, she downloads census data or whatever, when she thinks about ethics or being ethical with the data, it's not just privacy and security. It's thinking about how we treat and think about different groups in the data and do those tabulations and think about different groups and uncertainty around those estimates, right?

00:05:59

Speaker

So if you're using census data, it doesn't have names. Should there be an idea that maybe you shouldn't try to re-add names to that? Because you can do combinations of data sets.

00:06:12

Speaker

So that's a fairly rule level question. Should you just decide that you're not going to try to re-identify people? Maybe that should be kind of an ethical rule, not a principle, but just a rule. That's the thing that we're not going to do. And there's things like that. Then there's things about transparency that come up a lot with data science. How interpretable should this activity be and how much responsibility

00:06:39

Speaker

does the sort of data science community have to make what they're doing interpretal? That's a kind of, you know, as we move into a situation where more and more people are doing jobs in which they are the expert and almost nobody else can understand what they're doing, this isn't just about data science, it's about divisions of labor and, you know, kind of the modern workforce.

00:06:58

Speaker

How transparent do we have to make our work, whether we're data scientists or anyone else, to the typical user? And I think this is, again, where there's something

Policy Work and Data Ethics Councils

00:07:07

Speaker

new. So again, I think the census example is one where we kind of know what we're doing. But with the introduction of machine learning algorithms, that's where you're starting to get a black box. And it's much more difficult to know how you're going to treat that from an ethical perspective. So as you're aware, lots of debates about algorithmic accountability. Is it publishing the code?

00:07:27

Speaker

You're not going to get very far with that. One of the things I've been saying, particularly within the UK context, is that the public sector holds a lot of data and is being now approached by private sector companies who hold these algorithms and they're almost like magic. In the eyes of the public sector, this is magic.

00:07:48

Speaker

I mean there are instances of the public sector really giving away the data very quickly because to the magic and what I've been saying is actually you got it the wrong way around you have the monopoly you have the data and actually these companies are to a penny and there is a marketplace of them so.

00:08:04

Speaker

In this marketplace, you now must use your procurement power to enforce a whole series of transparencies and kind of governance oversight, etc., because you wield a lot more power than you think you do. And I think if we can get that message across, we will have a far fairest sort of set of outcomes than otherwise.

00:08:24

Speaker

Because a hammer without a nail is just a hammer, right? So what is the RSS doing in the UK right now to grapple with some of these issues? So we're working quite hard at the policy level. So we made this recommendation for a council for data ethics. And we persuaded there's an independent foundation called the Nuffield Foundation. 20 years ago, it set up something called the Council for Bioethics at the time that this was a really kind of

00:08:50

Speaker

big issue and that body did a great job of kind of doing research and reports in that area. So we persuaded them to set up a council for data ethics in a similar way. So that's just being set up right now. We've been giving evidence to our parliamentarians. There's an inquiry running on algorithms and decision making, so I gave evidence to that. There's also one in our House of Lords on artificial intelligence and similarly we've been telling them, you know, how we thought that

00:09:19

Speaker

From a kind of public interest perspective all of this should play out. So how do you build up the guidelines or the principles? Are you pulling together? Existing guidelines that other places have created are you starting sort of some scratch? So I think there's nothing Council for data ethics is going to do that work as they will be properly resourced to do that. I

00:09:39

Speaker

And I mean, one of the things that's come out of the sessions today, which I found quite useful, Chris Wiggins was talking about, you know, thinking about this at the level of principles, at the level of standards and then at the level of rules. And I would hope that anything that comes out is at that sort of level of, on the one hand, what are the kind of really big level issues and principles that we can agree upon.

00:10:02

Speaker

And then how do you then take that out to standards and then further down to rules? But I think all that's to kind of play for it. Right. He also talked about in your session earlier about the enforcement side. So does the RSS then once it's defined, is there an enforcement

00:10:17

Speaker

protocol that you'll be working on and is it enforceable? So no, I mean I think that's the whole point that I think is different to law. At the European level we're already about to see some new legal changes with the general data protection regulation coming into force and that is actually going to bring a lot of good practice in but at the same time

Professional Guidelines and Education

00:10:36

Speaker

and leaves open a number of ethical questions, including, I mean, under GDPR, once you've got consent, you can still do whatever you like with the data. Well, there's still some things that we would say, you know, ethically you might not want to do with the data. But, you know, what are the enforcement mechanisms? It's going to be good practice guidelines. And it's about, I mean, we're a professional body and we've known that our codes of conduct are things that statisticians can turn to their clients or to their superiors and say,

00:11:05

Speaker

you know, this is against my professional judgment and I've got it backed up by my professional body. And I would hope over time at the moment there is no professional body for data science, but you know, we're helping to kind of create that space. And as the codes of conduct, etc, come out, we hope that that will empower the profession. Right. Can you see the Royal Society as being the right professional organization to address data science?

00:11:29

Speaker

So we see ourselves as being able to do some of it. We have a data science section and a machine learning network, but we don't claim to own the space and others are kind of welcome to play in that space too. So, you know, who knows? It may well be that a new body is created in due course, but there's no sign right now in the UK of that happening. Lauren, you have started your data ethics course at NYU. So can you talk a little bit about what does that mean? How do you teach people to be ethical of data?

00:11:58

Speaker

Well, so one of the things we do is we admit that ethics is not new. So we start with ethical philosophies, and it's really, really frustrating to data scientists and to a fair number of people who are expecting me to say, okay, here are the ethical principles we're going to establish, and then we're going to move on with the rest of the course, assuming that we agree on these ethical principles, because in ethical philosophy, you've got very different

00:12:23

Speaker

principles available to you that are still being deployed. It wasn't like 1-1 some hundred years ago. We've continued on with that. So that's been really unsettling. And some of the class just dropped after I said, you know, I'm not going to just be able to teach you, here are the rules. We actually had a fair amount of drop off from the first week to the second week. Well, they really wanted that concrete. Yes, they wanted the concrete. I'm going to tell you how not to screw things up. So you can see there's a lot of anxiety

00:12:49

Speaker

But there's not necessarily the same appetite to really engage with these questions and be kind of on the front lines of what is happening. On the other hand, after the second class, and you know, I've been teaching for about seven years now, all sorts of different classes, not that this is the first time for this class, but the students were so excited about the class, they stayed after class in the classroom for an hour and 15 minutes extra, which is an entire extra class session.

00:13:16

Speaker

And then they still wouldn't, you know, on the sidewalk outside of the building, they still were talking. So the people who are into it, you know, this is the right time to be in these classes, to be on these councils, to be trying to put together communities of interested people to shape what these ethical principles are going to look like, what the rules are going to be.

00:13:35

Speaker

and how they're going to be deployed.

Global Initiatives and Sectoral Efforts

00:13:37

Speaker

I think it's great that they're coming out of a professional body. I think that's the right place for them to be because that's likely where they're going to be the most effective and the most flexible. If they were coming out of a legal body, they're not going to be that flexible. So I think that's really the right way to go.

00:13:51

Speaker

Having said that, I think what you're saying does really point to that need that people who are not necessarily going to immerse themselves in the topic still want to know, when I go into the office on Monday, what is it I need to do? And that's the job, I think, over the next couple of years for everyone working in this field to say, yes, it's complicated, but what is it that we can point to?

00:14:12

Speaker

Yeah. And I mean, we're not alone here. The ACM, the IEEE, there's a group, Gideon Mann and Oak Bloomberg and just group after group after group, you know, we're not geniuses. We're just your average data science person who also has some kind of moral compass that is looking around and saying, wait a second, I think

00:14:34

Speaker

I think we can do better. As well as developing these frameworks, I think that there's a role for some of these new bodies to actually take real problems, have them deferred to them, and to almost develop what you might call case law around them. These are new issues.

00:14:52

Speaker

There aren't always right answers, but if somebody thinks them through and can say this is why we came to this conclusion, then others could use that. I think if we could make this a kind of transparent space where people are saying this is how I'm thinking about it, we will get good debates and we'll be better off in the long run.

00:15:09

Speaker

Are there similar efforts in the US or other countries to the RSS's effort to build this rules or regular, or this amorphous thing? Are there other organizations? So I think Laura's talked about some of the other initiatives that are going on. But I don't know of anything quite like the thing that the Nuffel Foundation is setting up. It does seem to me in the UK that there is quite considerable policy interest

00:15:37

Speaker

which I think you know we've helped to kind of gather the momentum behind that but as of many others so yeah I hope that you know we'll take a leading edge and that others can kind of

00:15:47

Speaker

build on that and take what they want from it. New York City has become the first city to establish a task force to understand whether or not the city should make transparent all of the algorithms that it ever uses in any kind of decision making and do additional kinds of work around data sets themselves, which cities are

00:16:09

Speaker

one of the biggest sources of open data now. It's sort of about time that we had something like that. Do you ever worry that focusing on data science, computer science, the sort of technical fields where people are working with the sort of newer and bigger sorts of data that maybe other areas of study may not be aware of these issues and are not paying attention?

Expanding Awareness of Data Ethics

00:16:31

Speaker

economic, you know, coming economics background, you know, there, you know, economists are starting to use social media data starting to use these other types of data, but maybe this is not on their radar. And how do you ultimately get them to pay attention to this?

00:16:43

Speaker

That may well be true. I suppose it's going to be about cascading these things out. And it's noticeable that data science courses are now popping up all over the place and working with other disciplines, because data science in and of itself is empty. It's got some methodology, but beyond that, it's got to work with a disciplinary framework. So I think I would hope that it cascades out that way. That is one of the things that universities that are using their data science, not as a department, but as sort of a center

00:17:13

Speaker

the centering work is pushing this kind of thing out. Economics in particular seems to be a difficult discipline to engage. There was an article in the New York Times that made the claim that economists are the least likely to be doing interdisciplinary work. That sounds about right.

00:17:32

Speaker

I mean, in the UK, there's a whole interesting movement around rethinking economics, where students rebelled in Manchester and asked for a completely different kind of series of courses. There's also the core economics course, which is a kind of new version of the curriculum. So there are things referred, I think, of bringing more heterodoxy into it. Right. Rethinking economics, the Institute for New Economic Thinking.

00:17:55

Speaker

It's out there, but I think that's a small thing. Yeah, absolutely. So going forward then, along with the cascading, I assume that you would both view a data science educational framework, a school of data science, a degree in data science, that a data ethics course or seminar or something would be a core requirement as part of something like that.

00:18:17

Speaker

Yes, it's great to have a course that deals with ethics, but it would probably be even better if there was a course like that that has some technical component to it, but that all of the people who are teaching methods are aware of this and are reinforcing it as they go through and are making sure that where they have a chance to

00:18:39

Speaker

To, you know, teach a method using data that's that's got some personally identifiable stuff. So they say, okay, first we're going to do this. Now we can learn this method rather than say taking I don't know galaxy data star data, which doesn't have any of those issues or taking a data set that's already been cleaned. Like, let's make sure that we're

00:18:57

Speaker

reinforcing this all the way along because I think the tendency sometimes is to get these types of courses to seem like they're the soft courses or they're somehow an adjunct to the main show. You have to get out of the computer lab and think about philosophy. What I'd really like to see I think is not just a sense of ethics stops us from doing stuff, it's about ethics guides us towards some really useful things that would kind of drive social change.

00:19:25

Speaker

what are the examples of data being used for public good to drive the sustainable development goals, to stop animal poaching, to kind of using satellite data to see how we're dealing with poverty, et cetera, et cetera. So there's some really exciting things which I think could help students see data for social good rather than seeing it as a kind of barrier for change. Yeah, that's great. Let's end on a positive note. Thanks so much, both of you, for coming on the show. Thanks very much.

00:19:54

Speaker

And thanks to everybody for tuning in. If you have questions or comments or thoughts, there'll be lots of links on the show notes page. So please do let me know. So until next time, this has been the Policy Vis Podcast. Thanks so much for listening.