Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #115: Data Ethics with Laura Noren & Hetan Shah image

Episode #115: Data Ethics with Laura Noren & Hetan Shah

The PolicyViz Podcast
Avatar
174 Plays7 years ago

On last week’s episode, I sat outside Facebook and chatted with Andy Kirk about our experience at the Social Science Foo Camp, a two-and-a-half day conference at Facebook that brought together all sorts of social scientists. One of the first...

The post Episode #115: Data Ethics with Laura Noren & Hetan Shah appeared first on PolicyViz.

Recommended
Transcript

Introduction and Guest Overview

00:00:11
Speaker
Welcome back to the Policyviz podcast. I'm your host, John Schwabish. Thanks for tuning in. I'm coming to you from Northern California at the SoSci Foo Camp at Facebook. If you listen to last week's episode, you heard me and Andy Kirk give the recap of what we saw. So this week, I'm very excited to have two guests with me to talk about data ethics and a session that they held early on the first day. And so I'm excited to have Heitan Shah, who is the executive director of the Royal Statistical Society.
00:00:39
Speaker
And Laura Noreen, postdoc at NYU. Thank you both for coming on the show. I'm excited to have you.

Backgrounds and Data Ethics

00:00:45
Speaker
So why don't we have, maybe you could talk just a little bit about your background and then we can dive into the topic. Laura, you want to start? Sure. So I got my PhD in sociology and I did almost all of my research on organizations and they were all tech organizations. And then I had the great opportunity to do a postdoc in the Center for Data Science where I continued to study
00:01:05
Speaker
how universities grapple with data science. And along that route, I realized that we weren't really addressing ethics as much as we could be. And you started a data ethics course there, right? And you're teaching that this semester. Teaching that right now. Yeah. That's good. Going well. Well, we've had two sessions, so. OK. OK. So far, so good. So far, everyone. Everyone's still enrolled. OK. All right.
00:01:30
Speaker
So I run the Royal Statistical Society. I'm a professional charity chief exec. So I've run a different charity before that. And before that I was a think tank kind of policy person. So for me, I'm sort of the interface for the RSS on policy matters and
00:01:45
Speaker
data ethics is obviously kind of becoming a bigger and bigger agenda. We convened a big summit a couple of years ago in Windsor Castle and because of the location whoever we asked to come came along. We discussed kind of you know where is the kind of ethics of big data going and one of the big recommendations from that was we need a new institution a kind of council for data ethics to be set up and that's one of the things that's now moving forward in the UK.
00:02:09
Speaker
So, you two didn't know each other though before Facebook. No, we were sitting next to each other. Oh, okay. And you were like, we need a session. Yeah. Okay. All right.

Complexity and Principles of Data Ethics

00:02:17
Speaker
So, why don't we start with what is data ethics? Or what are data ethics? No, I think it's what is data ethics. The fact that there's this long pause, I think, means there's a lot of work to do.
00:02:32
Speaker
Well, I think that there's a whole series of questions, ethical questions which come out of the kind of new data landscape that we're sitting in. Actually, my starting point is, what is the purpose of the use of the data? And that's often overlooked. People jump to kind of privacy or bias or these sorts of things, but actually,
00:02:52
Speaker
If you ask the public, what is the thing that really drives your response to whether you want to share your data for a particular purpose? It's, are they doing this in some way which is going to help me or help the public good? For example, if you hear of Las Vegas casinos who are estimating your spend threshold as you walk in and then trying to disrupt that so you'll spend more, that is clearly not in the public interest.
00:03:16
Speaker
If you're using data for health research, etc, they're much more likely to be happy for you to do that. So it seems to me public interest is a really important starting point. Yeah, I would just say one simple place to start is to think about the Hippocratic Oath, which is often summarized as first do no harm.
00:03:34
Speaker
but that's not actually the Hippocratic oath if you go back and read the real thing. But anyways, that's what it's summarized as. So if with data you could think of not only first do no harm, but how do you make sure that like, you know, if you're working to sort of 95% confidence intervals, like you're very aware of what's happening to the other 5%. And we, you know, you can see other kinds of issues of scale and issues of dealing with some of our previous
00:03:59
Speaker
kind of agreed upon practices for being ethical, not really working for data science. Like how do you obtain informed consent when you're talking about
00:04:08
Speaker
you know, a giant corpus of tweets that you can't scale it and, you know, what if you store it indefinitely? How do you consent someone to that? Does that even make sense? Can they cognitively understand what they're agreeing to? Right. I mean, it seems to me like health data, security data, personal identifying information, Facebook data, for example, or setting your Facebook, I think sort of defining what data ethics mean or being ethical with those data seems pretty obvious to me, clear cut.
00:04:35
Speaker
But if I were to ask, what would data ethics or being ethical with data mean if I were to apply it to, you know, the census, for example, or sort of a standard, maybe a more standard data set that doesn't have people's names and specifically identifying information. So there's still basic issues around kind of data security, for example, and
00:04:55
Speaker
consent still comes into it. But I think that those are less tricky because we've built up those practices over many years and in a sense with new data sources and trying to combine all the new data sources, that's where we're getting into trickier spaces. The other thing to remember is that
00:05:10
Speaker
then our notions of ethics have shifted as technology has shifted over time. It's not so easy as to say, what are the ethics and let's just apply them. Our notions of privacy changed over time and will continue to change.

Ethical Challenges in Machine Learning

00:05:23
Speaker
Our notions of, for example, attitudes to sex and morality around sex changed because of the pill. So technology and morality and ethics have always had a kind of to and fro relationship.
00:05:35
Speaker
But what would a person, your sort of standard data analyst, she downloads census data or whatever, when she thinks about ethics or being ethical with the data, it's not just privacy and security. It's thinking about how we treat and think about different groups in the data and do those tabulations and think about different groups and uncertainty around those estimates, right?
00:05:59
Speaker
So if you're using census data, it doesn't have names. Should there be an idea that maybe you shouldn't try to re-add names to that? Because you can do combinations of data sets.
00:06:12
Speaker
So that's a fairly rule level question. Should you just decide that you're not going to try to re-identify people? Maybe that should be kind of an ethical rule, not a principle, but just a rule. That's the thing that we're not going to do. And there's things like that. Then there's things about transparency that come up a lot with data science. How interpretable should this activity be and how much responsibility
00:06:39
Speaker
does the sort of data science community have to make what they're doing interpretal? That's a kind of, you know, as we move into a situation where more and more people are doing jobs in which they are the expert and almost nobody else can understand what they're doing, this isn't just about data science, it's about divisions of labor and, you know, kind of the modern workforce.
00:06:58
Speaker
How transparent do we have to make our work, whether we're data scientists or anyone else, to the typical user? And I think this is, again, where there's something

Policy Work and Data Ethics Councils

00:07:07
Speaker
new. So again, I think the census example is one where we kind of know what we're doing. But with the introduction of machine learning algorithms, that's where you're starting to get a black box. And it's much more difficult to know how you're going to treat that from an ethical perspective. So as you're aware, lots of debates about algorithmic accountability. Is it publishing the code?
00:07:27
Speaker
You're not going to get very far with that. One of the things I've been saying, particularly within the UK context, is that the public sector holds a lot of data and is being now approached by private sector companies who hold these algorithms and they're almost like magic. In the eyes of the public sector, this is magic.
00:07:48
Speaker
I mean there are instances of the public sector really giving away the data very quickly because to the magic and what I've been saying is actually you got it the wrong way around you have the monopoly you have the data and actually these companies are to a penny and there is a marketplace of them so.
00:08:04
Speaker
In this marketplace, you now must use your procurement power to enforce a whole series of transparencies and kind of governance oversight, etc., because you wield a lot more power than you think you do. And I think if we can get that message across, we will have a far fairest sort of set of outcomes than otherwise.
00:08:24
Speaker
Because a hammer without a nail is just a hammer, right? So what is the RSS doing in the UK right now to grapple with some of these issues? So we're working quite hard at the policy level. So we made this recommendation for a council for data ethics. And we persuaded there's an independent foundation called the Nuffield Foundation. 20 years ago, it set up something called the Council for Bioethics at the time that this was a really kind of
00:08:50
Speaker
big issue and that body did a great job of kind of doing research and reports in that area. So we persuaded them to set up a council for data ethics in a similar way. So that's just being set up right now. We've been giving evidence to our parliamentarians. There's an inquiry running on algorithms and decision making, so I gave evidence to that. There's also one in our House of Lords on artificial intelligence and similarly we've been telling them, you know, how we thought that
00:09:19
Speaker
From a kind of public interest perspective all of this should play out. So how do you build up the guidelines or the principles? Are you pulling together? Existing guidelines that other places have created are you starting sort of some scratch? So I think there's nothing Council for data ethics is going to do that work as they will be properly resourced to do that. I
00:09:39
Speaker
And I mean, one of the things that's come out of the sessions today, which I found quite useful, Chris Wiggins was talking about, you know, thinking about this at the level of principles, at the level of standards and then at the level of rules. And I would hope that anything that comes out is at that sort of level of, on the one hand, what are the kind of really big level issues and principles that we can agree upon.
00:10:02
Speaker
And then how do you then take that out to standards and then further down to rules? But I think all that's to kind of play for it. Right. He also talked about in your session earlier about the enforcement side. So does the RSS then once it's defined, is there an enforcement
00:10:17
Speaker
protocol that you'll be working on and is it enforceable? So no, I mean I think that's the whole point that I think is different to law. At the European level we're already about to see some new legal changes with the general data protection regulation coming into force and that is actually going to bring a lot of good practice in but at the same time

Professional Guidelines and Education

00:10:36
Speaker
and leaves open a number of ethical questions, including, I mean, under GDPR, once you've got consent, you can still do whatever you like with the data. Well, there's still some things that we would say, you know, ethically you might not want to do with the data. But, you know, what are the enforcement mechanisms? It's going to be good practice guidelines. And it's about, I mean, we're a professional body and we've known that our codes of conduct are things that statisticians can turn to their clients or to their superiors and say,
00:11:05
Speaker
you know, this is against my professional judgment and I've got it backed up by my professional body. And I would hope over time at the moment there is no professional body for data science, but you know, we're helping to kind of create that space. And as the codes of conduct, etc, come out, we hope that that will empower the profession. Right. Can you see the Royal Society as being the right professional organization to address data science?
00:11:29
Speaker
So we see ourselves as being able to do some of it. We have a data science section and a machine learning network, but we don't claim to own the space and others are kind of welcome to play in that space too. So, you know, who knows? It may well be that a new body is created in due course, but there's no sign right now in the UK of that happening. Lauren, you have started your data ethics course at NYU. So can you talk a little bit about what does that mean? How do you teach people to be ethical of data?
00:11:58
Speaker
Well, so one of the things we do is we admit that ethics is not new. So we start with ethical philosophies, and it's really, really frustrating to data scientists and to a fair number of people who are expecting me to say, okay, here are the ethical principles we're going to establish, and then we're going to move on with the rest of the course, assuming that we agree on these ethical principles, because in ethical philosophy, you've got very different
00:12:23
Speaker
principles available to you that are still being deployed. It wasn't like 1-1 some hundred years ago. We've continued on with that. So that's been really unsettling. And some of the class just dropped after I said, you know, I'm not going to just be able to teach you, here are the rules. We actually had a fair amount of drop off from the first week to the second week. Well, they really wanted that concrete. Yes, they wanted the concrete. I'm going to tell you how not to screw things up. So you can see there's a lot of anxiety
00:12:49
Speaker
But there's not necessarily the same appetite to really engage with these questions and be kind of on the front lines of what is happening. On the other hand, after the second class, and you know, I've been teaching for about seven years now, all sorts of different classes, not that this is the first time for this class, but the students were so excited about the class, they stayed after class in the classroom for an hour and 15 minutes extra, which is an entire extra class session.
00:13:16
Speaker
And then they still wouldn't, you know, on the sidewalk outside of the building, they still were talking. So the people who are into it, you know, this is the right time to be in these classes, to be on these councils, to be trying to put together communities of interested people to shape what these ethical principles are going to look like, what the rules are going to be.
00:13:35
Speaker
and how they're going to be deployed.

Global Initiatives and Sectoral Efforts

00:13:37
Speaker
I think it's great that they're coming out of a professional body. I think that's the right place for them to be because that's likely where they're going to be the most effective and the most flexible. If they were coming out of a legal body, they're not going to be that flexible. So I think that's really the right way to go.
00:13:51
Speaker
Having said that, I think what you're saying does really point to that need that people who are not necessarily going to immerse themselves in the topic still want to know, when I go into the office on Monday, what is it I need to do? And that's the job, I think, over the next couple of years for everyone working in this field to say, yes, it's complicated, but what is it that we can point to?
00:14:12
Speaker
Yeah. And I mean, we're not alone here. The ACM, the IEEE, there's a group, Gideon Mann and Oak Bloomberg and just group after group after group, you know, we're not geniuses. We're just your average data science person who also has some kind of moral compass that is looking around and saying, wait a second, I think
00:14:34
Speaker
I think we can do better. As well as developing these frameworks, I think that there's a role for some of these new bodies to actually take real problems, have them deferred to them, and to almost develop what you might call case law around them. These are new issues.
00:14:52
Speaker
There aren't always right answers, but if somebody thinks them through and can say this is why we came to this conclusion, then others could use that. I think if we could make this a kind of transparent space where people are saying this is how I'm thinking about it, we will get good debates and we'll be better off in the long run.
00:15:09
Speaker
Are there similar efforts in the US or other countries to the RSS's effort to build this rules or regular, or this amorphous thing? Are there other organizations? So I think Laura's talked about some of the other initiatives that are going on. But I don't know of anything quite like the thing that the Nuffel Foundation is setting up. It does seem to me in the UK that there is quite considerable policy interest
00:15:37
Speaker
which I think you know we've helped to kind of gather the momentum behind that but as of many others so yeah I hope that you know we'll take a leading edge and that others can kind of
00:15:47
Speaker
build on that and take what they want from it. New York City has become the first city to establish a task force to understand whether or not the city should make transparent all of the algorithms that it ever uses in any kind of decision making and do additional kinds of work around data sets themselves, which cities are
00:16:09
Speaker
one of the biggest sources of open data now. It's sort of about time that we had something like that. Do you ever worry that focusing on data science, computer science, the sort of technical fields where people are working with the sort of newer and bigger sorts of data that maybe other areas of study may not be aware of these issues and are not paying attention?

Expanding Awareness of Data Ethics

00:16:31
Speaker
economic, you know, coming economics background, you know, there, you know, economists are starting to use social media data starting to use these other types of data, but maybe this is not on their radar. And how do you ultimately get them to pay attention to this?
00:16:43
Speaker
That may well be true. I suppose it's going to be about cascading these things out. And it's noticeable that data science courses are now popping up all over the place and working with other disciplines, because data science in and of itself is empty. It's got some methodology, but beyond that, it's got to work with a disciplinary framework. So I think I would hope that it cascades out that way. That is one of the things that universities that are using their data science, not as a department, but as sort of a center
00:17:13
Speaker
the centering work is pushing this kind of thing out. Economics in particular seems to be a difficult discipline to engage. There was an article in the New York Times that made the claim that economists are the least likely to be doing interdisciplinary work. That sounds about right.
00:17:32
Speaker
I mean, in the UK, there's a whole interesting movement around rethinking economics, where students rebelled in Manchester and asked for a completely different kind of series of courses. There's also the core economics course, which is a kind of new version of the curriculum. So there are things referred, I think, of bringing more heterodoxy into it. Right. Rethinking economics, the Institute for New Economic Thinking.
00:17:55
Speaker
It's out there, but I think that's a small thing. Yeah, absolutely. So going forward then, along with the cascading, I assume that you would both view a data science educational framework, a school of data science, a degree in data science, that a data ethics course or seminar or something would be a core requirement as part of something like that.
00:18:17
Speaker
Yes, it's great to have a course that deals with ethics, but it would probably be even better if there was a course like that that has some technical component to it, but that all of the people who are teaching methods are aware of this and are reinforcing it as they go through and are making sure that where they have a chance to
00:18:39
Speaker
To, you know, teach a method using data that's that's got some personally identifiable stuff. So they say, okay, first we're going to do this. Now we can learn this method rather than say taking I don't know galaxy data star data, which doesn't have any of those issues or taking a data set that's already been cleaned. Like, let's make sure that we're
00:18:57
Speaker
reinforcing this all the way along because I think the tendency sometimes is to get these types of courses to seem like they're the soft courses or they're somehow an adjunct to the main show. You have to get out of the computer lab and think about philosophy. What I'd really like to see I think is not just a sense of ethics stops us from doing stuff, it's about ethics guides us towards some really useful things that would kind of drive social change.
00:19:25
Speaker
what are the examples of data being used for public good to drive the sustainable development goals, to stop animal poaching, to kind of using satellite data to see how we're dealing with poverty, et cetera, et cetera. So there's some really exciting things which I think could help students see data for social good rather than seeing it as a kind of barrier for change. Yeah, that's great. Let's end on a positive note. Thanks so much, both of you, for coming on the show. Thanks very much.
00:19:54
Speaker
And thanks to everybody for tuning in. If you have questions or comments or thoughts, there'll be lots of links on the show notes page. So please do let me know. So until next time, this has been the Policy Vis Podcast. Thanks so much for listening.