Introduction and Importance of Data
00:00:13
Speaker
Welcome back to the policy of this podcast. I'm your host, John Twabish. Now, as you know, we spend a lot of time on this show talking about data visualization, but you can only visualize data when you have data. And maybe too many of us don't think carefully enough about the data that we have and how it's collected and ultimately how it's being used. And so on this week's show of the podcast, I'm really excited to have Sarah Williams, who is the author of the new book, Data Action.
00:00:42
Speaker
And it's a really great book. I really, really enjoyed
Sarah Williams' Background and Focus
00:00:45
Speaker
it. I think it's one of those must read books to help all of us think more carefully about where our data come from, how data are collected, and how we can pull together new data sources that maybe don't exist. So Sarah and I talk about her book. We talk about her work. And you'll hear lots of interesting stories that are presented in that work. And so I hope you'll enjoy it. So here is my chat with Sarah Williams.
00:01:13
Speaker
Hi, Sarah. Welcome to the podcast. So great to chat with you. Really great to be here. Thanks so much for inviting me. I'm really excited to chat with you. Um, I have just finished your book, um, data action. It is, um, I don't want to oversell it, but it is incredible. Um, it is a great book. It takes folks through this entire process and I really want to, uh, dig into it. Um, so, but we start, maybe you could talk a little bit about yourself and your background, and then we can, uh, talk about your current work and the book itself.
Influencing Policy with Data Insights
00:01:42
Speaker
Great. So right now I am an Associate Professor of Technology and Urban Planning at MIT. And I also run the Civic Data Design Lab, which communicates the insights of data to broad publics to affect policy change. And one of the main ways I do this, as you probably know from my book, is through data visualizations.
00:02:10
Speaker
My background, I'm trained as a geographer, an architect, an urban planner, and a data scientist, and also am a computer scientist as well. And so the work of the lab really mashes up those skills to translate data into a tool.
00:02:34
Speaker
for action. And I really think, you know, my early training as a geographer, which really focus on, you know, using data to understand how people relate to place and how place affects society has been a huge influence on
Overview of Data Action Methodology
00:02:56
Speaker
my work. But I've always felt like
00:02:59
Speaker
When we're using data or as, let's say, researchers, when we find insights of data, those messages aren't always communicated to the policy experts or the planners or the people who are making decisions. So one of the things that I really try to focus on is how can we bring those insights of data to people who really need to make
00:03:25
Speaker
important decisions around cities, but also other kinds of policies, whether it be criminal justice or things related to internet use, which are all part of my work.
00:03:42
Speaker
Right. First off, you have this amazing background. So if we had time, we could go into how you're able to do manage all of those different skillsets, which is incredible. But you mentioned that you do a lot of your work through data visualization and there's a chapter in the book on data visualization, but.
00:03:57
Speaker
What I love about this book is it takes a more thorough look at the entire sort of data infrastructure. So you have this model or method of data action. And so maybe you can just walk us quickly through what that process looks like for people because the data visualization, although it's really important and I think you make a great case of how important it is,
00:04:23
Speaker
especially in our current era of content streaming by, it is sort of at the end of the road after you collect the data and analyze the data and work
Training Researchers for Community Engagement
00:04:33
Speaker
with the data. So can you sort of lay out this model for us? Yeah, thanks so much for asking me. Well, I think one of the things that I think is really important in the model is that, you know, really to take action with data, we need multidisciplinary teams.
00:04:52
Speaker
Um, which allow us to ask the right questions of the data. So I bringing together a data scientist, a policy expert, the visualizer, um, and also people from the community in which we hope to affect really helps to create, let's say multidimensional projects, but also, um, answers to those projects. So I really start by, you know, bringing those teams together to identify the right question.
00:05:22
Speaker
Then I really think about how do we quantify it? Perhaps how do we create or capture new data that isn't available or missing?
00:05:33
Speaker
And then how do we create insights or build insights off of that data set, visualize that information through data visualizations or maps, and then really ask the people in the data whether what we are showing them rings true. So I call this kind of a ground truthing moment, both ground truthing to see
00:05:57
Speaker
if the insights of what we found on the ground are true, but also whether the insights we've developed make sense to the communities which we work with. So that's really why it's important to have these teams.
Community-based Project in Nairobi
00:06:11
Speaker
A lot of that, let's say ground-truthing process does come through data visualization and then modifying our results based on those kinds of ground truths
00:06:23
Speaker
assessments and then starting all over again, uh, to rebuild the model, to get it to be a bit more accurate. But I think like really at the heart of it, it's the teams that make the action really impactful. Um, because each one of them has their community of interest in which they bring the insights of our analysis to action oriented results.
00:06:53
Speaker
So I love this concept of the multidisciplinary approach. I think it's something that we all need to embrace more in our work. You mentioned just now, and it's throughout the examples in your book, this idea of talking to communities or people that are the
00:07:09
Speaker
focal point of the work or who the work will benefit. And I'm curious, coming from a more quantitative background, how do you get colleagues who are more quantitative to be, in some sense, more qualitative? How do you get them to go into the community? In the book, you have a number of examples, I think we should talk about a couple of them, about getting students to go out there and interview people. And I'm curious,
00:07:34
Speaker
do you try to get you know folks training or is it really just let's go out in the street and just talk to people um well i think that's what again like you know usually there's somebody on our team who's really specializes in that work and really i ask them to train the students or the researchers on the project to do that um talking with communities um and so
00:07:59
Speaker
you know, bringing their knowledge of how to talk to communities to the team helps everybody kind of learn a new skill. In my case, I know how to do that in my background. So sometimes that's me doing the training. Sometimes it's our
00:08:18
Speaker
policy experts. Um, I'm just thinking about a project that we did in Nairobi. Um, and I met with the students about wanting to talk to people in the community and they actually were like, Sarah, you know what? I think we can do this on our own because you being there observing kind of looks like an outsider. And I think you've given enough to know what to do. And they like, I was super excited about that kind of, they went out into the community. This was a project where we were.
00:08:48
Speaker
asking about internet use in some of the lowest income communities, wondering what was access like. We had information about access to the internet and download speeds, and we wanted to see if it really rang true to those community members. And I was just really excited to see the students at university in Nairobi take that on after several conversations.
00:09:11
Speaker
Right yeah that is that is incredible. So why don't we talk about one of those one of those projects because there is in a couple of the chapters there's a discussion of the Matatu infrastructure I guess in Kenya and you talk about the collaboration between the data folks, between the government, between
00:09:32
Speaker
Uh, the drivers and the public and, and just how that almost call it like a community based project really ended up in, in this successful final product. So can you talk about, about that
US Project Example and Collaboration
00:09:44
Speaker
process a little bit and then when, what that product was? Yeah, great. So I think this is like one of my favorite projects in the book, probably why you see it in multiple Jeff chapters. And, um, I think it really illustrates the data action methodology quite well. So this project.
00:09:59
Speaker
you know, has a long life. It started, I started working in Nairobi in 2006. And with many rapidly developing cities, transportation is a huge issue because the infrastructure hasn't caught up with the development of the city itself. It experiences crazy traffic jams. And I've been working on transportation models in the city, but didn't have
00:10:27
Speaker
you know, essential information about Matatas, which are the main form of transit in the city. They are small mini-vuses, either 14 or 30-seaters that really
00:10:41
Speaker
are how people get around in Nairobi and they're owned by private operators. We just didn't have data about where they went, but they represented close to 80% of the vehicles on the roadway. So you're imagining if you're putting a transportation model together, you really need to have that.
00:11:01
Speaker
data. So we set out on a project to collect that data set and really from the beginning included people in the transport sector that were interested, the government, the Motatu drivers, their owners in a process of how we were going to collect this data. And ultimately what we created was
00:11:25
Speaker
An app that you have on your cell phone and working with University of Nairobi students and my colleagues at the University of Nairobi computer science department created an app for the local context in which we collected data on all the routes and stops using the cell phone information. And I think what's like super interesting about the project is
00:11:51
Speaker
Everybody was involved to a greater, lesser extent. The government knew about it, but we're largely disinterested. But then once we had the data and translated it into a map, something that you might see
00:12:09
Speaker
in London or New York or Washington, D.C. They got very excited about the project the government did. Well, actually, the project went quite viral once we put the maps out themselves. And I think what's interesting is the government instantly made it the official map of the city. And I think that they were able to do that because they trusted our process, they trusted
00:12:39
Speaker
the way that we went about the data collection because we were very transparent and included them in it all along. And so they felt really like they had ownership of that data too.
Power Structures and Data Visualization
00:12:52
Speaker
Huh. So you've done a lot of work around the world. Have you found in projects in the United States the same necessary interaction between these various stakeholders? I mean, it seems like it would be universal. But that story in Kenya just seems like such the big success story that everyone would point to. I'm just curious, have you seen or have you been involved in other projects that have the same kind of success?
00:13:21
Speaker
and really focusing on the interaction of all these different groups. Oh, absolutely. I have a number of them that I write in chapter four, as you alluded to. I think one project that I've worked on in particular was a project called the Million Dollar Blocks, where we took intake data from the prison system and looked block by block how much it costs to incarcerate people from those blocks.
00:13:52
Speaker
And we found over a million dollars is spent to incarcerate people from many blocks in New York City, but the same level of investment is not given to schools, to job training programs, to the systematic things that might alleviate the reasons that these individuals might be involved in criminal activity to begin with. And so kind of looking at reinvesting the money into the community
00:14:19
Speaker
But that project, we had a deep partnership with policy experts, with the communities that we worked with, with architects and designers. And I really think it's a great example of, you know, how these kinds of collaborative projects can really have an impact. And in this case, our maps were seen
00:14:43
Speaker
by a congressman and our data visualizations were seen by a congressman who actually used them as evidence for the Criminal Justice Reinvestment Act, which allocates money for reentry.
00:14:56
Speaker
programs, which are job training programs after people leave prison. So trying to reinvest in the community itself. But I think what's interesting about that project too is it had a very long life. And in fact, it's come up a lot recently in all of the conversations around defunding the police because part of the message was let's reallocate spending towards
00:15:22
Speaker
you know, the systematic effects in these communities, which is a lot of what that conversation around defunding the police is. So I see the maps get used all the time still, even though that is over, you know, over 15 years old. It all depends about how, you know, having this collaboration create gives a project life that can have an impact much longer than its initial scope. Yeah. Yeah. I mean, it's really interesting to have,
00:15:51
Speaker
how that collaboration can give a project success and give it life.
00:15:58
Speaker
Another one of the things that you talk about throughout the book, I think, that comes through is data sources, data projects, data visualizations that reinforce power structures or reinforce racism. And in the chapter on data visualization, you talk about specifically spend some time talking about maps and how they're inherently political. And what you include and what you don't include illustrates these social contracts. They illustrate power structures. And I guess I'm curious on a couple
00:16:28
Speaker
perspectives on this. One is, I guess really in your work, how do you approach these data visualizations when you are trying to solve these important questions? So you mentioned the Matadu map in Nairobi.
00:16:44
Speaker
And I'm curious, you know, when you take a map like that and you model it after or someone models it after the London tube or the DC Metro, like does the team say we should have a different look and feel for this because we don't want it to, you know,
Ethical Considerations in Data Visualization
00:16:57
Speaker
look like, you know, Western, you know, Northern European structures and that sort of thing. And then how do you think about as you're as you're doing your work, how do you think about creating visualizations that are receptive and conscious about these these different power structures?
00:17:12
Speaker
So I know that's a big question, but it's really just trying to get at you spend a lot of time in the book talking about and I think it's such an important issue, especially today and I'm just curious if you could sort of just spin on it a little bit, I guess. No, it's a fantastic question. I'm so glad you asked it too. I think in particular, the moon top two projects a great example of that.
00:17:34
Speaker
you know, visualizing information isn't always a great thing in some context, right? I think in the case of Nairobi, you know, visualizing the routes and stops of these informal systems really had a benefit to the public. But you can imagine in some other contexts where this might not be good. And I guess, you know, I guess, you know, in some cities,
00:17:59
Speaker
you know, visualizing that data would cause a crackdown on these vehicles who, you know, sometimes don't have proper license and so forth. But, you know, a crackdown is it wouldn't necessarily benefit the people who live there because they depend on these transportation systems to get around. Right. Right.
00:18:20
Speaker
You might not want to expose them. And we have had, you know, since we've done the Matachas projects, we've had many cities come to us being interested in doing this work. And there are certain cities that exposing it would not be a benefit to those organizations. And we haven't done the visualizations because of them. So, you know, you always have to think, I guess, at the heart of it is
00:18:47
Speaker
what you're doing cannot do harm to anyone, right? And in the case of Nairobi, I think in fact, it provided a huge benefit in terms of, you know, providing an essential resource to the public, but also to the city, but also to, you know, transportation analysts who are trying to model and improve the transportation system.
00:19:10
Speaker
but in other contexts that might not be good. I would also say in the case of Nairobi, we worked a lot with the Matatu drivers owners and the community to think about a strategy for visualizing the
Challenges in Data Collection: Beijing Olympics
00:19:22
Speaker
data. And one of the things that they wanted to use the visualization for was to increase funding resources to this, you know, semi-formal system. And by having an association with
00:19:37
Speaker
let's say in London or Paris, and by making it really showing that it is a very similar system to those, really helped get that support from those NGOs and outside actors to help improve the system itself. So it was, in a way it was strategic. But I guess actually what was the side benefit is, I think,
00:20:05
Speaker
It became kind of iconic in Nairobi. We had Sweatshirt Designer come to us in contact and kind of went viral and became something that people in Nairobi are really proud of as well. Just like we are proud of the New York City map or you're many people are proud of their own city map too.
00:20:26
Speaker
Yeah, so I want to finish our chat by sort of going to the beginning, close to the beginning of the book, which is on the data collection side, because I think a lot of people who work in data visualization are focused on, you know, what's the end product? How do I visualize the data? But you spend time throughout the book in the various chapters talking about data collection.
00:20:47
Speaker
And there's a really interesting story about collecting pollution data in Beijing during the Olympics. So maybe you could tell folks about that project. But I think my real question of interest is,
Maintaining Accuracy in Crowdsourced Data
00:21:00
Speaker
When you are leading or participating, I guess, in any sort of crowd sourced data collection effort, how does one or how do you Sarah as the project lead, manage the accuracy of the data? So it's in some ways easy for us to go out with our cell phones and do, you know, geo tracking. but how do you ensure that the data that are coming in are accurate?
00:21:23
Speaker
so that when you get to that last stage of the data visualization that we can have faith in it that it you know it is accurate even though it's been collected by hundreds or maybe thousands of people. Um it's a great question um you know in many of the projects and even in the digital metat two's project while that data was crowd sourced we did have volunteers working on
00:21:46
Speaker
the data who understood how to collect it. And then we had a team that actually checked the data afterward to ensure its accuracy. Yeah, so in the case of the Beijing Olympics project, we were really interested in trying to get data on air quality levels in Beijing and what's really
00:22:12
Speaker
surprising is that just weeks before the Olympics, there was no data released by the government on air quality levels. And we know Beijing and we hear about the pollution. Obviously, there was quite a concern from the athlete community, but also just quite a bit of interest from the press and trying to identify
00:22:34
Speaker
what effect that pollution might have on the health of athletes. So we teamed up with the Associated Press to collect data on air quality. And I developed a sensor, a mobile sensor that could be used by the reporters during the Olympics
00:22:53
Speaker
One thing to note is that mobile sensors are not as accurate as, let's say, kind of these very big systems that we might see from the US EPA, but they do provide relative accuracy. And in the case of Beijing, where the air quality was so significantly bad,
00:23:14
Speaker
that relative error level was fine. And just to give you an example, like the average air quality in New York City or London ranges from 10 micrograms per cubic meter to 15. 15 is considered like a bad day or 20 is a bad day. In Beijing, we were getting recordings of 800 microcubic meters. And this is particulate matter I'm talking about right now.
00:23:43
Speaker
Some days it's 200. It was so extreme in how bad it was. Having some error in the device was fine because we were showing or exposing an extreme condition.
00:24:00
Speaker
So is the view then that some data in this case, in this case, because there is no data and it's, I mean, in particular, it's, you know, it is health of people and athletes. Um, in this case, is it, you know, some data, even if it's not perfect, even if it's not, you know, government regulated, some data is better than no data at all.
Data Findings Leading to Policy Change
00:24:21
Speaker
And in fact, you know, in this case, having just a dataset that says just at what range we have information have a huge impact on taking action, but just making people aware. And I think that
00:24:40
Speaker
And a lot of the community data projects that I talk about in the book, you know, having some information about the air quality, even though the sensors might not be high quality, have then allowed people to come out and put high level air quality sensors into the community. So one example that we use in the book is
00:25:01
Speaker
a community that was convinced that fracking was causing poor air quality in their community due to exhaust and some other, let's say, mechanisms in which fracking occurs. And they put low quality mobile sensors and they were able to indicate
00:25:19
Speaker
that in fact there were higher levels or let's say poor conditions as opposed to other areas and they were able to use that data to get the EPA to come back and put in even higher grade sensors which did in fact prove the poor air quality in the community and
00:25:40
Speaker
and had an effect on regulating some of the devices that are used in fracking.
Seven Principles for Ethical Data Use
00:25:46
Speaker
So you can see here, you have this sensor that's maybe not the highest quality, but it provides data and ultimately was able to have a huge impact on policy and a huge impact in that community itself.
00:26:00
Speaker
Right. So I want to just wrap up and just, I guess just ask like, where do you hope this book will be used? How do you hope people will be able to use it in their own work and apply it?
00:26:15
Speaker
Yeah, I mean, I think that's a really good question. I mean, one of the reasons that I created this book is, you know, you know, we're all really excited about data and its potential use, but I really wanted to give people guidance on how to use it ethically and responsibly. And, you know, we've heard a lot of critique
00:26:38
Speaker
about the misuse of data from people like Virginia Eubanks talking about automating inequity or Kathy O'Neill, weapons of mass destruction. But here I wanted to, within that criticism, provide guidance of ways that you can use it for good.
00:26:58
Speaker
And so I hope people take this book and create their own projects that really start thinking about how we can use data for the public benefit. And at the end of the book, I create what I call the data action principles, which are seven principles that I think
00:27:19
Speaker
We should all data enthusiasts should be thinking about when they attempt to use data for good. And perhaps I can mention those now. One is to say, do no harm. We must interrogate the reasons we want to use data
00:27:37
Speaker
and determine the potential for our work to do more harm than good. That kind of gets back to what I was talking about in Nairobi, is visualizing that data have the potential to do harm. So we should always ask ourselves that. Two, we should build teams to create narratives around data for action. It's essential for communicating results effectively.
00:27:59
Speaker
The third principle is change power dynamics by building data helps change the power dynamics inherent in controlling and using data. Those examples that I talked about in terms of air quality really did change those power dynamics.
00:28:18
Speaker
Four, expose hidden systems. Coming up with unique ways to acquire and quantify a model data can expose messages previously hidden from the public eye. However, we must expose ideas ethically going back to the first principle. So, you know, when I talk about the Million Dollar Blocks Project, we really expose, you know, the costs of incarcerating people.
00:28:40
Speaker
Five, ground truth. We must validate the work we do with data by literally observing the phenomena on the ground and asking those in the data set how our results can be interpreted. Six, we should share data. And I talk about this a lot by sharing the insights. We can really create change in policy. And then seven, create your own ethical standards. Remember that data are people
00:29:08
Speaker
And we must do them no harm and we must seek to develop our own standards of practice. I think this really gets at the fact that, you know, technology moves much more quickly than we can create standards of use. And it's up to us to create those ethical practices and data scientists to really develop them along with the technological development itself.
Conclusion and Inspiration for Impactful Data Projects
00:29:34
Speaker
Yeah. Well, that is, um,
00:29:37
Speaker
That's an awesome list right there. And I think I'll just, you know, I'll say again, I mean, I'm a big fan of the book and just, you know, big fan of your work. And I think people could learn a lot by following those, those seven steps and checking out the book. So thanks so much for, for coming on the show and chatting with me. It's been great talking. Yeah. Thanks so much for having me. And I'm so glad you liked the book and
00:30:01
Speaker
Really, I just hope that the book inspires people to create their own data projects and use data for action. So thanks so much for letting me share that with you today. That's great. Okay. Thanks so much there. I appreciate it.
00:30:17
Speaker
And thanks everyone for tuning into this week's episode of the show. I hope you enjoyed that. I hope you'll check out Sarah's book, Data Action. It's a great read and really will help you think about all the ways in which data can be collected and the issues that we should consider when we are visualizing our data. So until next time, this has been the policy of this podcast. Thanks so much for listening.
00:30:38
Speaker
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs, and each episode is transcribed by Jenny Transcription Services. If you would like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, or wherever you get your podcasts. The PolicyViz podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our Patreon page at patreon.com slash PolicyViz.