Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #29: Discussion on Open Data image

Episode #29: Discussion on Open Data

The PolicyViz Podcast
Avatar
138 Plays9 years ago

Welcome back to the PolicyViz Podcast! A special episode this week–a panel discussion, as it were, about open data. This conversation was sparked by a presentation I gave at the 2015 Socrata Customer Summit about the need to make open...

The post Episode #29: Discussion on Open Data appeared first on PolicyViz.

Recommended
Transcript

Intro & Overview of MICA's Graduate Program

00:00:00
Speaker
This episode of the Policy Viz podcast is brought to you by the Maryland Institute College of Art. MICA's online graduate program and information visualization trains designers and analysts to translate data into compelling visual narratives. Join expert faculty such as Andy Kirk, Marissa Peacock, and John Schwabisch to mine the data and design the story. For more information, go to mica.edu backslash MPS in Viz.

Introduction of Guests by John Schwabisch

00:00:38
Speaker
Welcome back to the Policy This Podcast. I'm your host, John Schwabisch. Thanks so much for tuning in. I have a very special episode this week with four guests to talk about open data and whether we can do a better job of opening data to more and more people. So what I want to do is have each
00:00:54
Speaker
Each one of my guests sort of introduce themselves quickly and then we're going to jump right in because this is going to be an experiment of how many people you can fit on a podcast at one time and whether listeners can identify each of those voices over the course of 30 minutes. So why don't we start with David.

Guest Backgrounds and Experiences

00:01:10
Speaker
David, do you want to introduce yourself?
00:01:11
Speaker
Hi, yeah, so I'm David Eads. I'm a data reporter and a web builder at NPR. And I think since it's relevant here, I've also been a federal contractor. I helped start a community technology program called Free Geek Chicago. And I'm also an original founder of an organization called the Invisible Institute, which actually just broke the story about Laquan McDonald, the police shooting in Chicago. Great. Second guess is Chris Wong. Chris?
00:01:40
Speaker
Hi, I'm a New York City-based civic hacker, open data enthusiast, and...
00:01:46
Speaker
Yeah, just basically a web developer, technologist, someone who hacks with a lot of open data, have a background in urban planning and GIS and cartography. So I'm always dabbling with things, especially in the location and place-based analysis. So that's me. Great. And third, Sharon Paley from Johns Hopkins. Yeah, hi. I guess I'm the one with the most distinguishable voice today. Sharon Paley, I'm the
00:02:16
Speaker
Deputy Director of Johns Hopkins University Center for Government Excellence, which provides hands-on technical assistance, online curriculum and resources for governments to leverage data.
00:02:28
Speaker
to better improve the lives of their residents as well as improve internal decision-making and policy creation. I am the co-founder of Hack Baltimore, which is a non-profit platform where the public can assist Baltimore and the state of Maryland and other jurisdictions in creating solutions to civic problems. And I am a gubernatorial appointee to the Maryland State Council on Open Data. Great. And last but not least, we have Andrew Nevis. Hi.
00:02:57
Speaker
That's something to follow. I've been in government for about 20 years as a city CIO, a statewide agency, a technologist, a federal contractor, and a major city police department CIO. And I've focused on data, both internally and externally, making it accessible to both populations. So this topic is of huge interest to me.

Discussion on Making Open Data Human-Readable

00:03:22
Speaker
Great. Well, thanks everyone for coming on the show. So let's start by how all of us got together. So this all sort of started with a comment I made on Twitter following a talk I gave at the Socratic Customer Conference. And specifically, I was talking about how open data in general needs to move from a world in which
00:03:39
Speaker
open data are not just machine readable, but they're human readable. And what I was talking about in that talk was showing examples of a lot of these different platforms where data are open, they're in platforms like Socrata, like OpenGov, even Tableau.
00:03:58
Speaker
But they're not really readable for sort of your average everyday person who may want to find out how long it's going to take for them to commute to work or how they can improve parking around their cities. And this sort of, five of us were sort of talking for a while, so I want to start there and move on. So I guess the first question really is, is my assessment correct that open data needs to move from
00:04:21
Speaker
needs to become more human readable for sort of the everyday average person who just wants to know more about their cities or their lives.
00:04:29
Speaker
So I'll jump in, because I think I might have been one of the first to respond to it. I think my perspective on all of this is that open data, by and large, does become human readable. It just takes a few extra steps. And we see that everywhere. We see the transit apps in our pocket. And we see open data making its way into
00:04:53
Speaker
You know newspaper articles and things like that, but all these are byproducts of the open data source, you know, so my argument here and I'll make the analogy of like housing, is that if you you know if you wanted everyone to have a house and you're giving away free lumber.
00:05:09
Speaker
There's a missing step there, right? You need the subject matter expert that is going to take that raw material and turn it into a finished product and then deliver it to, I guess, a less technical audience. That's my analogy, I guess, for Open Data, is that I think we are polluting the
00:05:31
Speaker
or polluting the subject of open data with charts and dashboards and things because all these I think get in the way of raw access to data. And it's not to say that they're bad things or that they shouldn't be there. I'm not saying it should not be the role of government to provide these things to a normal citizen. But that should be a very different objective and a very different policy and a very different business activity than publishing of raw data. That's my point.
00:05:59
Speaker
So I mean, I guess I don't as strongly disagree with Chris as I did when we were fighting it out on Twitter or whatever. But I think there is a really important role for government
00:06:15
Speaker
to not just publish quality raw data for people who know how to use that, but I think that there's an obligation for government to also make that data comprehensible and consumable by the people that they represent and who, you know, pay their salaries in effect and give them their jobs. And they're ultimately
00:06:40
Speaker
responsible to. And we can't just put out raw data because there are definitely people who are on this
00:06:48
Speaker
podcasts who do great jobs of, you know, visualizing that data and making that data comprehensible through apps and other ways. But I think that those things tend to be very hard to discover. So it's better if government can at the front end do work kind of like we see coming out of LA now with their data portal where there is available visualizations, maps,
00:07:16
Speaker
things like open checkbook applications, so that average Joe citizen, which I would consider myself because I'm not really a technical person, can go in and see what my tax money is being spent on, what services are being delivered. The raw data needs to be there for journalists and for advocacy groups and other people to leverage and vet what the government is saying, but I think the government has a responsibility to
00:07:44
Speaker
in its effort to be transparent to sort of present their content in a way that's consumable for everybody. At the risk of being boring and not disagreeing, I have to say, you know, I think the question, the framing of should open data be human readable?

Data Access Inequality and Government Responsibility

00:08:05
Speaker
Yes, absolutely. I really don't think there's
00:08:10
Speaker
A lot of question about that if you care about transparency and democracy and accountability. I think really fundamentally the question is about tactics and ecosystem and how we achieve that.
00:08:26
Speaker
We do live in a world that's profoundly unequal, where people who are barely literate or illiterate are sort of bumping up against the same government systems that people with college educations in affluent kind of economic situations are bumping up against. So I really think that we need to think about sort of
00:08:50
Speaker
a build with philosophy. Lauren Allen McCann is somebody from DC who thinks really hard about civic technology. She has this framing that you need to build with. If you're a government agency and you're providing an online service, yeah, you need your data to be human readable. Absolutely.
00:09:11
Speaker
But similarly, building with, I think, if you're dealing with sort of bulk data does mean good documentation, well-named variables, stable URLs. So there's different kinds of building with around open data. And I think part of my jumping into the conversation was that government so often does fail at just kind of the basics of providing raw data.
00:09:39
Speaker
from a journalist's standpoint, that it can be easy to kind of see something like, oh, well, we need to make more data visualization, or we need to make more user interfaces around data the government's providing. As a journalist, it's sort of like, well, just give me good data. Right. Not getting that now in a lot of cases.
00:09:55
Speaker
But back to Chris's point, which I think is an interesting point, which is, is it important to allow anyone to be able to use the data as easily as possible? Or should there be some barrier so that people are forced to be more thoughtful and careful with it? Sure. If I could actually jump in. So I somewhat disagree. I don't, with the initial point, I don't think it's bad that we get to a point where we make it very easily accessible to everyone.
00:10:20
Speaker
But I think part of it is we need to rip off the band-aid and say, it's not easy and accessible inside a government as well. You know? Analysts inside don't have this great source. In fact, there are times when some of the best cases in open data is the internal government users start to, oh, I have access to this now. I can use this either as an API or I can just get a list of accounts that I could never get before. Someone ridiculous.
00:10:48
Speaker
So that conversation and that sharing that, Hey, we spent a lot of time making a great open data portal and visualizations for the public. No, no, we, we also did that internally. So, you know, a law enforcement officer or someone in the health department or in finance can see what the heck is going on better. And that I think is really powerful. And during the white house police data initiative, Montgomery County police did training, not just externally, but internally to officers who can then use the portal themselves to find out more about what's going on in crime, which,
00:11:17
Speaker
Again, you think, hey, they have this already. Well, they don't always have this already. So showing it's raw, hey, this is a schema that looks weird. I don't know what this field is actually used for. That's useful in the conversation. Now, we should get to the point where it's human readable both in and outside of government.
00:11:36
Speaker
We're not there yet. I think that's okay.

Internal Government Data Challenges and Improvements

00:11:39
Speaker
So do you all feel that the discussion around open data focuses perhaps too much on government and on getting data out to the public where there should be at least more of the conversation should focus on sort of opening data within organizations? Sharon, let me start with you since I have a feeling like Hack Baltimore sort of that might be one of the things you guys work on.
00:12:00
Speaker
Yeah, well, actually at the Center for Government Excellence, you know, we kind of have this maturity model that we talk to our client governments about where open data is the first step in it and exactly for the reason that Andrew was talking about, because then we want to be able to have cities look across their cities or states or whatever, look across their verticals and kind of de-silo data.
00:12:35
Speaker
you know do data analysis and improve efficiency and service delivery and do things like you know performance management and predictive analytics and all this other stuff which is really cool that you know governments are trying to start to do uh... but you can't do it until you're at least liberating the data to yourself uh... so i think there's like a distinction at what we mean by open actually think that
00:12:54
Speaker
so that we can then go through and look for
00:13:01
Speaker
from my experience in Hack Baltimore, I think the conversation tends to fall a little more toward why the public should be able to have access.
00:13:11
Speaker
to this information and transparency being important uh... like all the work that this people like the sunlight foundation go out and and do and jen up a lot of interest in open data for that reason i think there's like a lot less focus on it for you know the reasons that andrew was talking about uh... which is kind of a shame cuz i think that's really
00:13:36
Speaker
the holy grail of open data is this ability to actually just raise the tide for everybody because government just works better in the end.
00:13:49
Speaker
So Andrew, this is a question for you. So from an organizational perspective, what makes a good open data policy? And then once that data is out there, Chris, when you're working with these open data sets, is there a conversation that's useful to have with the folks who are producing the data? Or is it just sort of like, there's some structure, we have an open data portal, we're going to put the data out there, and now we'll just let everybody sort of go do their thing. So Andrew, let me start with you about whether you think there is a way to make a good open data policy.
00:14:19
Speaker
Yeah, well, so I guess I'm a little radical sometimes in government by thinking more of what should we not be able to share and participate either internally or externally. Obviously, there's some internal use cases where you're like, look, I'm not going to share this suspect information or such in law enforcement or some specific health risk or analysis. But in general, let's start with default open.
00:14:46
Speaker
and go backwards. It's sort of a shame that we've been talking about opening government for so long and systems are still not designed with that as default so that the public, other agencies, other research organizations can't participate collaboratively. But so to a degree, yeah, what can't we open? But it's important that we bring those people to the table, right? We can't just open up the health department's records and
00:15:13
Speaker
Oh, this field that says, um, low income. Well, what does that even mean? You know, to anyone and, and what it means to me might be very different than what it means to you. So it's sort of hard. Uh, I think if we started from scratch, it'd be great to be all open and even discussing our, uh, system development, but we're, we're just not there. So I think start small and just engage the community and share, you know, as you go along and open more and more up.
00:15:41
Speaker
So okay, so what about this engaging

Decentralized Open Data Management Approaches

00:15:43
Speaker
with the community? Chris, when you're doing your projects with Open Data in New York City, are you engaging with the data producers or is it they've given out the data and that's sort of the end of the conversation from their perspective?
00:15:56
Speaker
These days, I'm much more in the business of going out and scraping whatever I need from some better source than what ends up on the data portal, just because my experience here is that, especially in New York and any city that's of a certain size, you're going to lose something when you start centralizing the open data program.
00:16:17
Speaker
And not only centralizing the program, but actually centralizing the movement of data into a commonplace. I think that's a shortfall of the open data portal as we've come to know it in a lot of major cities is that it now becomes this chore, right? That agencies must be in compliance. It's now a law. It's been legislated. It's a thing you have to do. It's just one more thing you have to do when you have 100 other things to do. So unless you really have a champion or somebody who really is behind it and is willing to stand up and say,
00:16:46
Speaker
I'm going to take responsibility for these data sets. It's already one step removed from the producer at least. Maybe it's more than one step removed if it's in a very deep agency. So my philosophy is that it should live as close to home as possible. And it really should be a celebration of that agency or even that subgroup within the agency to be publishing that data, not a mayoral initiative.
00:17:10
Speaker
Whoever's listed as the point of contact on the dataset on the portal is either not accessible, not responsive. Basically all complaints end up being routed through the central program and they don't have a great track record for responding to requests and things like that. It is a lot to manage and that's why I think a distributed approach would be better. David, from your perspective as a journalist, do you have the same sort of relationship with the providers of the data?
00:17:37
Speaker
Pretty similar, I do want to talk actually about a scraper that I worked on, but yeah, it's pretty similar to what Chris described. I appreciate the centralization of data and data portals, but as a journalist, you know, a lot of the interesting stuff is sort of hanging out in the margins, needs to be scraped or otherwise kind of sought out and found. And it would be great if all that could go in, you know, centralized data portals, I think. It would also be great if there were, you know,
00:18:06
Speaker
if agencies were able to publish that data themselves more effectively. One thing, speaking of the interplay of kind of internal to government, external actors, other agencies, I worked on a project where we scraped data about Cook County Jail, the biggest jail in the country. And it was really interesting because our primary user was other agencies within the county because the sheriff's department didn't provide the raw data to them.
00:18:32
Speaker
to the other agencies. They only provided summary information and so these other agencies within the county actually really wanted to be able to analyze it themselves and just see what was going on beyond kind of what they were getting from the sheriff's department.
00:18:48
Speaker
What was interesting about that project was that that then in turn put pressure on the sheriff's department to open the data. It's not open yet, but there's now an initiative underway to actually do that. It was interesting because we all got together and we talked and it's tough to open data like that because there's records that are wrong and you could compromise somebody's privacy. It's all a matter
00:19:10
Speaker
public record, technically. But there's a lot of really interesting questions about what it actually means then to open up that data. And the last part, which I thought was really interesting about that project, was we had some real hotshot coders on it. But we also had a guy who was bipolar, quasi-homeless, or at least sporadically homeless, trying to teach himself to code. But he had a lot of issues, a lot of barriers to doing that. He definitely
00:19:37
Speaker
you know is on the the the harder part of kind of for our society the and but he would have been an awesome clever because he actually spent a lot of time including jail and in he was a subject matter he truly was a subject matter expert he was like oh well that person has that designation because they checked in that they went to the main room and they got an ankle bracelet they got sent back out onto the street
00:20:03
Speaker
He knew that just looking at the data and we were like, dude. We're understanding so much more about what we're seeing just because you have this direct experience. It was just an interesting experience and bringing together a bunch of those different perspectives. I thought it was going to be really antagonistic and the sheriff's department was awesome. They were really into it.
00:20:28
Speaker
you're collecting stuff that we don't have because our back-end systems are kind of not the best.
00:20:35
Speaker
I think it's like important to know people who work inside of government. They're not just being cagey for the sense of being cagey about open data.

Importance of Governance and Collaboration in Open Data

00:20:44
Speaker
They're not having things because they don't want people to know things. They really are well-intentioned. They're really usually handcuffed by the technology that they work with. And I actually would sort of, maybe this is where I'll argue with Chris, where I think that
00:21:00
Speaker
strong policy is an opportunity to start to like unshackle government employees and things like having a policy that mandates that there is good governance of open data. So like I sit on the governance committee for the state of Maryland and I'm just
00:21:18
Speaker
a citizen that's interested in open data, but I sit in a room full of people who own data and they are scattered through every agency and department of the state, but they make us all get together every other month and have a conversation about like,
00:21:33
Speaker
what's happening, what needs to be changed as policy, what data has been released but not refreshed for a year and it's supposed to be refreshed every month. And by just throwing everyone in a room together, you get a lot of problem solved and you make a lot of, which is kind of what you're talking about also with the hackathon. You think about internal governance as being an opportunity to flesh out those problems
00:22:03
Speaker
not make it an experience of one person to a website, then I think that rules like that in policies or executive orders can do a lot to advance what open data looks like for everyone on the user side too.
00:22:23
Speaker
I agree with that and say that it is putting the word in people's mouths. People are talking about it. People know what it is. Like I said, I think it's how it's implemented is going to dictate quality. What I said is that the fact that the policy exists and you must publish data is not a terrible thing. It's just that how do we actually go about publishing it?
00:22:45
Speaker
you know is it sure is it a checklist item or is it actually are you know is is this policy eventually going to to lead to uh... that real in a change in attitudes where you know it's not like oh damn this is something i have to do to keep in compliance with the law but you know yeah we we actually believe in open by default and this is what we're going to enforce in our agency like i don't think we're there yet in a lot of ways and yeah i've just from my own my own you know dealings with with
00:23:10
Speaker
Government agencies. I know that you know, they still have to sign giant long memorandums to share little bits of data And you know when it does get shared it gets shared via email as you know CSV is playing around Things like that. Mm-hmm I'd agree with that and I think you know the the role for open data officers and such in government is to be that facilitator I've been on initiatives where Agencies will be will show up and say I was ordered to release this. What do I need to do? Give me the schema I'm like, oh, well, that's nice
00:23:39
Speaker
I don't even know what you have or what you'd want to share or how you'd want to collaborate, but let's talk about why this is good for your business process. And hopefully by the end, they're on board with that as well with the idea that, oh, you know, I can get a lot of value out of this conversation with the public or with other agencies or with, you know, all these different groups. But you're right. If it's just a release this schema by this date in this format with these fields, you're good.
00:24:04
Speaker
you sort of get into that, that industrialized open data, which I think we're against. And then the schemas are almost meaningless because you don't know what someone might have or might want to share. There's agencies in Texas that share suspects information, which I was like, oh, I wouldn't think you'd want to share the suspect because you haven't arrested them. So that's a little weird to me. But it's interesting how different people take different takes on what they can open or what they're willing to talk about. And I think that's,
00:24:33
Speaker
sort of the good point about having a conversational and not so checkbox policy. So I think in the beginning, it's that weird space where don't over governance, but don't under either. So as with most things, there's no one sort of silver bullet to solve a problem or create a solution. But if you were to start, if you had your imaginary city and you were going to say, we're going to have an open data policy and open data platform, what would be the one
00:25:02
Speaker
thing that you would start with? Is it about funding? Is it about leadership? Is it about convincing the folks who are working with the data that there's value as opposed to just opening it? What is the one thing that you would start with in your imaginary data-rich city? Sharon, I'll start with you.

Continuous Improvement in Open Data Programs

00:25:18
Speaker
Okay, well, just like a shameless self-promotion. We're all about seeing the self-promotion. The Center for Government Excellence actually has a playbook on how to launch an open data program. And so people that are interested in this could just go read it. We really talk about structuring a program that is about team gathering, policy creation,
00:25:44
Speaker
Community engagement and continuous improvement so that's not a silver bullet that's like a list of four things obviously. But they're all pretty critical and policy creation usually does rely on some sort of executive level by him.
00:26:01
Speaker
But I think that the most important thing, and it sounds like we kind of agree, is the continuous improvement. So that doesn't really help you get started because you have to start with something. There's lots of ways to figure out what something is. We would recommend data inventories and prioritization and things like that, which we also have playbooks on. But I think that the most important thing is continuous improvement. It's not a box to check, like Andrew said.
00:26:28
Speaker
open data is not an end in and of itself. It's got to be a practice that's advanced all the time, like anything else that government does. I would take that from the technologist's perspective. Again, I think open data is a technical problem. It requires technical people to figure out. All the byproducts of it are not technical and lead to better government and more informed citizenry.
00:26:52
Speaker
At the root of the problem is moving data from point A to point B in a timely manner with a certain standard. I think I would start with sort of the Amazon mandate where everything is an API. When you build every piece of government, whether it's creating laws or running the roads or building a transit system, whatever it is, every information system needs to have accessibility built in from the ground up.
00:27:15
Speaker
It doesn't mean it needs to be 100% open, but it needs to have the options of giving people certain permissions and a distributed but very well structured and very well organized technology platform for moving information around on demand at will for whoever needs it and wants it at a given time is foundational to all of this. I think the byproduct of turning a table into an API
00:27:38
Speaker
which is what many open data portals do. It's a great technological leap, but it's only as good as the table that came in. What I'm saying is we've got to get closer access to the source and then our quality will be much better and our timeliness will be much better. I just want to know how you pay for that, Chris.
00:27:56
Speaker
You can't have, you know, everything is an API that's really aspirational and great, but I think Andrew could speak even more accurately than me about what, you know, how like of like 90% of governments, 90% of their data is like an Excel spreadsheet. So there is like a huge technical problem, right? And it needs money to solve it. And the truth of the matter is that like,
00:28:20
Speaker
If we went out on the street and we pulled 100 people, like, what do you want your tax dollars going toward? Unless they ran into you, Chris, no one's going to say they want it going into open data technology. I don't think. They would all say APIs, I think.
00:28:35
Speaker
No, I think that's a great point. I mean, it's a story I always tell about the imaginary person at the Census Bureau who's asked to create an open data portal.

Challenges in Implementing Open Data Policies

00:28:46
Speaker
And it's not like they're given more training or more funding or more staff, they're just asked to do it. So to the points that many of you have made, it's not like all of a sudden you just say, do this thing and everybody sort of knows how to do it magically, right?
00:29:01
Speaker
Andrew, since you got to introduce yourself last, let me give you the last word here. What is your silver bullet, although Sharon had four, so I'll give you a few. Sure. Mine actually is that leadership. I have been in government for quite a while and often I'm the person who's to say, well, why isn't this open? I default that. I was with a large metropolitan police department
00:29:28
Speaker
And the idea to share some charts based on use of force and officer-involved shootings came up. And I said, well, why don't we just publish the data openly? We have the system. We track X, Y, and Z. We could wire it up. We could do some visualizations and help with that. And we did. And actually, it was kind of bizarre because it almost went unnoticed until large events happened in that city. So having leaders like me, not to toot my own horn, but to just internally start changing that
00:29:58
Speaker
So at the time it gets to the public side, we're already doing that. We're going that way. We're thinking about that. We want that. Uh, so that, you know, in itself really pays dividends. Uh, budget and funds is always important, but hopefully a good leader can balance and transition that and get us off these systems, which boy, I would have killed for Excel in some cases. Uh, cause at least I could do something with that, but there are, there are departments with very, very, very old,
00:30:26
Speaker
applications that you cannot get data out unless you pay a vendor a lot of money. And that's actually far worse than just an old system. So my silver bullet is quality leadership. Well, on that somewhat depressing note. No, it's not a depressing note. It's a good note. There's lots of challenges. It gives folks like us things to do and talk about.
00:30:47
Speaker
I want to thank all of you for coming on. This was super interesting. We'll do it again, I think. I'll find something else for us to fight about, and we'll try to get you all arguing. So thanks, David, Chris, Sharon, and Andrew, for coming on the show.

Conclusion and Feedback Invitation

00:31:00
Speaker
And for those of you listening, thanks a lot for tuning into the show. Please rate the show on iTunes, and if you have comments about this or other episodes, please hit me up on Twitter or on the website. And thanks again. And until next time, this has been the policy of this podcast.
00:31:26
Speaker
This episode of the Policy Viz podcast is brought to you by the Maryland Institute College of Heart. MICA's online graduate program and information visualization trains designers and analysts to translate data into compelling visual narratives. Join expert faculty such as Andy Kirk, Marissa Peacock, and John Schwabisch to mine the data and design the story. For more information, go to mica.edu backslash MPS in Viz.