Introduction and Sponsor
00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by Socrata. Socrata is the global leader in software solutions that are designed exclusively for digital government. They deliver unprecedented data-driven innovation and cost savings for hundreds of public sector leaders and millions of their constituents around the world. Socrata's digital government solutions are being deployed for a wide array of strategic and mission-critical government services that empower citizens while enhancing their quality of life. To learn more about Socrata, visit www.socrata.com.
Guest Introduction: Rebecca Williams
00:00:45
Speaker
Welcome back to the Policy Viz Podcast. I am your host, John Schwabisch. I hope everyone's having a lovely kickoff to fall. I'm here today with Rebecca Williams from the Office of Management and Budget. Rebecca, welcome to the show. Thanks, John.
00:01:02
Speaker
Thanks for coming on. This is really exciting.
Rebecca's Career Journey
00:01:04
Speaker
We have known each other for a while now when you were at data.gov and now you're at OMB, a Senior Open Government Analyst, which is a pretty awesome title, especially for someone in the federal government. I agree. Why don't we start by having you tell listeners a little bit about yourself and maybe a little bit about what you're working on right now at the White House.
00:01:25
Speaker
Sure, I'll give my background for more context. I started off being interested in data as a city planner. And when I lived in New York City doing city planning work, I came across a group called Open Plans that did open source civic technology. And I got more and more invested in this space of
00:01:45
Speaker
trying to find new data sources that were free and open to use for research and other reasons. And from there I started working at the sunlight foundation and then I worked at data.gov and now I'm currently on loan to the White House to work on open data this time instead of like crunching numbers more on the policy side.
00:02:07
Speaker
Okay, so can you explain a little bit about sort of the difference between what you're doing now at the White House versus what you were sort of doing at data.gov?
Roles and Responsibilities: Data.gov vs. OMB
00:02:14
Speaker
Because my sort of, my thought was when you're at data.gov is you were sort of working with agencies trying to convince them to give you all their good data or at least the libraries and the meta links to all the good data.
00:02:25
Speaker
That is totally accurate. The briefest way I can explain it is I had more of a good cop facilitator role at Data.gov, and now I have more of a bad cop metadata cop police role, which actually I think suits my background from the Sunlight Foundation a little bit more.
00:02:49
Speaker
At data.gov, so there's overlap. I would say like about like 20, 25% of the work I do is still talking to agencies about their metadata in a similar way. Except for now, since I'm in the Office of Management and Budget in the White House, I can be more specific with my critique. And there are all these other like meetings I have to go to with higher up politicals. And I can say like, this agency isn't doing this with their data, or they really should be doing this. Can you tell leadership in that agency?
00:03:19
Speaker
that this is a priority.
Challenges in Open Data Initiatives
00:03:21
Speaker
Right, so what is it like or what are the challenges, I guess, is probably the better question, when you're talking to agencies and trying to get them to buy into this whole idea of opening up their data? Because I would suspect that for many agencies, that's a completely foreign idea that the data that they are working with, they're now going to sort of release out to the public. So what is that sort of interaction like?
00:03:43
Speaker
Yes, I think, um, I mean, we have 24 CFO act agencies that are under the executive orders slash open data policy that we focus on, um, specifically, uh, all the other like independent agencies, like for example, like the FCC or the FEC.
00:04:00
Speaker
These are agencies that it truly is just like a kind conversation where it's like we're doing this thing and this is how you can do it. But these other agencies are technically under a mandate to at least start doing steps forward in this direction. And there's definitely different strategies for different agencies. But I think part of my role being over at OMB now is shifting the conversation away from just like quote unquote open data and having that be like
00:04:31
Speaker
buzzword that's interesting and talking more about the life cycle of data management and then also how this plays into how agencies are actually managing things that are relevant to their mission.
Policy Integration Efforts: Data Act and FITARA
00:04:43
Speaker
So some of the things I'm working on at OMB is sort of rebranding open data to just be about data management period and about the full life cycle of data management. There's a lot of
00:04:56
Speaker
privacy concerns going on in the federal government right now, and rightfully so.
00:05:02
Speaker
And open data is still very much a priority for this administration, knowing that the administration will change eventually. And we want to sort of institutionalize all the good work that's been done with open data. So a lot of the work is just sort of making sure that we have data and data management baked into all these other policies that are happening. And one of those policies is the Data Act.
00:05:28
Speaker
which is a law that was passed last year to standardize federal spending data. And OMB also helps oversee that implementation. So that's like one example of something that I'm working on from the policy side that I wasn't doing at data.gov. Another example is
00:05:46
Speaker
There's a law passed recently called FITARA that empowers chief information officers in their agency in a way that they hadn't been empowered previously. Now they essentially will have a seat at the table for all agency decision making. And CIOs are going to need good data to work with about how their agency is being run and also how their mission is being accomplished. And they're going to need
00:06:15
Speaker
better data and data people to work on that data to help inform their decision making. But now the idea that like technology and mission are separate is sort of going away and definitely data is a huge chunk of what technology is.
Sharing Non-Public Data: Internal and External Challenges
00:06:32
Speaker
And how much of the discussion that you have with agencies is about when it comes to open data is it
00:06:39
Speaker
How much is the discussion about opening the data to the world versus sort of opening the data just within their organization sort of trying to knock down the silos between divisions? Yeah, so I think that's what we've actually struggled with the most. The opening the data to the world for the low-hanging fruit has been sort of
00:07:02
Speaker
It's going slower than I would like, but it's on course. But the internal stuff is more tricky. So the open data policy actually has three access-level categories. So we have non-public, restricted public, and public. And the restricted public bucket is supposed to be a free-for-all for all sorts, or a big bucket for things that either are for researchers, or for internal use, or require a login.
00:07:31
Speaker
But in terms of having agencies actually inventory the non-public and the restricted public stuff, they've been a lot more hesitant to even sort of list it somewhere. So there's been a major push to get agencies to share internally and then also like intra-agency better practices. Right now they're basically like one-off memos of understanding between agencies to share
00:08:00
Speaker
what might be sensitive information, and it's not the most efficient way to go about it. Right. And are the challenges for a lot of these agencies, is it technology? Is it culture? Is it the format of the existing data? Is it the privacy and security issues? Are there certain things you've identified as being the primary things that are the things you run into the most?
Governance and Cultural Issues in Data Management
00:08:25
Speaker
Yeah, so funny you should say that. So we just did like agency by agency sessions, like talking with people that are actually working on it on the ground and then did a survey of results that should be out any moment. We just closed it yesterday, but that will be public data.
00:08:43
Speaker
our survey results about what's going on with agency data management. But what I've seen is there's issues with data governance. Federal agencies are so large and it's just like organizing between components or within a component that is really difficult. And then what has been self-reported is that the cultural issues are actually more
00:09:08
Speaker
of a challenge than the technical. But I will say, in terms of all sorts of information management, talent has been a real challenge as well. Luckily, some of the digital services funding and 18F and USDS and
00:09:26
Speaker
the Presidential Innovation Fellowship Program has been really helpful, but again, this is still, I don't know how many people that makes up all together. It's like 200 people or something in a very, very large organization. We see time and time again, if we put somebody in that has these skills and they have authority or license to do what they need to do with the data, a lot of government data is in a position where it's just like it needs to
00:09:51
Speaker
to be ETLed into a better system to work better. Or it needs to be collected better. It's very like step one before we even get into really interesting analysis or critiquing better ways to collect information. So yeah, it's getting the right people in government and also the right people in government to help procure the right tools so they don't
00:10:18
Speaker
pay too much for technology that isn't really going to do the job they need or even so that they know how to include the proper clauses for like data rights or like requirements if you have if you're creating an API or an ETL like somebody that can speak policy or law and then also speak text so you're asking for what you need and you're not
00:10:41
Speaker
being swindled, essentially. Yeah, right, right. So I know you spend a lot of time in here in DC, at least in the data in the data community DC and in the meetup groups. And I'm wondering what role sort of the public plays in this whole effort, not necessarily in terms of sort of the demand for data, but sort of in the maybe helping to develop the tools or provide the feedback on
00:11:09
Speaker
You know, this these data are not useful. We can't get them, you know, sort of providing, not just demanding, like, we want this data, but saying, you know, this data that is out there is not very good. Can you sort of help us make it better? So is there a role for the for the, I mean, we can call it the tech community, but it's really broader than that, I think. But is there a role for the public and in as being part of this process of opening more data from the government?
Public's Role in Government Data Enhancement
00:11:34
Speaker
Yeah, absolutely. The interesting position that the federal government is in right now is that since 2013, there has technically been a policy about open data and data management. We're slowly moving along, but part of that policy is there's also a requirement to engage with the public about prioritizing data improvements. And since there's so much work to be done, the more the public
00:12:02
Speaker
And you know any aspect of the public so that the private sector or nonprofits or academia or journalists can identify like this data set has this issue and this is what I would need to fix it or ways in on like this growing.
00:12:19
Speaker
resources of tools. There's a bunch of open data, open source tools that are being made available in the federal government. I was just talking the other day about how we should just round them all up in one spot. But at least govcode.org is one place to start. And especially, I would encourage people to check out the GSA GitHub repo or the 18F GitHub repo or CFPB's GitHub repo. A lot of the data tools are coming out of those shops.
00:12:49
Speaker
But just weighing in either on the tools or specific data. I know every talk I give I'm just like email the government like what you need. Especially if you are a user of that data on a regular basis. A lot of the feedback that you end up getting inside of government is either
00:13:10
Speaker
There's a lot of research help desk type of requests that we've got at data.gov before that we can't really fulfill without doing more research or collecting new data that's not collected or
00:13:22
Speaker
using data that is in government data to answer the question. Or a lot of it is just YouTube comment type style feedback where it's just a random thing. So if you're really using it and you're really precise about it, I encourage you to reach out by email, by comment. Every agency has a slash data page where they're
00:13:41
Speaker
By policy, every 24 CFO Act agencies are required to have some sort of feedback mechanism. But also if you write a blog post about it or you write a story about it or you write a paper about it, I send those to the points of contact on a regular basis and I've seen things turn over. One example of that is
00:14:03
Speaker
Parker Higgins, who works at EFF, did a blog post last year about how their watercolor paintings that the USDA had.
00:14:13
Speaker
they weren't releasing them and uh... oh they were charging you can get three for free but then you had to pay to download these these watercolor images and he said you know open uh... everything advocate in the public domain and he was like these are government created paintings why is this the case he wrote like a really specific blog post about it uh... i sent it around to the USDA people and asked like what we could do and then like
00:14:40
Speaker
in government time it took like a month maybe two months but eventually that they worked out the
00:14:47
Speaker
They rechecked the contract of the vendor that they had procured. They had to digitize those paintings, and they outsourced that task. And in that contract, they thought that they would have to charge for it under the contract language, but they revisited it, and they didn't have to. So now all of those paintings are online, and you don't have to pay for them anymore. But that's one example where it's just like, wow, there was a really specific ask with a specific thing that he was going to do with them that I could
00:15:16
Speaker
get to the right point of contact and it doesn't have to be me. There's also, if you go to Project Open Data at the very bottom of the page or slash points of contact on that page, there's the email addresses of anyone who's a point of contact at these agencies on open data and they are by mandate supposed to listen to you.
00:15:37
Speaker
Nice. I like when people are forced to pay attention to me. That is nice. Well, I'll put all these links on the podcast website so people can use them, which is great. I want to, in an imaginary way, I want to speed up government time.
00:15:52
Speaker
Um, a few years and ask, uh, and posit the following. So, um, we're now in a world that some years in the future, um, and government data is out there. Uh, there are nice tools and there's more data that is available for people to get more easily. What is the next, and I know that I'm, I know that's a long, that may be a long way away and that's fine, but let's just speed it up. What is, what's the next, um, evolution in open data?
Future of Open Data: Integration and Validation
00:16:24
Speaker
government open data. I mean, we can talk generally about open data, but for federal government open data, what's the next phase?
00:16:32
Speaker
So this is my theory. It does not reflect the administration or anybody else. But I think it's true for federal government data and local government data. I think the next step is rather than just having it be broadcast authoritative data from the government that is either accurate or not as accurate as you want to be, it's going to be more of a conversation
00:16:56
Speaker
between government data and then other data sources. I think one of the first examples where this might actually play out, and I look for examples like this all the time, but OpenStreetMap, bearing any licensing issues, to be compatible with government data is usually, it's often the case that OpenStreetMap is ahead of whatever government data collection we're doing for geospatial things.
00:17:26
Speaker
And that's an example where someone that's not the government can provide data that the government doesn't have. I also think it'll be a way to validate data to see if it's actually accurate. There's already some government data sets that are sort of like this now where you can sort of compare and contrast to sort of like get closer to what like the accurate count is on something. I think one example is
00:17:54
Speaker
Um, maybe it's in the UCR where they asked about, um, hate crimes. Um, but then CDC also asked victims like what they were, um, victim to. And if it was a hate crime and you can like compare the statistics against each other, like how many people think they were victim of a hate crime versus like how many hate crimes are reported by police departments and like how far off is that number? Yeah. And I should sort of verify the cross cross and verify the data.
00:18:23
Speaker
Yeah, so I think there's going to be a lot of verifying and validation stuff, and that's going to be between government and between outside sources. And I think one really, there's a million reasons why it's an interesting topic, but the Police Data Initiative work is really interesting to me because it has this dynamic in spades where it's very important to talk about state data versus citizen collected data.
00:18:52
Speaker
when what you're trying to do is make sure that the state is held accountable in a situation. So you can't always rely on state data. It has to be compared against something. That might be farther away than I would like, but that's where I see it going.
Rebecca's Future Plans with Data.gov
00:19:13
Speaker
interesting. Well, so I want to thank you for coming on the show. This is a really fascinating discussion and I'm curious to sort of wrap up. So you're at data.gov, you're at the White House now, your detail there ends up in a few months. So are you headed back to data.gov or is your role going to change there? Sort of where are you going to be in early 2016?
00:19:36
Speaker
So the plan now is back to data.gov, but I think hopefully my experience working across several different perspectives for this now and also working more closely with 18F and these other teams, like these digital service teams and agencies, might also be useful or grow into something more specific. But it'll still be data for a while. Right. Great.
00:20:00
Speaker
Great. Well, thanks again for coming on the show. It's been really interesting. And for those of you who are listening, hopefully there's many of you. If you have data needs or data critiques from the federal government, just email Rebecca. She'll fix all the problems for you. Email me, though. I don't know that I can always fix them. But she'll put it in the hands of the right people, which is sometimes that's really what you need.
00:20:22
Speaker
So thanks everyone for listening. Be sure to check out the show on iTunes and please rate the show on iTunes and if you have comments or suggestions or questions, hit me up on the site or on Twitter and I will see you in a couple of weeks. Thanks so much. Bye-bye.
00:20:48
Speaker
This episode of the PolicyViz podcast was brought to you by Socrata. Socrata is the global leader in software solutions that are designed exclusively for digital government and provide benefits for hundreds of public sector leaders and their constituents. The company's customers, among others, include the cities of New York, Chicago, San Francisco, and Los Angeles. To learn more about Socrata, visit them on the web at www.socrata.com.