Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Protecting Your Privacy in a Data-Driven World with Dr. Claire McKay Bowen image

Protecting Your Privacy in a Data-Driven World with Dr. Claire McKay Bowen

S8 E216 · The PolicyViz Podcast
Avatar
325 Plays3 years ago

Dr. Claire McKay Bowen is a principal research associate in the Center on Labor, Human Services, and Population and leads the Statistical Methods Group at the Urban Institute. Her research focuses on developing and assessing the quality of differentially private data synthesis methods and science communication. She holds a BS in mathematics and physics from Idaho State University and an MS and PhD in statistics from the University of Notre Dame. After completing her PhD, she worked at Los Alamos National Laboratory, where she investigated cosmic ray effects on supercomputers.

In 2021, the Committee of Presidents of Statistical Societies identified her as an emerging leader in statistics for her “contributions to the development and broad dissemination of Statistics and Data Science methods and concepts, particularly in the emerging field of Data Privacy, and for leadership of technical initiatives, professional development activities, and educational programs.”

Episode Notes

Claire on Twitter

Claire at the Urban Institute

Claire’s personal website: https://clairemckaybowen.com/

Protecting Your Privacy in a Data-Driven World

Book page: https://clairemckaybowen.com/book/

Data4Kids

Overview of GDPR

One Nation, Tracked. Story from the New York Times

NetFlix Cancels Recommendation Contest After Privacy Lawsuit

iTunesSpotifyStitcherTuneInGoogle Podcasts

Recommended
Transcript

Introduction to Policy Vis Podcast and Episode Topic

00:00:13
Speaker
Welcome back to the Policy Vis Podcast. I'm your host, John Schwabisch.

Importance of Data Privacy and Security

00:00:17
Speaker
On this week's episode, I talked to my Urban Institute colleague, Claire Bowen, about data privacy and security. It's one of the most important issues when we think about how our data
00:00:29
Speaker
is being used and how it's being used for us and against us. Claire has a great book about this. I really highly recommend it. It's linked in the show notes.

Raising Awareness on Data Collection and Usage

00:00:38
Speaker
You should check it out. It gives you a great overview of these data privacy issues. And so Claire and I talk about her work, her background. She tells some incredible stories about things that might scare you a little bit when it comes to data.
00:00:50
Speaker
Um, but hopefully this will

Conversation with Claire Bowen on Data Privacy

00:00:52
Speaker
get you thinking about how we can be more careful with how we collect and use our own data. So here's my interview with Claire Bone.
00:01:01
Speaker
Hi, Claire. Welcome to the podcast. Thanks, Sean, for having me. I'm really excited to be talking about data privacy. Data privacy on a podcast. On a podcast. On a podcast, right, because that's where data privacy should be discussed. So this is exciting. So just quickly for folks who don't know, Claire and I work together at the Urban Institute. We've done a number of projects together. Our most recent one is our Data for Kids project.
00:01:27
Speaker
on helping kids learn more about data science, data visualization, all the good things that kids should learn about these days. And today we're going to talk about Claire's book and her work on data privacy. So her book, which I'll hold up for those of you who are watching the video on
00:01:44
Speaker
YouTube is, which you can't really see. But anyway, protecting your privacy in a data-driven world. So we're going to talk about data privacy, so important for those of us working with data.

Claire's Journey into Data Privacy

00:01:53
Speaker
So let's start with the Claire Bowen origin story. So what got you interested in these issues of data privacy and security?
00:02:03
Speaker
That's a really great question. I will give you the really short answer and then I go a little bit longer because the short answer was I was applying for funding as a graduate student. There it is. I was looking at different options and I thought, wow, there's a cool fellowship through Microsoft. I brought this up to my advisor. Again, this is my first semester starting grad
00:02:24
Speaker
my grad program and my advisor said, well, just as a piece of advice, you always want to pitch the research project based on what they're interested in. I bet Microsoft is really interested in privacy. I used to do this for my graduate work. I haven't done it since then. Maybe you want to look into data privacy. At the time, she also said, hey, there's a new thing called differential privacy you might want to check out.
00:02:47
Speaker
So that's how the origin story started, but to give some more context, so it's not so blunt as it was because of funding, even though that kind of started it. More context is I actually started in physics as an undergrad and I went into physics because I really wanted to know how the world work. And I thought it was like, oh, there's these cool, challenging problems and physics answers a lot of these questions. And I got into math because I learned that math was the language of science.
00:03:13
Speaker
And that kind of evolved into getting to statistics and

Everyday Data Privacy Issues

00:03:16
Speaker
realizing I really liked the analytics part of doing scientific work. And also realizing that to like paraphrase a famous saying from a statistician, John Tukey, who said that basically you'd play in everybody's backyard if you aren't statistics. So that's why I decided to pursue that. And then my advisor who was very flexible about what topics we could cover, she's a Bayesian statistician. So for those of you who don't know, there's like two different
00:03:43
Speaker
ways of thinking about statistics. There's the frequentist, which is what we're normally taught. And there's the Bayesian, which is actually the foundation for machine learning, AI, so on and so forth. And so she says, as long as you're thinking about Bayesian statistics, then I don't care what topic you do for your dissertation.
00:04:01
Speaker
And I got into privacy because again, there was that funding thing. And then when I was digging into it, I realized like, wow, this is a really cool area. There's a lot of open problems and questions. And it has a very obvious application because that was a really big thing for me when I was in physics was that I knew I wasn't going to be a theoretician. I really wanted to solve practical problems. And
00:04:21
Speaker
privacy felt like a good fit on trying to look at that intersection. It did snowball from there because I did win the funding, then I won more funding. When that happens, then you're like, oh, well, I have money to actually research this and so I could actually deep dive into this

Impact of Data Privacy on Analysis and Research

00:04:38
Speaker
area. That's actually why my whole dissertation was in this field because I was able to
00:04:42
Speaker
just dedicate all that time to look into it. And luckily, I shouldn't say luckily, but sometimes in grad school, you go on a topic that you find out you don't actually like, but I really enjoyed it and it's become my full career. Nice, nice. That's a good origin story. All credit to your advisor, where credit is due, right? Right.
00:05:03
Speaker
So can you give folks some idea of where, for folks who may be not familiar with this, where data privacy comes up in their everyday lives? And I think generally people sort of have a general concept of this, like when you talk to Alexa, like we all know, like it's being stored somewhere, but like for those of us working in data all the time, like where does this pop up that you think is sort of most relevant to our work in our lives?
00:05:31
Speaker
That's a great question. So I'm going to back up a little bit and kind of clarify what I mean by data privacy because it's like a nice catch-all phrase of like what I do, but it's a very broad field. And so sometimes when I tell people, hey, I work in data privacy, they think I do encryption or cybersecurity. I fight off hackers and be like, oh, that's so cool. Or I have a few family members who are like thinking I'm an app developer after I told them to stop using certain apps because of security reasons. And so anyways, it kind of digresses from there.
00:05:59
Speaker
What I focus on is trying to expand what I call expand access to data or make sure that there is very sensitive information that could be useful for making very impactful public policy decisions, but making it so that the researchers who analyze it don't know who is in that data. One of the examples I give for thinking of everyday use, most of us have a smartphone now and it records your location and time. That kind of information, you can figure out actually where
00:06:28
Speaker
you live if you're in a certain residential area during sleeping hours, where you work because you're in a certain location during working hours. Even if you release just that dataset of where a person is, time and place with no identifiers, no names, no gender, no race, ethnicity, so on and so forth, you could still figure out who they are. This was actually done by the New York Times back in 2019. They did a whole article about this where they got a dataset like that.
00:06:55
Speaker
where they were able to identify somebody because they were at Microsoft campus for certain times. And all of a sudden they switched over to Amazon campus and they were able to go on LinkedIn and figure out who this person was and they verified. So like, Hey, is this you? And it was correct. And so then some people's responses to that is like, okay, we should not have that data public. That should only be kept in whoever's collecting it for very specific reason or
00:07:22
Speaker
or the cell phone companies, they shouldn't be sharing it to anybody else, but just for whatever purposes they need for maintenance for the cell communications. However, that information is what's used by FEMA or for other emergency responses because that tells FEMA what is the patterns that they see going through the United States. This last year, we had a lot of natural disasters with hurricanes and forest fires.
00:07:47
Speaker
and flooding and so trying to figure out based on people's patterns, where did they go, what's the best ways to close down certain roads versus maybe we need to block out these other areas or maybe we need to prioritize certain neighborhoods because they have limited access to get out of the cities. That's the example I like to give because it's something that we all have. I'm pretty sure we all have

Understanding Data Aggregation and Privacy

00:08:10
Speaker
a cell phone. Except for one person we work with who's standing by their flip phone.
00:08:23
Speaker
We won't reveal names, but just to say that we do we do work with at least one person who's
00:08:29
Speaker
dedicated flip phone user, even in 2022. So, okay, so now that we have a sense of how this works every day as anybody, what about people who are data analysts, who are researchers, what should they be thinking about when it comes to data privacy? And I know that's a broad question because
00:08:55
Speaker
You're doing a lot of stuff. So another way to think about this might be like, what do you tell our colleagues at Urban about data privacy or what are the projects that you work on? I know that's like super broad, but yeah. I'm actually going to target one aspect, which is knowing that a lot of the data that you collect, unless it's directly the raw data, it has been altered in some way because of privacy concerns. Cause certain

Federal Data Laws and Privacy Practices

00:09:16
Speaker
data sets, like especially census is a really popular one. Cause especially at Urban, right? A lot of people at Urban uses the American community survey.
00:09:24
Speaker
or any of like right now we just had the 2020 census and so they use that data for a lot of their research figuring out like oh what is the demographic breakdown for a state and they try to figure out the survey weights for certain kind of analyses as they do. A lot of people who access that data think that that data has been the raw data.
00:09:45
Speaker
but that hasn't been the case for decades. There's been many acts. But basically, we have not had access to raw data for a very long time. So there's that misconception. And so that's why one of the things I tell people is when you take a data set, what are the things that have been altered so that way you're not going to say the wrong,
00:10:11
Speaker
quote, unquote, data story, because the data has been one of the terms is like aggregated up to a higher level. So instead of saying that we get down to what we call census blocks, which are really small units of geography, sometimes the data gets aggregated all the way up to a county level. And so if you just analyze data from that, you could make misleading conclusions because not all counties are
00:10:35
Speaker
created equal or created the bounds equal. The example I like to give is the town that I grew up in, in Idaho, so for those who are listening, I grew up actually in a rural area of Idaho and the county I was in is the size of Connecticut. It's a huge county with very few people. The town I was in was the biggest town in the county with 3,000 people.
00:10:59
Speaker
So you're trying to make sometimes decisions for a whole county with people who are scattered throughout versus another county like Arlington, because that's really close to Urban Institute. And that one is only like 26 square miles or something like that with hundreds of thousands, right? It's a lot more people.

California's Data Privacy Laws and Effects

00:11:20
Speaker
I shouldn't say hundreds of thousands. I actually don't know the whole population. I mean, you got to be careful that data privacy, how many numbers you're putting out there.
00:11:27
Speaker
Yeah, exactly. But definitely more, I think, in that county. We can say more. Definitely more than the county I was in. Right.
00:11:38
Speaker
Um, so do you, I mean, not so much for the folks that we work with, cause many of them are sort of experts on a lot of those data sets, but like when working with a new data set, like what do you tell people to do? Should they read the code book? Do they look at like only the specific variables that they're looking at or do they like do most major federal surveys? Like, is there always a notice about like what they've done? Like, how do you, how do you.
00:12:03
Speaker
How do you think about looking at a new data set and how to uncover what aggregation or changes have been made? That's a great question.

Differential Privacy and Its Challenges

00:12:11
Speaker
Hopefully somebody has done a data dictionary and they talk about how the data was collected and what aggregations have been done. Sometimes you can. There's a contact person that you can go to. I'm picking on census again because they're such a classic example. They do have
00:12:31
Speaker
working papers or documentations on what methodologies they do use. There's a general one that says, for instance, when they do certain aggregations, they have to have at least 100,000 people with this certain kind of characteristic combination across the United States to be considered part of the data set without what we call suppression, which suppression means you basically remove that information entirely from the data. And that's also
00:12:56
Speaker
One of the things, another example would be the Bureau of Labor Statistics. They also do some suppression techniques, and they also have a document of how they make the decisions to suppress the data. I know the federal rules, but what about private sector data sets? Do they often base their suppression or aggregation rules on the federal government, or is it just like the wild, wild west out there?
00:13:20
Speaker
So it's definitely the latter, the wild, wild West. There's no one law federally and there's none for consumers. So we'll get to that one a little bit, but for federal laws is like hot pot here in the United States.
00:13:34
Speaker
So for example, Census is governed by a mix of Title 13 and SIPC, which again, I'm always bad with acronyms. So the Confidential Information Protection Something Act that has been updated in 2018. So there's that act there. It also governs Bureau of Labor Statistics data.
00:13:56
Speaker
There's also titled 26 which governs the internal revenue services data, but then it kind of overlaps a little bit with census too because they have some like joint data sets together so some of those data sets are protected by both.
00:14:08
Speaker
And then we have our healthcare data, which is governed by HIPAA and student data with FERPA. So there's all these pieces. And so there's not one to go to to say like, Hey, this is how you should protect your data because of how these laws are done. Now to consumer data, we have no federal laws comparing how consumer data. So that's why there's no pressure for certain
00:14:29
Speaker
companies to think about how they should protect their data. There are a few states. Last time I checked is between 11 to 13 states have some laws, but that's it. They very much range on severity level. The closest would be California, that they have the strictest set of laws to govern consumer data. But even then, they're just one state, so it's not a federal mandate. It's affecting or it applies to companies based in California.
00:14:59
Speaker
or to people based in California? So a mixture. So if the company is in California and it has to be a for-profit, so actually Urban would be excluded from this act because they're a nonprofit and then it protects all California residents. So if you are a California for-profit company doing a nationwide survey,
00:15:23
Speaker
include some California residents. Does the law apply just to those residents or to everybody in the survey? Because you're based in California. You're based in California. There's like, it's interesting. They actually have a clause in there of like, what is considered a California business. And like, there's also really, yeah.
00:15:39
Speaker
Uh, requirement too. And so there it's interesting because there's been also an update to, to the law cause they released one version in 2018, I believe. And then they have another update for this year. So that's why you've kind of been spammed again by different companies saying like, we updated our privacy policy. Oh, interesting. Cause that's California change. And I remember there's a whole section of your book about the actual penalties and a lot of these are not very strong, right? Like penalties are kind of like nominal.
00:16:09
Speaker
Yeah, it is. And so California kind of updated. There's a little bit more where like, for instance, like one of the biggest changes for the penalty is like, if it involves a child, which they consider anybody under the age of 16, or I think it's 16 and under, excuse me, then it's considered a severe case, no matter what. Like, even if it's like a light and fridge, because they have the classifications of like less severe versus severe.
00:16:34
Speaker
But if it involves kids, it is automatically severe. And that I believe in that case, it's like $7,500 per child. Wow. And that's the penalty to the state that's paid to the state on top of whatever civil penalties could come up if someone outside of the suit. Yeah, exactly. So one of the updates between the past California laws and then the current one, when they did updates, they actually had like a
00:17:04
Speaker
uh, designed a body that would actually pursue those lawsuits because before it was kind of squishy, I guess. And then they also had, I believe a 30 day window that the company could like, well, as long as they correct it.
00:17:19
Speaker
Yeah. Right. It'd be okay. Right. It'd be okay.

Balancing Data Privacy and Utility

00:17:22
Speaker
But for like bigger companies, if there's a lot of money on the line, they'll be very motivated to fix it in 30 days. Sure. Sure. Right. It's 7, it's 7,500 a pop. You would think that you would, yeah, jump on that. Yeah. Um, you mentioned this phrase, differential privacy earlier, which, um, is a big deal. And I wanted to ask you to explain it for folks,
00:17:44
Speaker
because it's so important and it's often confusing to me at least. I'm kind of like, you know, getting a little, maybe I get in the weeds too much, but like, yeah. So if you maybe just talk about that and what it is, and there's especially this big debate at the Census Bureau, although I guess other places too, but that's where I'm most familiar with it, so.
00:18:01
Speaker
Right. So it is a very complex topic. There are a lot of people who are very smart people who struggle with understanding it. So I'm just going to make that kind of like caveat disclaimer as I try to explain it very like high level and quickly for people who are listening. So before I even dive into what differential privacy is, I have to talk about how
00:18:21
Speaker
the fact that like when you are looking at protecting a dataset, you have to define what you mean by what is privacy and what is a risk to the data or information you're trying to release. So for many, many years, several decades, the way that we define, I say we as like the federal government or other agencies, defined privacy risk is like being able to identify somebody or finding a group of people or saying like, hey, if these people are smoking, we can infer that they're likely to have
00:18:50
Speaker
So maybe we should increase their health insurance rates because they're more likely to have cancer. Like we don't want those kind of like disclosure risks or privacy to be disclosed. And so those are very intuitive definitions. But the problem about defining privacy that way is like whoever thinks that's the way we should define privacy, it's very subjective.
00:19:13
Speaker
But again, very intuitive, being able to like, oh, can I match somebody in this data set versus another data set? So a very classic example that many people like to cite is the Netflix prize data set. That was like this million dollar prize Netflix did back in, I think it's 2007, 2008 or so, saying, hey, if you can improve a recommendation system by 10%, you will win a million dollars from us. And so they released a data set that was anonymized. They really have people's identity
00:19:42
Speaker
personally identifiable information.

Need for Privacy Education

00:19:45
Speaker
But one group instead of trying to approve the recommendation system was able to directly link the records and that and the Netflix data set with IMDB and be able to identify certain people. And so that caused a lawsuit. And so maybe some listeners here think
00:20:01
Speaker
Well, Claire, that's silly. I don't care if somebody knows I gave five stars for like the latest Avengers movie or something like that. But one of the things that came from the lawsuits is that you don't know what could be inferred by the data set. For instance, you could figure out people's sexual preference apparently based on what they were watching. And so that was one of the lawsuits was that you could identify if somebody was on the LGBT or LGBTQ plus.
00:20:26
Speaker
So that's sensitive and that's something we thought of. So that's what we call record linkage attack. That's something that people will say, Hey, that is a disclosure risk. Let's protect against that. So that's my level setting there. That's how we've been doing it for many, many years.
00:20:41
Speaker
Now, differential privacy tried to tackle the ad hocness of that, right? You're trying to, like, before we tried to predict, like, how is somebody going to attack? Is they're looking for one person, the group of people, the inference that I said earlier about smoking and cancer, or even knowing, like, oh, who knew that sexual orientation could be inferred

Proposing Unified Federal Data Privacy Laws

00:20:59
Speaker
from this Netflix dataset, right? Like, all those kinds of things. And so, basically, differential privacy says, hey, I'm a new privacy definition.
00:21:07
Speaker
I'm going to scrap all those things and say that we're going to assume the worst possible case scenario. I'm saying that you must make a method that says that you have all the information of all other records but one person and that you have to think of all possible versions of that data set. And that means that the account for future data sets. So that's another problem for past methods is like, you don't know what future data sets are going to be. So it's like, so you have to protect against all future possible data sets.
00:21:36
Speaker
and that a person would have basically unlimited computing power as if they were going to do blunt force, like try to figure out something. So that's what differential privacy basically says is like, this is how we should define privacy. So it's a very, very conservative, very like high privacy guarantee. But the criticism you get from that is like, one, trying to figure out what is the universe or the possible versions of datasets and how to protect that is really hard and really difficult for people to even wrap around like,
00:22:06
Speaker
all future data sets. What does that mean? Data sets, you're like, okay, well, if it's everybody but that one person, isn't that a bit too much? So then you get data sets that might be way noisier than before.
00:22:21
Speaker
So this actually goes to your point earlier.

Exploitable Features in Privacy Laws

00:22:24
Speaker
So like, well, I keep hearing about differential privacy in the context of the 2020 census. So up until 2020, the methods have all been using the what I call more traditional privacy definitions. And so when they, when I say they, Census Bureau created their, what they call the disclosure avoidance system is, is the phrase. When they made their system to protect the decennial census data,
00:22:47
Speaker
They based it on those traditional privacy definitions. And the biggest method they use, biggest or the main method, it was data swapping. So they were swapping records with similar characteristics based on like, oh, there's very few people with,

Conclusion and Book Recommendation

00:23:02
Speaker
who are let's say African-American with so many kids in this one area of the country, let's swap them with another
00:23:08
Speaker
family in another part of the country and that's how we're going to protect them. So 2020 is the first time that we did all the way of that and decided to make a new method that satisfies differential privacy. Now I'm being very careful with my words right here because often I hear people say differential privacy is the method. It is not. It is a privacy definition that a method must satisfy. So the method or the algorithm that was used for 2020 is called the top down algorithm that has components of it that
00:23:37
Speaker
has differential privacy in there. I gotcha. Okay. That actually makes, I just learned a lot there, but that was really interesting. So how, we can keep it specific to the 2020 census. So how does the census then determine
00:23:54
Speaker
I guess the resulting accuracy of the data, right? So once you've done either a swap or you've done all these other things, how do they say, yeah, these data are not just, we haven't just created a random dataset for you. Right. And that's a great point. I've been hinting at it, but I haven't explicitly said that there's this natural tension between protecting the privacy of the dataset and making sure that it's useful or accurate or for every, any kind of case you want. So you can't have all the information.
00:24:24
Speaker
and all the privacy, right? So like in those two extreme cases, it's either like, if you want all the information or usefulness that you're just gonna fully expose people and there's no privacy. To make it fully private, you could just lock the doors and just say, hey, you don't have access to the data, right? So there has to be some trade-off in that to your point, it's like, how do you determine that? Well, for any dataset,
00:24:46
Speaker
There's no one utility metric or measure that you should go by. It really is dependent on the data. What are people going to use it?
00:24:54
Speaker
Census again is a great example because it's used for a lot of things. And so it's really hard for them to be like, Hey, it needs to be useful for everything, right? Like one that's impossible. So they do like a suite of analysis. Like they do one way is to look at summary statistics. So looking at like how the bias or that I don't think they call it bias. I think they look at absolute error of some of the measurements, like how variable it is. So they do those quick checks, like at different geographic levels.
00:25:23
Speaker
they have what sometimes we say outcome specific or like there's a specific use case that somebody will use it for. So 2020 census is used for redistricting. So they probably want to make sure that it's going to be still very useful for that redistricting file. There's other ones where you can look at distributional aspects of the data. So saying like maybe this variable, it looks really good, or let's say this categorical variable is very close to the original categorical variable for these counts or something.
00:25:52
Speaker
or looking at continuous distribution and comparing those two. And then some people are like, okay, what about relationships? So you can apply certain models or look at just like multivariate distributional features of the data. So it can vary quite a bit. But just to say that census has a really hard task, right? Because you can't make a data set that's completely useful for everything. Again, if you do that, then there's no privacy. So like how do they optimize on that? And there's certain things where I,
00:26:21
Speaker
they're just not going to be good at because it's not privacy issues. Like one of the examples I hear that people use the census data is like, I guess there's a county in Ohio that uses it for figuring out restaurant permits. Okay. So that's not something census is going to be thinking about. Yeah, right. Right. But it's a random use case that
00:26:40
Speaker
But to your point, someone could try to somehow combine data, like to the privacy, right? They could combine various data sets to pull out like this person goes to this restaurant, or I don't know, whatever that, whatever that magical future data set and data need is, yeah. Yeah, that's a heavy lift.
00:26:58
Speaker
Okay, so before we wrap up, and you've alluded to this a little bit. I wanted to ask you if you were in charge of the US data privacy.
00:27:12
Speaker
thing. And you get to control everything, not just federal data, you get private, private sector, nonprofit sector, you can you're the data privacy czar of the whole country, but czar with like a star on it because it's everything. What, what would you like to see happen? Or what would your what would your policies be?
00:27:32
Speaker
Oh, that's a great question. So I'll do, I keep saying this because like you just have a great question. So there's two, there's just two parts. So one part is like, I would hopefully have some control over our education system because I think we need to teach this. This is actually all this information I'm providing. You don't learn this unless you're in grad school. And even then it is in these like computer science departments. So like I talked to a colleague who came from a computer science
00:27:58
Speaker
program. And he said, well, like, well, we learn privacy, but it's not expanding access to data part of privacy that we've been talking about. Like there's no formal classes on it. If it is, it's a graduate level in computer science. Like I said,
00:28:11
Speaker
And that's not good because these data are used by so many people, right? They're the public policy people, social science, demographers, economics, I'm from statistics. So this should be standard in some sort of classroom where they do data analytics. And so there's already been like a movement with professors trying to teach students that your data set isn't this beautiful, clean thing.
00:28:38
Speaker
Like, what is it? The classic example is the iris dataset. Right. Like, look at how everything's perfect. And it works out. And it's like clustering and things like, no, no, no, no. So like on top of it all, it should be discussed that there are these privacy implications. You're also seeing, I guess, another movement of like data ethics and data equity. So that should all wrap in. And so that's why I'm like, well, I want another star into my czar part, which is like the education. You're going to be busy, but okay. All right. Right. Okay.
00:29:05
Speaker
And then I guess like for all of privacy, definitely getting more of a unified public policy, like law, right? For how we use data, how it should be accessed because they're one, we don't have any consumer privacy laws, right? And then two, the issue with the federal laws we have is because they're piecemealed, some agencies have to renegotiate every federal fiscal cycle.
00:29:30
Speaker
Wow. For different exchange of data. And so, like, for instance, there's a project that I'm working on with the Bureau of Economic Analysis and they have an agreement with the Internal Revenue Service because they, like, Bureau of Economic Analysis, they do a combination of census data, taxpayer data, and they look at some, like, labor data too. So,
00:29:48
Speaker
the way that they negotiate is they become a subcontractor to another agencies. That's how they work around that whole renegotiating every year. But some agencies, they still do a formal process of like, okay, it's that time of year again. Like we're going to spend like three months negotiating how we will exchange data under the different privacy acts. Wow. And that's three months out of the 12 months. So you get maybe nine months and then you have to restart over again. Over again. All over again.
00:30:16
Speaker
Yeah. So it is GDPR, which is the European law that you referenced earlier. Would that be your model, at least as like a starting point? Or is that not, from your perspective, not sufficient or not? I don't know. What? I don't know. Like, yeah. Would that be your starting point at least? Or like a model to build off of?
00:30:35
Speaker
I think it's a good model start. I think we have a combination of that in the California laws because we can learn a lot from both of them. So I think I could talk about this actually in my book that both of them have really great features, but some of them can be easily exploited. Right.
00:30:48
Speaker
So like one of the things that both of the laws have is like, if you have full consent from people, but like, what if you're pressured into giving full consent, right? So one of the things that we were actually seeing actually in school systems, I did not know this. Like now there's certain apps that you, like students download on their smartphones and professors can know your attendance. If you're physically in like their sensors and the lecture. Yeah. That's creepy.
00:31:14
Speaker
Now, you know, if your students have been there, if they're late or not, I mean, like, I guess students could just like one person with a backpack with their cell phones. Yeah. Well, like that's beside the point. But I mean, what's the stuff school to say, like, Hey, if you want to come to our school, you have to give full consent to, right. And we're going to track you. And if you don't do X, Y, and Z, then we'll take away your scholarship, for example, something like that. Right.
00:31:36
Speaker
Right. Right. And then like for work, right? There's already those like keyboard detection, like how active are you there? Like, are you actually working? And what's to stop your employer to be like, hey, like if you want to work for us, you have to consent that we were tracking this information on you.
00:31:52
Speaker
Okay. Well, on that scary note, I will point people again to your book, link to it on the show notes. Everybody should check it out. It is a mixture of scary stories, but also things that we can do as data users and consumers. Highly recommended. You should all check it out. Claire, thanks a lot. Thanks for coming on the show. This was lovely. Always great to talk to you. Thanks for having me.
00:32:20
Speaker
Thanks everyone for tuning into this week's episode of the show. I hope you enjoyed it. I hope you will check out Claire's book and her other work. You can head over to her page at the Urban Institute, which I've linked to in the episode notes. Just one more mention of various ways that you can support the podcast. I of course have my Patreon page open where you can support the show for just about a cup of coffee every month. You can head over to my newsletter page where I have a
00:32:44
Speaker
free newsletter and a paid newsletter. The paid newsletter gives you some other advanced information, let's just say, some coupons for conferences, ability or opportunity to meet with me and a Zoom call. And there's also the new window community that I've been building where I'm sharing out little data visualization tricks and tips and techniques via text message only two or three a week. And you can text me back and say, I didn't really like that, or I liked that a lot, or what else can you show me? So I have that.
00:33:13
Speaker
opportunity as well. So check all those out. They're all listed on the show notes page and also on policyvis.com. So until next time, this has been the Policy Vis podcast. Thanks so much for listening.
00:33:24
Speaker
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs. Audio editing is provided by Ken Skaggs. Design and promotion is created with assistance from Sharon Satsuki-Ramirez. And each episode is transcribed by Jenny Transcription Services. If you'd like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, YouTube, or wherever you get your podcasts.
00:33:45
Speaker
The Policy Vis podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our PayPal page or our Patreon page at patreon.com slash policyvis.