Introduction to Surveys and Data Collection
00:00:11
Speaker
Welcome back to the Policy Vis podcast. I'm your host, John Schwabisch. On this week's episode, we are going to talk about running surveys and collecting data and information.
David Johnson's Career Journey
00:00:21
Speaker
And to help me talk about that and to explain all the ins and outs and difficulties and challenges and I guess the rewards too, I'm joined in person actually by David Johnson, who is currently the director of the Panel Study of Income Dynamics at the University of Michigan at Ann Arbor.
00:00:39
Speaker
David, welcome to the show. Thanks, John. You mean I could have done this remotely? You have to fly all the way down here with us. Exactly. Who knew? How are you? Pretty good. Busy these days. Yeah, pretty busy. Yeah. Why
The Complexity of Survey Question Modifications
00:00:54
Speaker
don't we do this? Maybe you could talk a little bit about yourself for folks, because you've been doing this sort of business of conducting, running, improving surveys at lots of different places.
00:01:06
Speaker
for a while now. Yeah. And then we could talk about the PSID because now it's the 50th year of the PSID. Yeah. Okay, great. So currently, I'm the director of PSID. I've been doing that for about three years. But before that, I spent 25 years in the federal statistical system. So at the Bureau of Labor Statistics, the Census Bureau and the Bureau of Economic Analysis doing different surveys. BLS, I helped with the Consumer Price Index and the Consumer Expenditure Survey. The Census Bureau
00:01:34
Speaker
help with the current population survey, the American Community Survey and the survey of income and program participation, which we spent a whole lot of time reengineering. And now there's a new SIP out there that's collecting the similar data monthly on monthly income and monthly program participation, but doing it every year instead of three times a year. So I think that was that was a good thing. And then at PEA, I worked on
00:02:02
Speaker
health insurance and getting a better way to come up with pricing of health care.
Testing Survey Questions: Understanding vs. Consistency
00:02:07
Speaker
So the hard part in surveys is when you want to change a question. So especially in a PSID, like a longitudinal survey, you want to have consistency over time. And the current population survey that's used to measure employment and unemployment
00:02:23
Speaker
changing those can cause breaks in series. So anybody's ever looked at the new poverty estimates or the income estimates done by the Bureau, they'll see a break in series in 2013, around 2013 or so, when they change the questions, because we wanted to improve the way we collected income. That makes it complicated. So when you make a decision like that, you're balancing improvements to the question, but also then the consistency across
00:02:53
Speaker
over time, and then how ultimately is that decision made? I mean, there's a lot of testing that goes into changing your question.
00:03:00
Speaker
Right, so we cognitively tested, so in fact the income changes in the current population survey as well as the health insurance questions, we contracted with a number of different research organizations to look at this and to come up with suggested questions. You cognitively test the questions like with focus groups or a separate survey if you think you need to, if you're not pulling from another survey.
00:03:25
Speaker
PSAD, we try to find questions that people have used before. But we'll still send them out to a random sample to try to see how well they can answer those questions.
00:03:36
Speaker
Right. So you test a random sample of a new question. Are you testing that consistency within that random sample? Are you comparing it to administrative data? No, no. Most of the time, yeah. So when you're doing the test, you're not usually comparing to see consistency. You're looking to see if people understand the question.
Challenges in Survey Answer Interpretation
00:03:53
Speaker
Do they understand the way you're asking the question? Right. Or do they say, I don't know what you're even asking here when you're asking about my income or the timing of the income or those types of things.
00:04:03
Speaker
So that's the main thing. It's really hard to check for consistency.
00:04:09
Speaker
because we don't know for sure how people are interpreting questions, right? So there's
Impact of Question Order and Survey Fatigue
00:04:16
Speaker
a lot of times in income people round and we don't know if they're rounding up, rounding down or what they're rounding to the next 10,000 or something like that. So unless we have a baseline, a benchmark to say this question is actually accurate and right, the only way you could probably do that is to compare to administrative data. And when you do that, household surveys are okay, but there's extreme under-reboarding.
00:04:37
Speaker
What about the order of questions within a survey? So how does that factor in? Yeah, that does. And I haven't spent much time looking at that. I know there's other places that have looked at order because they're concerned about fatigue. So in the consumer expenditure survey, you know, they go through all your different spendings of housing and clothing and transportation and trips and travel. And they thought maybe if they move things around, they might
Origin and Design of PSID
00:05:01
Speaker
get better results. I don't know what the conclusions are.
00:05:03
Speaker
Right. So what's going on at the PSID? So it's 50 years. I'm sure there are folks who aren't familiar with the PSID. So maybe you could talk a little bit about it and then what you are all doing now and looking into the future. Right. So the PSID came out of Johnson's war on poverty. So he declared a war on poverty and they realized, hey, we need to figure out ways to measure this. And so what the PSID was meant to do was to find families who are in poverty and to move out of poverty.
00:05:33
Speaker
Now, hence they needed a longitudinal survey. So they contracted with the University of Michigan. Jim Morgan was there who had done a lot of this stuff. And so you can look at these reports of President Johnson to Congress saying, hey, Michigan's got this new survey. It's going to follow families over time. We started with 5,000 families and we followed
00:05:54
Speaker
them over time. We'll follow the family. If they have kids, we'll follow their kids when they move out. If those kids have kids, we'll follow those kids. And so we have families who started in 1968 who probably have about over a hundred different family members in the survey. We have some families where the people were interviewing
00:06:14
Speaker
are in the seventh generation of a family when 1968. So we have amazing numbers. And there's about 3,500 people who were still in the survey in 1968. So we've been doing this for 50 years. We follow
Family Loyalty and High Response Rates
00:06:29
Speaker
the same families. We now have about 11,000 families.
00:06:32
Speaker
Um, because obviously you grow, we tried to replenish it with new immigrants. So, but it makes for a very interesting way to look at, you know, sort of like America's family tree. So you can, you can look at all the families and everybody's and the big thing now and the big research people looking at her intergenerational effects. So people know their parents are important, but now we can look at their grandparents. Um, how did their grandparents behave or are they contributing to the,
00:07:01
Speaker
child's education. I always like to think of I am over 50, I'm almost 60. So I can think about what I was doing in 1968. So in 1968 is when the show Mr. Rogers Neighborhood started. And I can just imagine some of these kids watching Mr. Rogers Neighborhood, who happened to be in the PSID, who are now parents or some of them, I'm not yet, but some of them might be grandparents. And so just to think that that's how these families are going, and we can find
00:07:30
Speaker
what their income is, what their health is, what their education is, general well-being, what kind of occupation they've had over this time period, how many different relationships they've had, or how many times they've been divorced and remarried, and how many families are now living together in the same household. A lot of those things
00:07:52
Speaker
are really important. So when you talk to some of the families who answer the survey, is there a sense of ownership of the people who are participating? They've been participating for a long time. It's pretty amazing. I feel like asking a survey in general and getting people to respond is hard enough, but you've had people in the survey for 50 years. Yes, for 50 years. So we were shocked. So this year, in 2017 survey, we added a question that says,
00:08:16
Speaker
So you've been participating in the PSID. Why do you do that? So over half of the respondents either answered either loyalty to their family or loyalty to the survey. And you read some of these responses and they're so, well, I'm doing it for a long time. My mother always did it. My mother is now gone. So I feel I have to do it. Some of them were saying, well, you know, we have our family. We feel obligated that we should be doing this. There are some families.
00:08:43
Speaker
that when we start interviewing in March, every other year we interview in March, they go, oh, I've been looking forward to your phone
Evolution of Survey Methods
00:08:50
Speaker
call. I've been wondering when you're gonna call. Obviously that's not all the families, but we have about a 94% response rate from the people who we interviewed two years ago interviewing again. So there's a lot. There's obviously people drop out, but there's a lot of commitment to the survey. That's pretty amazing. How has the actual survey, the actual physical survey, how has that changed over the last 50 years?
00:09:13
Speaker
So obviously, when we started, it was all on paper. It was probably about maybe 30 minutes. We would go interview people personally. So those 5,000 families would get a personal interview. We moved to then telephone and started interviewing on telephone and in person, and then switched it to a computer-assisted telephone interviewer.
00:09:36
Speaker
Caddy interview, where so most of our survey is now done on the phone. And it's about it could be about 80 to 90 minutes. So it's a pretty long survey, we obviously interview some in person, we're trying now to move beyond to do something on
Providing Data to Researchers
00:09:50
Speaker
the web. So we've had supplements where in between the two years, we go out and ask them certain questions, they can do it on the internet, and we're trying to convert it to any event. But so that's how we do it. The content has also changed. So in the beginning, we mostly income
00:10:05
Speaker
and employment, but we added a lot of questions on health. So health is a big deal. So have you had diabetes or cancer or those types of things? Have you had childhood experiences with different diseases?
00:10:19
Speaker
your education, your fertility behavior, and how many kids you have, a lot of those things we've added. And then we add other supplements, so that we followed these same families over 50 years, but then some of the families have kids and young kids, and we want to get more information about the young kids and their primary caregivers.
00:10:38
Speaker
So we have a special child development supplement that goes out and interviews those families with kids. And then we have the young adults. So from 18 to 24, they're either living with their parents or moved out. There's a lot of information we want to have about them. Or the elderly. So we had disability and use of time, a supplement to look at that. So we try to use the core PSID, as we call it, to follow the families over time and then other offshoots from that. Right.
00:11:05
Speaker
When you think about providing the data to researchers, what's the discussion like? How do you make something that people can get, download, and use relatively easily? Yeah, so providing data that people can use isn't that complicated if they want to use
00:11:24
Speaker
one survey and look at the cross section of families the hard part of a longitudinal survey is doing the linkages over year year year so we've tried to develop our website where you can go and you can find a variable let's say income or even religion.
00:11:42
Speaker
or education and you can go and find that variable and then click all the different years you want. And then the website will create a data file for you that puts all those variables together across all the years and you can use that. And we tried to do it with the other supplements and do it. So we think it's pretty easy. We have a lot of videos of how to get access to the data and how to do these things, what weights to use, not really how to evaluate the data, but basically how to construct the data
00:12:11
Speaker
We have a whole new family mapping system where it would create these families across time and generations for you. So that makes it easier to do that type of analysis. So we have, you know, we get a lot of downloads. We have probably 30,000 data users.
00:12:28
Speaker
We have about 5,000 publications that use the PSID over time and almost 900 dissertations who have used the PSID. And I think that's sort of what makes it exciting. We have a lot of committed users and big name users. So we have, there's 11 Nobel laureates who have used the PSID. Angus Deaton and Jim Heckman are two of the most well-known. Jim Heckman uses the data set a lot. Recently we've been finding
00:12:58
Speaker
because of this whole idea of a family tree and intergenerational, we're trying to find intergenerational researchers. So somebody who did the dissertation, who their advisor did their dissertation, whose advisor did the dissertation? So there's a lot of multi-generational papers out there. So it seems like, you know, this whole idea of the intergenerational goes not just from the survey and the families, but also from the researchers.
00:13:21
Speaker
How do you think about creating safeguards so that people can go and they can download the data but they may not be the experts and there may be consistency issues like so how do you think about you know if I go in and download the health data and maybe the health question changed in 1993.
00:13:38
Speaker
And so I'm just the so and so and I see this big spike and I'm like, oh, look at this big spike happen. But really, maybe it's an artifact of the data. So is it the responsibility of the PSID to have all these documents and videos to say this isn't just for anybody, like you need to know what you're doing. So to help with that, we have user help that people can ask. We have trainings. We have a week long training.
00:14:01
Speaker
in Michigan to help people.
Adapting to New Research Trends
00:14:03
Speaker
We have shorter segments at professional conferences that we try to help trainings, but we don't try to control how people use the data. We try to make it consistent. So if your example on a health variable, if you go through our list and there'll be a variable, it'll say health status. Let's say we change it. We didn't change that one. Education changes a lot. How do you ask education? So there'll be two or three variables there.
00:14:26
Speaker
And you'll see, oh, we have one variable from 1968 to 1980, and another variable from 81 on. Well, then you'll have to know. We'll show you have two variables. In your analysis, you're going to figure out how to do that. But we will help you do that. We haven't spent the time like other places, like Ipums does, in trying to make a consistent variable.
00:14:47
Speaker
on some we have, and on others we have it, and that makes it difficult. And what that makes it more difficult is we want to improve the questions, but when we think of improving the questions, that means changing the questions.
00:15:02
Speaker
So a recent example is in our data, we try to find out the assets people have. And one of the big things is checking in savings accounts. And we have a question where we ask, oh, do you have a checking of savings or a money market or stocks or all this other stuff? And we get a lot of people saying no. And so our number of people who have checking and savings is a lot lower than other surveys, the survey consumer finances. So this time, in night 2017, we added a question that says,
00:15:31
Speaker
Are you sure you don't have a check in? And, you know, two thirds of the people said no, started saying yes. So now we have a lot more people. But what that's going to mean is that means if you're going to look at frequency of people who have check-in savings, you're going to see a break in series, unless you know how we do this additional trick. But we thought it was much more important to capture the data
00:15:53
Speaker
accurately than over consistency. Right, right. So give us a view into the day-to-day working at the PSID. So first question, how big is the team that's working on the PSID sort of day-to-day? And then of course you have
00:16:10
Speaker
an additional group of people, right, that are going out and running the survey. Right, right.
Operational Challenges of PSID
00:16:14
Speaker
So yeah, so we have about 30, 35 staff who are ongoing processing the data. They're either developing the next survey. So we just finished the end of December, we finished collecting
00:16:26
Speaker
the 2017 wave of data. And so now we're currently processing the data. So we have to go through and make sure all those families, when they said, oh, this person was living with us, we have to make sure that that person they said is living with them isn't living in another family because all the families overlap. So there's a lot of processing that's done. But then we have to start developing the instrument that will start being collected
00:16:50
Speaker
So we have to develop the instrument and then we have to help upload the data and get it processed. So day to day there's all these. There's the development, there's the data processing, and then there's the application development, which is getting the data out, right? So that's constantly not what I do today today. I don't know. I come here and talk to you. I go to conferences. I try to plug the PSID, but then I'll have to. You know, we'll have to review some of the changes in the questions
00:17:18
Speaker
how people might answer. We had a big push this year to add new immigrants to the survey. One of the data problems of a longitudinal survey that starts in 1968 is the country's changed a lot since 1968. So we want to try to get a whole set of new immigrants in. Well, we have to look at those data and figure out how they respond and how they're going to be captured in the next wave of the survey.
00:17:48
Speaker
Some of these people who have been going on and on and on are pretty easy to follow over time. But the new ones are harder or if a split off. If you have a new child that gets married and moves out of the house and moves in with a whole other family, you have to figure out how you're going to follow those. So all those questions are really what the staff are doing day to day. Then as you said, we have a whole other group of people who are basically collecting the data. So there's five or six people who oversee 50 to 100 data collectors.
00:18:18
Speaker
that are either on the phone or going around the country interviewing people. So you have to train them and then you have to help them collect the data.
00:18:26
Speaker
So it's now mostly computer assisted on the phone. So the people who are collecting the data, when they travel, are they like hunting people down who they can't get ahold of? So well, so we try to first decide, we look at the people who respond. And we know there are people that are harder to get on the phone. Okay. So we try to choose those to go visit a personal visit. Right. And
00:18:49
Speaker
What usually happens is you have a personal visit, and some of them you complete the interview, but some of them will go, Oh, yeah, okay, I'll do that. And then they end up completing on the phone. Right. So the personal visit helps in two ways. It gets the interview in person, but then it gets an interview. Right. But other things like the child development supplement,
00:19:06
Speaker
That study, we want to go and visit these families because we want to get more of a sense of the kids' cognitive ability. Or we've started collecting biomarkers. We've started collecting saliva, and that we have to be in person for to get that.
Future Directions for PSID
00:19:21
Speaker
So I think that leads to sort of the last question, which is what's the next 50 years of the BSID? I mean, clearly like biomarkers like seems to be like cutting edge sort of like next wave. What else do you or what else are you planning and seeing for the next?
00:19:39
Speaker
Yeah, so that's hard of what we're going to see for the next 50 years. So we try to see, you know, what are we doing for the next five or 10? And I think one of the things, obviously, is some biomarkers. People really want to look at that. I do think this intergenerational aspect is a big deal. So there's a lot of research out there that looks at mobility. So I'm a kid, you know, I grew up in a family with a lot of income. What's the probability that I'm going to have a lot of income when I grow up?
00:20:09
Speaker
And there are people that have done a lot of estimates on that. PSID, most notably. But some people are suggesting that's changed over time. So it's less likely now, when you're a kid, that you're going to grow up and, whether low-income, you're going to grow up and be high-income. So I think over time, that's going to matter. Where these children are growing up and what their income is when they become adults. So the longer the PSID can go, the better we can see if that mobility has gone up.
00:20:38
Speaker
or gone down. So I think that's one of the big advantages. And the other advantage is the PSID has a lot of other demographics. So people have now started looking at mobility and health status, or there's mobility and occupation, there's mobility and education, there's a lot of other things you can look at.
00:20:55
Speaker
of how kids, how their outcomes as adults are related to their outcomes of kids. I think that's where the future is. So some of the best articles you see of kids who grew up when they were in areas that had food stamps, right? What their outcomes are as they're now adults. But that's 68 compared to now, but you can move that up and look at, oh, maybe when the rollout of the ITC came out or rollout of other programs, how that affects the long-term
00:21:25
Speaker
outcomes for these kids. And I think that's sort of the future of how these kids are growing up in an intergenerational way.
00:21:31
Speaker
Great. Cool. Well, thanks for coming all the way down. No problem. And thanks everyone for tuning into this week's episode. I will put links to all of the data sets and things that we talked about in the episode. I'll also highlight the part of the PSID website where you can go in and grab the data. So I encourage you to do so. If you have comments or questions, please do let me know. So until next time, this has been the Policy Biz Podcast. Thanks so much for listening.