Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #95: Jen Golbeck image

Episode #95: Jen Golbeck

The PolicyViz Podcast
Avatar
184 Plays7 years ago

Hi everyone, welcome back! I’m excited to have Jen Golbeck on this week’s episode. Jen is a world leader in social media research and science communication. She is a pioneer in the field of social data analytics, discovering people’s hidden...

The post Episode #95: Jen Golbeck appeared first on PolicyViz.

Recommended
Transcript

Introduction to Jennifer Golbeck's Work

00:00:11
Speaker
Welcome back to the Policy This podcast. I'm your host, John Schwabisch. On this week's episode, we are going to talk about social media data and how data from social media streams can be used to help us better understand the world in which we live. So to join me this week and talk about social media data is Jennifer Golbeck, Associate Professor at the College of Information Studies at the University of Maryland College Park, which itself is a mouthful.

From Computer Science to Social Media Research

00:00:38
Speaker
But I'm very excited to have Jen on the show because she's doing some really cool and exciting work with social media data. So Jen, I'm very excited to have you on the show. Thanks for joining me. Glad to be here. You've done a lot of interesting work using social media data, which is sort of like one of the forefronts of lots of different areas of research, I'd say. But before we get into that, maybe you could talk a little bit about your background and how you got interested in this particular field of research.
00:01:06
Speaker
Yeah, so I'm a computer scientist by training. And I got into social media before there kind of was social media. I was interested in maybe networks of people online. And my advisor for my PhD was like, yeah, I guess that could be a thing. And so I started looking at social networks, which there was social network research before social media, but it was just really small. And so I kind of came at it as a time where all of a sudden it started getting really big.
00:01:36
Speaker
And so it was this great space to like explore algorithms and science and computational things, but also keep a connection to the human side, which I always thought was really interesting.

Understanding People Through Algorithms

00:01:46
Speaker
Let me ask the sort of general question of what is your overarching goal when it comes to your research agenda?
00:01:53
Speaker
Do you have one? Yeah. That's a good question if I have one. If I had to make one up, I would say that what I really want to do is better understand people by creating algorithms that analyze the data that they leave behind. That's a pretty concise way to put it. Yeah, absolutely. Can you talk a little bit about some of the work you're doing now so folks can get a sense of what that means in practice?

Predicting Behaviors from Social Media

00:02:18
Speaker
Yeah, so a lot of the work I'm doing in that space now is related to building algorithms to either find out people's attributes or predict their future behavior from the data that they leave online. So we've done research in my lab on predicting things like people's personality traits. Are you an introvert or an extrovert? Are you easy to get along with? Are you kind of laid back or really nervous? Political preferences, like placing people on a liberal to conservative spectrum. But we've also looked at things like, as I said, future behavior.
00:02:48
Speaker
So we have a study that we're finishing up on looking at people who are alcoholics and announce that they're going to their first AA meeting, building models that can read their tweets from before they go to that meeting to predict if they're going to get sober or if they're going to start drinking again in the short term.
00:03:05
Speaker
And so it's this really wide range, but I tend to have a psychological bent to it. I'm interested in things like how people cope with stress, what kind of behaviors they have, what sort of psychological profiles do they have and how is their signal that comes out about that through what they do online that explicitly isn't about any of that.

Privacy Concerns and Ethical Algorithms

00:03:23
Speaker
Right. When you're doing this sort of work, do you ever worry that you are big brother? Oh my God, like all the time. I have a
00:03:34
Speaker
Yeah, I have a lot of anxiety about the work because on one hand, this kind of research is used in all sorts of helpful ways. I mean, just like mathematically, the stuff I do is the same as what Netflix uses to recommend movies to you or Amazon uses to recommend products. Like the math underneath is the same. There's a lot of ways that this kind of stuff can be helpful to people.
00:03:55
Speaker
At the same time, yeah, it's terrifying in a way that this could be abused if it falls into the wrong hands. And so I spend a lot of time now kind of traveling around and talking about privacy and actually have a whole
00:04:07
Speaker
kind of section of my work focused on making it easier for people to control their privacy. Because ultimately, like in my ideal world, these algorithms would be out there, they'd be helpful, they get used in context where we want them used. And when we say no, they wouldn't be used. So I really want people to be able to control and consent to it. But unfortunately, that's not the world we're in right now. Right.

Anonymity and Online Behavior

00:04:29
Speaker
You mentioned taking a psychological bent to some of the research you're doing. I just finished watching The Experimenter, which was a movie with Peter Sarsgaard on the Stanley Milgram experiments, where he looked at our tendency maybe to punish people or to listen to an authority.
00:04:50
Speaker
And that's sort of maybe one of the early types of experiments of that nature. How does social media interact with our basic behavior, our fundamental behavior and our humanity? Are you able to sort of think about that a little bit? Yeah, I mean, I think
00:05:07
Speaker
I don't think we behave differently online than we do in person. Online just creates different environments for us. So we see a lot more nasty stuff online than you tend to experience in person, if you're lucky. But I think that's because of anonymity. People do all kinds of nasty stuff in person when they're anonymous, too. And you can see that. And we can look at it if we want to be really immediate at the protests in Charlottesville over the weekend.
00:05:35
Speaker
You know, the guys who are carrying the torches and screaming the neo-nazi things are now all upset that people are naming them online. And it's like, you know, you didn't have a hood on, like you were out there doing this stuff, but they thought, well, I'm just kind of this anonymous person walking around. There aren't going to be any consequences.
00:05:54
Speaker
And if they had had to wear their name badge or if they knew that they were going to be identified, they might have behaved differently. So whenever we have this ability to behave anonymously, a lot of people are willing to do all kinds of stuff that they wouldn't do if they would be held accountable for it. The internet and social media just make it way easier to be anonymous. And it even feels more anonymous because, you know, your person isn't there. People aren't looking at you.
00:06:17
Speaker
And so I don't think it really changes the way we behave. At the same time, it creates so many unusual environments that you don't normally see in person that we see manifestations of all this kind of weird behavior. I don't know. I'm sure some people are kind of psychologically affected by it in more significant ways. But I think for the vast majority, it's just a different kind of space where they interact that they're going to do it differently than if they're sitting down with you face to face.
00:06:45
Speaker
Yeah, it's interesting you say it that way. I have a friend whose daughter's going into seventh grade and they have a little text messaging group. And she was being essentially cyber bullied by one of the kids in this text messaging group, which sort of falls in this cyber bullying, which we've all heard about and it's really scary. But what shocked me the most was that this kid was also in her actual physical class.
00:07:12
Speaker
So it's sort of like a person having two faces.

Cyberbullying: Online vs. In-Person

00:07:16
Speaker
I don't think, in this particular case, he was bullying her in the classroom or in the school in person, but online or in this text messaging group sort of had this different face, if you will. And you've also done some work on cyberbullying. So I wonder if you could talk a little bit about the work and maybe help me understand how someone can bully someone
00:07:37
Speaker
through the phone, but then be perfectly, or not maybe perfectly, but be a little more normal, let's say, or regular around a person in person. Yeah, that's a great question. So Zara Oskitirab, who is a former PhD student of mine, she just graduated and defended a month or two ago, her dissertation was on cyberbullying. And it's interesting all the ways it manifests, because some of it is like name calling, like what you'd expect. But other ways you see it is like, especially among girls, is that
00:08:05
Speaker
You know there'll be a group picture you know everybody went to the mall or to the beach and they'll crop one person out of all the pictures right and that's a really common form of cyber bullying but like it's so different than what you would expect. You know in person like that manifesting online and so yeah I mean I think especially for kids that age there's this issue where a lot of times I think because it's online it doesn't count.
00:08:30
Speaker
And you used to hear all people of all ages say this, and adults have kind of wizened up having spent more time on social media than when it was really new. But I think kids kind of feel that way. You're still figuring out a lot of social interactions at middle school age.
00:08:49
Speaker
And so, you know, a lot of them might feel like, oh, it's funny, or people are gonna laugh, or I'll get attention. And not just for bullying, but for all kinds of behaviors. And if I do it online, it doesn't count, right? Because you're not looking at people, you're not getting that instant kind of feedback from your actions, where if I had to walk up to you,
00:09:05
Speaker
and say something, you know, I might not do the same thing I would do online. And like, we see that in cyberbullying, but it's even in person, right? Like, I think all of us have written stuff in email that if we sat down with a person that we were mad at and talked to them, we would say it in a different way than we say it. Like when we're responding to, it's like before bed responding to this irritating email. Because you're not getting that instant feedback, it can be hard to adjust your interaction in the way that you naturally would in person.
00:09:32
Speaker
And so for kids, that age especially, it's all kind of exacerbated. So I think it's not at all unusual or unexpected that you get really different in-person behaviors and online behaviors with those things all coming together.
00:09:46
Speaker
Yeah, really interesting, really interesting.

Analyzing Large Datasets

00:09:49
Speaker
I'll put the link to the cyberbullying paper on the show notes, but I want to switch gears a little bit and ask you if you could talk a little bit about what it's like to work with social media data. In particular, what does it take to analyze, presumably, where are large streams of data? What it takes to visualize those data? I mean, how do you get a sense of trends and comparisons when you're working with that size data?
00:10:14
Speaker
Yeah, it's a great question. I mean, I think, for me, there's a couple main things I try to keep in mind. One is that you're never going to capture everything, and there's going to be all kinds of noise and incorrect stuff in social media data. So you can't treat it like, oh, it's going to be this perfect picture. You're going to get all kinds of extra stuff that doesn't have to do with what you're looking at. You're going to miss some stuff, and you just have to accept that. Once you've got your data like that, then the question is, how do you figure out what's going on in it?
00:10:40
Speaker
And I tend to take this approach that's like, let's look at a couple interesting subsets. So if we're looking at cyberbullying, let's pick a keyword and pull out all the tweets or messages that have that keyword. And that's not going to be everything, but at least then, as a human, you can get a sense of what's going on. That kind of step of as a human analyst,
00:11:03
Speaker
beating a bunch of stuff that's in your data set and just understanding what it looks like is so important and so many people just want to skip that and like go right to some statistics or you know visualize it and I think all along the way it's super important to just take pieces of it and read them and get a good understanding so you become something of a minor domain expert in terms of what you're analyzing and then once you have it
00:11:29
Speaker
Yeah, I mean, it can be really big, right? Which is one of the big challenges is, you know, we work with Twitter data, and sometimes we have, you know, 80 million tweets, like you can't do anything with that. Which becomes a little tricky. So again, I think, like the willingness to say, let's pick out a few interesting subsets. Let's find everybody from Canada, right? Like, that's not necessarily going to be representative, but you can kind of get a picture. And if you do that with a few different subsets, visualize those, look at some statistics,
00:11:57
Speaker
Then you get a more general picture of what's maybe going on and you can make more kind of intelligent decisions about how to filter or aggregate or otherwise manipulate your big data set to start finding some stuff with it.

Technical Strategies for Twitter Data

00:12:11
Speaker
So I like to do visualizations, do kind of the standard measures of centrality and influence on the nodes in there, look for communities, and then see what those communities represent, like actually go look at whatever text or data you have about selected nodes in those communities because
00:12:27
Speaker
It gives me a good feel of what's going on. And once you have that initial sense, it almost always is something that you can then follow up more rigorously on the big data set to get those bigger insights.
00:12:38
Speaker
I want to ask a technical question because I can imagine a bunch of people listening to the show saying, ooh, I'd like to look at Canadians on Twitter and see what, you know, and download the data. Can you talk, you know, briefly, because I know people can go in and get the details, but briefly, if you have a question and you're going to use Twitter data, how do you practically go about, you know, downloading a stream over a keyword or a topic? You want to learn something more about it. You know, what's the practical steps that you go through to get those data down?
00:13:07
Speaker
and start the analysis. Yeah, so I have my own little chunks of code, which makes them sound way fancier than they are. I've got these tiny little Perl scripts that will, you know, I've got one that'll grab the network for an individual person. I've got one that'll grab, you know, whatever data for a recent search term.
00:13:25
Speaker
It just dumps it into a plain text file for me. All of this is written with the Twitter API. I'm sure there's stuff, if you're not a big coder, that you could grab and modify on GitHub. But the Twitter API, I teach students who have never programmed before to use it in about three weeks. They're going from never having coded to interacting with the Twitter API, because it's really straightforward. So I have a couple scripts that will pull that data.
00:13:53
Speaker
I'm sure you can find other stuff online if you don't want to code it yourself. But there's tools out there that will help you pull that.
00:14:01
Speaker
And then once I have it, one of the keys to using Gephi, which is the visualization and network analysis tool I use, is just understanding how to get it in the right format, which if you know how to use Excel and manipulate columns and rows, that's pretty easy to do. I think I have a tutorial. I've got a bunch of Gephi tutorials on YouTube, and I think I have one on just how to take your data and manipulate it and convert it into a form, basically a CVSV format that you can open there.
00:14:29
Speaker
So my attitude towards getting the data the right way is you have to know a few little technical things, whether it's how to move around columns or change it from tab to comma separated, maybe how to run somebody else's script and be comfortable messing around with it. But ultimately, one should not fear being hacky. I hack together all kinds of stuff that's ugly, but whatever. All that matters is getting the data and getting it in there,
00:14:56
Speaker
And I think so many people want this perfect solution where, oh, it's like this beautiful process you can easily replicate. If you're doing it a lot, that's fine. But I do a lot, and I still just hack stuff together to get what I want. And I think the willingness to do that, to find something that's mostly right and then beat on it a little bit, that's going to get you to the end that you want way faster than trying to find this perfect solution. Right. Right.

Social Media Text Analysis Methods

00:15:20
Speaker
I also wanted to ask, since social media data so often has so much text, how do you evaluate text? How do you visualize text? This is a big question that comes up a lot. In some ways, you're working with qualitative data. What are some of the techniques that you use once you've mentioned doing these measures of dispersion and doing some tabulations?
00:15:46
Speaker
At its core, you're working with text. What are your strategies for working with text? A lot of reading. Everyone should always read a bunch of text before they start doing other stuff. That's always what I do first, is just read a lot of it. I'll do topic modeling. I use Mallet for that mostly because it's really easy. You can download it. It's got a command line. There's a tutorial that says, here's the command to run. It will give you topics
00:16:11
Speaker
that emerged from the text. You can see the sorts of things that people are talking about. And I'll do word count frequency measures, basically like word cloud stuff that's super straightforward. And then one of my favorite tools to use is one called LUC. It's L-I-W-C. I think it stands for the Linguistic Inquiry and Word Count, which is out of UT Austin. And it's a psycholinguistic text analysis tool. And what it does is it takes the text you give it, so say all the tweets for a person.
00:16:41
Speaker
And it counts the number of words that fall into, I think, like 100 different psychological categories. So how many words are there about consuming things, like eating or drinking? And how many words are there about family, or happiness, or sadness? It has all these different categories. It has a graphical user interface, so you don't even have to program. You just put your text in a file, open it up, and it gives you this output. And once you've used it enough, you get a sense of how that compares to normal. They have a tool online now called Receptivity.
00:17:11
Speaker
which ends with an I, where you can try this out on the web, and it gives you these really beautiful psychological profiles just from the text. I don't know if I'd rely on it as ultimate truth, but it's really insightful, and those text counts become useful features that then I will often feed into machine learning algorithms and other stuff.
00:17:32
Speaker
So do you recommend that people try to do their own psychological profiling of their own Twitter feeds using these tools? Oh, yeah. Don't worry about what I was like. You can try a lot of it. Yeah, no, it's interesting to try these on yourself. There's one called Analyze Words. I think it's like analyzewords.net or .org that'll analyze your Twitter feed and give you a profile. There's also a site called Apply Magic Sauce, I think .org.
00:17:59
Speaker
which is a Cambridge University. I don't know why it's called that, but it's out of Cambridge and it does psychological profiling. It'll run it on your Facebook and or your Twitter. You can put them together or do them separately and give you this big, huge profile of everything from psychology, but also what it thinks your gender is. Are you intelligent or not? Would you be a good leader? I think it's fun to run those things on yourself and see what they find.
00:18:26
Speaker
So just before we end, I wanted to get a sense from you of, from your research agenda, where do you think your work sort of falls into the landscape of public policy and social policy research?

Impact on Public Policy

00:18:39
Speaker
Essentially what you're doing is predictive analysis, and I can think of lots of ways that businesses could use it, but I'm also curious about whether you see there's an application to public policy.
00:18:51
Speaker
Yeah, that's a good question. Most of the funded work that we do, the funding comes either from the Department of Defense or the intelligence community, so you can imagine there's a lot of national security issues that one could address with this. But I think it touches on public policy in a bunch of different places. Part of it is
00:19:14
Speaker
I think there's a good public policy debate on, do we want to be individually profiling people and making decisions about that? I hope not. And like I said, I spent a lot of my time going around talking about that. And I think, you know, part of the importance becomes, so people are going to build these algorithms, right? It might not be me if I stop, but this is where we're going. There's a whole field moving this way. There's a lot of money to be made from these algorithms. So it's going to keep happening.
00:19:44
Speaker
If we can talk about how powerful they are, that should influence some policy decisions regarding privacy, regarding consent. But then there are applications that we're seeing in public policy where these work. Sometimes it works well, and sometimes it doesn't. Kathy O'Neill's book, Weapons of Math Destruction, gives this really interesting example of the Washington DC public schools using algorithms like this to decide which teachers to hire and fire. The algorithms are wrong sometimes, which means
00:20:14
Speaker
you will make decisions that are wrong. And that can be really bad. And there hasn't been a great discussion of like, how do you make policy around these algorithms? We've seen it in things like sentencing guidelines. There are algorithms that will determine how long a person's sentence can be. They seem to have a lot of racial bias in them. And so there are those kinds of questions where, without doubt, the algorithms can be useful. But as researchers, we call them decision support systems. They're supposed to help you make decisions.
00:20:43
Speaker
And I think the real public policy question that faces us as the algorithms come forward isn't, are they helpful? But are you using them to just support a human making a well-reasoned, fair, just decision? Or are you using them to make the decision for you? And if it's the latter, you're going to screw stuff up, including people's lives. And so, yeah, I mean, so that's not quite like just apply it to make this policy, but I think is a really important place that we need to figure stuff out before we deploy these more widely.
00:21:12
Speaker
Yeah, absolutely. Absolutely. I think that's right. And I think some of the work I've browsed through your list of publications, things like cyberbullying and other things, I mean, those to me have direct applications, but I think you're right. Just on a world of more technology, of more social media, of more of our own data out there, being able to use some of this information carefully and individually, what we should do with our own data is really important. Jen, I want to thank you for coming on the show. It's really interesting work and I look forward to keeping tabs on it.
00:21:42
Speaker
It was my pleasure. Thank you so much. And thanks to everyone for tuning into this week's episode. I hope you enjoyed it. There's a long list of show notes for you to browse through all the tools that Jen mentioned. And of course, you should check out her websites to learn more. So until next time, this has been the Policy Vis Podcast. Thanks so much for listening.