Podcast Season Four Introduction
00:00:11
Speaker
Hi, everyone. Welcome back to the policy of his podcast. This is season four of the show. I hope you all had a great safe summer and are looking forward to a nice cool fall. Just a few changes coming up in the show this next year.
Upcoming Guests and Topics
00:00:24
Speaker
I'm going to slow things down a little bit and I'm going to go to an every other week format.
00:00:28
Speaker
still coming out on Tuesdays. You'll have something to look forward to. I also have a bunch of really interesting guests scheduled for this year. I'm gonna talk more about data and communicating data and also talk about just some new tools that are out there and coming out. So some exciting stuff coming up for the fall.
Guest Introduction: Hilary Mason
00:00:45
Speaker
And so I'm really excited to kick off this season with Hilary Mason from Cloudera. I'm a big fan of Hilary because she's done awesome work and I'm excited to talk about all the new things that she's doing. So Hilary, hi, how are you? Welcome to the show.
00:00:58
Speaker
Hi, thank you so much for having me. It's great to talk to you. How's your summer? We'll start with the summer. We'll look back. How's the summer? Summer's been lovely. Yeah, good weather, spending time outside, enjoying New York when everyone else goes abroad. It's really nice.
00:01:16
Speaker
Nice. Yeah, when it clears out, it's like the time you can actually ride the subway and not not be boxed in. Exactly. No wait for brunch. It's lovely to be here. That's nice. That's great. And looking forward to fall, I assume. Yeah, fall is my favorite season. All right. All right.
Hilary's Machine Learning Journey
00:01:32
Speaker
Good. So why don't we start if you could maybe for folks who don't know about your background, maybe you could talk a little bit about your background and where you come from and what you're doing now.
00:01:43
Speaker
Yeah, I mean, I'll try and keep that brief and hopefully interesting. And I started in machine learning about 20 years ago and have been working in data science in startups and other contexts for a long time. Most recently, I'd founded a company called Fast Forward Labs where we do applied machine learning research and advising.
00:02:05
Speaker
And that means we have our own program of research that our customers subscribe to, and it gives them insight into machine learning and data science capabilities that are possible and are becoming useful. We aim to be about six months to two years ahead of the market, and then we also advise them on their practice, meaning about half technical practice.
Cloudera's Focus on Machine Learning
00:02:27
Speaker
These are things like what algorithm do I use? How do I architect these data pipelines? And then half everything around the technical practice that's not technical. So questions about people, processes, organizational structures, all of the things that are actually hard about doing data science work well. And then we were acquired by Cloudera a year and a day ago.
00:02:51
Speaker
All right. Congrats. That's great. Well, thank you. And so we've been continuing that work inside of Cloudera. And I've taken on a role as the general manager for the entire machine learning business, including our software platform as well, Cloudera Data Science Workbench. Nice. So you've been doing the machine learning stuff, as you said, for a long time. So can you maybe give us a quick
00:03:14
Speaker
rundown of what you've seen over the last couple of years and where you think things are headed. I mean, it seems like one of those areas that's changing rapidly right now, but you've had a nice long view of it. So wherever you see things come from and where are they going? That's a really nice question.
Machine Learning in Real-World Applications
00:03:31
Speaker
And, you know, I've been a practitioner in this space for a long time. And in fact, we've all been using machine learning in our everyday lives effectively for
00:03:42
Speaker
going on 20 years at this point at least. And we take a lot of it for granted, but when you think about your email spam filter, like that is a fantastic example of sort of everyday machine learning. And we're starting to see a broadening of the applications for ML and data science, but it's not, the hype has outpaced a lot of the actual capabilities. And in many cases, the best approaches are still sometimes the simpler ones.
00:04:12
Speaker
And so I'd say that we are not at the end state for what the technology is capable of. It's still transforming. The way we architect systems around it is changing. But we're still at the beginning of understanding how to effectively use even the capabilities we have now in a business process or a product context.
00:04:35
Speaker
I'm trying to say is that we still have a lot of growing up to do and the technology is continuing to change while we're going through that process right.
Roles and Practices in Data Science
00:04:45
Speaker
You also mentioned that a lot of the work you do is around the people in these organizations. Can you tell us a little bit about what that entails? Do you find that you have to spend your time sort of convincing people of the capabilities of ML? Are you educating them on how to implement the models or what is it like with working with people when you're trying to do this sort of work? So it's a really fun place to be working with people because
00:05:12
Speaker
There are many companies that have great data science practice, but they are not the same. So if you were a software engineer and you take a job at one company or another, you're essentially going to use the same kind of process for your work. There are more or less the same expectations of how you're going to be managed, what you'll be delivering, what you're responsible for, what other teams are responsible for, how you relate to them.
00:05:37
Speaker
In data science and machine learning, we do not have that standard set of practices and best practices evolved yet. So a data science role in one company can be quite different from one in another company. The way teams are constructed is quite different. So does data engineering live on the data science team? Does the data science team live in the COO or CFO's office? Or does it live in the head of product office? Or is it in engineering? All of these things are valid approaches.
00:06:05
Speaker
they optimize for somewhat different things. And then we come back to, you know, what are your data scientists even doing? Where do they come from? Because you've only been able to get a degree in data science for the last five years or so.
00:06:18
Speaker
And those degrees tend to be pretty diverse in what they teach. So some are essentially statistics dressed up a bit. Some are computer science dressed up a bit. So yeah, there's not this kind of standard practice, which means that each company has to make an artful decision about how they want to build a practice.
00:06:38
Speaker
So we end up spending quite a bit of time on that as well as the really interesting technical and strategic questions around their data. So what are you going to do with the data and how are you going to accomplish it? A couple years ago, you had written a paper with DJ Patel about data science teams within organizations.
00:06:58
Speaker
Part of that paper was DJ talking about his experience at LinkedIn and having these data scientists and data science teams sort of strewn about the organization and sort of the evolution of that. Have you seen an evolution in how data scientists or computer scientists or data visualization experts or whatever, have you seen a change in how those skill sets and those people are being used in the organizations that you're working with?
Data Science Team Maturity
00:07:23
Speaker
So thank you for reading that. How do you just like you said, this is sort of relatively new. So how do you bring these various groups together, right? You've got analysts and data scientists and I don't know, public relations and marketing. How do you get them all on the same page?
00:07:46
Speaker
Well, that is the question. DC and I actually wrote that because we were both giving this kind of advice quite often and didn't have something written down that we could refer to. We also thought it would be useful as a prop for a data scientist or a data science leader in an organization to have something they could wave in their boss's face and say, here's how people do this.
00:08:12
Speaker
to try to get to a perhaps more effective type of organizational structure. We certainly have seen a fair bit of maturity. And again, it's not evenly distributed. So some companies we work with are incredibly mature. They're inventing the approaches they need. They're optimizing for what they need to optimize for. And then I have others where they're just hiring their first data science leader.
00:08:42
Speaker
and they don't even know what that person really needs to be doing. And so I'd say that we're getting there and I expect in another five years, you would think that that little mini book is a bizarre sort of useless archetype because everything in it is common knowledge, but we're not there yet.
00:09:00
Speaker
Okay, so we'll just root for that day when that book is no longer needed. But until then, it still should be, I think, required reading for any organization, working with data, really. So let's switch gears a little bit, because a new thing that you're working on that I think was announced earlier this summer was a project with DJ Patel and a few others, I think, on data ethics.
Data Ethics with DJ Patil
00:09:24
Speaker
Can you talk a little bit about that work and where that is and where it's heading?
00:09:28
Speaker
Absolutely. And yes, this is work co-authored with DJ Patil and Mike Lucides, who has been our editor at O'Reilly for a long time. Mike edited that first piece of work as well, but he's a co-author on this one. So we were once again thinking about
00:09:46
Speaker
the issues of the practice of ethics and data science. And what I mean by that is that we've reached a point in the use of the technology where it's very clear that it has many unethical applications. And also that society broadly is still immature in our ability to have conversations about what we will accept and what we won't accept.
00:10:10
Speaker
And that bar for what is acceptable with data science has been moving over time, so it's not even a clear, bright line.
00:10:20
Speaker
And one of the things that I've been very concerned with is that we have many people leading work in fairness, accountability, and transparency. We have plenty of people doing broad critiques of the kinds of behavior we see from Facebook and how their policies and products are influencing our society. But what I haven't really seen
00:10:44
Speaker
is a set of practices for people who are actually doing the work, putting their hands on the keyboard, a set of tools for that group, myself among them, to even have the conversation about ethics. And so what we've tried to pull together in this second mini-book and series of essays is not the answer.
00:11:08
Speaker
I happen to believe that well-intentioned people can come to different conclusions around some of these questions, but rather a set of tools we can use to productively have the conversation. And once again, this was designed to be the kind of thing that a data scientist can bring to their manager and say, look, it's important that we have these conversations now. And hopefully this is working its way into the discussion of how we practice data science.
00:11:38
Speaker
Do you think there's a role, well I'm guessing you do, but is there a role for data scientists who have training in data ethics as part of their degree requirements?
Ethics in Data Science Education
00:11:49
Speaker
So of course, and DJ has been a huge supporter of and promoter of programs that do exactly this. But the reality is that most of us who are data scientists do not have degrees in data science. And so, you know, I have a team of what I think of as fairly accomplished and extremely talented data scientists, machine learning researchers here with me at Cloudera.
00:12:13
Speaker
Not one of us has a degree in data science. We have computer scientists, physicists, neuroscientists, cognitive scientists, electrical engineers.
00:12:23
Speaker
So going forward, it's important that ethics is part of that curriculum and is not, you know, we can't say, oh, we're just building the technology. We're not responsible for how it's used. That's not responsible at all. But that doesn't solve the problem that most practitioners today have, which is that they don't have a formal data science. And I'm putting that in air quotes, which you can't see education.
00:12:49
Speaker
Because you've only been able to have that education for the last five years and I'm actually not a fan of restricting these job opportunities only to people to happen to have had the privilege of doing a master's in it. This is a great field for anyone who is quantitatively and also creatively inclined.
00:13:09
Speaker
But we need tools and expectations of the way we do the work that supports ethical outcomes. And so we have to just make it normal and not worth thinking about that we have a conversation about what can go wrong when we start a project.
00:13:26
Speaker
Right, you have in one of the essays that you've written, there was a checklist, right? There's got to be principles, which I love that it's kind of loosely based off of Tula Galandi's work, which is like one of my favorite books about the medical field. But how do you think about developing a checklist like this for people who are working with data?
00:13:50
Speaker
So I'm going to start by telling you where that checklist came from because one of the really fun things in the process of writing these essays has been sitting down with DJ and Mike and finding areas where we disagree and.
00:14:06
Speaker
You know, we came into this conversation around oaths for data science, and I really, you know, said I don't really mind the discussion of oaths, and I think it's a fine thing to say I'm not going to do anything wrong, but I don't think it'll actually change anything. And I think it may in fact distract from the work we need to do to change practice.
00:14:28
Speaker
And we had one of my favorite conversations of the year really trying to work through our disagreement around this topic. And where we ended up was this notion that oaths are fine. But we need something more concrete and checklist seem to be the best tool for that kind of thinking. So saying rather than making a grand declaration at one point in time,
00:14:51
Speaker
that I'm going to behave in a certain way, I'm going to take these little decisions and I'm just going to check myself every time I make a decision or work on something where it might be relevant against my own standards that I've committed to you on a regular basis.
00:15:11
Speaker
And so that's what the checklist is intended to be. And so if you're thinking of using one, it's really something you can add into the data science development process. So when you go from idea to error metrics for validating that you have a solution to potential product uses of your data science work, you can also add in that checklist that says, you know, am I respecting the data
00:15:37
Speaker
that has been given to me for this work. Am I using it in a way that is sustainable? All of these questions can fall out of that.
00:15:46
Speaker
Okay, great. So I will point people to all of the work that you guys are doing on the data ethics issue. And I want to turn to one last question because on your website, which I'll also link to, you say that you are inherently, internally an optimist. And I want to ask you about that in a particular view. So I don't want to talk about
Optimism vs. Science Backlash
00:16:09
Speaker
politics or anything. But I do want to talk ask you about what seems to be sort of a backlash against science and against research and against in some ways against facts. And so I want to ask you how you sort of maintain your optimism being a data scientist like working with data and working with facts like how do you how do you think about these sorts of things and how
00:16:32
Speaker
Do you maintain your optimism and what are things that you think people can do to sort of fight against some of this pushback that we're seeing?
00:16:41
Speaker
I love this question. Yeah, I happen to be naturally an optimist and I tend to look generally for areas where I can do work that promotes that optimism. But to think specifically about your question, you know, how do we live in this environment where people are pushing back against rationality, against science, where people are using those tools in a fairly negative way?
00:17:11
Speaker
And it really comes down to, I believe optimism is the only rational philosophy because it is the mental attitude and toolset we need to build the future we actually want to live in.
00:17:27
Speaker
And I try and do work that supports creating that future world that I want to be part of. I think that is what it really comes down to in that if you're only negative, you're only critiquing, you're tearing things down.
00:17:43
Speaker
It is, I mean, I would find that intolerable. Also just from living in that way, it just seems so negative. But it really means that there's always hope and there always is that potential for a bright future ahead. And no matter, I know you said no politics, but no matter what challenges we are dealing with today, I really do believe that that is a future that is attainable. And it's optimism that gives me that belief. I love it.
00:18:12
Speaker
I love it. But I want to ask, I mean, maybe you don't run into people who sort of have these fundamental, I mean, there are questions that we asked about science, but these sort of like, you know, not believing in facts. So when you are thinking about where when you're working with an organization, do you ever run into these basic disbelief and facts?
Challenges in Communicating Facts
00:18:32
Speaker
What would your thoughts be for someone whose job it is to try to convince people
00:18:36
Speaker
that the data provides evidence for this thing as a fact. I think journalists, for example, are facing this probably daily, where they are analyzing a story. They're talking to people. They're looking through data. They're presenting as facts. And there's just this belief that that's not true. That's the fake news or whatever it is. So how do you think about, inherently as an optimist, but how do you think about trying to communicate information to people who may not believe it just because
00:19:06
Speaker
They're just going to ignore it. It's not a fact the way they look at it. Well, I find that to be quite a challenge. And there is no easy answer, because if there was, we would all be doing that. You have to have a shared belief that there are a set of things that are true. However, I think that, you know, there's plenty of work that shows that the information landscape we live in guides what we believe to be true.
00:19:36
Speaker
And so again, just trying to create the information landscape as much as possible that supports this notion of there being facts and there being truth. And, you know, in many cases, these are specific facts and specific truth. The one thing I'll add to this because.
00:19:53
Speaker
You know, there's a difference between being a journalist writing an article that's going to go out to the broad public where that one article is an artifact has to suffice for everyone and having a conversation with somebody individually specifically around.
00:20:09
Speaker
data, it is being able to understand that data itself is an imperfect representation of truth. And there often is a fair bit of context. And there are reasons why people bring these beliefs to these conversations. And so trying to be patient to not just, you know, try and reduce things to facts and interpretation,
00:20:33
Speaker
I have found is somewhat successful. But these are, you know, I think you've hit on the greatest challenge of our moment, is how do we create that information environment where we can even believe that there is truth, much less what that truth is?
Promoting Team Optimism
00:20:49
Speaker
Well, maybe we could just get everybody to be an optimist like you, Hilary, and it'll all just be happier and better.
00:20:55
Speaker
Well, I actually think we need a portfolio of attitudes. And on my team here, I certainly have people who range from a little bit more cynical to quite optimistic. It's something that I think we need both perspectives. Well, thanks so much for coming on the show. It's been really interesting. And I'm looking forward to seeing especially the data ethics work. I assume you're going to keep, is the plan to keep writing more pieces on that?
00:21:21
Speaker
Absolutely. So the set of essays we have right now actually is available on Kindle on Amazon.com as of this week, and we are going to be soliciting contributions from other people as well and hopefully building more of a corpus of work there. So if anyone listening is particularly passionate about an aspect of this, I'd love to hear from you.
00:21:45
Speaker
And yes, there is more terrific. Well, I look forward to reading it and I will post all these links on the show notes. And if you're listening to this and want to get involved, please do connect with Hillary or DJ or O'Reilly media as well. Well, thanks for tuning into this week's episode. I hope you all again had a great summer ready for a great fall. So until next time, this has been the policy of his podcast. Thanks so much for listening.