Introduction to JMP and Episode Overview
00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:00:19
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.
Host Introduction and New Year Greetings
00:00:49
Speaker
Welcome back to the Policy Viz podcast. I'm your host, John Schwabisch. Welcome back to the show. Happy New Year, everyone. I hope everyone's doing well. On
Conversation with Sarah Callison from Intel
00:00:57
Speaker
this week's episode, we're going to talk about how to get organizations to use their data well and effectively. And so I'm very pleased to have Sarah Callison from Intel. Sarah's a senior statistician there, and she's been doing a lot of work on trying to get everyone to be better at data analytics and data viz. Sarah, welcome to the show.
00:01:16
Speaker
Thank you, John, for the opportunity to come and talk. Yeah. Thanks so much for coming on. I'm excited to dive into this because I think it's a big question of how do we get our own organizations and other organizations to improve the way we use data. But before we dive into too much content, maybe you can talk a little bit about yourself. People get a little flavor of what the kind of work you do over there at Intel. Hi. Yes, thank you.
00:01:42
Speaker
While I'm a senior statistician at Intel, I've been at Intel for 10 years and I have an interesting background because before Intel, I've worked in many different companies. I worked for the U.S. Department of Agriculture as a commodity statistician. When I was in grad school, I was working at Ford Motor Company. I went to University of Michigan for applied statistics. Got to play around with cookies and
00:02:08
Speaker
crackers at Nabisco craft foods, medical device company BD, and then I had my time at Intel where I worked both in manufacturing and in sales and marketing and a couple other organizations with an Intel. And you're going to say, wow, that's really diverse. But a lot of my background is in both R&D and process improvement, working in manufacturing or business transactions, process development.
Common Data Processing Challenges
00:02:38
Speaker
one of the things that I keep on seeing within organizations are how well they're able to process data. And that is actually really a process problem from the time where you collect the data to the time that you can get some insights into it. It's always been fascinating to me to kind of see how organizations, how much they value the data and how much they are able to glean out of that based on how well their process is.
00:03:07
Speaker
And that's something that I'm doing at Intel and trying to understand how well we can really, you know, streamline that. So is there a common thread you've seen across these different organizations that you've been in and the ones that you've maybe worked with elsewhere? Is there a common theme of, you know, I guess pitfalls would be the way to start, but you know, and then what are the common things that folks are doing wrong and then what are the common things that folks are maybe doing right?
Importance of Top-Down Data Strategies
00:03:31
Speaker
So, you know, it really comes down, the organizations I've really seen be really successful at this, it's always been a top down approach. Where management says, hey, everyone is going to follow this strategy, we're all going to get behind it, and then it becomes top down. In order for management to actually be in a buy off, you know, there usually has to be some kind of grassroots effort to have that influence up to management.
00:04:00
Speaker
That can take a really long time. You really need to have people who can really be able to influence and showcase the ROI, why it's important for management to get back. Once that happens and you have management, I've seen it go really, really fast and be really impactful. That's when organizations really start
00:04:23
Speaker
coming together and understanding. And there's a lot of different pieces that need to be done. And this is reason why management is so important. Because first you have to have the directive, we are going to value this, we're going to invest in this. And there's a huge investment. You need to make sure you have the right skill sets. People need to understand how to handle data. And ensuring that these right skill sets and the time being invested in it
00:04:52
Speaker
So you have to have the right thinking around it. Then you have to invest in the flow of the data or the processing in the data. Is the data correct? Is the data structured properly so that it could be easily handled going forward? Is the data complete? Is the data correct? Are some of the things that
00:05:13
Speaker
the management needs to understand and invest. And then also you need to have the infrastructure in order to do it. So a lot of organizations say, well, I have data in Excel, everybody has Excel, everybody knows Excel, but Excel's not really the appropriate tool to be using, especially when you have a lot of data. Being able to merge the data, having integrity of the data, Excel's not necessarily the right tool.
00:05:40
Speaker
So it's interesting how you start the way I think a lot of, at least the way I think of it, I think a lot of people think of it as you need some sort of grassroots, some group of people who really want to improve or introduce better data analytics, be it data visualization, be it whatever it is, and that sort of filters up and get the managers to buy in and then it sort of moves to the organization.
00:06:03
Speaker
Are there specific things you've seen people do with the grassroots level that have successfully got managers to buy in?
Grassroots Efforts in Data Analytics
00:06:10
Speaker
I run into this a lot where people say, oh, I buy into this whole idea of doing better slides, for example. But if I were to come up there with a picture of a sunset or something, you know, people are going to yell at me because they want the dense table. So are there are there specific things you've seen people below the managers do so that it grabs a manager's attention and then it becomes, you know, part of the culture of the company?
00:06:32
Speaker
Yeah, it's really kind of showing kind of the difference between, okay, this is what you're getting with the current mindset. Here's what you get when you invest. So it's kind of showing side by side comparisons on it. Also showing how much work it takes to get from this. So showing kind of all the stumbles, all the issues that could possibly be going wrong.
00:06:56
Speaker
the checks and what the process needs to look like so that management gets an idea. I think a lot of times, if you don't articulate that, they don't necessarily understand all the grunt work that happens. And anybody who's been working in data will understand is 90% of the problem is just once you have a defined problem, defined, figuring out how do you get the data
00:07:23
Speaker
in the right format, making sure that it's clean, making sure it's accurate, getting it structured properly so that you can essentially start being able to play around with it so that you answer the questions. And the great thing about data visualization is you can start playing around with the views and how the data feels and the distributions and so forth.
00:07:48
Speaker
but you need to make sure that you have the right tools and the data is structured in a way that allows you to do that. But that takes time.
00:07:57
Speaker
Yeah, yeah, it certainly takes time. I mean, I think that lines up with a lot of what I've seen and heard. You've mentioned tools, and I'm curious what sort of tools you use and what you've seen as successful. And also how you've seen, I mean, you've been at Intel now for about 10 years.
Tools and Evolution at Intel
00:08:12
Speaker
You said how you've seen the tool sets change over that period, especially as it relates to data visualization. I personally use Jump, but there's a whole bunch of other different tools out there that people can use.
00:08:26
Speaker
A lot of people within Intel use Excel. Like I said, there's a lot of limitations with it. The great thing I like about Jump is it really allows you to point and click, be able to pull data in from various sources, be able to combine it. And then the Graph Builder option in there is just absolutely amazing because it's just drag and drop and you're able to kind of look at data much more in a more complex way.
00:08:56
Speaker
Um, then you could just kind of your typical graphs within Excel. I mean, you probably can do some stuff in like in Excel, but it could take you a really, really long time to do. And as Intel are most people you work with, are they SAS programmers? There are people at Intel that work in SAS, but they're usually the statisticians that are using, they're using SAS.
00:09:19
Speaker
or a job, yeah. Right, right. And most of this work that you're talking about, this is for internal deliberations and internal decision making, is that right? Pretty much, yes. So what happens when you start thinking about trying to put something, you know, out the door trying to communicate to public or the or a layperson or you know, a decision maker with it with another firm, or is that not really a consideration for the kind of work that you're doing right now? It's not really a consideration of work that I do.
00:09:48
Speaker
So when you are working with folks and trying to get them to buy in, it seems clear to me that you get some group of people who are invested, and they are the ones trying to get managers to buy in, and maybe managers do, and maybe they don't. But let's say you get the managers to buy in. How do you get the rest of the organization to adopt these ideas and strategies?
Training and Data Governance Challenges
00:10:10
Speaker
So one of the things that I would do is just, and I've seen this effective in other organizations, is really having training
00:10:18
Speaker
being able to train engineers on how to use their data, not using toy examples that might be open data sets, but their actual data, and being able to showcase to them exactly, hey, you have this data, and here's the complexity of this data, and here's a way that you can actually be able to graph your data.
00:10:46
Speaker
And does this help you provide some insights into decision making for your work? Does it save you time? Does it provide more insights? I found that very effective. Now, I saw a lot of lights go on.
00:11:04
Speaker
The problem that I did see when I started doing this was our data wasn't necessarily in a way that you could easily bring it into a statistical program. Statistical programs take the data and need a certain structure to it. For organizations that haven't necessarily been in that thinking of how to structure their data, let me give you some examples. There could be some very important strings
00:11:31
Speaker
that basically engineer just writes out in their dataset from a stat point of view or from a program, these strings should be parsed out. And if they're not necessarily able to be parsed out in a systematic way because there's so many of them and the information can't be categorized properly, it can take a really, really long time for an engineer who doesn't necessarily know what the mental model
00:12:00
Speaker
of being able to pull their data into a big program and then essentially be able to say, here's the structure I needed in order for me to start using jump or another program. It can take a really long time and that can be the huge obstacle from them to be able to move to being able to do data visualizations in a statistical program or another
00:12:30
Speaker
data visualization program. This is one of the things that from being able to articulate to management is the idea that the data needs to be structured and that there needs to be some kind of data governance around this in order to make this transition from being pretty unstructured to a point of being able to get some insights from it.
00:12:58
Speaker
Do you find that statisticians, engineers, folks you've worked with or have worked with that they are reluctant to use visualization tools as an analytic tool? And they're more inclined to sort of stick with the good old, I'm gonna look at, I'm gonna calculate distributions and means and percentiles and I'm gonna do sort of standard statistical tests but not use the visualization tools as part of their analysis process or toolkit?
00:13:28
Speaker
Um, yes. And I think a lot of it isn't necessarily they're reluctant to do it. As for like, they don't see value, it's just too painful for them to do it. So because it's painful, they're not going to necessarily do it because it's going to cause them so much more work to do. Yeah. So painful, painful in the sense of sorry, but painful in the sense of having to learn a new tool and, and go through a whole new process. Okay. Yes. Yeah. That seems to be like the been some of my,
00:13:55
Speaker
biggest challenges is really, they see the value of it, but they're not willing to do the work. Right. Because they have to invest the time to learn the new tool or to think in that way. And while there may be a payoff, they may not think the cost is worth it, right? Right. And you also have to realize, you know, this isn't
00:14:17
Speaker
work that's really value added to them. You know, they can always go back to their old ways because hey, this is the way that we've always done it. I've been rewarded on it. By the way, I am so busy. You know, I mean, we're not talking about people who are working 40 hours a week. I mean, these, these guys are working, you know, many, many, many more hours than 40 hours. Um, and you know, they need to get things out the door. They need to make these decisions to be able to communicate kind of
00:14:47
Speaker
you know, whatever they need to be communicating as fast as they can. And so it's kind of more like, yeah, it'd be great to do it. But this is the way that I've been, you know, doing this before, it's been okay, I'm just going to go out. And until management kind of says, Hey, I'm just basically enabling the old paradigm
00:15:09
Speaker
Things are not going to actually change. There needs to be an incentive to change. Right. What do you do with folks who are busy, but they're inclined to use the new tool or use the new process? Do you offer internal trainings? If so, how do you get them to say, okay, I need to take a few hours or a half day or a full day out of my schedule too?
00:15:32
Speaker
dedicate myself to learn this tool and then I always find the problem is even if you sat with a tool for two or four or six hours, it's the ongoing use that really makes it useful because you don't learn something as a thing you're going to be good at after a couple hours.
00:15:50
Speaker
Right. And it's, I found that, you know, you have to do the kind of the drip approach. So you teach them something and then, you know, maybe it's something like, and they find out that it's easier to pivot my data in jump than it is in Excel. So now they're bringing the data in, doing the pivots and then taking it back and kind of going on.
00:16:14
Speaker
And it's kind of that building. So the next time they'll do the pivots, they're like, well, I can be doing this. I could maybe I can just take this into Jump Builder because hey, yeah, Sarah just showed me how to use this. It's kind of that drip thing where you keep on introducing this and reinforcing it and just allowing them to explore the tools and kind of learn it step by step.
Data Science Center of Excellence at Intel
00:16:39
Speaker
So when we think about organizations that are large, Intel is a large organization, but even organizations that are not that large, a lot of organizations that I've worked with, for example, are pretty siloed off. You have the communications folks over here, you have the data analysts and elect folks over there, you've got the managers over there. How do you break down those walls between the different silos and get
00:17:05
Speaker
I don't know, I guess a wider swath of an organization to buy into the change in data analytic methods and data visualization methods? You know, that's a really good question. I mean, Intel is a really large organization, we have so many different organizations that have really great capabilities in analytics to those that they're still trying to struggle with, you know, what does analytics mean? How can I incorporate this? And be so there's really a dichotomy
00:17:35
Speaker
within Intel and a lot of it comes back to what management in different parts of the organization have put value in.
00:17:45
Speaker
Which is an interesting question because one of the things that I've been leading is a data science center of excellence within Intel, where we're just basically bringing all the data scientists within the company and being able to share kind of how various different organizations handle data. What are the kind of the best practices? What does it mean to be a data scientist at Intel? What is the career growth? How can we,
00:18:14
Speaker
be influencers and be heard so that essentially the entire organization of the whole corporation can benefit from all these best known practices and I probably said BKM which is best known practices which is an acronym in Intel.
00:18:32
Speaker
So, you know, what does that mean? And how can we make Intel a data science leader? And when I say data science, I'm not necessarily using kind of the, I'm using it generally as anybody who knows data. It could be a statistician. It could be a traditional data scientist, a data architect, a data engineer.
00:18:53
Speaker
and so forth, or even a data citizen, a data user. Yeah. Yeah. Very interesting. Very interesting work. And I think it's one of the places where there's a lot of discussion and challenges, because as you've mentioned, there's a lot of different tools and different needs. And I think people
Podcast Conclusion and Listener Engagement
00:19:08
Speaker
are coming more and more aware of what they can do and what they need to do. So good luck. Sarah, thanks so much for coming on the show. It's been really interesting. Yeah, no problem. Thank you very much, John.
00:19:21
Speaker
And thanks everyone for tuning into this week's episode. If you have comments or questions, please let me know on the website or on Twitter. And please rate and review the show on iTunes or your favorite podcast provider. So until next week, this has been the PolicyBiz Podcast. Thanks so much for listening.
00:19:45
Speaker
This episode of the PolicyViz podcast is brought to you by JMP, Statistical Discovery Software from SAS. JMP, spelled J-M-P, is an easy to use tool that connects powerful analytics with interactive graphics. The drag and drop interface of JMP enables quick exploration of data to identify patterns, interactions, and outliers.
00:20:05
Speaker
JUMP has a scripting language for reproducibility and interfacing with R. Click on this episode's sponsored link to receive a free info kit that includes an interview with DataVis experts Kaiser Fung and Alberto Cairo. In the interview, they discuss information gathering, analysis, and communicating results.