Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #38: Steven Drucker image

Episode #38: Steven Drucker

The PolicyViz Podcast
Avatar
136 Plays9 years ago

I’m pleased to welcome Steven Drucker to the show, who is a principal researcher at Microsoft Research. Steven and I talk about MSR’s new project, SandDance, a browser-based information visualization system that scales to hundreds of thousands of items. You...

The post Episode #38: Steven Drucker appeared first on PolicyViz.

Recommended
Transcript

Introduction and Sponsorship

00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by Juice Analytics. Juice is the company behind Juicebox, a new kind of platform for presenting data. It's a platform designed to deliver easy-to-read, interactive data applications and dashboards. Juicebox turns your valuable analyses into a story for everyday decision makers. For more information on Juicebox or to schedule a demo, visit juiceanalytics.com.

Guest Introduction: Steven Drucker

00:00:35
Speaker
Welcome back to the Policy Viz Podcast. I'm your host, John Schwabisch. I'm joined by a very special guest today from Microsoft Research, Steven Drucker, principal researcher at MSR. Steven, thanks for coming on the show. My pleasure. So we're talking live in Germany, which is interesting. We're at a visual storytelling conference and Steven has been showing us a sand dance tool.
00:00:55
Speaker
which by the time this has come out, you will now have seen is live for people to use. So Stephen, before we talk about Sand Dance, which people are going to, I think are going to be really interested in hearing about, maybe you can tell folks a little bit about MSR and what it is you guys do and how it fits into the larger Microsoft ecosystem.

Research Freedom and Impact at MSR

00:01:12
Speaker
Sure. My pleasure. So MSR, Microsoft research is about a thousand people across a whole bunch of different labs. Some in Cambridge, England, Beijing, India, and most in Redmond.
00:01:23
Speaker
It's fairly different than the rest of Microsoft in that we're very bottom-up. Our ideas come from individuals that might want to gather them into some bigger things, but there's a lot of freedom to pursue things that we think make a difference as opposed to, hey, there's been a project and people are assigned to work on that project. And the way we make impact is publishing, advancing the state of the art across everything.
00:01:48
Speaker
sometimes trying to make impact by transferring ideas to product groups, sometimes making intellectual property. But again, it's not any one of these things, it's a combination. So we go to all the academic conferences, we attend seminars like this one, where it's really just about sharing these ideas with others and partaking in the academic discussion.

Overview of Sand Dance

00:02:12
Speaker
And so the new project that you have coming out is called Sand Dance, a new sort of interactive data visualization with also a storytelling component to it. So can you describe what Sand Dance is and sort of what was the motivation behind creating it?
00:02:27
Speaker
So there are a couple of things here. One, San Ants is kind of unique in that we have a real emphasis on showing both the individual items and how they're organized in a whole. So I kind of like to say the forest and the trees. A lot of visualizations tend to show you the aggregates, which is great for looking at overall, say, businesses and sectors. But if you want to find outliers, or if you want to find anomalies of any sort, or if you're kind of exploring your data,
00:02:55
Speaker
it often helps to kind of see where those individual points are, perhaps in the context of, oh, what's this point doing all the way out there with really high education, but really low income? Let me explore that further. So it's about that. And it also allows you to transition between many, many different views of the data, bar charts, scatter plots, other things, while it's still showing those individual particles.
00:03:21
Speaker
Another motivation for this is this whole continuum between presentation and exploration. It's kind of my belief that a really good way of presenting something is showing you how I came up with it. It might not be the exact path. I might have noticed anomaly, I explored it, I sort of found an explanation for it, and then I'll take you to the best part. But revealing the process of why that is so
00:03:44
Speaker
in the tool that I found, it really helps a lot. And so that's another emphasis. So we can save insights that we come back with, play those insights back, share those insights with others.
00:03:56
Speaker
So having that all in a very, we hope, simple to use tool. And in fact, I've been working on this for over two years, and a lot of the difficulty in this is figuring out how to make this both simple and powerful. Because it's really easy to just eliminate features, but then it's a toy. But then you put in too many things and it's just too hard to use money.
00:04:17
Speaker
And so right now it all works in the browser. That's right. So why don't we talk a little bit about some of the demos that are available. So the one we were just looking at has county level election data. Right. And so I just wanted to talk about some of the stories that you could tell toggling between all the different views that you have available. So actually what I did with this data set was I took
00:04:34
Speaker
the census from 2010 and mashed it up with some of the political election results from the, uh, Romney Obama election. And I can tell lots of different stories in here. One, one certainly is a story about, you know, so the divided country and in some ways it might not be as divided in the way you

Applications of Sand Dance in Data Visualization

00:04:52
Speaker
thought it was. And it's the classic red state, blue state thing. Well, when we look at it at the county level, there's a lot more nuances. And when we look at how much Obama took each county or didn't take each county,
00:05:01
Speaker
It's even more nuances. There's swing states all over the place. And again, you get to see that by seeing these representations. And you've seen maybe the purple United States kind of map. But now we can fasten this and divide this up into many different versions, delve down into
00:05:18
Speaker
One county, five counties that are similar to it based upon their voting habits or their education levels or their income levels. So that's kind of one set of things that I think people really care about seeing and showing. You can even see sort of 3D views where you see the density of the population, see how that makes a difference.
00:05:34
Speaker
Another thing is just purely the demographics. I think there are a lot of fascinating stories about the demographics. We were looking at income and looking at the median income across the country. And you can see that how many counties are living at or below the poverty level. And again, these are counties. Some of these counties have 500 people. Some of them have 3 million people. So it's not exactly that, 99% kind of thing. But it's a fairly good proxy for that.
00:05:59
Speaker
And then you can delve into that further. You can see, okay, how does such an income and unemployment co-vary and how does education vary with that? And then when you look at it that way, you start seeing some weird anomalies, I mentioned before. The most educated county in the United States, 95% of the people have a bachelor's degree or more.
00:06:20
Speaker
they're not making much money and when i looked at it is going where is that i found that it was stanford california i was like what what's going on here and it's like oh okay i guess maybe they're on government stipends working on their higher degrees not making much money right and then i found like the next most educated place was also not making much and i looked into it was it was thompkins kind of what's thompkins carry so i clicked on it and
00:06:43
Speaker
looked it up and it's an automatic way of looking up. Found out, oh, that's where Cornell is. So then I reorganized it and go, let's look at this whole bottom half that are really highly educated, not making much money. And I found out every single one, the top 20 of them.
00:06:59
Speaker
They're all college students. And again, this is something I didn't know going in. I think it's an important part about visualization. And a weird thing about the story that's hiding behind the numbers that you might have. Exactly. Now, what's interesting about Sand Dance is the transitions from one view to another. So you have this county level map, a point for each county. And then if you want to look at it and say some sort of bar chart format, you toggle between the different views. But the points all transition from the map to the column chart. So in your view, what's the advantage of having
00:07:29
Speaker
that transition. And I know you can turn that transition off. But what's the advantage of having those transitions of all of those points moving? Here we have, you know, 3000 points, but you could obviously have smaller data sets. But what's the advantage of being able to transition or seeing the transition from one view to another? Yeah, I think it's all about maintaining context, kind of know that
00:07:48
Speaker
We're talking about the same thing. We're always talking about counties. Or if you're looking at the Titanic, you're always looking at passengers on the Titanic. And no matter what the representation is, that's what we're talking about. Well, we just look like a series of bar charts. But is this average income, or is this total number of people, or is this the sum of sales? And in each of them, you might have labels, but you've got to have a little cognitive switch. What does this mean? And I think seeing that the particles are constant across this,
00:08:17
Speaker
I believe helps people understand what's going on a little bit better. So I think that's the really important thing. It also looks kind of compelling. People are just drawing what's going on here. So that's another aspect. So how does Standance interplay with the rest of the Microsoft ecosystem?

Integration and Data Privacy Concerns

00:08:32
Speaker
I'm going to guess it works pretty well with Excel. Now, can you bring an Excel file directly in? Do you need to convert it to a CSV to bring it in? And how does it work with, say, even search engines? Can you go directly from standards because it's on the web? Can you go right to searching for, you know, maybe your data have links in them or something like that? Well, to start with the way it works with a lot of different product groups, that's always a tricky question. Research, again,
00:08:58
Speaker
goes by its own drummer. We obviously want to impact the groups, but a lot of times the groups just have different priorities. They've got customer responses, so it's always hard to tell what will make it and what won't. We've actually made a version that was a plug-in for Excel in the past. It's kind of interesting that they showed it to some of their most valuable customers, and some of them loved it and said, oh, I really want it.
00:09:21
Speaker
Some of them were saying, I don't tend to think visually this way. I just want the numbers. Just give me the numbers. And others were saying things like, you know, everything I do with this, I can do with a pivot table. And it's true, but I was looking... But not visually. Exactly. Not visually. And the people that can use pivot tables, I think, are the experts. And therefore, I think this was not a tool for the experts. This is a tool a little bit for...
00:09:44
Speaker
I don't know, information enthusiasts, maybe knowledge workers that are totally comfortable with it. So does it make sense? Excel also already has several different ways of doing charts. There are other groups, Power BI, other things. We're obviously trying to work with all these groups, either inform them or get it in. And I love that. But again, the drives that are driving these things, I think that research is a hedge about innovative dilemma.
00:10:07
Speaker
Excel has customers, they need to please their customers, but a lot of times you get trapped into just doing what the customers want rather than something else. So I think we try to kind of shake things up. If customers like this or they're demanding things like this, then Excel has a reason to do this. Now the other thing is, because this is all in a browser, we do make it really easy to link outwards. So you can click on something and it'll do a search automatically with one click and find relevant things based upon the context. So I show this with the Titanic dataset.
00:10:35
Speaker
You can find a child who died of this. You can click on her name. There's a website that has a little story about her. And to me, that's great because it helps humanize the data. A lot of times, back to that beginning point, you get lost in these aggregates. These data are often made up of individuals. And how does it work with sharing and exporting? Can you export stills? Can you export sort of a dashboard? How does that work? We can't do dashboards right now, but that's been a kind of requested feature.
00:11:03
Speaker
Sort of depends how you define a dashboard. Like in some ways what we've just been looking at, we're toggling between a view that's a map versus a column chart and just being able to toggle between the two, I might define as a dashboard. So it sort of depends. But how would one share that analysis? So we've got a number of ways right now. One is you can always take an image of it and export it.
00:11:23
Speaker
Another way is you can actually share your entire, you can say the insights, which are kind of like slides in PowerPoint, but they're actually interactive points into a presentation. So if you click on this thing, now you've got the data set loaded up and you can start interacting from that point onwards. There are guided tours through this data set as well. So it's another way of sharing this thing.
00:11:42
Speaker
You can email these insights. You can actually online collaborate, although I think we've turned off that feature right now. But you could be working on a data set at the same time someone else is working on that same data set and make selections and those selections and changes are reflected in other persons also for collaborative things. Finally, we're exploring other ways of when you export something, it might make sense to export that as a description for anything from Vega-like to Power BI to other.
00:12:09
Speaker
So they can just start with, you know, hey, I've got the data, now I want to cover that view. It might not be that particle view that's so useful for this presentation, but at least it's, you know, starting off with the conclusions that you've got. Right, right. And so it's working in the browser. So at this stage, if I work for a firm, I have proprietary data, I have
00:12:27
Speaker
data I don't want out there. This is probably not the tool for me. Actually, no, it is because it doesn't ever load the data up to the server. Essentially, you go to the browser, download some JavaScript to run this, and then at that point, if you open up some data, it never leaves your site. Now, if you want to share this with someone else who's not at your company, then you've got to have your data in a place that they could get to. Either they have a local copy and this file that you're sharing, it only points to the data. So the easiest way might be to put it up in a URL somewhere,
00:12:55
Speaker
you share this thing and they just click and it opens and it says, okay, you know, here's the URL. I see. Okay. Okay. So again, the data never, the data never leave, but each person could get the data on a shared server and then just download those. Right. And the only thing at that point that's quote shared to keep things in synchrony is items one through a thousand are selected. Sure. Okay. How customizable is it in terms of text and color, you know, all the sorts of good stuff we want to have. So one of the things that we spend a lot of time was sort of managing that complexity. Yeah.
00:13:24
Speaker
Under the hood, you can do all these settings and we can do text by any column. You can do line by any column, shape by, size by. You have all these things that you can customize. That's one form of being able to create a lot of different compelling charts. There's another element of customizability that we've been exploring, which is that you should be able to embed
00:13:45
Speaker
This thing in any web page that doesn't have the chrome outside doesn't have essentially utility on that Yeah, that's something we're actually exploring as well. Okay, so okay And so right now you have in the tool scatter plots Which which with geographic data turn out to be maps right longitude latitude column charts bar charts line charts and tree maps Do you have hopes or aspirations to do other chart types?
00:14:08
Speaker
Absolutely.

Innovations and Feedback in Sand Dance

00:14:09
Speaker
We've got 2D density maps, even some 3D views. 3D is very controversial, but sometimes, especially when you have 2D data and you have something on top of that, that makes sense. You've got views like that. It's interesting because back to pivot tables. Pivot tables work very well for sums.
00:14:26
Speaker
And for counts, an average, as we've been exploring, how do you represent averages? And sometimes the best way to do that is you actually do violin and maybe annotate them. We don't have an annotation layer that shows the mean and median, but that's another thing that we've kind of looked at. You know, we don't do a lot of statistical things in here. We don't show correlation lines. We don't show confidence intervals, but you could do that as well.
00:14:48
Speaker
So do you view it at this stage now as just come out? Do you view it at this stage as more of an exploratory tool or as a presentation tool or some combination of the two and just depends on what the user wants? Yeah, I think it's the last. I go back and forth. If you know what you're looking for, there might be some better ways of doing this. If you don't know exploring,
00:15:07
Speaker
This is a great way of doing this. There might be other things that are really good for exploring, but I have the most fun doing the presentation so much because I think part of this is making this into a compelling coherent message to someone else. That's kind of why I'm here. I'm excited about this. I'll be really curious. One of the points of doing this is a beta and getting this out to people. A beta implies it's going to turn into a product and that's not necessarily the implication. What we want to do is find out what people are doing with it.
00:15:32
Speaker
Is the preponderance of people using it for presentations? Are they using it for exploring? What kind of data sets? Is it textual data? Because we can handle textual data. Is it numeric? Is it aggregates? For instance, I also have a college database from the data.gov in here that looks at discriminating what college might you pick based upon all sorts of different attributes. That's very, very different than sales data, which we also have in there. So what do people find compelling in this?
00:15:57
Speaker
So that's really interesting in terms of getting the feedback. So how do you go about getting the feedback? Is it going out and asking people, are you surveying and how do you find the people that are using this tool as opposed to something else? Yeah, again, a great question. We've got a feedback link in here and I fully expect to maybe get 0.1% responses actually filling that out.
00:16:16
Speaker
We are looking at what kinds of data can we log. We don't want to do any privacy data, but we might want to look at things that are clicked on more or less, or how many tons of view changes, things that are completely anonymous. So hopefully we can get an inkling of behavior from that. We'll have blogs and discussions. There'll be a forum and probably some targeted experiments, meaning, hey, we've got this data set. Maybe we'll have a challenge and see what insights can you pull from it, or how did you go about finding something from that?
00:16:46
Speaker
And from MSR's line of things that you produce, how important are those communities in terms of getting things out and then getting the feedback? I mean, we know from all these different tools from D3 to Tableau to other Microsoft official products like Excel and PowerPoint that the communities and the forums and the working groups are all really important. When you put out a tool like Sand Dance or other projects, how important is it to have those communities engaging and discussing it with you or amongst themselves?
00:17:13
Speaker
Because the whole point of getting this out there is to get feedback about next directions, they're incredibly important. We don't necessarily have a fixed mechanism. As I say, I'm not even sure what the right community is. I mean, we can go to data enthusiasts. We can go to business intelligence guys. We can go to all sorts of things. Educators. I've demoed this on capital. I've demoed this to teachers. Teachers tend to love us.
00:17:35
Speaker
This will be great for teaching visualization to my students. So there's lots of communities that I want to see sort of where this resonates. And you know, part of the point is get it out there as much as possible and talked about and try to find out where people are falling down and not be able to use it. What it's not good for is as important as what it is good for.
00:17:53
Speaker
Is there probably like for instance? It's not very good at data sets that have 10 items in it because you can't mix and match and do all stuff with 10 items It's probably better to do something in Excel. This sweet spot is probably anywhere between 200 and 200,000 news we could handle about up to 200,000 maybe a little bit born a browser before the browser starts running out of memory, right?

Technical Development and Team Acknowledgment

00:18:14
Speaker
And for those who are interested in sort of the back-end technology, it's using WebGL as the machine running the whole thing. That's right. That's how we get the performance for 200,000 points animated at 30 frames per second. Of course, depend a little bit on the machine that you're running on if that's a decent graphics card or not. But it runs in anything from Chrome
00:18:32
Speaker
Internet Explorer, Edge, Firefox. I haven't tried an Opera, maybe. Runs on the Mac in Chrome and Safari. It doesn't currently run on iOS because we just haven't had a time to look at why the WebGL implementation is different. Everything you can do, you can do via touch. And that's another important part of this. So it's really a kind of rich and easy exploration. I love demoing this on a large touchscreen where people can see what I'm doing
00:18:56
Speaker
to see how they get the answers. And it makes them, I hope, feel that they can get answers themselves. Great. Well, it's a really cool looking tool. I look forward to playing around with it. And thanks for coming on the show. Yeah. One other thing I want to say is that this is primarily
00:19:11
Speaker
two people, Roland Fernandez, who's done a lot of the implementation to myself. And we've had various other people help along the way. But again, out of research, it's not this huge effort from a lot of people. Yeah. Good. Good. Great. Well, thanks again for coming

Conclusion and Sponsor Mentions

00:19:24
Speaker
on. Yes. Exciting. And thanks to everyone for listening, for tuning in. Please let me know what you think about the show or about Sandance. I'd be curious to know how you used it and what you think of the tool. And again, this has been the PolicyViz podcast. Thanks so much for listening.
00:19:51
Speaker
This episode of the PolicyViz podcast is brought to you by Juice Analytics. For 10 years, Juice has been helping clients like Aetna, the Virginia Chamber of Commerce, Notre Dame University, and US News and World Report create beautiful, easy to understand visualizations. Be sure to learn more about Juicebox, a new kind of platform for presenting data at juiceanalytics.com. And be sure to check out their book, Data Fluency, now available on Amazon.