Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #23: Arvind Satyanarayan image

Episode #23: Arvind Satyanarayan

The PolicyViz Podcast
Avatar
145 Plays9 years ago

Welcome back to The PolicyViz Podcast! In this week’s episode, I chat with Arvind Satyanarayan, Computer Science PhD candidate at Stanford University, working with Jeff Heer and the Interactive Data Lab. Arvind’s research seeks to lower the threshold for design, with a...

The post Episode #23: Arvind Satyanarayan appeared first on PolicyViz.

Recommended
Transcript

Introduction and Sponsorship

00:00:00
Speaker
This episode of the PolicyViz podcast is brought to you by Socrata. Socrata is the global leader in software solutions that are designed exclusively for digital government. They deliver unprecedented data-driven innovation and cost savings for hundreds of public sector leaders and millions of their constituents around the world. Socrata's digital government solutions are being deployed for a wide array of strategic and mission-critical government services that empower citizens while enhancing their quality of life. To learn more about Socrata, visit www.socrata.com.

Guest Introduction: Arvind Sattanarayan

00:00:37
Speaker
Welcome back to the PolicyViz podcast. I'm your host, John Schwabisch. I hope everyone's having a nice fall. I'm here with Arvind Sattanarayan. Look at that. Yeah, there you go. Yeah, exactly. Practice makes perfect. First take. Arvind, thanks for coming on the show. Of course. Thanks for having me. For those who don't know, Arvind works at University of Washington in Jeff Hare's group.
00:01:01
Speaker
behind lots of very cool tools that people are excited about. Arvind, why don't I ask you to start, just introduce yourself real quickly and then we'll, and then maybe just jump

Introduction to Vega: Declarative Visualization

00:01:10
Speaker
in. I think we want to talk about the new tool, newish release of the new tool, Vega, and maybe just talk about that a little bit, sort of the basic philosophy behind it and what you're trying to build.
00:01:21
Speaker
Sure. So I am now starting my fifth year in the PhD. Wow. And I've been working with Jeff over the course the last few years. And sort of the philosophy behind Vega
00:01:36
Speaker
just continues this thread of research that we've been doing in the group with declarative visualization toolkits. So it started with Peroviz and of course led to D3. And the design decisions behind D3 really wanted to leverage the existing web design tool stack, right? So it wanted to tie data to DOM elements.
00:01:57
Speaker
And that yields a lot of advantages. So you're able to continue using CSS to do the styling. You're able to use the JavaScript console for debugging and things like that. But it also restricts the set of higher-level stuff that you can do around it, and in some cases, puts some caps on your performance.
00:02:21
Speaker
So we created Vega to address those points.

Exploring Vega's JSON Specification

00:02:27
Speaker
And so with Vega, visualizations are represented through a purely declarative JSON specification. So there is no traditional sort of imperative programming happening. No for loops, no while loops, no if statements, things like that.
00:02:40
Speaker
All you do is say, you know, I want these marks, I want these scales, I want this data, and we've even got a sort of data transformation pipeline integrated into the Vega language as well. So things like filters and grouping into facets and so forth.
00:02:58
Speaker
Because a visualization is just JSON, that means that it can serve as a platform for higher level tools. Because it's much easier for another piece of software to generate a JSON object, particularly in JavaScript, than it is to write out a bunch of D3 code.
00:03:20
Speaker
And when users are working with Vega, are they working in a particular framework or sort of framework? Are they sort of building the code in the browser? What's the interface? Yeah, so there are a handful of different ways.

Tools for Creating Visualizations: Vega Live Editor and Lyra

00:03:38
Speaker
So we have a Vega Live Editor, which is basically just a text box with some syntax highlighting and so forth. So you can type out your JSON
00:03:50
Speaker
hand-coded. And that's definitely, you know, it's to some degree a challenge. You need to learn the Vega syntax and, you know, JSON can often trip you up if you forget a quotation mark or a comma in places. But that's definitely useful in its own right. But where we really see Vega differentiating itself from the others is, like I was saying, you know, a platform for higher-level tools.
00:04:13
Speaker
Last year we put out this tool called Lyra that allows you to do drag and drop design for visualizations, so kind of like Illustrator, but for data visualizations. And a bunch of my colleagues have been working on Vega related tools, but for exploratory data analysis.
00:04:32
Speaker
So recreating a Tableau style interface or even a gallery of visualizations to browse through your data. And so you can see that we're sort of moving visualization design up an abstraction level. And so we're removing the need for people to maybe manually specify these things and try and figure out is there a way to do it automatically? Can we recommend good designs? Or can users just do it in a sort of direct way?
00:05:01
Speaker
Right. And it's purely a visualization tool, right? Because I think one of the knocks, not knocks, criticisms of Tableau, I think is that it's a pure data visualization tool. So you need to have your data in a particular order to be able to make the visualization. Is that similar for

Vega's Ecosystem and Data Cleaning Bridge

00:05:17
Speaker
Vega?
00:05:17
Speaker
That's definitely similar. So like I was saying, we have an integrated data transformation pipeline, and it is extensible. So you can add your own transformations in there if you want. But we really do sort of focus on the visualization side of things. So we expect your data to be clean, structured, and so forth. So if you were to just dump, say, a CSV file from data.gov or somewhere like that, it wouldn't work right off the bat. Right. You have to clean it up.
00:05:47
Speaker
That's right. But again, one of the nice things about Vega being this JSON format is that it can also serve as this common language that all these different tools can talk. And so we're hoping that we don't need to be the one tool that handles everything.
00:06:07
Speaker
But rather, we start to build up an ecosystem of tools that all speak the same language, and so it becomes easy to start with your data in something like Data Wrangler, Tableau, or Trifecta, and then export it to another tool to do more customized visualization, but sort of underneath the hood, everyone's speaking Vega.
00:06:29
Speaker
I see. So in your mind, the total workflow is something like someone has some raw data, they're working in R or SAS or SPSS or state or whatever it is, they may export the core thing that they want to show to maybe Excel or to Tableau, they sort of draft something up, and then they take that same data set and they move it into Vega to create the online and interactive for their or online graphics for their
00:06:55
Speaker
That's right, because I think each of those steps is complex unto itself and needs its own specialized tool to really do a good job. And so I think sort of making it easy to go from one tool to the next is really sort of the way we want to go.
00:07:13
Speaker
And the one that just came out is Vega 2

Innovations in Vega 2.0: Reactive Programming

00:07:16
Speaker
.0. That's right. So what were the differences between the first version and version 2.0? Yeah, so the first version was mainly for static visualization. So everything in the JSON would be familiar to people that have used D3. So you've got your data and your transforms, you've got scales, you've got axes and legends, and then finally your marks.
00:07:41
Speaker
What that leaves out is interactions. And Vega One supported interactions, but you had to register event listeners, and then you're left to your JavaScript programming, which is just the way that D3 does it. And that introduces a bunch of problems for the user. The user has to manually manage state. They have to figure out all these internal details with event handling, like preventing propagation and defaults and all this stuff.
00:08:11
Speaker
And particularly when you were in Vega land, it was very jarring to have this nice JSON specification that described your static visualization, and then for all the interactive behaviors suddenly have to do programming. And so over the last year, what we did that ultimately yielded Vega 2 was figure out how to do interaction design declaratively. So it was part of that JSON specification.
00:08:36
Speaker
And so what we did, what we ended up doing was actually leveraging reactive programming, which a lot of people have been looking at recently, and particularly in data visualization.
00:08:48
Speaker
What we did was take this notion of signals and event streams and figure out a way to tie that into the rest of the specification language. The idea being that event streams kind of abstract away the difficulty of event callbacks and all the sequencing that happens.
00:09:09
Speaker
And then signals give you basically a dynamic variable that you can use throughout the rest of the specification. And any time that variable changes, Vega automatically re-renders the visualization for you. So you don't have to worry about any of the managing state or propagation yourself.
00:09:29
Speaker
Do you have a particular audience in mind? I mean, I think there are sort of the hardcore developers who are going to be doing, they want to do their custom thing in D3. And then you've got sort of, I mean, if you looked at sort of a spectrum, you've got like, developers, doing, you know, sort of work that you do, right? Like, all the way at the root, and then you have people who, you know, Excel and Tableau. So is there a, do you have sort of an audience, a user in mind?
00:09:53
Speaker
It's a good question. Like I was saying, a lot of people we've seen have been sort of coding Vega by hand. And from what we've heard, people do like it as a prototyping tool, a way to quickly generate something, quickly test out interactive behavior, see if it makes sense.
00:10:12
Speaker
And then maybe, you know, go to D3 for the fully fledged thing. And so that's definitely an option and might be particularly more so once, you know, we finish work on Vega Lite, which is, you know, even further up the abstraction ladder. But where I sort of really see the power is in those higher level tools. So sort of the Lyra's and the Polestars and the Voyagers.
00:10:37
Speaker
to figure out new ways of doing design that haven't been possible so far. Yeah, interesting.

Exploratory Data Analysis with Voyager and Polestar

00:10:44
Speaker
Do you want to talk about some of those other tools? Sure. Like we'll start on some of the other ones, Jeff, and then maybe some of the other things that you're working on as well.
00:10:49
Speaker
Yeah, so my colleagues, Dominic Moritz and Knit Wangsa Pasa, what? I hope I said that right. We're just at our first take today. I've been looking at the exploratory side of the data visualization pipeline.
00:11:08
Speaker
And one of the problems there, including with tools today like Tableau, is that if you have a data set that you're not familiar with, it can be difficult to know what questions to ask ahead of time. And particularly in Tableau, for example, you're presented with a blank slate.
00:11:27
Speaker
You need to know which fields in your data are interesting and you need to figure out which way to map them to visual encodings like should I put price on the x-axis and something else on the y-axis and so forth.
00:11:42
Speaker
And so what they did with Voyager was basically provide you a search interface over a gallery of visualizations. So right off the bat, you load your data into this tool, and it provides you visualizations that show you the distribution of various variables.
00:11:58
Speaker
various data fields. And then you can start sort of drilling down into your data set once you notice something interesting using a bunch of check boxes on the left-hand side. So you can say, show me the price and the yield. And once it does that, behind the scenes it's got a recommendation engine called Compass that tries to calculate some statistics over those data sets and suggest interesting visualizations to you to uncover
00:12:26
Speaker
things of interest. And one of their sort of principles behind the tool was to maximize data variation rather than design variation. So they're trying to show you as part of the gallery many different slices of your data set rather than showing you the same slices in different ways. Gotcha. Okay. Interesting.
00:12:47
Speaker
And then Polestar was just a quick or comparatively quick interface they put together to see if this gallery view was actually a useful way of showing data. And they found that, yeah, actually once they tested it with participants in a user study, they found that people do cover a lot more areas of their data set than with just a manual specification interface like Tableau or Polestar.
00:13:18
Speaker
So how do you feel about leading people down these roads? Basically, Excel leads people down roads into 3D exploding pie charts, right? Is that the way that just people are going to create visualizations as you lead them down a road or you just provide them with the library of possibilities?
00:13:38
Speaker
Yeah, I think Voyager or the goal with Voyager is really powerful in this way because it's something in HCI at least we call mixed initiative. So the idea is that it's neither purely sort of leading the users down a path and it's not, you know, the user purely
00:13:59
Speaker
deciding what it is they want to do, but rather it's sort of this cyclical process where you give the system a little bit of information. The system uses that information to make or present you with a list of possibilities and then you make a decision again. And so it's sort of this back and forth. So it's kind of guided exploration rather than
00:14:18
Speaker
And so the hope is, on the design side of things, we have all these perceptual principles that are known with color and things like that. But none of our design tools expose any of that knowledge to users. And so you can imagine similar sorts of things in a tool like Lyra where
00:14:39
Speaker
you know, maybe you create, you map some data to a color field, and then maybe Lyra starts to, you know, intuit the types of data you're trying to visualize and suggest colors for that. Because that's one of the studies that one of my colleagues in the lab did as well, which was, you know, if there's a semantic mapping between the data and the color, people are more likely to remember it and, you know, understand it.
00:15:06
Speaker
So there's things like that that I think can be operationalized in systems that we haven't explored yet. Interesting. I've asked the last couple of guests this question, so I'm going to turn it to you. Sure. Where do you think the field of data visualization is headed? And I think that's more not so much on the research side, but on the tools, and I hesitate to use the word storytelling, but on the
00:15:31
Speaker
Storytelling for back of a better word telling stories with data. Yeah, not so much on the on the research side I think that's a whole other discussion that we can definitely have but yeah The sort of you know that you're working a lot on these tools and you know So where do you see the field sort of evolving next over the next you know in 2016 I guess right? for me the big thing is is trying to get a
00:15:56
Speaker
away from requiring programming as a way to express this thing that is inherently visual and interactive. My focus over the next year is definitely going to switch back towards Elira and figuring out how we can close the loop with interaction design.
00:16:13
Speaker
And then all the stuff with storytelling I think falls out as future work out of that.

Lowering Barriers to Data Storytelling

00:16:20
Speaker
Because Lyra right now is this fully featured, monolithic sort of application. And I think it's interesting to think about how do we break that down to make it even simpler, sort of what if,
00:16:35
Speaker
on data journalism sites like the Upshot or something like that, people not only had a comment box, but like a little toolbar that allowed them to make a very simple visualization from the data that's being showed. And so that way, if you really reduce the threshold or lower the barrier for creating visualizations, I think
00:16:56
Speaker
people can much more quickly engage with the data or offer sort of rebuttals or things like that. The same sets of conversations that we see sort of on the data visualization Twitter sphere where people throw visualizations back and forth and critique it and sort of unpack it, I think that only happens right now amongst sort of the professionals and people that are interested because the barrier remains so high to create these.
00:17:24
Speaker
So my hope is that if you make it easier than that discussion sort of democratizes. Interesting. What have we missed? What else are you working on? I'm sort of just pushing this whole tool stack forward. So my current work is looking at, well, great, we have all this interaction stuff at the Vega level. How do we raise that even further up the abstraction ladder to the Vega light level?
00:17:53
Speaker
And so that way you're creating specifications in Vega Lite much more efficiently than in Vega. But again, we've not thought about any of the interaction stuff. So my hope is that once we figure it out at the Vega Lite level, that might inform how we do interaction design in Lyra. So how do you interactively design interactions and things like that. So that's sort of my next year, year and a half's worth of work.

Future Directions: Simplifying Interaction Design

00:18:19
Speaker
And so all the work you've been talking about, like you said, tries to democratize the process of creating visualization. So you give to some person, you know, they're an Excel user, you give them Lyra, you give them Vega, you give one of these tools, and they're like, great, I can use a tool to make something. But now I have to figure out how to get it from this thing into the browser. So is that another? I mean, that's obviously part of the workflow. Is that another sort of thing that you're thinking about trying to help people do in an easier sort of way?
00:18:49
Speaker
Yeah, it's a good question. So we've been trying to figure out better ways to make Vega visualizations publishable online. So I think one of the things that made D3 as popular as it is and as useful is that huge gallery of example visualizations and blocks.org and things like that.
00:19:09
Speaker
And so we've been trying to figure out how can we leverage GitHub Gists or other things like that. So we have a little project in the Vega GitHub organization called Vega embed, where we're sort of prototyping some of these ideas out. But it's definitely on our radar. Yeah. Cool. Yeah. Great. This is super interesting. Thanks for coming on the show and telling me all about these great tools here. Absolutely. Thanks for having me.
00:19:37
Speaker
And thanks to everyone for tuning in this week. Of course, if you have comments or suggestions, hit me up on Twitter or on the website. And please rate the show on iTunes. Moving it up the queue really does help. And until next time, thanks for listening.
00:20:03
Speaker
This episode of the PolicyViz podcast was brought to you by Socrata. Socrata is the global leader in software solutions that are designed exclusively for digital government and provide benefits for hundreds of public sector leaders and their constituents. The company's customers, among others, include the cities of New York, Chicago, San Francisco, and Los Angeles. To learn more about Socrata, visit them on the web at www.socrata.com.