Introduction and Guest Introduction
00:00:11
Speaker
Welcome back to the Policy This Podcast. I'm your host, John Schwabisch. Summer is sort of in full bloom here in DC. Although it's great today, but I'm here, happy on a great day, because I'm here with Shomik Sarkar from the Democrat National Committee. Welcome to the show. Thank you for having me. I got to say before we get started, I love the little jingle at the beginning of your podcast. It is very good. I have at times rewinded.
00:00:40
Speaker
It's a very good drink. Well, thanks to Pottington Bear for the music. Oh, there you go. You just gotta get out there and find the right person. Yeah, it's very good.
Shomik's Background and Role
00:00:53
Speaker
So what was your title when you were working with the CLIMP campaign? You were working on the CLIMP campaign for a while and now you're at the DNC. So what was your title there and out here? So I'm a data scientist at the DNC and then I was kind of on loan after the primaries entered to the CLIMP campaign to run the reporting team there. So my title was Director of Reporting. Oh, okay. And how long were you with the campaign? I was right after the primaries and it is when I started. So May, June. Okay. All the way through the end of that. All the way through the end, yeah.
Data Analysis in Campaigns
00:01:24
Speaker
great. I'm personally very curious, and I'm sure many readers are curious to hear about how data is, and data generally was sort of part of the campaign, what you were trying to process.
00:01:36
Speaker
quality and made the visualizations and pass them off to people who were presumably making important decisions about how to run the campaign. So I'm just going to throw it out there. Yeah. And we could just talk about it and just spend a few minutes talking about the whole process that you went through. Sure. So I would say that overall, it was a pretty integral part of the campaign. Most of the analyses and reports that were published
00:02:03
Speaker
were based out of our Tableau server. So we have a Tableau server that's housed at the Democratic Party that serves the needs of all of our constituents, including campaigns and committees that we work with. So that includes the D3C, the Democratic Congressional Campaign Committee, the DS, the Sanatorial Campaign Committee, as well as the presidential campaigns. So when I was still at the DNC before,
00:02:28
Speaker
The primaries ended I was working with both presidentials and then after the primaries ended I kind of moved up moved up to Brooklyn and started the team there good and I would say that my team's goal was to better integrate the analytics functionality with What was going on on the ground and specifically around field data? Okay So that
Integration of Analytics with Field Data
00:02:55
Speaker
was not only tracking
00:02:57
Speaker
what was happening on the ground. So who are we talking to? What are we saying to them? Are we allocating resources properly? But also figuring out how to layer analytics on top of that to see if we were meeting certain benchmarks, if those processes were optimized properly.
00:03:17
Speaker
You know, it was both interactive reports that senior staff used and then static reports that came out of a Tableau server that were kind of set around. So give us a sense of what the data were being collected and then what you were trying to get out of it. So I'm imagining field workers going out and talking to voters and collecting all their information about
00:03:38
Speaker
Well, what are they asking about? Yeah, so there is kind of a CRM that sits on top of our database that's called, that's based in NGP VAN, and they house all kinds of field data.
00:03:53
Speaker
every time your organizer goes out and knocks doors or makes calls or does volunteer building, it's all logged there. So then part of my team's responsibility was organizing all of this data on the back end, setting up database infrastructure to easily digest all of this information and to optimize it for use in these data visualizations. In our case, it was tough.
00:04:21
Speaker
What was the process before you came in and started working to revamp that?
Data Pipelines and Visualization Standards
00:04:28
Speaker
Before my team was there, the Tableau server was being used. I was kind of working more in a consulting role and working with the various analysts that were building reports and disseminating best practices and going through and giving advice on how to build these visualizations. And then part of what I did when I got there was consolidate both the backend
00:04:52
Speaker
setting up these pipelines to make it less ad hoc when we were doing analyses, and to put in place a series of visualizations that would be kind of the standard that state leadership could use, that leadership and headquarters could use to make decisions. And most of that was around field data.
00:05:14
Speaker
What were our organizers doing on the ground? Okay, so sort of so automating it and making it quick. Yes, and and Automating it was definitely a you know
00:05:24
Speaker
Part of what I did when I got there was to make sure that we weren't using Excel and PowerPoint and moving over to Tableau for a lot of these reports I think because it would save time down the road, you know, things move pretty quickly towards the campaign. So, you know, we wanted to make sure that things were set up so that we could produce reports more quickly and taking out that human layer where there's kind of room for error if you're
00:05:52
Speaker
manually putting in numbers into PowerPoint or Excel, then it could cause some problems. Yeah. So it definitely required some legwork upfront, but it was, you know, when we were producing 10, 12 reports a day, it was really, it was important that that infrastructure was in place. So can you tell us a little bit about the makeup of the team?
Team Composition and Tools
00:06:14
Speaker
How many people were on there and sort of the backgrounds of those people? And then how did you get management to buy into
00:06:22
Speaker
how important individualization is to the campaign.
00:06:28
Speaker
The team that I was working with, they came from a variety of backgrounds. The way I look at it is there was back-end folks and more front-end folks. And part of the process was setting up the pipelines. In this case, we use HP Vertica as our database, so it's all SQL-based, a little bit of Python for scripting, and setting up those
00:06:54
Speaker
those pipelines to pull in the data from digital and from field and financial data, which are housed in disparate systems, but for the purposes of reporting, you want them all to go. So setting that up required SQL engineering, data engineering type of skills. And then on the other side of things is
00:07:21
Speaker
Producing the visualizations based on that data which for our sake was was all in Tableau Partially because you know, I have a little bit of background in D3 and JavaScript high charts things like that but Tableau is I would say the
00:07:41
Speaker
the easiest to iterate on quickly. And the producer of the report doesn't necessarily need to be able to code, which is important in this type of setting. By the end of the campaign we had, I was just looking at these numbers today actually, we had 3,000 users on our Tableau server. So every organizer could log in
00:08:10
Speaker
Yes, every single organizer, state staff, staff at headquarters, and then I think I checked that we had 2,000 visualizations that were on the server by the end of the campaign. Were you also providing training and support for the local offices to help people
00:08:29
Speaker
like learn how to use some of the parts of Tableau or how to maybe make the dashboard that you made, but for their individual. Absolutely. So my team was responsible for all of the trainings, both in understanding this data and on the how to build a visualization. Yeah.
00:08:48
Speaker
So, you know, with an organization this large and that scales up so quickly, part of what we had to do was make sure that everybody was pulling numbers the same way.
00:09:02
Speaker
We had a series of base tables that my team built. You know, had all of the key metrics that we identified that I worked with leadership to pull together and have that set up in a way that's really easy to kind of pull those numbers out and to build visualizations on top of that. As I say, Tableau likes things longer rather than wider, as you know. So setting things up
00:09:34
Speaker
So now can you talk a little bit about the management side? As we know it takes management to buy into something. How did you manage that?
Empowering Non-Technical Users with Data
00:09:43
Speaker
What sort of things were you giving them? We met a few weeks ago, we talked about this tension in some ways of interactive versus static.
00:09:52
Speaker
think about that and work that whole workflow. So one of the things that I think I realized pretty quickly was that data visualization allows non-technical end users to really feel like they own the data or really get in the weeds with data that they otherwise would not have been able to do. I think that
00:10:14
Speaker
When I first set up some of these interactive reports, I had pretty strong positive feedback from, I worked really closely with the National Field Organizer and the field teams at headquarters.
00:10:31
Speaker
They be just being able to dig into these numbers without having to write code was pretty powerful and you know, they wanted more which was which was good, but obviously, you know, there is that tension as you mentioned between interactive reports and static reports and you know, there was we were sending slides to Hillary herself. Obviously, she was not logging into our tempo server, unfortunately.
00:11:03
Speaker
But part of what I wanted to do was set up a system where even a static report that looks like a traditional PowerPoint could be built in Tableau and you know by the end of the campaign you could pretty much just push a button and produce the same report every day. Setting that up in the early side of things made things a lot easier down the road. So that means that you have sort
00:11:28
Speaker
in my mind, you have Hillary at the top as the candidate. And then you have a couple people who are just below her that are sort of the senior staff. So are those senior staff level people, are they the ones that, did they sort of go into the Tableau server and make something quickly and be able to push that out? Or does it need to go further down the chain? Yeah, so they weren't making reports, but members of the senior staff were logging into our, so I worked really closely with our
00:11:57
Speaker
national field record or you know, she was logging into our Tableau server regularly. So there were definitely members of the senior staff that were using the interactive features of our visualizations. They were designed with that in mind. But then there were
00:12:13
Speaker
you know, the campaign manager and Hillary herself and some of her closest advisors were not necessarily logging in every day. We were making sure to, you know, we were sending them debts regularly. And they were made in Tableau, but they were kind of static. Okay, and so for the folks who are listening who are on the hardcore Tableau, what sort of
00:12:36
Speaker
I just sort of want to get a sense of, like, specifically, were you creating, were you doing story points? Were you using Tableau, like, single dashboards, separate worksheets? Like, they're probably, um...
00:12:47
Speaker
Well, maybe just a few. Like how do I build this same sort of ecosystem where I build this thing and save it out as a static PMG or PDF? So can you talk just a little bit about the sort of nuts and bolts? Yeah. Yeah. So it was usually it was a single dashboard with multiple works and, you know, there was.
00:13:07
Speaker
Often times multiple data sources and each of the worksheets would have a separate data source that I like to and you know, it didn't require often times a lot of legwork as I mentioned before on the data engineering end to join things up in a way that that made it easy to Build this visualization, right? But yeah, usually it was you know, I'd set it up so it would look like a landscape and
00:13:32
Speaker
Yeah, yeah, let's say that out and then you know, we would they would automatically refresh So we would we would set an extract that ran every day and say and then you know at 9 a.m We would email it out to the people in here We talked a little bit about the data quality. Well, let me first talk about asking about qualitative data Yeah, I would get a lot of questions about this lately. So I'm curious I would expect that there's a lot of qualitative data coming in from the from the field reps talking to people about their
00:14:01
Speaker
Positions and views were you getting a lot of qualitative data? And if so, how did you how did you end up visualizing matter or getting that you know? Yeah, so let's so from field I would say It was a quality data, but we did tracks our digital Assets in our texting program and a lot of that is purely quality. Yeah
00:14:23
Speaker
So not my team, but there were folks on the campaign that did more kind of natural language processing type of work and would be able to be able to identify frequencies of certain topics that were being discussed. And we would report on, on the frequencies on the frequencies of, yeah, the different topics, or if we were getting texts, you know, our texting program, we were tracking what people were texting back. Yeah. In there, we, we wanted to see what, you know, what were people were responding? What was the sentiment around that? Okay.
00:14:52
Speaker
Um, can I also, I want to also ask about the data quality because we look back in the polls. Well, you sort of see lots of coverage now. It's like, well, the polls are really bad, but also they weren't really bad. So how did you, and I don't want to, we don't really need to be hassle and polls are good, not, but when you got the data in, what was the work like to assess the quality?
Data Quality Challenges
00:15:17
Speaker
like well here's a hole or here's some field work that someone did but it's hard to know whether that's accurate or not, right?
00:15:25
Speaker
Yeah, I would say that in terms of digesting the polling data and the field data, it was pretty standardized. The ways the polls were coming in and the ways that Van both others, the tool that organizers use on the ground, the way that digest data is fairly standardized. So there wasn't a lot of room for data quality issues. I would say the data quality issues were more on that unstructured data that we were talking about.
00:15:54
Speaker
through the texting and things like that where you know figuring out what's noise and what's not and right and figuring out if if it's worth tracking certain Twitter data things like that yeah so what about tracking
00:16:11
Speaker
margin of error or uncertainty stuff or uncertainty around some of these polling estimates, especially when it comes to talking to the small field office in rural Minnesota or whatever it is. When you are training people up on how to use the Tableau server, how to do this workflow, are you also training up on, yeah, these polls, they look different, but they're really not because of the margin error. Were you working on that as well? Yeah, so that wasn't under my current view, but there were definitely people on staff in the analytics world that were working on it.
00:16:40
Speaker
on that. We were a pretty large operation by the end of the campaign. We had, I think, 80 plus on just the analytics team and then 80 plus on the tech team. I'm not sure if those numbers are exactly right, but in that general ballpark. We were a pretty big operation. Part of why, I think,
00:16:59
Speaker
Tableau and data visualization was used so much more frequently than it is in other campaigns and committees that we work with. It's just there were so many staff members, an analyst that needed to produce reports on the fly pretty regularly. The Tableau server was kind of the main source to disseminate this information.
Role Expansion at the DNC
00:17:20
Speaker
So now the election's over, and you're back at the DNC. I'm back at the DNC. So can you talk a little bit about what you're doing here specifically? Well, I was going to assume what you do, but I'm going to pull up. Yeah, my responsibilities at the DNC are
00:17:36
Speaker
I'm like the campaign where I was focusing mostly on data visualization. Here I have a broader portfolio, but I think I'll touch on the data vis side of what the DNC does. We house the Tableau server that every campaign and committee that we work with has access to. So part of what our responsibility is, is to make sure all that is set up properly.
00:18:01
Speaker
permission properly and they have the tools they need to build these reports. And then doing trainings on how to most efficiently pull the data, right? So that's a lot of, it's more on the SQL side. And then on the Tableau side, how to build these visualizations. And we are kind of salting shop to these campaigns.
00:18:25
Speaker
committees. And there's a suite of reports that we house here that is kind of universal that everybody can use. But then we also go out and work on, you know, specific issues or meet weekly with say,
00:18:41
Speaker
Democratic Governors Association, or in this case, they have, you know, there's more going on there, also the D triple C. So, you know, they build reports, you know, I have a counterpart of D triple C, and he works to build reports for these congressional campaigns. Right. So more and more, the data visualization piece is becoming an integral part of any field program, just to be able to track what's going on on the ground. And to enable a campaign manager or senior
00:19:11
Speaker
staff to log in and understand both geographically and visually understand what's going on. And specifically when it comes to the database, I'm guessing that these dashboards are pretty simple in terms of the aspect makeup and not doing crazy like tile grid maps.
00:19:28
Speaker
There were some definitely crazier maps on the campaign just because we had so many people building the ports. I remember at one point in Ohio, the data team there built a visualization on where all the pokey stuffs. This is where when Pokemon Go was big and they were doing voter reg at these Pokemon Go stuffs. So they built this like it had like Pikachu popping out the side and there was like all kinds of stuff happening. I remember I logged in and I was like, whoa.
00:19:53
Speaker
I was like, these guys have to teach me something. But yes, no Pokemon in reports here. They're pretty basic. And a lot of times I am training staff that are using Tableau for the first time. So we want to keep it pretty simple. Is there a lot of variation in what people want to create? Or because you've created this template of how many dashboards that people are satisfied with what you've already created,
00:20:23
Speaker
Or do you see people coming in and saying, well, I really want to do this other thing. I want this other type of data. I want to build this other thing. So I think that depends on the resources that the campaign has. So generally a congressional race, for example, won't have a dedicated data staffer on the campaign. Senate and governor's races are usually different. Usually they have one staffer there.
00:20:51
Speaker
So, places where they don't have data staff, they rely more heavily on the DCCC, the Democratic Universal Campaign Committee, and us at the DNC to be able to build out those facializations.
00:21:11
Speaker
to help them iterate on the kind of baseline. So now looking forward, it's almost mid-2016 already, a year and a half before me, I'll vote again. No, that's not true. Lots and lots of elections coming in between.
00:21:33
Speaker
Looking forward, I don't want to talk about who's going to win. But in terms of the data and the data biz, both the ecosystem and what you see are going to be the demands from campaigns and also what you're going to supply. Do you see big changes? Do you see things happening? Do you expect to make big changes, either infrastructure-wise, either on the back end or the front end? What are we looking at?
Future Campaign Innovations
00:21:58
Speaker
Yeah. So right now, we have a Tableau server.
00:22:01
Speaker
I anticipate that we're going to continue to use Tableau as the main source of visualization that's housed here at the party, but that could definitely change in the future. But in terms of how I think data visualization in campaigns is going to change, I see
00:22:18
Speaker
I see this cycle as one where interactivity was, as I said, senior staffers were logging on to the Tableau server and using the interactive components of our visualizations. I see that kind of increasing. I think that there's a benefit for
00:22:37
Speaker
both static and interactive, but I think having the buy-in, I think more and more senior scatters on campaigns are encouraged to go in and kind of dig into the data themselves. So I see that trend increasing over time. I also think that part of what my job here is in the next, you know, before the next big ramp up for 2018 is to make sure that we have, you know, the data
00:23:04
Speaker
structured in a way, trying to pull together all of the different data sources that we have so that reporting is even easier and that campaigns can go in and really quickly pull numbers that ordinarily would take them much longer to do. And I think that also enables them to build visualizations more quickly.
00:23:25
Speaker
So I think that, and setting up more of these baseline visualizations that these campaigns can use, I think that's kind of what we're working on now. You know, LastCycle was the first cycle where we really leaned on database as the main source of, or in this case, Tableau, as the main source of reporting. And, you know, I think it was pretty successful and we want to continue to make sure we have the research in place to do that in the future.
00:23:52
Speaker
Interesting, I want to close by asking you if you have any, you already told us the Pokemon story. Do you have any other fun data, data-vis related stories from
Election Night Anxiety and Closing Remarks
00:24:01
Speaker
the trenches? I can say that we had a report on election night that tracked what was happening in different states and it was in Tableau and all the principles, so Hillary herself and
00:24:13
Speaker
Tim Kaine and Bill and then a number of folks in that circle had iPads with these Tableau reports on them. And, you know, we had the reports ourselves as well. And at one point they were very red. And I can only imagine what things were like on their end as they were looking at that. So sometimes I think about that report. I have some PTSD about that election night report. Yeah, I bet.
00:24:41
Speaker
Thanks for coming on the show. Yeah, thank you for having me. Thanks to everyone for tuning into this week's episode. If you have any comments or questions, please let me know on the website or on Twitter. Until next time, this has been the policy of this podcast. Thanks so much for listening.