Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode #194: Charlie Smart image

Episode #194: Charlie Smart

S7 E194 ยท The PolicyViz Podcast
Avatar
238 Plays4 years ago

New York Times graphics editor Charlie Smart visits the podcast to talk about his work creating the NYT Covid dashboard and more.

The post Episode #194: Charlie Smart appeared first on PolicyViz.

Recommended
Transcript

Introduction to COVID-19 Visualizations

00:00:12
Speaker
Welcome back to the Policy This Podcast. I'm your host, John Schwabisch. Now, I think all of us around the world have been checking our favorite newspaper or website or dashboard to get information on COVID-19 infections.
00:00:29
Speaker
deaths and now, fortunately, changes in vaccination rates. And I tend to check two main websites, the New York Times and also the Washington Post. So I'm located right outside Washington, D.C., so the Post is sort of like my local newspaper. So I generally check the Post every day to see what's going on in my county.
00:00:47
Speaker
here outside of Washington DC. But I also really enjoy primarily because of the data visualizations, the dashboard over at the New York Times. And of course, in tracking these day-to-day infections and deaths and vaccines, the team over at the Times and lots of other media organizations have had to make lots of different decisions about the visuals that they're going to create and how they are going to communicate this information on a day-to-day basis.

Interview with Charlie Smart from NYT Graphics

00:01:15
Speaker
So I'm really excited to have Charlie Smart on this week's episode of the podcast. Charlie works on the New York Times Graphics Department. He is one of the team members working on their COVID-19 dashboard. He had a little bit of a Twitter thread some months ago about some of the changes they made to the color palette in the maps about the COVID-19.
00:01:36
Speaker
infections rates. And so that really spurred me to reach out to Charlie to see if he'd like to chat about the decisions that they've made in and around the dashboard. So I'm excited to have this conversation with Charlie. I think you're going to learn a lot about the insides of how Charlie and other members of the New York Times think about communicating COVID-19 data, which of course is sort of different than showing information about the unemployment rate or GDP because
00:02:04
Speaker
COVID-19 information is potentially life threatening. It's making life and death decisions about am I going to wear a mask? Am I going to go outside and am I going to, you know, be around other people? So these are really important decisions that are driven by data. So I think you'll enjoy this week's episode of the podcast. And so here is my interview with Charlie. Hey, Charlie, how are you? Thanks for coming on the show. Doing well. Thank you for having me.
00:02:29
Speaker
I am really excited to talk with you about all this great work you and your team at the Times have been doing specifically on the coronavirus tracking dashboard and all the other work that you've been doing. So we've got a lot to talk about. So maybe we can start by having you just talk a little bit about yourself and your background and how you got over to the Times. And then I will ping you with a series of questions about how you've been managing all this data and database.
00:02:51
Speaker
Sure.

Charlie's Journey from Radio to Data Journalism

00:02:52
Speaker
So I went to college for journalism and actually thought I wanted to do radio journalism. When I was in school, I was very into podcasts and NPR and did college radio and did my first journalism internship at an NPR affiliate in Connecticut. And while I was there, they found out that I knew a little bit of like HTML and knew how to code like a little bit, but not really well at all.
00:03:17
Speaker
But they found that out. And so they asked if I could build some tables for a story and then make some maps for a story. And I kind of at that time was sort of just learning that data journalism was a thing. I had gone to a conference with some radio stations and college radio folks earlier that year. And I'd seen actually someone from the Times Graphics Department speak there and was like, whoa, this is really cool. Yeah. And so kind of through doing this radio internship kind of started
00:03:46
Speaker
the data and graphics journalism stuff.
00:03:49
Speaker
I just thought that was really cool and kind of did that work through college. And then when I graduated, I worked for a little while at a design studio in Boston that sort of does data visualization work, but in more of a design studio environment, not so much news. And then, you know, did some freelance work and worked for a little while with the folks over at the pudding and was just kind of like, you know, learning a lot of this stuff. Like, I think a lot of people kind of, there's no,
00:04:18
Speaker
I guess there are programs now that sort of focus on data visualization and this sort of journalism. But I think a lot of people are sort of self-taught and kind of figure it out as they go along. And that's very much what I was doing. And I started at the Times a little over a year ago now in late 2019. And I was hired to focus on elections to work on graphics for the primaries that were coming up at that time. And that lasted until I think the Florida primary was the last one that I like
00:04:48
Speaker
directly worked on, which was March, I think 17th, which was... So almost to the day of when everything shut down. Exactly. Yeah. I think we'd been out of the office at that point for about a week. And yeah, New York had kind of shut down right around then. And that was when I switched over to working mostly on the coronavirus graphics. And that just sort of was dominating the news cycle and the things that we were covering at the time.
00:05:14
Speaker
And yeah, so I've kind of been working on a mix of coronavirus and elections things since then. Right. Before we talk about the coronavirus stuff, do you miss the radio part of of your early interests? You know, I do like it's it's kind of fun to be on a podcast now. I do miss, you know, I always love audio and radio as a medium. And I think there's, you know, people doing really, really cool work in that area. And I do definitely miss it. What I like about
00:05:43
Speaker
data journalism and visual journalism is that it sort of combines these like different interests I have in journalism and in design and, you know, programming and kind of lets you do like a real mix of things and it's not, you're not like siloed into one area and you can kind of move between these different things. Um, so I really enjoy that, but you know, I do miss, uh, you know, college radio and, and that sort of thing.
00:06:05
Speaker
Right. Have you, um, and maybe this is a premature, but have you, uh, thought about or talked about combining audio into some of the visual data visualizations at the times? I know they, I know Amanda Cox had done some some years ago had done some audio stuff, but it's not a very common form. I'm just curious.
00:06:24
Speaker
Yeah. You know, it's not something that, um, I can't speak for everyone in the department. I don't know like what else is going on. It's not something that, uh, I've been kind of focused on, um, there's like larger dashboard projects, but you know, it's definitely an interesting, um, there's been some like interesting work I've seen sort of in that area. Right. Yeah. Interesting. Um, okay. So let's talk about the times Corona virus tracker, because there's a lot there.
00:06:49
Speaker
You're updating it every day and then you've made some larger changes at certain points. And I'm sure some smaller changes over the course of the last, I mean, almost, you know, we're in early March right now, so almost a year to the day, uh, since, since you've been doing it. So can you talk a little bit about the evolution of the dashboard and some of the, you know, both the smaller changes that maybe some of the tweets that you had made. And then I know sort of in later in 2020 in November or so, there were some bigger changes that had to be made.
00:07:18
Speaker
Yeah, absolutely. So just the first thing I want to say about this project before I kind of get into talking about it is that this has been just a huge team effort, this project. So a lot of the things that I'm going to be talking about are things that other folks have worked on.
00:07:34
Speaker
I don't know what the biggest byline ever on a New York Times story is, but this one takes up most of the vertical space of my browser window when you scroll down to it. So there's just so many people that are working on this, and I just want to make sure everyone on this team gets credit for that.
00:07:49
Speaker
Sorry, just before you dive in, I assume that's not just the people on the graphics team. Does that, does that include like other reporters and folks from the health department sort of weighing in and helping give you, give your team the perspective that you need to make sure you're representing the data in the right way?
00:08:05
Speaker
Yeah, absolutely. So it's really a cross department, multiple desks working on this. We sort of have people from all over the newsroom contributing to this project in collecting data, in figuring out what the data means, and reporting out stories based on the data. So yeah, it's a huge, it's like a massive sort of cross newsroom effort, sort of similar to what might happen for an election project or something like that, where it's such a big project that we have people from all over working on it.

NYT's Initial COVID-19 Data Tracking

00:08:35
Speaker
Yeah. So in terms of the history of this project, we started tracking coronavirus cases in the United States in, I believe, late January. And this started out as literally a Google spreadsheet where people would go in, and every time there was a new case, would add a row to the spreadsheet. And we're reporting out this data very, very manually.
00:09:02
Speaker
And that, you know, we sort of very quickly realized that that was just not tenable in the long term for this project as the virus started to spread. And so this is a part where, you know, we sort of worked with the team that does sort of more like database development at the times, and they helped us build a system to, you know, sort of more robustly in a real database sort of track these cases as they're coming in.
00:09:29
Speaker
And this was just data that didn't really exist in this sort of unified form anywhere. We were tracking from every county and part of this was that we needed to make it so that.
00:09:38
Speaker
the data from different places in the country were comparable. So because things were being reported at a state level, different states would report things in different ways. So for example, some states might not include cases from people incarcerated in that state in their totals. And so we thought that those should go into the state totals. And so we did that work of adding those numbers that we reported out to the state numbers and keeping those things consistent across states.
00:10:05
Speaker
And also just doing lots of reporting on the way that states were reporting cases, whether they were reporting only confirmed cases or suspected cases, and same with deaths. And so it was just a huge reporting effort to collect these. And we started mapping these around that time. I think in early March, we published the first US coronavirus map.

Evolution of COVID-19 Maps at NYT

00:10:28
Speaker
And it was a pretty simple map. It was just circles over counties to show how many cases there were there.
00:10:34
Speaker
I remember in the first map, actually, we highlighted states when they had had, you know, a case to show like where in the country the virus had been to. And that feature, you know, quickly became sort of obsolete as, you know, all 50 states had the virus. And so it's just sort of been this continuous shifting of responding to the changing nature of the pandemic and responding to that in how we're graphing it. Right.
00:11:02
Speaker
Yeah, so then later on in March, we started building out more maps. We made the maps fully interactive. We added lots more charts. We started with just this basic bar chart of cases per day. And we started building out different views of that, more details on showing the seven-day average of cases as this went on. And you need to know trends and not just daily figures.
00:11:31
Speaker
what we call our curve grid, which is the section that's where cases are going up and where cases are going down. And then we started thinking about other map views. It got to a point in April, May, where for a while the virus had really been
00:11:50
Speaker
the epicenter was New York City and New York. There got to be a point in the late spring of 2020 where New York had gotten its situation somewhat under control and the virus was spreading rapidly in other parts of the country.
00:12:08
Speaker
At that point, we realized that the focus needed to not be so much on case totals, but on what's happening right now in my area. It's not just about which place has had the most cases because in that map, New York always had the biggest bubble because this had been so bad, but that didn't mean that things weren't bad in other places then.
00:12:30
Speaker
And so we started playing with different ways of showing that. So we tried different versions of the map showing, we did a version of the map for a little while that color coded things by whether cases were going up or down, like daily case numbers, like sort of a change from the week before, and how steeply they were going up. And that was good for a little while, but could be kind of confusing, we found.
00:12:57
Speaker
And we worked on organizing this curve grid by showing where cases are increasing. And then also, we changed it to show not just where cases are increasing, but adding this other dimension of where cases are high and increasing, and where cases are still high, but going down. Just because something is going down doesn't mean things aren't bad there.
00:13:21
Speaker
So just trying to get as much information in as we could. And what we ended up doing for the map was we settled on sort of showing the number of cases per capita in the last seven days and doing sort of a chloropleth map based on that. And we thought that was a pretty good way of showing sort of how bad are things near me right now compared to elsewhere in the country.
00:13:42
Speaker
And so that's sort of what the map has largely been since like, I think late May or early June was when we made that change. So when you were going through these iterations, you had said earlier that you had found that this quite didn't work and maybe this did work. When you

Adjustments for Rural Area Data Representation

00:14:02
Speaker
say that you found that, was that just a collaboration within the newsroom or were you asking people, asking New York Times readers to like,
00:14:10
Speaker
help you all understand what they want, what works well for them, or was it more this huge team of people in the Times Newsroom saying, yeah, the pandemic is sort of moving in this direction, we can see things spreading across the states in this way, and this is not really representing the data as clearly as we want.
00:14:28
Speaker
Yeah, it was sort of a combination of those things. We spend a lot of time thinking about this internally and sketching lots of different possible ways of mapping and charting these things and doing just lots of experimentation to try to find the best way of showing this and just discussing internally with people both on graphics and other reporters on other desks who are covering the story.
00:14:52
Speaker
what's the best way that we can be showing this and what's most important to get across to people right now. But at the same time, this is a page that lots of people look at and lots of people look at every day. And so we do get lots of reader feedback on these pages and are definitely responsive to that.
00:15:10
Speaker
If we see people are interested in seeing certain things or are confused about the way we're showing things, we definitely take that into consideration throughout this whole process. Can I ask, this is a little bit of a maybe two inside baseball, but I'm curious. The Times is read by millions of people around the world. Presumably, you're getting thousands, hundreds or thousands or hundreds of thousands of comments from people.
00:15:36
Speaker
how does that work? Is there a team that's going through those comments and are they feeding them to you, the ones that they seem relevant? And then like, if there's enough, presumably there's so many of them, are you actually like trying to quantify or visualize those comments? Like, so that you can help improve the tool? I mean, it's like, it's data, it's its own kind of data, right?
00:15:55
Speaker
Yeah. I mean, I, so I, I'm a little limited in like what I can say about like specifics of, of how those sorts of systems work. Um, but, um, yeah, you know, we're definitely like, it's definitely something that we're responsive to it. Like we're sort of, we're seeing what readers are asking about these things. And it's definitely something we like, we think about and talk about during meetings when we're, you know, planning out how to do these things. Um, another, another thing on the, on the mapping design front and sort of, uh, the way things have changed, um, that I wanted to talk about is.
00:16:25
Speaker
that sort of early on in the pandemic when the virus was largely hitting cities like New York and sort of large cities and especially on the East Coast, we made a decision on the maps to add what's called a asymmetric filter basically so that we were
00:16:46
Speaker
We were only showing areas on the map, um, above a certain population. And so we were, we were showing counties, but we had this sort of filter layer on top based on, um, census block groups. And we were, we were not filling in areas that very few people lived in. And the idea behind that was basically that, you know, uh, counties in the West are geographically much larger, uh, generally than, than counties in the East.
00:17:10
Speaker
and that there were, um, in many of those counties, like sort of contained outbreaks in things like, um, prisons or in like meat packing, uh, uh, facilities. And we realized that by, you know, if we filled in the whole county in that color, it gave the impression that the virus was like very widespread in this place when that wasn't really the case. And so we wanted to, um, sort of have a visual filter to say like,
00:17:35
Speaker
Yes, the virus is here, but it's not like the entire state of Nevada is overrun by the coronavirus. And so we had this filter on the map. And I think for a while that was a really useful visual tool to indicate that. But then there was this point in November around the time when we made these other map changes.
00:17:58
Speaker
that were, basically what had happened was that for a long time, the virus, like I said, was concentrated in the East Coast cities and then moved to the South. There was a point in November where the Midwest started to get hit really, really hard by the virus. Michigan, for example, had been seeing through much of the summer, had been seeing 700 cases a day and
00:18:22
Speaker
you know, in November, December, started seeing, you know, seven or eight thousand cases a day. And same with like, you know, South Dakota had gone from like a hundred to like, you know, fifteen hundred cases a day. So just this huge increase in cases in a lot of these Midwestern states. And so that kind of caused two problems for our graphics. The first was that the states had sort of maxed out the scale we were using on the map. So there was you just couldn't see any variation anymore. And everything was just solid red.
00:18:52
Speaker
And that's sort of, you know, on the one hand, like things were very bad there. And so having everything be read was not wrong, but it's also not especially useful if you live there to see like, you know, how is my county doing compared to other counties around it? And that's still information we wanted to get in there. Um, and it's always tough to make changes to these scales because this is something that people look at, you know, every day and people sort of become very used to these scales and sort of can kind of, um, I think identify like,
00:19:21
Speaker
you know, they see the red value and they know that things are bad. And we didn't want to just like shift the scale down so that everything that was red became like orange again, because people might see that and be like, oh, you know, uh, things are better. Um, so, which was not the case. Um, so what we did instead was we, uh, we added more values on top of the scale. We actually extended the color range into the sort of dark purple, um, and extended the values and sort of did that, not just linearly, but so that like,
00:19:50
Speaker
The, uh, the maximum values were like, you know, pretty high and that sort of allowed us to get a little more, uh, range in the scale.
00:19:58
Speaker
and allow for some of that, you can see the geographic variation in those places. And the other thing we did was we, at that point we decided to remove that asymmetric filter and just show all of every county. And the reason we did that was because the virus had moved from largely urban areas to rural areas. And so these areas were being hit really, really hard by the virus.
00:20:21
Speaker
But they were not showing up, you know, they were just showing up as a small little like the one city in that county was showing up on the map. And when we realized that that was sort of misleading in the opposite direction that we had initially intended this to go. And, you know, this was another point where we sort of had some reader feedback from people who lived in
00:20:40
Speaker
The Dakotas and places like that where they're saying the virus is really bad in my area, but it's not showing up clearly on your map. What can you do about that? That was when we decided to remove that layer, and we've kept the map like that since then.
00:20:56
Speaker
So hypothetically speaking, if you were working on a project where you had a similar scaling issue, where the line or whatever, the value sort of punched its way through the maximum of your scale, but it wasn't the sort of dashboard where you thought people were checking it every day, right? Or it's not, which we can talk about a little bit, it's not life-threatening, right? Knowing how many goals the cap scored last night isn't life-threatening, right? But knowing how many infections are in my county is potentially life-threatening.
00:21:26
Speaker
So if this wasn't that sort of dashboard, do you think you would have taken a different approach to that color challenge?
00:21:32
Speaker
Yeah, I think absolutely. I think this is a really unique project in the way that people interact with it. It's not a news story where it publishes once and a lot of people read it on the day that it publishes or in the week that it publishes and then it has this very quick drop off of readership. This is very consistent, very, very consistent readership of people coming back to this every day, every week and checking it.
00:21:58
Speaker
Like I said, when you check something this often, people begin to associate these colors with the specific situations. It's hard to make those changes and to require people to adjust their own mental model of how they understand these graphics.
00:22:15
Speaker
So yeah, if this was not a story that people were checking all the time, I think we probably would have just adjusted the breaks on the scale. And that would have been that.

Personalization and Hospital Data on Dashboards

00:22:22
Speaker
It sort of requires you to think a little differently of how you're making changes. And another really important part is just signaling very clearly that we are making changes. Even when we adjusted that scale, we included a big note in a box on the top of the thing that's like, we adjusted the scale and here's the reasons why we had to do that. And that was when I posted that tweet thread too, sort of explaining that. And that was part of just trying to get this message across of
00:22:45
Speaker
what we changed and why we felt like we needed to change things at this point. Does that notice still sit there or there was like, was there a period of time where it's like, okay, so we, this has been an existing first X number of days and that's, that's probably enough. Yeah, it was, it was there for a few weeks, I think. And then, and then we pulled it, but we, we still do things like that for other changes. Like, um, for example, uh, the counties in Iowa, uh, the state of Iowa recently stopped reporting data at the county level, um, in the way that we need it for these maps and.
00:23:15
Speaker
is only reporting that sort of data at the state level. And so we started showing Iowa on the maps as just a state where everything else is just counties, Iowa you hover over it, and it's just the state of Iowa. And so that's another thing where we really want to be clear about the messaging about why we're doing that. And that it's not just like we forgot about Iowa counties. It's like we're responding to these ongoing changes in the data collection and in the state of the pandemic.
00:23:42
Speaker
Yeah. And I seem to remember early on that there was, there was a day where there was like a big data dump. And so there are these big spikes in like multiple places. And I remember there being a big note that says, you know, February act, you know, 25th is not really representative because it was like a big data dump that day. Yeah, totally. That's like, um, that's a really, you know, those like, uh, we call them like anomalies in data reporting. Like this is a, the one thing about this is that this is a very, uh, messy dataset. Um,
00:24:12
Speaker
Like it's just, it's reported from so many different places and you know, it's, it's inherently a hard thing to capture data for. Um, and it's hard to track this data and you know, we're, we're really trying our best to, you know, make this the most, as useful as it can be. Um, but it is just, it's a tough data source to work with. And, and a lot of the times, you know, states might.
00:24:34
Speaker
For example, some states just don't report data on weekends, and so Mondays will have a larger spike than other places. That's why we use the seven-day average in most places to smooth that out. There are times when states will have a backlog of tests that were never logged or reported, and we'll report those all at once, and they're from some indeterminate dates in the last month, and it'll just show up as a spike.
00:25:00
Speaker
When we know that that's happening, we have a team that has a lot of reporting around this data, not just collecting what states publish on their website, but also reporting on why is there a spike on this day and when we're able to identify the reason behind that. We have a system where we can flag that day as an anomalous day and include a note with that and have it highlighted a different color on our charts and have a little arrow pointing to it saying this day is an anomaly and that we're not actually four times as many cases on this day as the rest of the week.
00:25:29
Speaker
Um, we, we think doing that sort of stuff is, is really important because people, you know, we don't want people to get the wrong message about the real state of the pandemic from this sort of, um, uh, these like sort of artificial spikes and dips in the data.
00:25:43
Speaker
Yeah, right. So can you talk a little bit about the, uh, the overall user experience of the visuals? Because in some of the views, it's a little bit more about comparing my state to another state in my area, to another area and other views are here's just your area, right? Here's just, you know, Virginia or New York state. So how do you all think about.
00:26:08
Speaker
balancing those two types of users who some users probably want to make a comparison and some users just want to know, like, is it okay for me to go to the restaurant today? Yeah, it's like, I think there's a range of reasons that, you know, you look at these dashboards, I think.
00:26:24
Speaker
You know, the main sort of US dashboard page with the big map at the top is really useful for just getting a picture of like, how is the United States in general doing in the coronavirus? You know, we lead with like the sort of, um, the, the curve, like the, you know, the, the waves of the virus, like that, that main chart at the top, that's become sort of a, a symbol of like, you know, how well are we doing compared to the spring compared to, you know, the peak in the, in the winter. And we lead with this map. Um, that's just sort of the big map of.
00:26:53
Speaker
where is the virus worst in the country right now? And so those pages are useful for just getting an overall picture of the virus in this country. And also, we have the World Page 2, where you can see that on the global scale of where is it worst in the world right now. And we do also publish these pages for subnational geography for some other countries, too.
00:27:16
Speaker
when we're able to get that data. So those are good for just sort of giving an overview of how is the virus affecting this place right now. But we also know that people want information about where they're living. This is not just data that's interesting in the abstract. It's very useful specifically to people's lives and day-to-day decision making.
00:27:36
Speaker
And so this is something we've been, you know, we've always had, uh, these sort of, uh, a page for every state in the country. Um, and that sort of was like the first level of like personalization that we had that you can go to your state and see, you know, how are coronavirus cases in New York right now. And we actually have a page for New York city specifically, because we're able to get a zip code level geography for there. But, you know, more recently in the last, um, you know, five months or so, we've been focusing more on like very detailed personalization. So, um,
00:28:05
Speaker
in late November, we published a page that lets you search for counties that you're interested in and create sort of a personalized dashboard of counties, you know, interesting to you. And so you might, you know, look up, you know, the place where you live and the place where your parents live and, you know, where your siblings live. And, you know, I just have like these sort of locations that are that are relevant to you in there. And that just sort of gives you like the very basic information of like
00:28:32
Speaker
you know, how bad are things right now? What direction are things trending? How are things compared to, you know, the peak of the virus, just to sort of like sort of at a glance, you know, core information. And we also have that in a newsletter where you can get, you know, your personalized places. Right. Just that that very simple data sent to you. And that's like, I think, very useful for people. And then we also in I think December, the Department of Health Human Services started really saying very detailed hospital data.
00:29:02
Speaker
on sort of at the specific single hospital level of COVID data. So that includes things like what percent of ICU beds are available, how many COVID patients are there, what chair of the patients there are COVID patients. And so we initially published a map showing that at the
00:29:23
Speaker
hospital service area level, which is just sort of a kind of small geographic unit that usually has between like, you know, one in 10 hospitals in it. And just showing like, you know, kind of a corporate map of how bad things are. And then we wanted to get even more detailed and show, you know, at the individual hospital level, like how bad are those specific hospitals in my area. And so we made a map that lets you search for, you know, your address and shows you the hospitals closest to you.
00:29:47
Speaker
And we used a new visual tool in that where you can pan around the map. And there's a column on the left-hand side that updates some summary statistics of the hospitals you're looking at. If you can imagine panning around in Google Maps and seeing the places that you're looking at update in that little sidebar, similar to that idea.
00:30:09
Speaker
And then the most recent step we've taken in sort of the personalized view of this is that earlier this year we started publishing a page for every county in the United States. So we're now publishing over 3000 tracker pages multiple times every day. And that was a sort of large effort to make that technically feasible to make things fast and efficient enough to be able to update these and publish these.
00:30:36
Speaker
And those pages focus, you know, they give you sort of the overview data that the rest of the tracker pages do, but they also focus on, specifically on risk. We worked with, you know, health researchers at Johns Hopkins to sort of determine a way to calculate sort of risk levels based on a number of factors, you know, cases, testing and things like that. And then based on those risk levels, we give, you know, very specific advice on
00:31:03
Speaker
what activities are safe and not safe and how people can protect themselves based on the current situation in their area. We're really trying to give people information that's not only interesting and gives a picture of how the virus is doing nationally, but also very
00:31:20
Speaker
useful and actionable, that it's instead of saying there's been 50 cases in your county today, you might not know what that means or it's like, what do I do with this information? But if instead we say your county is at an extremely high risk level and here are the things that you should and shouldn't do right now, that's information that we think is very useful to readers and that's what we want to be focusing on.
00:31:43
Speaker
Yeah, I mean, when I go to the tracker page, and I don't know if this is because I'm logged into my account as a subscriber or if it's, or it's because I've done a search for my county in the past or it's the IP or whatever it is, but definitely when I get to that part, it says, you know, right now it's saying that, you know, my county is at a very high risk and it's like red and bold in that little, you know, in that little teaser. And when you click over there, you get this, you know, basically this big headline that's like an all red, you know, very high risk.
00:32:12
Speaker
So it's definitely like, you know, pointing my attention to my area. Yeah. That's, I mean, even, you know, uh, I think you talked about like on the, on the homepage, like, um, that's something that we've been, you know, there's a lot of design work has gone into that little homepage widget of showing like, you know, we have, you know, and it's not just, we don't just have like the case tracker pages. We also have.
00:32:33
Speaker
lots of other tracker pages for vaccines and for colleges and nursing homes and metro areas. We have many, many different trackers that we're publishing. We want to be able to use that homepage widget to highlight the things that are interesting or that are recently updated or that are specific to your location when we're able to get that location or have you search for it. We want this to just be like,
00:32:55
Speaker
a very quick and useful little bit of information. Those little spark line arrows on the homepage dashboard just to show how are things now, how are they compared to last week. Those are just very pieces of information that are every day useful and interesting for people. It's interesting in a lot of ways because it almost feels like
00:33:17
Speaker
the trackers are more of a public health resource than news reporting and sort of the traditional sense of here's the story of what's going on in this particular county. It feels like you have all taken this as your, almost your responsibility as experts in this field of data communication and being able to wrangle the data and the technology to provide this resource that
00:33:44
Speaker
you know, one might think that the CDC or the Department of Health and Human Services would have on their webpage, but it seems like you've all taken this responsibility of being this gatekeeper in some sense of this coronavirus information.
00:33:57
Speaker
Yeah, and we definitely are very aware that people use this as a resource and we want to make it as useful as possible. The other thing we do is that every day we publish this data on a GitHub page so that folks are able to use it for research or for their own analysis. We want this data to be out there and for people to use it and make use of it. And this is definitely a sort of
00:34:22
Speaker
resource both for the public and for us inside the newsroom. We report stories based on this data and based on these dashboards every day. We have people working on just looking at these numbers and seeing interesting trends and using both the visualizations we've created and also just the raw data. We
00:34:45
Speaker
We have teams both on graphics and on other desks that are using this data all the time for reporting and for doing the more traditional journalism with this information. It's great to be able to support both the public need for this information and the ability of the times to publish work based on this.
00:35:05
Speaker
Yeah. I wanted to ask you a couple more quick questions. Well, I don't know if this first question is that quick, but I have one quick question. But when you and the folks you work with started creating these trackers, did you have a different feeling or a different approach?
00:35:21
Speaker
to creating this tracker and creating the visualizations in the sense that, like you just mentioned, people are using this as a resource and it's potentially life-threatening information, right?

NYT's COVID-19 Tracker as a Public Health Resource

00:35:32
Speaker
I mean, this virus has killed more than a half a million people in the United States alone.
00:35:36
Speaker
Um, it's not like, you know, sports scores or the stock market, which is interesting and it's useful for lots of people, but not going to affect my personal health or the health of my family in a direct sense. So do you feel that way? Do you approach it that way? Or do you try to put that aside and say, this is, you know, it's another news story that we're working on. No, I mean, I definitely think like it's, it's definitely hard to ignore the sort of implications of this data that we're working with. Like I, you know, um,
00:36:06
Speaker
There have been times where, you know, when we're approaching one of these big milestones of of, you know, case numbers or death numbers that, you know, we're updating this and, you know, we want to, you know, be able to update it so that people are aware of these milestones, but it's also like.
00:36:20
Speaker
it's terrible to watch these numbers go up in real time as we're pulling in this data. It's sort of hard to ignore the fact that these are real people with a terrible virus. I think we just have to kind of focus on communicating this information as clearly and in as useful a way as we can. It's a news story and it's important information that we have to get across. One thing that we're doing in this project actually that I think is interesting is that
00:36:49
Speaker
We offer these pages in both English and Spanish, so we have Spanish translations for these pages, which I think is really important because this is such crucial information and we want as many people as possible to be able to access it.
00:37:04
Speaker
you know, sort of a challenge in building this and that, you know, nothing, none of the text on these pages are like hard coded in anywhere. Like we built a whole system to be able to swap out the translations for everything from the main copy of the story to, you know, the individual chart labels are all translated to. Right. And so, you know, we thought that was really important as part of the sort of service that we're doing on a sort of technical level. An interesting thing that sort of makes this different from a traditional news story is that
00:37:33
Speaker
you know, there is, there's copy on these pages, like there's, there's charts as well as sort of explanatory copy. Um, but a good chunk of that copy is actually like generated. Like we have, um, we have scripts that we run that generate sentences based on the data. And basically that lets us like keep this copy for all these pages fresh and, um, you know, up to date with the latest information without having, you know, an editor have to go through and update the copy on 3000 pages every day. Um,
00:38:01
Speaker
And so it lets us communicate this information, not just in charts, but give a little more explanation, but in a way that's sustainable long-term for us to keep these updated.

Technical Challenges in Data Maintenance

00:38:13
Speaker
I think that's been a big.
00:38:15
Speaker
sort of constant challenge in this project is just keeping this in a way that as this project grows and as the data grows and the number of pages we're publishing grows, just being able to keep on the cycle of updating four times a day and keeping the data as fresh as possible is something that we're constantly thinking about and thinking of ways we can
00:38:34
Speaker
improve and we're making changes right now to slim down the sizes of files that we're sending to users so that the page loads faster on your phone. We just want people to be able to get this data and this information as easily as possible in a way that's as useful to them as possible.
00:38:53
Speaker
Yeah, I think there's a whole technology side of this tracker that I think is fascinating for lots of different people for lots of different reasons. I think, you know, a few years ago, having copy that would up where the numbers in the copy would update automatically was a very new thing. And people were very excited about how the graph and the data and the text actually interact with one another. And you all have sort of done that.
00:39:17
Speaker
you sort of taken that all the way to the, to, I don't want to say the end point, because I don't know what the end point is, but you've taken that all the way where it's basically automating this whole thing. And it's just a really interesting technology that I think, I would suspect that many DataVis folks would be interested in doing in their own work, because it does allow you to more integrate those two things together.
00:39:37
Speaker
Yeah, I think actually in some way, because we at the Times have done sort of elections, live elections results pages for so long and have a lot of sort of institutional experience working on those sorts of things. I think obviously it's a very different set of data and a different set of needs that you're communicating.
00:39:56
Speaker
like those generated sentences were something that we had started using in the primaries this year. And that just seemed like a natural thing to bring into these coronavirus dashboards. And a lot of the mapping techniques from a technical perspective are things that we pulled in from these other experiences in making these large dashboards that
00:40:20
Speaker
are the sort of thing that people check regularly and have new data coming in too regularly. And so I think that was sort of a useful background to be able to do this, especially on such a tight timeframe when we were first getting these pieces out. And it was like there was a very high sense of urgency, especially a lot of us being located in New York that things were so bad here that we really needed to get these pages out.
00:40:44
Speaker
Yeah. Um, okay. I've got one last question for you. You might not be able to answer this question, but we'll see. You might say all of it. So when I go to the main times tracker page, um, there are more or less just the boil it down to like three main graphics. There's the, the map, uh, county level map. And as you noted, not for Iowa right now, but county level map. Um, then there's a series of small, multiple little line charts for.
00:41:12
Speaker
cases going, you know, getting higher, uh, going lower for, for all the different States. And then the third sort of main visual is these little, uh, strike charts, um, for each state and sort of like a, this really, what I think is a really cool table. So from those three sort of main sections, do you have a favorite part of the, of the, from a data data of his perspective, not from the impact of the virus, but from a data of his technology perspective, you have a favorite part of the page.
00:41:39
Speaker
So I, you know, on a sort of personal level, I mostly work on the maps. And so I think the maps have been very useful, especially at different points of the virus in showing.
00:41:49
Speaker
uh, you know, just in highlighting like where in the country, uh, things are worse. And I think that's a very easily understandable visualization. I do think the, the small multiples, the sort of curve grid we call it, um, is maybe one of the most useful things on the page in that it shows, you know, at the level of like each, um, state, it shows both the curve, um, you know, of cases in that place, but also I think the, the way that we group them is extremely useful. And this was something that I talked about earlier where it's not just like,
00:42:19
Speaker
the direction, but it's grouped by both direction and how bad things are. There was a time when there was just no states where cases are low. Maybe the US Virgin Islands was in that, but there was hardly any places in the cases are low area. That was something that I saw lots of people tweeting about and responding to that it was just bad everywhere at that point and everything was going up.
00:42:46
Speaker
It was really sort of striking just to see sort of how these move around between those sections. And so I look at that section a lot.
00:42:54
Speaker
I think the table is useful as like a, and I know this is what you were saying, I'm just going to say all of them, but I do think they're all useful for sort of, for different reasons. I think the table is good as like a quick overview. Yeah, different reasons for different people doing different things, right? Yeah. That's like really the big part of this is that we know, you know, we have a very large audience that's looking at this and that everyone sort of has different needs and different interests, whether it's like, you know, you want to know, you know, is it safe to see friends outside or not, or,
00:43:24
Speaker
you know, just trying to figure out like your day-to-day decision making versus someone who's looking at this to get a sense of, you know, how things are doing in the whole country versus, you know, epidemiologists who might just want the data. And so we publish the data, you know, separately from the
00:43:39
Speaker
the whole visual presentation of it. I think we have spent a lot of time thinking about like how each piece functions and sort of what the different needs of different users are. The one other part that you didn't mention is the very top chart. That's just the sort of curve and those top figures. And I think like as a sort of
00:43:58
Speaker
I feel like this curve has just become a symbol in a lot of people's minds of the virus and just these different waves of it. I know, at least for me, I can remember memories of what things were like at the top of that second wave or at the first wave in April and have a lot of associations with this curve specifically.
00:44:22
Speaker
So, you know, I think all the parts of it kind of function together to make a tool that we think is pretty useful for people. And we hope that people, you know, are able to use to get information that they need. Yeah, it's great.
00:44:38
Speaker
Um, Charlie, thanks so much for coming on the show. I mean, we covered a lot of ground and, um, it is a, it is a remarkable project. And, um, congrats on working on that. And, uh, hopefully you'll get to maybe work on, you know, other things at some point. Yeah. I hope so. Yeah. Thanks for coming on the show. I appreciate it. Yeah. Thank you for having me. It was great talking to you.
00:45:03
Speaker
And thanks to everyone for tuning into this week's episode of the show. I hope you enjoyed that. I hope you'll take a look at all the links and resources that I put in the episode notes of this week's episode of the podcast and go check out the New York Times COVID-19 dashboard so you can see all the things that we talked about in this discussion. So until next time, this has been the policy of this podcast. Thanks so much for listening.
00:45:27
Speaker
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs, and each episode is transcribed by Jenny Transcription Services. If you would like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, or wherever you get your podcasts. The PolicyViz podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our Patreon page at patreon.com slash PolicyViz.