Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
How to collect, analyze, & visualize hockey data with Micah McCurdy image

How to collect, analyze, & visualize hockey data with Micah McCurdy

S9 E239 ยท The PolicyViz Podcast
Avatar
800 Plays1 year ago

Micah is a mathematician who likes to use pictures to understand things. He runs a website, hockeyviz.com, where he stores pictures about hockey. He lives in Halifax, Nova Scotia with his wife and his two children.

Episode Notes

Micah | Twitter | Site

Bubble physics
Python
Beautiful Soup
svgwrite
Matplotlib
Line-width illusion

Related Episodes

Episode #238: Jeremy Ney
Episode #237: Tristan Gullevin
Episode #194: Charlie Smart

iTunesSpotifyStitcherTuneInGoogle PodcastsPolicyViz NewsletterYouTube

Sponsor

Use my special link (

Recommended
Transcript

Ad Read and Promo

00:00:00
Speaker
This is Jack Traubish and I wanted to do the ad read for this week's episode of the Positive is Podcast. After baseball practice, I am very hungry, but lucky for me, I use my BlendJet 2 to make a delicious shake. The BlendJet 2 is portable, so you can make a smoothie at home or protein shake at the gym. If it's small enough to fit in a couple, they're powerful enough to blast through tough ingredients like ice and frozen fruit with ease. It lasts for 15 plus buns and recharges quickly via a USB-C.
00:00:25
Speaker
It is whisper quiet so you can make your morning smoothie without waking up the whole house. Best of all, BlendJet 2 cleans itself. Just blend water with a drop of soap and you're good to go. What are you waiting for? Go to blendjet.com and grab yours today. Be sure to use the promo code policyvis12 to get 12% off your order and free 2 day shipping.
00:00:44
Speaker
No other portable blender on the market comes close to the quality, power, and innovation of the BlendJet 2. They'll guarantee it or your money back. Blend anytime, anywhere with the BlendJet 2 Portable Blender. Go to blendjet.com and use the code POSITIVEIS12 to 12% off your order and free 2-day shipping. Shop today and get the best deal ever!

Introduction to Episode and Guest

00:01:16
Speaker
Welcome back to the Policy Vis podcast. I'm your host, John Schwabisch. On this week's episode of the show, we explore the interaction between data, data visualization, and hockey. I am joined by the creator behind the website, hockeyvis.net, Micah McCurdy, who pulls in
00:01:33
Speaker
real-time NHL data to create a variety of exciting data visualizations and data tools around hockey data. We have a really exciting conversation about the tools that he uses to collect the data, to clean the data, and of course to visualize the data. We also talk a lot about the balance between static visualizations and interactive visualizations and why he focuses his attention on static visualizations. And then, of course, we talk about Conor McDavid,
00:02:00
Speaker
the current playoffs picture and what he thinks is going to happen to a couple of teams in the next couple of years. So I hope you'll enjoy this interesting episode on the intersection between Data and DataViz and hockey.

Micah McCurdy's Background

00:02:12
Speaker
And so here's my conversation with Micah McCurdy.
00:02:17
Speaker
Hey, Michael, welcome to the show. Great to have you on. Thanks for having me. Very excited. We are in early May when we're recording. So the second round of the playoffs, just getting started. Very exciting. I mean, last exciting for me, cause the caps didn't quite have the season that we had expected even after a great December.
00:02:35
Speaker
kind of fell apart after that. But I want to get to your predictions and some other things about the playoffs in a little bit, but I wanted to start with maybe having you talk a little bit about yourself and how you got into this intersection of hockey, data, data phys, and have this pretty exciting site where you like have this interesting Venn diagram going on.
00:02:59
Speaker
So I was from Halifax, which is where I live now, but I did not stay here my whole life by any means. I grew up here and I was sort of a casual hockey fan like a lot of Canadians are, you know, it's in the water, you don't need to seek it out or do anything, it's just there.
00:03:16
Speaker
And so I was a sort of casual senator's fan, but I watched a number of other teams too when I was a kid. And then I went to Australia to do my doctorate in mathematics. And I was at the time I was completely sure I was going to be a research mathematician. That was my career. And I discovered when I went to Australia that all of a sudden there was no more hockey. And they're sports mad, of course, but
00:03:39
Speaker
But ice hockey, as they insist on calling it to my incredible annoyance, is not, it was just starting to get on the like, we're going to fill a channel with some reruns in the middle of the night because it's live in Canada or the States kind of thing. And I discovered that I was really homesick on the other side of the world, trying to do my PhD. And simultaneous to that, I was
00:04:04
Speaker
struggling with doing purely pure mathematics without doing any kind of hands-on anything.

Creating Predictive Models

00:04:10
Speaker
And so I got more and more into hockey, especially because I could watch it while I worked. And when I wanted a distraction to do some different kind of work, I could run little simulations. Say like, well, let's make a model of what are the senators going to do on this California road trip? How many points are they going to take? Well, the Kings are really good this year and the Sharks are really bad. Well, okay, how does that mean? And you get into that and what you're going to do, I might as well do it for all 30 teams.
00:04:33
Speaker
And then you get into 32 now, of course. And so that scratched an itch both in terms of homesickness and in terms of wanting to do some quantitative work. People always laugh when I tell them that my mathematics PhD contained essentially no quantitative work because it's all, but it's true. It's almost entirely, well, and then of course the material of the PhD is something that by definition almost no one else cares about.
00:05:01
Speaker
Right. It happened to have this kind of peculiar consonants. Namely, I was working on graphical languages, so doing calculations with pictures. So not illustrating the calculations, but actually being the calculations of two-dimensional string diagram calculus. You could actually work with the ribbons. And so it sounds very abstract than it was. But the essence of it was doing things visually instead of doing things with symbols. Right.
00:05:30
Speaker
So then later, years later, when I decided to turn the hobby that I picked up in Australia into a job, so almost a decade later, it was very easy to think of it not as an exercise in hockey or statistics, but as an exercise in data visualization. Because I find that looking at things, seeing pictures of things activates a totally different part of my brain than trying to process things symbolically. Right. And so that's where the two threads where the
00:05:59
Speaker
the math thread and the hockey thread kind of overlapped and became this entirely different thing centered around database. Right. It's interesting the way you tell that story because it sounds like you started not by just doing kind of in-depth, deeper dives of just tabulations and cross tabs, but actually doing predictions. Yeah, I started with predictions. And that was because I wanted to make simulations. And so it was kind of
00:06:28
Speaker
the wrong way around, if you like, you know, if you're doing science, you think about here's the thing I really want to understand. And then, and I had done a little bit of that in my undergrad, I did a little bit of simulation work, I published a paper on bubble physics. And where you're looking at soap froths squeezed between two glass plates, and I was simulating what they would look like with a computer. And that's a really interesting scientific technique.
00:06:49
Speaker
But there, of course, the interest is here's this thing, and I can attack it experimentally, I can attack it theoretically, or you can attack it in the middle. And I discovered I really enjoyed that approach, and so that became my focus. How can I find another project using those tools? I just enjoy using them. And hockey was sort of naturally close to hand. And then afterwards, I thought, I don't really know what I'm doing. I should look at the data better. Am I doing this right? And so in the very first instance,
00:07:17
Speaker
DataViz was a debugging tool for that. If I look at the data, then I'll be able to see if it's good because I process information very well that way relative to just looking at numerical output. And so I had this little gadget and I think it kind of works all right, but it became this need to make diagnostics to understand what it was that I was doing led to this focus on DataViz.

Data Tools and Challenges

00:07:43
Speaker
So let's talk about the DataViz piece of it. So I'm curious about your process. So you're pulling in a lot of data and for folks who haven't checked out the site HockeyViz, I'll leave links to it and you absolutely should. You've got a ton of data and a ton of visualizations. So what is your process? Where are you pulling data from? What is your toolkit to
00:08:06
Speaker
process the data, and then what are you doing on the visualization side? And I want to dig in a little bit deeper on the visualization side as well. But what does that whole data workflow for you look like?
00:08:17
Speaker
So for NHL data specifically, it's actually an enormous pain in the neck. I shouldn't be too upset about it. The league, after all, puts it out for free. And they know that I use it and rearrange it and sell it on my website. And they don't get too angry with me. But it's definitely not presented in a way that is straightforward in any way. And so the data engineering aspect of it is
00:08:43
Speaker
a real pain. A lot of it is just scraping HTML. So there's some pretty detailed HTML reports that are put out after every game by the league. In fact, during each game, they update every few minutes. And so you can even do some rudimentary live stuff. And then there are some machine readable JSON endpoints that are not particularly sophisticated when the two data sources don't quite line up, even though they both allegedly come from the league. And so you have to do quite a bit of chicanery to make them line up.
00:09:11
Speaker
In the old days, people used to know about even more data that used to leak out through League partners of one kind or another. There was a time when the ESPN website could be relied upon for some information that you couldn't get from the League, which is no longer true. But there's a lot of tricks along those lines where you have to do quite a bit more than just
00:09:33
Speaker
take in an endpoint and store it locally.

Community and Visualization Style

00:09:37
Speaker
The scraper in particular is one of those things where that's a piece of software that's carefully honed with lots of different if-fends for silly circumstances. Right, for a whole bunch of end cases or edge cases that just pop up. Yeah, and there's not just the usual data where this data happens to be corrupt for reasons that are known possibly to no one. But then there's also all kinds of weird things like, you know, Rich Peverly had a heart attack in a game.
00:10:03
Speaker
And after he scored, and then they restarted that game. And so they credited him with the goal before the opening face off. And he also didn't play. He got hurt. Right. And then, sorry, it's not Pavlovic scored, some other guy scored, and he got hurt. Wow. And so then he didn't get to play in the replay. And so, you know, you make assumptions like, do you assume that all the goal scorers in the game dressed in that game, and you get it wrong. Yeah. Yeah. There's a lot of data engineering along those lines.
00:10:31
Speaker
And what are you and what are you doing on that? What's the tool you use to do the actual scraping? Oh, beautiful soup. It's all my stack is all Python. Oh, Python. Okay. Yeah. Okay, so you're pulling it in. And so you're pulling it directly from the NHL API. And you're not using like hockey reference because those are more those seem hockey reference seems a little more aggregated as far as I can go.
00:10:56
Speaker
That's right. I don't have any partnerships with any other hockey websites. And I'm also sort of neurotic on this level where if there are mistakes, I want to at least be able to say that's my mistake or that's the NHL's mistake. Because what I'm doing is not just for my own benefit, but it's public facing. I find it's
00:11:18
Speaker
easier to say, you know, this is exactly who made a mistake here. Yeah, yeah. So you can you so you control the entire workflow from from A to Z. Yeah. Yeah. And I also don't step on anybody else's toes, except itself. Right, right. And so hobbyists, you know, there's a certain amount of competition among people, but, but also a certain amount of professional respect for anybody else who's working with NHL data, where you say, you know, like, no, and again, you say, I have corrupt
00:11:43
Speaker
hasn't fixed it, do you have a copy that wasn't corrupt, you know, you share things among practitioners. So it's interesting. So there is sort of a behind the scenes kind of community of, in this case, the hockey data folks. Yeah, there's, it's probably about like 10 or 12 people that I can think of that I consider in the community. In fact, it's large enough to have sort of its own little petty squabbles with people who don't like one another. And
00:12:06
Speaker
And of course, everybody privately thinks that their way is best and their models are best and their, you know, my data pipeline is definitely not the best. And so it can be occasionally a little bit testy, but mostly people, you know, who have actually done a modicum of work and know what it's like.
00:12:22
Speaker
are quite friendly with one another. Right. They all know the challenges in these weird edge cases. So you've pulled the data in Python. You've cleaned it in Python. So what about the data vis side?

Static vs. Interactive Visualizations

00:12:38
Speaker
And maybe you could talk a little bit about your
00:12:42
Speaker
I don't know, do you have like, I mean, I obviously follow your stuff and you've got a lot of different types of visuals, but do you have a few that you kind of market or share as like the kind of exemplar stuff of your work? There is one for sure. In fact, I reworked it just this year. It's a Sankey diagram which shows the probabilities of each team as they move through the playoff rounds.
00:13:05
Speaker
Yeah, yeah, I just saw the updated one yesterday, yeah. And so this one, in fact, I found a great blog post by a man whose name I forget, sadly, explaining how he had really admired Sankeys that were made with circle segments and straight lines. So with only horizontal and vertical lines relative to some axis, and then circle segments when you needed to turn. And so I reworked mine, which had been based on
00:13:33
Speaker
lines with very different angles, which of course have this optical illusion where thick lines can appear thin when they're very steep, which I finally started to really annoy me, and so I decided to rework it this way. And speaking of text acts, that one graph that I show people all the time, it used to be called, in its old incarnation, fans of mine decided to call it the rainbow death crab. It had a sort of crab-like shape,
00:13:57
Speaker
And it was full of many multi-colors. And the death or skull emoji that people used has to do with how hockey fans, even the ones who love it, in fact, especially the ones who love it best of all, find the playoffs extremely stressful. In fact, I briefly sold some t-shirts. Technically, you can still buy them, but nobody has in a long time. They'd say, we may win, but I may die. And to that, I sort of leaned into that with the name of the thing. And that graph is the only one that I make
00:14:26
Speaker
with a Python library called SVG, right? It's just a fairly gentle layer on top of plain SVG. And so in fact, in a past life in a totally different application, I used to write a lot of XML and XSLT stuff. And so I find that quite natural because I really wanted pinpoint control of actually making every single element. But that's a little bit unusual. Almost all of the other stuff I do is all map plotlet.
00:14:53
Speaker
which I've got reasonably good at bending to my will. And in fact, I've avoided, from time to time, people say, oh, Micah, you should do, you should try this other tool, or you should try this other library. And I nearly never do, because I'm persnickety about having a lot of pinpoint control, and I don't mind spending a great deal of time to get it to look exactly the way I want. So the process can be quite laborious in places.
00:15:21
Speaker
and quite intricate, but it generally does come out exactly the way I want it once it's done. And you're doing all that in code? Like, do you pull any of it into as an SVG and pull it into another tool to add annotations or do any manual stuff, or is it all automated? I mean, obviously not automated, but is it all done in code?
00:15:41
Speaker
It is entirely done in code, which I take as a, as a kind of article of faith. Yeah. Yeah. If I was interested, like every now and again, you come up with something and you say, Oh, I see. That's no, that's a case I didn't quite expect. And it doesn't look quite as elegant. You know, I could hand tweak that. And if I was, I don't know, like if you commissioned me to do something for national geographic or something, then maybe I would, I would, I would tweak it and make sure that it was absolutely ideal. But I consider that as a rule, not to not as a virtue, but as a distraction from
00:16:11
Speaker
thinking at the right level of generality. This is sort of where the pure math background comes back out, where I think that solving the problem means solving the problem in general, which means it has to be done in code without hand tweaking after the fact, even if that means that sometimes you have to put out something which is a pixel off from what it might be.

Updating Visualizations

00:16:32
Speaker
Right. So during an evening, we've got playoffs now, so there's only two or three games a night.
00:16:39
Speaker
During the regular season, multiple games going on, are you constantly updating or do you have it running at a particular cadence? How is the updating process working? And also, how does it update through to the website so that when I log on, I can go grab last night's or yesterday's shot diagram for the caps?
00:16:59
Speaker
That is all automated. Occasionally stuff goes wrong and I have to go fix it. I mean, there's always a certain amount of putting your finger in the hole in the dike. But almost all of that is automated. I have a web server running on a server that I rent from a company in Toronto. And so that's all stashed there. Right. Right.
00:17:21
Speaker
The other thing that's interesting about the site, so for folks who haven't logged in, they should and check it out because you get into the page and there are probably on the front main page, there's probably like 16, 20 graphs, something like that. And I'd say maybe two of them are animated and the rest are static. I don't think there's a lot of interactive stuff on the site. What is your thought process about interactive versus static?
00:17:48
Speaker
The few interactive videos that I've made, they're very, I don't know, two or three, I think, compared to thousands of other types. And I consider the interactive ones to be failures on my part, where I think of data views in a genre sense as being akin to photography. And so if you make photography that moves, in some sense, you have failed to understand what you're doing. Like I consider the
00:18:18
Speaker
the process where you take something and you say, this is what goes in the rectangle. This is what it is I want people to look at. Then if you make something interactive, in some sense, failure might not be the word, abdication might be the word. You've declined to solve the problem. You leave it to your readers to solve the problem. And there's an obvious virtue there, which is that agency brings something, but also an obvious vice where you haven't
00:18:47
Speaker
done the work, the editorial work that requires, where one of the things that people are looking for, and this is something that I think is a real virtue of having some static, is that it gives a certain kind of finality where you can say that even if there's something else you want to see, there is something that you can see here, and I have shown all of it to you.
00:19:11
Speaker
where you say that implicitly just in the static choice well before you actually look at the data. And then of course every now and then people say, you know, they disagree with you. And sometimes they will yell at you on Twitter and say, you know, this visit shows this and it ought to show that. And I say, well, it doesn't because it's not for that. It's for this, the first thing I chose. And then you can get into an argument about choices. Why did you choose to show this? And those arguments are always interesting to me, especially because I do a lot of that work. In fact, I consider that to be
00:19:41
Speaker
in some sense, the most interesting work, much more interesting than the color choices, the composition choices. They're like, what is it do you want to show here? Right. And, and I find that restricting myself to static is, with, again, a few exceptions, crystallizes that process in my own mind. And by the time I'm, by the time I'm done, I feel like I've gained a lot by knowing exactly what it is I want to show.
00:20:08
Speaker
I agree with that, I think 90% of that, but in your particular case of this rich hockey data, for example, if I wanted to go in and I could see the shot map for the caps last game, I'm just going to focus on the caps, even though we're done. We don't have to talk about the servers.
00:20:30
Speaker
If I want to look at that shot map but then filter by Ovechkin versus Oshie versus Carlson. Now on your side I believe you can select the different maps for each of the different shooters, but do you think there is value in having an interactive version of that overall static where I could filter and click inside the visualization.
00:20:52
Speaker
No matter where you decide to draw a line, you see, this is what I'm going to call static. And of course, every interactive news has that inside it. Unless the pieces move when you're not touching them. There's always a static render, where you take your finger off the slider and you look at whatever it is that you as a user, together with the author of the biz, has created. And so no matter where you sit on this spectrum,
00:21:18
Speaker
you're going to have a certain amount of staticness and you're going to have a certain amount of interactivity because no one is going to look at any particular thing for longer than they need to.

Time Compression in Visualizations

00:21:29
Speaker
And a good vis, as always, of any type, you look at it, you see something interesting, you think, oh, I wonder about, and now you're doing the interactivity in your head now.
00:21:38
Speaker
I'm thinking about a new question. And of course, I'm running a web server, which means that there's something inherently interactive at the level of the web server, where you're going to click on a link. You're going to click on something else. There's links all over the website, which is nothing if not interactivity. And so the question becomes, where do you put the interactivity? Even if you're being incredibly old school and you're going to have a book of maps, your map doesn't move when you look at it. But then you think, oh, I wonder if that's the same in this other country. And you turn the page to the other country.
00:22:07
Speaker
You interact with the gadget which contains the static Fizz. So I've become quite comfortable with that for letting the interactivity be at the level of the web server, especially because it builds on a lot of existing technologies. My interest technologically
00:22:34
Speaker
is quite minimal. I appreciate very much that other people have made any number of tools, but I don't have a particular interest in the technology as such, and so I prefer to use the simpler technologies whenever I can. And for me, there's another angle to a more craven angle, if you will, which is that my
00:22:57
Speaker
in terms of getting customers, in terms of getting engagement, in terms of forming a community, to put it in a slightly nicer light, it's valuable to me that all my work be shareable. And so one of the things about having just PNG outputs for stuff, for almost everything, is that other people, or more to the point me, can click on something, paste it into a tweet, put some sort of comment linking it topically to whatever is going on,
00:23:26
Speaker
in whatever conversation, in whatever sphere. And then people can digest it right away. And you don't have to, I mean, of course you can take screenshots of stuff too. But there's something about, something that is going back to what we were talking about earlier, something about the presentation of the thing as static. If you take a screenshot of something that's interactive, even if it's your own thing, even if it's curated as carefully as you like, there's something not quite final about it that
00:23:50
Speaker
encourages or permits your readers to look at it as a little bit more transient. But if you've written something, even if you've written something in codes, you know, it's generating this for a vegetable, it's generating that for present stuff, it's generating this for all the players, you know, even if you haven't gone over them one by one with a little hammer to say, you know, this is exactly the way I want it, it still has that finished quality that comes across in the grain of the material, if you like. It feels slightly abstract to say that, but, you know, about a
00:24:20
Speaker
what's just a set of pixels like any other pixels. But you still get that, like, oh, this feels like this, this feels like that, where if you made the table out of wood, it's different than if you made it out of them. Yeah. Yeah, I think that's right in a lot of levels. I appreciate your point about what is the level of interactivity. Is it two separate images that are layered? Is it a click? Is it a filter? But it is a movie at the end of the day, right? It's a set of static images stitched together.
00:24:47
Speaker
I wanted to get into some more data questions, but go ahead. I don't know if you had another thing to add. I did one just using the word movie that made me think about it. There's a third angle too about interactivity versus staticness that I really like specifically for database without sports because the sport itself is never pictures.
00:25:07
Speaker
The sport itself is always moving. It's extremely interactive. It's unavoidably happening through time. The space is very deliberately constrained. Every sport has its core
00:25:21
Speaker
that you play in, not out of, and also fairly tight times. But the time element is precisely the one which is difficult to deal with. The space element you deal with in a pretty straightforward way, you take, it is just a matter of scale for every sport. You want to make a picture of the rink? You do, and the pictures that I put out are five inches high. You could write it down exactly how much smaller it is than a regular rink. It's just compression in space, and it's self-similar.
00:25:49
Speaker
You know, you try to keep the proportions the same, you mark them in your vis the same way, or mostly the same way that you mark it on the ice, and for the same reason, so that people know where they are. But the compression in time is really fundamentally different, where hockey games, sorry, lasts an hour and a bit of game time, two or three hours of real lifetime. But the vis does not last in time. It's compressed, and that compression, I think, is extremely important.
00:26:19
Speaker
That's part of what it is. I think that the creator has to do where in fact, frequently your crew compressing, not just a single game, you're compressing entire seasons, maybe entire careers, maybe multiple careers so that you can put them together in space. And that to me is the really fundamental aspect of what database is at all. And it's really comes through strongly when you're doing sports stuff is to take variation in time and turn it into variation in space instead.

Storytelling in Sports Data

00:26:45
Speaker
And so if you say, well, we'll just let the user click through this, you know, you're not doing all of that compression in time, which I think is, is the sort of first part of the job description. Right. Right. You are giving them a snapshot of two hours of their, of their life. And, and it's not just like, you don't want to, this is part of why people say, Oh, you know, you should just watch the games where you get all this pushback from, from typical people, because of course they are putting in that.
00:27:15
Speaker
Right. And a lot of the pushback you get about, you know, having control or having power over the sport is because of you're getting pushback from people who are investing their entire lives. Yeah. And right there, you have, and of course, in my own way, I'm investing my entire life. I mean, this is my, like, professional career in its own way. Right. But it's not the same level. And it's in particular, you know, you get that extra level when players, ex-players especially,
00:27:43
Speaker
you know, can be a little bit twitchy about this as well, because they've invested a lot, not just the playing time, in addition to, you know, the watching time that somebody might, but also the physical pain, the trouble, the exercise, the physical attributes, et cetera. Yeah. And so that, but I think like so far from trying to fight anybody who criticizes you on those terms, I think you have to accentuate the differences and say, we are not replaying this game.
00:28:08
Speaker
You know, if you want to watch the game, you can watch the game. We are absolutely, and you can hear that I'm using the sort of Royal mathematical we, right? The author and the reader together. We are compressing the game. And that's deliberate because I am showing you in a specific restricted context what matters and I am taking out what doesn't matter. And that requires a knowledge about the subject matter. And this is true for every, for every vis, right? It's not just for sports. It just happens to be more vivid for sports.
00:28:38
Speaker
because you have this dichotomy of who currently has a lot of power and who is gaining power at their expense. But that aspect of we are condensing this, these things are being removed and these things remain and the process is completed, I think is really important to not shy away from.
00:28:59
Speaker
Yeah, it's an interesting way to think about, I mean, it is in a lot of ways real-time data, right? I mean, it's not stock market data, but it is real-time data and try to collapse that down. And it is always something that bothers me when I am stressed out watching my team playing, someone who's not in the sport says, oh, don't worry about it. It's like, no, I've invested my time and my energy. You just don't understand.
00:29:20
Speaker
No, I intend to worry about it. Yeah, that's right. So, as we've been talking, I originally reached out to you because I was going through some of your shot visualizations. So for folks who haven't seen these these are more or less heat maps of where a team or a player shoots on the ice, and
00:29:41
Speaker
And I reached out because I was curious about having just data that's just where players are on the ice, because I have a hypothesis of the students quickly, but I do have a question coming. I have a hypothesis that there's a lot of play happening on the on the side of the ice that's opposite from the benches.
00:30:01
Speaker
And so there's a lot of play that we don't see because cameras tend to face the benches. And so that was why I reached out and you were telling me that the data kind of doesn't really exist. And so I'm curious, what data doesn't exist that you wish you had? Well, that data in particular that you mentioned is in this most maddening middle ground where it does exist. I just don't get to have it. Ah, okay.
00:30:26
Speaker
And even that, of course, is reasonably new. If you ask the same question a few years ago, then it simply doesn't exist at all. And so, but player and puck tracking data has been available internally to teams for a couple of years now. Okay, I see. Well, and I happen to know just because of some connections with the industry that some teams are dealing with it quite well, and some teams are struggling. Because the data, you know, in fact, every now and again, I certainly get
00:30:54
Speaker
my eyes full of stars when I think about what I could do with it, but it would be, you know, among other things, I would have to take out some sort of loans and get ahold of a team of people. And it's one thing to do everything yourself and already it's all I can do for a full-time job to actually make what I'm already making, even though it's essentially all automated. But that kind of granularity of that kind of data where every player and the puck is
00:31:18
Speaker
on the ice with however many times a second resolution you're looking at, all of a sudden you don't just need a rented computer and a half decent command of Python, you need a team of professionals. And not all hockey teams do. In fact, many even professional teams are run on what look like shoestring budgets once you take out player salaries. And so there'll be some movement from that as over the next bunch of years, but it's going to be quite slow because
00:31:49
Speaker
that kind of data hasn't been broadly used in the space.

Comparing Hockey Analytics

00:31:56
Speaker
Right. I mean, it's interesting. Do you, do you think hockey in general, in terms of the analytics is behind other major, we'll just stick to North American sports like baseball and football?
00:32:08
Speaker
It's behind all four. It's the fourth of the big four. And arguably also behind how things are going in soccer. Although soccer has its own history with a lot of important established people who don't like data-driven approaches. But now and again, I referee basketball papers.
00:32:28
Speaker
you know, statistical papers when they're submitted to journals, and it is evident that they are miles ahead. Interesting. And baseball, of course, is its own little curiosity because the sport is so different from the others. Right. Because it has a data history that goes back very, very far, you know, connects to cricket. And in some sense, baseball has always been this place where a particular kind of statistically inclined fan has always gravitated because they're so scrupulous about keeping so much data over such a long history.
00:32:57
Speaker
But it doesn't lend itself to the same kind of techniques because it's not a continuous open play kind of game. Whereas basketball is the one that I look to most closely because it has a lot of formal similarities to hockey. The frequent substitutions, the generally open play, the
00:33:18
Speaker
constant contest for possession. That has a lot of analogs, the restricted space compared to the number of people, where you don't have that territory aspect that you have in the NFL or the CFL. But even there, NFL data is, as a culture within the sport, is miles ahead. Yeah, I mean, I found it really interesting this past season, all of a sudden you were seeing
00:33:46
Speaker
real time replays of a play with the icons of each player moving. And that seemed to be, I mean, even within, I think that was new this past season. So just the growth has been kind of amazing. It would be interesting to see if we could, I mean, I just remember being a kid, like going through box scores in the physical newspaper, which maybe many readers or listeners now don't know what a physical newspaper is.
00:34:16
Speaker
parsing through like the box scores and just whether that's a predictor of people being interested in data and data vis today in your sort of interesting Venn diagram. Well, I was one of these kids too. I definitely poured over the standings too. It's also worth noticing that despite the fact that hockey is really far behind, as I said, the improvements in the broader culture in the last just say five years are enormous. Like
00:34:42
Speaker
One of the nice things about having Twitter around is that you can see like what were people arguing about five, six years ago, every now and again, people come up and every now and again, you know, even your own tweets, you think, boy, was that really where I was? Yeah, right. But also it makes you realize that as a culture, as a community, the improvement is enormous, even if we're still in fourth. I don't know if we're gaining on anybody, but like, I don't think of it as a competition. I try to think of it as, is the culture improving?
00:35:10
Speaker
uh and ideally quickly and i think the answer is yes in both cases well it is really interesting because at least here in washington obviously um the command the the football team the commanders being sold um and the leader of the what looks to be the team that's behind the commanders uh the guy who's leading that he owns a part of the devils uh the new jersey devils hockey team and part of the philadelphia sixers uh basketball team and so it will be really interesting i think
00:35:37
Speaker
I mean, obviously this is behind the scenes and whatever billionaires are able to do, but interesting to see is there sharing going on on the technical piece of these data analytics happening? I mean, he'll own parts of teams, or I don't know if he knows how much he owns of each of these teams, of three major sports in North America and how they're sharing analytics and techniques and processes and data and all that sort of thing.
00:36:03
Speaker
I think that's definitely going to take even more hold. I know that the Cranky Sports Group has a handful of sports teams across different leagues. I know that there's also another couple of groups, like the Glazer Group that owns a handful of teams, and some of them for sure are at least trying to build out cross-disciplinary
00:36:26
Speaker
technical. Sure. I mean, it's a lot easier said than done. Oh, absolutely. But but to your point, I mean, basketball and hockey are two great examples. I mean, they're, they're similar in, in sort of the underlying structure. And so if you have basketball data, that's real time of where everybody is on the floor at any one particular time, and you do whatever analysis and visualizations with that, and you have the same structural data for hockey, I would suspect that those teams can be talking to one another.
00:36:52
Speaker
This is one of those areas where there's a lot of ability to cross over. You can say something like, especially because you have to deal with a culture in all cases where you have an existing established culture that's not data-driven. And so I was talking earlier about how I turned to this because I'm not particularly
00:37:18
Speaker
moved or convinced by numbers, by which I mean physically, you know, when you write numbers on a page, that doesn't translate into information in my brain easily. And whereas people, you know, traditional owners, coaches, players of major sports are like that only 10 times. And so being able to put this in front of them, where you can make a point quickly and say, look, this is how this helps us win.
00:37:42
Speaker
that that's an enormous gain. And so if you have great, great technical skills, but not a lot of visualizations, specifically, and you have someone else in a team who can help you out with that, you know, that's, this is sort of a front facing version of what you can get out of, Oh, you know, I don't have any database administration abilities. And so I'm going to get a DB tech who actually works in healthcare uses, you know, I'm going to get her to come over and build out
00:38:05
Speaker
System for me in sports right there the technology is sufficiently generic that everybody understands that if you can build a database for this you can build a database for that but in fact the vis skills are just as transferable and

Playoff Predictions and Team Analysis

00:38:19
Speaker
And so that's, I think, where some teams are going to find some traction. There's some teams of teams, if you like. Right. Teams of teams. Yeah, absolutely. Okay. We've talked about the data and your process for grabbing it and cleaning it. And we talked about the vis. Let's talk about some hockey to wrap this up. So we are now, I think,
00:38:37
Speaker
each of the second round were one game into each of the second round games. So where we stand today, so it's early May, what are your models saying right now? Who's the favorite to win in both conferences? What do you have?
00:38:52
Speaker
All up, I favor the Carolina hurricanes. Okay. Um, they're one up in their current series, but I favored them even before the second round started. In fact, even before the first round started, uh, and in the West conference, uh, I prefer Dallas over all the others, which is a slightly unusual choice. In fact, both of them are a little bit unusual. Yeah. The hurricanes especially have a very unusual style.
00:39:11
Speaker
The most important thing is to shoot the puck well and that's what they do not do. They do all of the other things extremely well. They just have the puck all the time and they're very good at turning it into shots. And the goal is to sort of come out eventually on the side. Like if you press enough olives you'll get a certain amount of oil leaking out the side of the tank.
00:39:35
Speaker
What about Dallas? I'm curious about Dallas. So Dallas is a more conventional construction, actually. But there's something about being in the southern part of the United States and also being at West. West is here relative to Eastern time. And well, and of course, West is a question of when you play, and they do play late at night to the chagrin of a lot of local fans, actually. That somehow manages to make you fly under some radar a little bit.
00:40:06
Speaker
And they have a fairly devoted, but quite small fan base that also, you know, teams which have big fan bases, like beliefs, you know, you can't, they can't do anything. You can't change the sheets. I'm about in a hotel without people making a big deal about it. Right. Whereas, you know, however relevant it might be, whereas in Dallas, you can assemble an extremely strong roster and have only a handful of people notice. Yeah. Yeah. That's interesting. Well, so one of the greatest examples is, um, their best forward Jason Robertson.
00:40:33
Speaker
is, I think, very nearly as good as somebody like McDavid, who has an incredibly high profile. Right. And I mean, very deservedly, so he's one of the best to ever play the game. But Robertson is only a little bit off, and yet people don't think of him as even remotely comfortable. And, and the market, so it's just one player, you know, they have a whole stable of great players, but they're much more traditional, you know, they have a great goalie, they have great team defense, they have a handful of good shooters.
00:41:01
Speaker
That's sort of the standard blueprint, but somehow people didn't notice them doing it.
00:41:06
Speaker
Yeah. So, so I'm curious about McDavid. I mean, I, uh, my caps are out. So, but I'm kind of rooting for Edmonton primarily because I just like watching him play. Um, so kind of a two-parter. So, so one, why doesn't Edmonton go further every year with dry Seidel and McDavid? I just feel like they would go further every year. And the other day, is it just like a playoff problem? Like what, what, what, what is it?
00:41:33
Speaker
I don't think it's a playoff problem specifically. I do think it's a dysfunction in the team as a management structure problem where they've been unable to surround those two with the depth that is required because of their previous bad decisions. They here actually covers more people than just the current managers because
00:41:59
Speaker
because you can have these decisions where you, you know, in a hard cap, you commit to a particular player for a really long time. And then even if you buy them out, the bio itself leaves marks and the shadows can be cast for a really long time. So you, and if you have poor management choices like that, you can hamstring even the very best players. Part of the two courses that it's, you know, we were talking about the, one of the differences
00:42:23
Speaker
in analytics between hockey and the other major sports. You also have differences in the game structure itself, where individual brilliance does not count for as much in hockey as it does in all of the other sports, where the position structure in football is so incredibly disparate.
00:42:45
Speaker
where you need to have a quarterback of a particular quality in order to go a particular distance. And if you do, you probably will, even if your other players are only so-so. And there's no analog to that in hockey, even goalie, which is the closest you can get, just doesn't have that same amount of leverage. And then basketball, which of course has no
00:43:05
Speaker
It's much, much flatter positional structure. Even though there's technical positions, of course. But the difference between the five different positions on the court are so small. Minuscule compared to NFL. But there, the time pattern is so different, where the best players play such an incredibly large fraction of the game.

Individual Brilliance vs. Team Success

00:43:28
Speaker
But in hockey, you just can't go so fast. No humans, no matter how herculean, have the oxygen to supply their muscles to skate like that for much longer than they're already skating. And so that really limits how much you can do. McDavid is a physical freak. He can play for 30 minutes a night. That's still only half the game. And so you can't...
00:43:55
Speaker
cannot squeeze a single player, or in this case, two incredible players for that much. Yeah. Yeah. I had the, I was fortunate enough to be able to see Edmonton here in DC in November, one of the rare games where we won. And I mean, just watching McDavid play is just, is just a wonder. I mean, he is, he is incredible. He's confounding statistically. He's one of these players who's so good where, where you look at
00:44:26
Speaker
You know, at outputs of graphs, I think, well, I did that wrong. And then you, you know, you go back and you look and you're saying, no, I just have to move those axes. Yeah. Do you think he is the right now, the, or one of the most, I mean, certainly one of, do you think he's, he is the most dominant individual athlete in a team sport in, in let's say North of the big four we've been talking about?
00:44:49
Speaker
I know so little about the specific identities of anybody else who could be. If it's not him, who might it be? And I don't have the list on my fingertips of who it could be. I feel like structurally, the constraints we were talking about before mean that he can't be
00:45:08
Speaker
Just because he's a hockey player that no hockey player, you know, sort of like Zeus himself on skates. Right. Could ever have that kind of role. Yeah. And I, you know, I get the, there's something very pure about just saying, you know, I love watching this guy play because it's great. And there's plenty of players who I have loved in the same sort of way. I think Carlson has occupied a similar kind of position.
00:45:29
Speaker
mentally and spiritually for a lot of Senators fans and then Sharks fans over the years. Yeah. This year in particular they were the Sharks team were dreadful and yeah there's plenty of Sharks fans who say well you know it was all still worth it because I got to watch Eric Carlson have a historic season. Right. Right. And you know but but that's that's understood that you can have a player
00:45:49
Speaker
doing titanic things and still lose. Yes. In fact, it's reasonably common. And certainly you watched Ovechkin take better. I mean, obviously he got that one cup with that one team, but he took better teams. Yeah. Uh, considerably shorter distances into the past.

Future of NHL Teams

00:46:03
Speaker
The president's trophy curse, right? I mean, it's, we saw it this year. Um, well, you don't, you know, nothing is promised. Nothing is written as they say. Right.
00:46:15
Speaker
So I want to ask two more. You already mentioned I was going to ask you who the most underrated offensive player is. I think you said I think is Robertson. Robertson would be your guy.
00:46:25
Speaker
Yeah, I think so. I try not to talk too much about underrated and overrated because it gets a little snipey about, you know, who knows stuff and who doesn't. Oh, yeah, sure. Sure, sure, sure. But I try and try and think about it in the sense of like, you know, do I have like a bottle of wine that's really good and it's only 18 bucks and I can only give you a glass, but I can say, oh, by the way, it's only 18 bucks and then you will enjoy how great that's sort of try to take it in this positive spirit, if I can.
00:46:48
Speaker
Yeah okay so let me let me rephrase then my last question for you which team that's not in the playoffs this year do you think has the brightest future in the next couple of years? I think Ottawa certainly tracks um the Buffalo I think also looks really bright yeah they've they've acquired a number of
00:47:08
Speaker
Both of those teams have a number of extremely good players who are not just coming into their prime, but still a couple of years off it. You know, where you say, if you're this good now at 21, at 22, what are you going to be like when you're 23, 24? Those are the two that really stick out. And of course, it's part of the fun is that both of those teams are together and they both just missed the playoffs this year. And they're
00:47:34
Speaker
they're together in the same division. And their division is, at least this year, was the strongest one in this court. So even if they do improve the way that I expect they both will, it's going to be a knife fight.
00:47:48
Speaker
just to even make the playoffs. And in fact, probably one or both of them are going to lose in the first round of the playoffs just because they might have to play one another. Right. Right. That's right. It's part of part of the business fun. Yeah. Yeah. Also kind of nerve wracking. It's quite possible to be, you know, to have a dominant season and to simply have to play your best opponent right away. Right away. Yeah, absolutely.
00:48:11
Speaker
Michael, this is great. Thank you so much for coming on the show. I really appreciate it. And I'll look forward to seeing where we end up in the next few weeks over the playoffs. Absolutely. Thanks for having me, Joe.
00:48:23
Speaker
And thanks to everyone for tuning in to this week's episode of the show. I hope you'll check out Micah's website and learn a lot about hockey. And I hope you're enjoying the hockey playoffs, of course. And if you haven't yet checked out my new book, Data Visualization in Excel, it is available on Amazon at the Routelage Publishers site and wherever you get your books.
00:48:42
Speaker
It's a step-by-step guide of how to create more than 20 different non-standard graphs in Excel, from heat maps to mosaic charts to strip plots and rain cloud plots and dot plots and slope charts. So I hope you'll check it out. Let me know what you think. There are downloadable Excel files that go along with it that you can use in your own work. So I hope you enjoyed this episode of the show. I hope you're enjoying the NHL playoffs. And so until next time, this has been the policy of his podcast. Thanks so much for listening.
00:49:12
Speaker
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs, design and promotion is created with assistance from Sharon Satsuki-Ramirez, and each episode is transcribed by Jenny Transcription Services. If you'd like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, YouTube, or wherever you get your podcasts.
00:49:34
Speaker
The Policy Vis podcast is ad-free and supported by listeners. If you'd like to help support the show financially, please visit our PayPal page or our Patreon page at patreon.com slash policyvis.