Intro to Hockey Analytics Q&A

[text_output]**This post was written by guest contributor and Patreon supporter Tyler McGillick, and he interviewed Blueshirts Breakaway site editor and analytics writer Drew Way. You can find Tyler on Twitter @TylerRichard93 **[/text_output][text_output]Hey Blueshirts Breakaway readers. My name is Tyler McGillick and I am a 24-year-old Rangers fan living in Frenchtown, New Jersey, which is right next to the border of Pennsylvania. I have written on here prior to this about my thoughts on the Rangers letting Alain Vigneault go and how to proceed. I’ve been a fan of the Rangers since I was 18, when a friend of mine I was going to high school with at the time turned on a game in mid-March of 2012…this ended up being the game where Stu Bickel, Brandon Prust, and Mike Rupp squared off at the start of the game against the New Jersey Devil’s Ryan Carter, Eric Boulton, and Cam Janssen. I’ve been a hockey fan ever since. It was entertaining, exciting, and ended up being a pretty good game.

Fast-forward six years into my fandom and now, my love for the game has entered new realms. I used to be very naïve about the analytical approach to looking at the game with stats like Corsi, Fenwick, and the like. Prior to this, my analysis of the game used to rival those of the likes of Jeremy Roenick, Keith Jones, and even in some instances that I almost gag at the thought of now, Mike Milbury. My perception of hockey was rooted in intangibles, gut feelings, and just flat out luck of the draw. I had no idea the resources of analytics were out there until I started to go on Twitter the past year and follow accounts of people who use them to dissect the game in ways I had no clue were possible without being a part of the team. I started to ask myself why these people were not being taken seriously because they were making what I felt were bold predictions, but they were often right on outcomes. From this point on, I started to pay more attention to the more “old school” analysis again and realized, these people’s perceptions were not only outdated, but just flat out lazy. I realized that some very popular analysts with very large platforms were coming to sweeping assumptions on players after only a few “eye tests,” and had little insight into the full context of the player.

Of course, I believe the eye test is very important, but because of everything I just stated, I’ve begun to try to learn more about analytics. In this article, I will be joined by Blueshirts Breakaway site editor Drew Way in a little Q&A banter with him about analytics and continue to gain knowledge about their use in the NHL.[/text_output][image type=”thumbnail” float=”none” src=”2320″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]Tyler McGillick: So Drew, I guess my first question above all is, why analytics? Why do these analysts insist on intangibles being the end all be all when this data is freely available at their disposal?

Drew Way: That’s a good question. To be honest, the answer to your exact question is I don’t know, haha. I don’t really know what is going on in the minds of the Pierre McGuire’s and Mike Milburry’s of the world that constantly preach about character and intangibles and then go out of their way to admonish analytics.

What I can say however, is there are a few key reasons as to why I personally value statistical analysis, and why I have spent countless hours of my life learning about advanced hockey stats, studying them and looking for ways to communicate them in an easily digestible manner to hockey fans.

Before I begin however, I do want to get one thing out of the way: I DO NOT think that statistics are the end-all-be-all to hockey analysis. I DO NOT think that all you need is a spreadsheet of data to be able to tell the full story of who a player is, how good a team is, or what happening in a game/series/season. I believe that statistics are an extremely valuable tool for player/team analysis, but it is one of multiple tools, which also include “the eye test.” We have a ton of data available at our hands currently, but you do need to watch the games as well in order to fully understand the full picture. In my opinion, analytics and the eye test go hand-in-hand, and should be used together for evaluation purposes.

With that caveat out of the way, one major reason I value statistics is because numbers are objective, whereas humans are extremely subjective. Numbers don’t lie, but humans do. Humans can certainly manipulate data and use only certain statistics and ignore others to paint the precise picture they would like you to see; but that’s not the numbers lying, that is a human using numbers to lie, which is an important distinction. All of us are prone to confirmation bias in our eye tests, even the best scouts in the world have their biases, and in order to have a very strong eye test, you need to be aware of these biases, which is MUCH easier said than done. Just go on Twitter during any game if you want a good example of what I am talking about. You can have 100 fans all see the same exactly play, and have 100 different interpretations over what exactly happened during the play.

Also, and this is something that admittedly I’ve been guilty of, fans often will see what they want to see in a player, which influences their eye test. I remember earlier this year, the Rangers were cycling the puck in the offensive zone, and Buchnevich peeled off the boards and went to an open area of the ice just above the slot. Shortly after this, the opponent (I believe it was Washington), won possession of the puck and cleanly exited the zone. I’m a Buchnevich fan, and when I watched that play I thought, “smart play by Buchnevich, the Rangers were in a position that it looked like they could win the board battle, and then Buch would be all alone for a prime scoring chance.” However, a large segment of the fanbase saw that play and interpreted it as Buchnevich being lazy or soft. Same play, two completely different interpretations.

Another reason I like using advanced stats is because they help you learn about other teams and players. I said before you shouldn’t rely ONLY on statistics for evaluation, but fact of the matter is, nobody has time to watch all of the games of every team. However, using the right statistics, you can get an understanding of a myriad of factors of other teams and players, including how a coach is deploying his players, who has the largest impact on high quality scoring chances, which players do the best job at limiting opponent scoring chances, and much more. Sure, I try to watch as much hockey as possible, but let’s be real, I have a wife, a kid on the way, a full-time job, and a life outside of work and hockey. There are only so many games I can watch. The data helps to somewhat fill the gaps, along with the limited viewing of some of the west-coast teams that I unfortunately don’t get to watch as much as I would like to.

Further, when you are using statistics, you often are working with a much larger sample size than eye test viewings. Obviously most of you reading this watch most Ranger games, but how often do you really see other teams? MAYBE you see one or two other teams 10 times in a season for most of you with a life. Fact of the matter is, 10 games is an extremely small sample size; not nearly large enough to make the sweeping opinions that many make on players after a few viewings. You may have seen an opposing player three times, and he looked awful in all three. But these players have lives outside of hockey awful. Maybe they were sick, maybe their kid is sick, maybe they got into a fight with their wife, maybe they are playing through an injury. All of these things can greatly influence how a player performed in a game, and all of them are likely unknown to us viewers. However, if we are only using the eye test, this is all we will know of the player, and this could lead us to assume this player is much worse than they actually are.

Advanced statistics can also help account for things that could greatly influence a player’s eye test and box score stats, such as how a coach is deploying a player. It is very easy for a player to have his point totals inflated by coaching deployment; a player with a lot of power play time who is often deployed in the offensive zone can more easily rack up points than a player who is more often used in the defensive zone and the penalty kill, and never gets power play time. Further, a player who is paired with other talented players more likely than not will be made to look better themselves, compared to a guy who is stuck on a checking line with guys that lack playmaking ability. There are a number of contextual stats that help us understand player deployment, ranging from the quality of teammates a player is accustomed to playing with, to what situations they most often are deployed during.

Lastly, but most importantly, I just enjoy working with numbers. My whole life I’ve been a statistics nerd. As a kid, my father used to show me off to his friends, because I was able to remember all sorts of player stats across the sports I follow (basketball, football, hockey and baseball). I started helping my dad with his fantasy football team when I was six years old; I remember the first draft I attended with him, our first two picks were Thurman Thomas and a young Brett Favre. At the end of the day, the reason we follow hockey is for entertainment, because we enjoy it, and studying statistics has always been one thing that has greatly added to my enjoyment of sports. Everyone is entitled to enjoy the great sport of hockey however they please, and using advanced stats is one thing that adds to the enjoyment of hockey for me.[/text_output][image type=”thumbnail” float=”none” src=”2570″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]TM: You bring up sample size. That has to be the most important factor in this whole “eye test” vs. analytics debate. I personally have heard remarks on the air about how analytics can’t track certain intangibles like “leadership”, individual compete level, passion, or confidence in what you do. Don’t get me wrong, you need leaders like in any work place environment to help keep things trending upwards and to teach for lack of better terms. Those intangibles can certainly affect a player and throw off their game here and there like you stated, but you nailed it. Over a course of a season, you can really tell who can sustain their play year after year from a proper sample size. From this, what you see on the ice–on top of what you can read in data tracked–can only strengthen your analysis of a player. Now from this, in studying the game from an analytical point of view, which data tends to carry more value amongst people who track metrics? Defensive, or offensive based metrics?

DW: It’s a good question, and it’s one that I don’t believe there is a particular answer to. There are a number of analytically-inclined hockey writers and hockey statisticians that specialize and focus on a particular part of the game. For example, Corey Sznajder (who I would recommend any hockey fan interest in analytics follow) has a fantastic tracking project where he focuses on a few key microstats such as passing (pass types, success rates etc.), zone entries and zone exits. Obviously, these stats don’t give you the full picture (nor do they intend to), but they serve as another great tool to round out your analysis of a player.

In more general terms though, I would say that stats that have been proven to carry more “predictive value” tend to gain the most traction in the analytics community. Corsi (shot attempts) for example, was originally created by goalie coach Jim Corsi, as a way to measure how much work his goalies were putting in each game, with the mindset of a goalie has to react to all shot attempts, regardless if they hit the net or are blocked. However, after a number of studies by statisticians much smarter than myself, it was proven that Corsi carries more predictive value than goals or shot attempts. By this I mean, there is a stronger correlation between Corsi and future team success than there is between goals and future team success. This isn’t a subjective debate, it is something that has been proven beyond any shadow of a doubt multiple times. Unfortunately, when people hear this, they misinterpret it to think Corsi is all that matters. That couldn’t be further from the truth. However, Corsi is a BETTER predictor of future success than goals, but obviously not the best, or only, predictor of future success.

However, personally, I am more of a fan of using expected goals models compared to Corsi. If you want the long answer as to why I prefer expected goals, I urge you to check out the expected goals section of my Hockey Lexicon, which defines all major advanced hockey statistics in an easily digestible manner, and provides usage examples for many of the more prominent stats. But long story short, expected goals is a statistic that considers both shot quantity and quality in order to provide a metric for how many goals a team (or player) should have scored, given the quality of scoring chances generated, if the opposing goalie played at a league-average level. Expected goals accomplishes this by weighting each unblocked shot attempt by a variety of shot attributes, with heavier weightings applied to shot characteristics with a higher chance of leading to a goal. Also, going back to the concept of predictive value, expected goals was proven by noted hockey statistician Dawson Sprigings (DTMAboutHeart on Twitter) to be even more predictive than Corsi. The chart below was included in Dawson’s Hockey-Graphs article discussing the predictive value of expected goals, and as you can see, in terms of predictive value over time, expected goals (xG, the blue line) > Corsi differential (CF%, the red line) > goal differential (GF%, the yellow line).[/text_output][image type=”thumbnail” float=”none” link=”true” target=”blank” info=”tooltip” info_place=”bottom” info_trigger=”hover” src=”703″ alt=”Photo Credit: DTMAboutHeart Hockey-Graphs Article” href=”https://hockey-graphs.com/2015/10/01/expected-goals-are-a-better-predictor-of-future-scoring-than-corsi-goals/” title=”Photo Credit: DTMAboutHeart Hockey-Graphs Article” info_content=”Photo Credit: DTMAboutHeart Hockey-Graphs Article” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]TM: You bring up a great point in the Corsi analysis. It seems to be the “Go-to” stat for many who analyze the game from a number’s based perspective and there is definitely a reason for that in a game where puck possession is huge and goals are rare. One final thought and question I have regarding analytics is can the data eventually be consolidated to one number? Meaning, there are so many different metrics available (expected goals, Corsi etc.) so for it to become more digestible to a new age fan who might not be good at analyzing numbers or just not have the patience to read them—I’ll be honest, I don’t always have time to read the charts people work tirelessly to report on players—can these metrics ever be consolidated to one objective analytical statistic? While it is important for fans to remember these advanced statistics exist–and that they more often than not can help predict outcomes–I do see where people are coming from when the data shown is just flat out hard to read.

DW: I think there is a short answer and a long answer to this; but of course, knowing me, my short answer will still end of being pretty long, haha.

The short answer is yes, you can consolidate the data into one number that attempts to quantify the overall performance of a player. These are catch-all statistics, which first were popularized in baseball (Wins Above Replacement or WAR) and then caught on in basketball (Player Efficiency Rating or PER). Hockey has its own version of WAR, and there are also a number of additional catch-all statistics like GAR (Goals Above Replacement) and Game Score. If you want a full break-down of what these stats are and how they are calculated, you can check out my Hockey Lexicon or you can read the individual articles I wrote on WAR and Game Score. All of these catch-all metrics consider a litany of factors that drive performance, such as shot share, shot quality, shooting performance/scoring, penalties etc. in order to try to capture the overall value a player brings to the ice. They each are calculated in a very different manner however, and so while they all generally agree on who the handful of best players are (with some outliers of course), the exact ordering of the players is likely to differ.

And that last sentence gets me to the long answer, which is while there are catch-all statistics available, and I believe they are valuable tools for player analysis, they are most definitely NOT the end-all-be-all for player analysis. In this situation you described, where you might not have the time to consider all the different common metrics, microstats, charts, graphs etc. available, then you can most certainly use any of these catch-all statistics in a pinch to help give you a quick sense of approximate player value. However, I will always, always caveat any statistical work I do with the fact that I don’t believe any metric, or set of metrics, exists that allows someone to get a 100% accurate read on a player, team or game.

Obviously, I think stats are extremely important, or else I would not have spent a fraction of the time I have researching them and writing about them. But, there are no perfect set of stats out there, and leading hockey statisticians are abundantly aware of this, which is why they are constantly working to refine their methodologies, improve their models and get better data (some data sources are better than others and the NHL doesn’t have the player tracking data available that the NBA does).

Personally, if you are looking for a go-to catch-all stat to go with, I’d recommend using Game Score. Game Score is a catch-all statistic created by Dom Luszczyszyn of The Athletic that quantifies the total value of a player’s productivity from a single game. Dom’s Game Score incorporates the following stats in an attempt to quantify the overall performance of a player: goals, primary assists, secondary assists, shots on goal, blocked shots, penalty differential, faceoffs, 5v5 Corsi differential and 5v5 goal differential. Obviously not all stats carry the same importance, so Dom assigned weights to each of the metrics to come up with the following formula for Game Score:

Skater Game Score = (0.75 * G) + (0.7 * A1) + (0.55 * A2) + (0.075 * SOG) + (0.05 * BLK) + (0.15 * PD) – (0.15 * PT) + (0.01 * FOW) – (0.01 * FOL) + (0.05 * CF) – (0.05 * CA) + (0.15 * GF) – (0.15* GA)

The reason I wanted to share the formula with you is not because I want your eyes to roll into the back of your head, it’s because one of the main reasons I like Game Score so much is because of it’s relative simplicity. WAR, GAR and other catch-all stats out there involve complex regressions and control variables that make it nearly impossible for your average individual to compute on their own. However, anyone with a calculator and access to the data incorporated in the Game Score model (which is everyone on the planet with internet access, for the shot attempt data, I recommend using Corsica or Natural Stat Trick) can calculate Game Score, as it involves about middle school-level arithmetic and patience. Of course, you don’t need to calculate Game Score on your own, Corsica provides the stat for you. But my point is, the stat just makes sense. You look at the equation and you can see how it is calculated, and you can understand why it is calculated in that manner.

Game Score has multiple applications, and Dom frequently uses it in his writing when assessing player and team performance across single games as well as entire seasons (and everything in between). Like many stats, Game Score can be used in raw counting terms, or it can be depicted as a per-60 minutes of ice time stat. Lastly, Game Score is an advanced metric that absolutely passes the smell test the vast majority of the time, which is an important element to getting buy-in from fans that are skeptical of these types of metrics. When you look at the list of league leaders in Game Score, it usually aligns remarkably closely with who you’d think should be at the top based on your eye test and the box score stats everyone is familiar with. For example, the league leaders in all situations Game Score this year are: Connor McDavid, Nikita Kucherov, Claude Giroux, Artemi Panarin and Sidney Crosby.

One last thing I’d like to mention here, if you have a little bit more time, and were hoping for a chart for your one-stop-shop for a quick player analysis, instead of a single metric, I’d recommend checking out Bill Comeau’s awesome SKATR player comparison tool. This is a resource I do rely on a good amount myself when I don’t have the time to dig into all of the raw data myself. Below is a list of items you need to know in order to use and accurately interpret the chart:

You can choose the players to compare and seasons by which to compare them from the dropdown at the top. Note – you cannot compare a forward to a defenseman; you can only compare forwards to forwards and defenseman to defenseman, and there is a dropdown to select which position pool you’d like to choose from in the middle of the top.
All data provided in the chart is 5v5 only, score adjusted (because teams play differently at various scores, there is a model to adjust the data to account for this), and per-60 minutes (to account for variances in ice time between two players).
The image below, which we will use as an example to make this little tutorial easier, compares Chris Kreider (left) to Phil Kessel (right) over the prior two seasons.
The size of the bar chart and the number inside the bar chart represent percentiles, and not raw data. So, in the image below, the Game Score bar lists Kreider as an 87 and Kessel as a 75; that means that Kreider is in the 87^th percentile of forwards in Game Score, while Kessel is in the 75^th percentile of forwards in Game Score.
The color of the bar charts also helps to depict ranking; the darker the blue, the closer to the 100^th percentile, the darker the red, the closest to the 0 percentile.
The “individual” stats represent the numbers that the skater himself accumulated, and include: Game Score, points, goals, primary assists, secondary assists, individual shot attempts, individual expected goals for, shooting percentage and penalty differential.
The “on-ice” stats represents the stats a team accumulates while the player is on the ice, and includes: Corsi for percentage, relative to teammate Corsi for percentage (in my opinion, relative to teammate stats are a much more accurate portrayal of a player’s ability than relative to team statistics; relative to team stats are usually what is used when you see someone use a “relative” statistic), shot attempts for, shot attempts against, expected goals for percentage, relative to teammate expected goals for percentage, expected goals for, expected goals against.
The “context” stats help you understand the way a player is deployed, which provides valuable context to a player’s stat profile, and include: time on ice percentage, quality of competition they face, quality of teammates they are deployed with, and defensive zone starts.

So, using this chart, the biggest key takeaway, for me at least, is that during 5v5 play Chris Kreider has been the far more impactful player than Phil Kessel over the prior two seasons. Kessel does a lot of his damage on the power play, and is certainly the more effective power play forward than Kreider. But, when you are examining even strength impact, they have similar point production per-60, but Kreider has a much more significant positive impact on the Rangers in terms of driving possession and high-quality scoring chances than Kessel has on the Penguins. Kreider also typically faced stiffer competition, and they both played with a similar caliber of teammates, while Kreider saw more defensive zone starts, so this isn’t a case of Kreider’s deployment is artificially inflating his production relative to Kessel’s.[/text_output][image type=”thumbnail” float=”none” src=”2571″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]TM: Those charts really help paint the full context of the past two seasons of who is fact a better 5-v-5 player. Game Score also appears to be a fully contextual analytical tool to use because you see in a single game who was more impactful vs who is just a passenger, and you get enough of those, you can see who can sustain their Game Score. In the context of the SKATR comparison tool, they are very digestible especially when you just learn what each category is. While it may not be the singular number many hope for, they do help paint the full picture of a player.

I’d like to thank Drew for doing this Q&A so that not only you–the reader–could hopefully learn something about analytics you did not before, but us as well. We are in a new age of hockey analysis and it is exciting to see where the game is heading in analysis. Intangibles will always be part of player analysis, because coaches/management will put value on leadership and other types of characteristics that just cannot be tracked via data, but it is also great to see that hockey culture is hiring people like Kyle Dubas and others who use analytics primarily in analysis. The hockey fan in me makes me excited for this because hopefully this means hockey media will soon start to accept that analytics are not only just a tool for analyzing, but vital for analysis. This may be an apples to oranges comparison, but there’s a lot of similarities of analytics in hockey right now that there were when sabermetrics were first really introduced into baseball. The good news for baseball, is that they are now a must for analysis. More often than not, sabermetrics are the majority now, and not the other way around when it comes to hockey analysis. With hockey going from a grind it out, dump and chase game, to a speed & skill game, the more it will force not only GM’s, but those who write about hockey to start to really look into new ways of analyzing what they are seeing in players.

Once again, a huge thanks to Drew for joining me in this and we hope you enjoyed this as much as we did. Let’s go Rangers![/text_output]

Author: Guest Writer

This contributor is a fan who wanted to contribute an article to blueshirtsbreakaway.com. Please show them some love on the above links. If you’d like to submit an article to be posted on the site, please send it on over using one of the links below!

Twitter Facebook

Latest Episode

Author: Guest Writer

Latest Articles