Introducing a New Hockey Analysis Resource - EvolvingWild's Website

[text_output]For those of you that may be unfamiliar, I created the Blueshirts Breakaway Hockey Lexicon which launched on February and is intended to be a one-stop shop for learning about hockey analytics in an intuitive and easily digestible manner. Months of work and years of research went into building my knowledge base and physically creating the resource, with my primary goal of trying to make the complex and often overwhelming world of hockey advanced stats easier to understand for everyone. My hope is that if I provided an avenue for hockey fans to learn about the various concepts, research, statistics and resources available to everyone, that maybe a few fans will embrace the advanced stats more and perhaps find an additional avenue to expand their enjoyment of the great sport of hockey.

Given its nature, this resource is not intended to be a one-time information dump that is than just left to rot. I am continuously updating it as new research, models, concepts and resources are introduced, and I already have a lengthy list of items to add to the already 20,000+ word resource.

With that said, I am happy to announce that I have just added a fantastic new website to the Resources section of the Hockey Lexicon: EvolvingWild’s Website.

The hockey statistician Twins—Josh and Luke, who both operate under the Twitter handles EvolvingWild—have been putting out tremendous work for a few years now, most notably creating their own catch-all statistical model called wPAR, an expected goals model, free agent contract prediction model, teammate relative statistics and more. In fact, I’ve written about some of their work myself for Blueshirts Breakaway, as I’ve included information on their work in my Hockey Lexicon, leveraged their contract prediction model in my Rangers RFA contract predictions article and wrote about their relative to teammate statistical model in a summary article.

Personally, Josh and Luke are easily among my favorite hockey statisticians out there, and I hate the feeling I have in the back of my mind knowing that they likely will be hired by an NHL team at some point soon, likely meaning the end of their publicly available work. Don’t get me wrong, I would be absolutely thrilled for them if/when they get hired by an NHL team, but selfishly I would really miss getting to see their work.

It goes without saying that when they recently announced the launch of their own hockey statistics website, I was thrilled. I’ve pestered them via Twitter DM numerous times about some of their data, and now it is publicly available to all of us. In this write-up, I want to introduce their new fantastic website to all of you, and give you a quick rundown of the data and functionality available within the site, as well as some tips for using the site and interpreting the data. One thing I want to make clear up front is that Josh and Luke are currently working on a number of cool additions to the site, and this is far from a finished product. So read up on what’s available here, check out the site and keep your eyes peeled for the enhancements they’ll be making in the near future.

The website organizes its capabilities across four main menu tabs: RAPM Tables, RAPM Charts, Expected Goals and More. The first three tabs house the data and charts, while the More tab includes information about where the data comes from, links to methodology write-ups about models used on the site and other shift-by-shift regression models used in hockey, and explanations about some of the information contained in the site.[/text_output][custom_headline type=”left” level=”h5″ looks_like=”h5″ accent=”true” id=”” class=”” style=””]RAPM Tables[/custom_headline][text_output]RAPM stands for “regularized adjusted plus-minus” and it uses a very similar model that Dominic Galamini Jr. uses for his HERO Charts Player Evaluation tool, just with different variables used as the target variable. If you want to get down into the nitty-gritty details of how the model was created and how it isolates various variables to accomplish its goal, the References page on the site (in the More tab) contains links to multiple papers that discuss both NBA and NHL adjusted plus-minus models. Long story short however, RAPM attempts to isolate a player’s performance and impact, independent of both his teammates and opponents. In other words, RAPM measures a player’s actual contribution to his team in the various statistic being discussed (e.g. shot attempts, expected goals and goal differential) by stripping out factors outside of the player’s control, such as the strength of his teammates and strength of his opponents. This is a far superior version to the team relative statistics (usually simply referred to as “relative” statistics) that you see thrown around Twitter and other channels, and standard relative metrics simply illustrate how a team does while a player is on the ice compared to off, which of course can be greatly influenced by the other players on the ice with the player.

RAPM is a complicated statistical model, and I hope I was able to just explain what it is and why it is better than standard relative statistics. If you have any questions, please don’t hesitate to reach out to me, and I will do my best to try to explain it better. But for now, let’s get back to discussing EvolvingWild’s site.

The RAPM Tables tab contains to pages—Skaters (which serves as the website’s landing page) and Teams—which display the player-level and team-level RAPM data within sortable data tables. Both the Skaters and Teams pages provide RAPM data for three key metrics: Corsi (shot attempts), expected goals and goals. Both pages also allow users to view the data in terms of “impact” or “rates” by selecting the appropriate radio buttons at the top of the screen. The rates tables simply display the per-60 RAPM data for each statistic that the team posted on a whole (Teams page) or the team accumulated while the specific player was on the ice (Skaters page). According to EvolvingWild, “the impact figure is the “implied” total added if the rate is expanded based on a player’s TOI.” In other words, the rate stats are a pure per-60 figure, while the impact stats take into account a players TOI, and use that to demonstrate the overall impact the player had on the team across the given timeframe. This is an important addition, because fact of the matter is as much as we all like to isolate per-60-minute production, total impact statistics are also vitally important, because a player who can produce at a high level while receiving high usage is obviously more valuable and has a larger team impact than a player who produces at a high level but receives minimal ice time.[/text_output][image type=”thumbnail” float=”none” src=”2782″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]All of the RAPM Tables pages and views break down the goals, expected goals and Corsi data into three individual components: against/defense, for/offense and differential (combining offense and defense). This convenient feature allows users to gain an understanding of which aspects of the game each player or team is the strongest in, as well as their overall performance in the metric. The nomenclature various a bit depending on the view you are on. If you are viewing the Skaters or Teams Impact tables, the individual stats are goals for impact, goals against impact, goal plus/minus (differential) impact, expected goals for impact, expected goals against impact, expected goals plus/minus, Corsi for impact, Corsi against impact and Corsi plus minus impact. If you are viewing the Rates tables, all of the data is in terms of per-60 instead of impact, and labeled as such.

Now that you know what data is contained within the RAPM Tables page, let’s discuss how to use the page. Similar to Corsica, the page provides a variety of data filters at the top. Both the Teams and Skaters pages allow users to select the table type (Impact or Rates) from the top and filter the data by a specific team and game strength state. The Teams page also allows users to select a specific season or combination of seasons (dating back to the 2007-2008 season), while the Skaters page allows users to select a timeframe. It should be noted that due to the way the skater data is calculated within the model, the timeframes for skaters are in three-year chunks (e.g. 2015-2018).

The Skaters page also allows users to filter data by skater position and place a minimum time on ice requirement; an important feature given the fact that rate data in small sample sizes is fairly unreliable. By default, the page currently only provides data for players with at least 505 minutes of ice time. After the user has selected the data filters they desire, they must click the blue Submit button to generate the view. The Skaters page also includes a search field, which dynamically updates the data as a user types in a skater’s name; this means you do not have to click the Submit button to generate the results for a player search. Additional features of the page include the ability to sort any column by clicking the column header, pagination controls at the bottom of the page, ability to choose the number of rows per-page to view via a dropdown at the top of the page, and the ability to click to highlight a row. Users can also export the data into an Excel file via the Download button at the bottom of the screen.

The below screenshot depicts all players that have played for the Rangers from 2015-2018 and that have logged at least 2,300 minutes of ice time at even strength over that timeframe. I have selected the Impact view, and sorted the data by the xGPM_impact column (expected goals plus/minus impact). I’m sure the top result will frustrate the hell out of Rangers fans, seeing that Eric Staal had the greatest impact on expected goal differential on the teams he played for over the past three seasons. If you look beyond that however, I think nobody will be too surprised by the results. Ryan McDonagh finished second by a comfortable margin with a xGPM impact of 17.95, followed by Mika Zibanejad (12.69), Chris Kreider (12.48) and Mats Zuccarello (8.06). One thing to note when you look at the against columns: negative numbers are good. The differential stats are calculated by subtracting the against numbers from the for numbers, so a negative number in the against column adds to the differential number (which you want to be positive). If a player has a negative goals against, Corsi against or expected goals against, it means that the player’s performance leads to the team allowing fewer goals/shot attempts/expected goals against.[/text_output][image type=”thumbnail” float=”none” src=”2783″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][custom_headline type=”left” level=”h5″ looks_like=”h5″ accent=”true” id=”” class=”” style=””]RAPM Charts[/custom_headline][text_output]The RAPM Charts tab provides a convenient player comparison tool that allows users to quickly compare how any two players perform across give key even strength RAPM stats (per-60): goals for, expected goals for, Corsi for, expected goals against and Corsi against. The stat types for each bar are presented across the bottom of the chart, and the bars represent how well each player does in terms of standard deviation from the average player. The darker the blue and higher the bar reaches, the better the player is in the stat; the darker the red and lower the bar reaches, the worse the player is in the stat. Similar to the data within the Skaters RAPM Tables page, the data is grouped into three-year increments. Users can select the time frame and players to compare from dropdowns provided on the left-side of the screen. Currently, only even strength charts are available, but the tab contains a Powerplay link, which is under construction at the time of writing this.

The chart below compares Chris Kreider to Artemi Panarin, and Kreider stacks up surprisingly well in this model. The two players perform relatively equally in terms of per-60 goals for RAPM, while Kreider has the upper hand in terms of expected goals for and against, and Panarin has the advantage in Corsi for and against, with his advantage in Corsi against being the most sizable gap across any of the stats.[/text_output][image type=”thumbnail” float=”none” src=”2784″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][custom_headline type=”left” level=”h5″ looks_like=”h5″ accent=”true” id=”” class=”” style=””]Expected Goals[/custom_headline][text_output]The final section of EvolvingWild’s fantastic new site is the Expected Goals tab, which presents their proprietary expected goals (xG) model (which is different from other ones out there, such as Corsica’s, Matthew Barlowe’s and Cole Anderson’s) at both the skater and team-level. EvolvingWild published their methodology and code for their xG model here, and I encourage anyone interested in the details of it to check it out. I’d also encourage anyone to check it out because they provide excellent video examples at the end that demonstrate various scoring chances and their corresponding expected goal weights. But, to summarize, expected goals models incorporate shot quality information (e.g. type of shot, location of shot, angle shot was taken from etc.) to weight each unblocked shot attempt (Fenwick event) to get an idea of how many goals a team should’ve scored, given the shot quality, given league average goaltending.

The primary difference in this xG model is that, “Currently, all public models use all situations Fenwick shots with strength state as a specific feature variable (a “categorical” variable). Given the significant differences in play styles and scoring rates between even-strength, powerplay offense, shorthanded offense, and empty net situations in hockey, it seemed like a good idea to build four separate models for each of these specific play states. In the initial stages of testing, we determined that there was a benefit to creating separate models for each of these four strength states.” I’d also like to note that they have variances in the algorithms used in the model to help generate the results, but again, I’d encourage any of you to read their methodology if you’d like to learn about the specifics there.

The Expected Goals tab features two pages: Skater Table and Team Charts. The Skaters page presents data in sortable tables and offers filter and search capabilities, similar to the RAPM Tables page. Users can choose to view the table in standard terms or relative; the primary difference being standard shows the normal xG data for when the player is on the ice, while relative provides teammate relative data (note – not standard team relative; as I discussed above and here, teammate relative data is objectively superior to team relative). Users can filter the data by team, position, game strength state, individual seasons and can play a TOI minimum on the results.

The data table presents the player information (name, team and season) as well as how many games they played and the total time on ice for the selected season(s) for both the Standard and Relative views. The Standard view then presents the team expected goals for (xGF), expected goals against (xGA), expected goal differential (xG_diff), expected goals for per-60 (xGF60), expected goals against per-60 (xGA60) and expected goals differential per-60 (xG_diff60) the team generates while the player is on the ice. It also provides the individual expected goals (ixG) and individual expected goals per-60 (ixG_60) the player himself generated.

The table below shows the Standard view for the Rangers last year (minimum 400 TOI), sorted by the expected goal differential column. As what should come as no surprise to any Rangers fan, only a handful of Rangers finished the season with a positive expected goals differential: Mika Zibanejad, Chris Kreider, Rick Nash, Nick Holden (ok maybe this one is a surprise to some) and Vladislav Namestnikov.[/text_output][image type=”thumbnail” float=”none” src=”2815″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]As I mentioned earlier, the Relative table view uses teammate relative data, and includes the following data columns (all teammate relative): expected goals for per-60, expected goals against per-60, expected goals per-60 differential, expected goals for impact (same impact methodology used in the RAPM section), expected goals against impact and expected goals differential impact.

Below is the same screenshot of Rangers players from the 2017-2018 season with a minimum of 400 TOI, sorted by the teammate relative expected goals differential impact column. With this model, which honestly is one of my favorite hockey data points, we see similar results, but with some important nuances. Chris Kreider and Mika Zibanejad flip spots at 1 and 2, and Brady Skjei rockets up to fourth, whereas previous he wasn’t even in the top-10. Ryan McDonagh gets downgraded by a decent margin, and Kevin Hayes is one of the bigger risers as well. All data is important, and both of these views should be considered, but because of the way EvolvingWild’s relative to teammate models adjust for all sorts of variables outside of players control, personally, if I had to choose just one of these to lists to use, I’d go with this one.[/text_output][image type=”thumbnail” float=”none” src=”2816″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””][text_output]Last but not least, we have the great Expected Goals Team Charts page. Users can select the team, season, standard deviation scale (which in essence serves as a zoom function) and minimum skater TOI for inclusion in the chart. The x-axis contains teammate relative xGF per-60, while the y-axis depicts teammate relative xGA per-60. The x-axis is standard; the further to the right you go, the higher the expected goals for per-60. The y-axis is inverted however, so the further up you go, the lower the xGA per-60 (which is better). It is a standard four-quadrant chart, with the top-right quadrant being good (lots of expected goals for, few against), the bottom-left being bad (few xGF, lots of xGA), top-left being dull (no xG for anyone) and bottom-right being fun (xG for everyone). The dots are color-coded to represent the amount of even-strength playing time each player received; the darker the blue, the more time they received.

The chart below depicts the 2017-2018 Rangers season, and I think there should be few surprised in there. Mika Zibanejad and Chris Kreider were undoubtedly the best Rangers in terms of expected goal differential, with Zuccarello and Skjei both hovering near the good range. John Gilmour, Tony DeAngelo and Brendan Smith are all firmly entrenched in the fun quadrant, and Paul Carey and David Desharnias epitomized the dull section.[/text_output][image type=”thumbnail” float=”none” src=”2817″ alt=”” href=”” title=”” info_content=”” lightbox_caption=”” id=”” class=”aligncenter” style=””]

Author: Drew Way

Diehard New York Rangers fan since 1988! Always has been fascinated by sports statistics, and is a big proponent of supplementing analytics with the eye test. Also a big Yankees, Giants and Knicks fan.

Twitter Facebook

Latest Episode

Introducing a New Hockey Analysis Resource – EvolvingWild’s Website

Author: Drew Way

Latest Articles