Shot Attempts Are Valuable Information

This week’s article in The Star looked at the predictive value of information embodied in a team’s points and in their possession metrics. In particular it showed that these two stats contained different kinds of information, so that when used together, predictive power is increased.

I should begin by mentioning that the first draft of the article was written with data at the 20 game mark. It ended up getting pushed back, and so it was rewritten to reflect data from the 25 game mark. There were some interesting observations from the original data, however, that I didn’t get a chance to update, so in this piece I will be referring mostly to results that are based on 20 game data.

The data on various measures of team performance were gathered from the last 5 non-lockout seasons (so 2008-09 to 2011-12 and 2013-14) and considered in isolation and in various combinations to see how well they explained which teams made the playoffs. As mentioned in The Star, www.puckon.net was a valuable resource, but other data used here but not in The Star piece were taken from war-on-ice.com.

Looking at the effect that the number of points after 20 (or 25) games has on making the playoffs is fairly straightforward. However, when looking at possession, there are various forms to consider. There’s Corsi (all shot attempts) or Fenwick (only unblocked shot attempts), both of which examine shot attempts in 5-on-5 situations. They can be modified to consider shot attempts only in “close” situations – when the game is either tied, or there is no more than a one goal lead in the first or second period – or adjusted to reflect score effects (as mentioned in The Star article). These variables all differ in their predictive value.

Note that, since we’re looking at whether teams made the playoffs (a binary variable), probit regressions were used. With these regressions, there is nothing like an R-squared measure to tell how much of the variation is being explained. It does generate, however, a pseudo R-squared, which can be useful for comparing models, although it does not have the interpretation of an R-squared. Other useful measures for comparing models are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For both these measures, smaller scores are better, while bigger is better for the pseudo R-squared. The following table shows these scores for a variety of models run with just a single explanatory variable.

Variable Pseudo R-Squared AIC BIC
Points after 20 games 0.2367 193.8639 200.2499
Points after 25 games 0.3191 145.1337 151.155
Goal Differential (20 gms) 0.2532 189.7471 196.1331
Fenwick (20 gms) 0.1342 219.3607 225.7467
Corsi (20 gms) 0.1254 221.5341 227.9201
Fenwick Close (20 gms) 0.1997 169.874 175.8953
Corsi Close (20 gms) 0.1904 171.8077 177.829
Score Adj. Fenwick (20 gms) 0.2437 160.7652 166.7865
Score Adj. Corsi (20 gms) 0.2428 160.9597 166.9809
PDO (20 gms) 0.0770 233.5908 239.9768

 

In terms of predictive value after 20 games, score adjusted measures contain more information than the other measures. Interestingly, however, the results are somewhat in contrast to a recent study done by Micah McCurdy over at Hockey-Graphs.com. He also found that score adjusted measures were best, but that close measures were worse than standard possession measures. Here, we are seeing that close measure are in fact better than standard measures. McCurdy also found that Corsi measures are better than Fenwick, in all cases. Here, we are again seeing the opposite (although the difference is negligible when looking at score adjusted measures).

It should be noted that McCurdy was looking at the correlation of these measures to different outcomes – goal percentage and winning percentage, specifically. Perhaps there is something about the coarseness of the measure here – making the playoffs or not, that is causing the differences. At any rate, it is worth further investigation.

Interestingly, when looking at just a single statistic after 20 games, goal differential seems to be the best. According to Elliotte Friedman, this is Mike Babcock’s preferred statistic, so chalk up another point for the man who might be the most sought-after free agent this offseason.

In combination, however, it appears that the best combination is points and Score Adjusted Corsi. The following table gives the above measures of goodness of fit for the various models, all using data from the 20 game mark.

Variables Pseudo R-Squared AIC BIC
Pts  + Goal Diff 0.2645 188.934 198.5128
Pts + Fenwick 0.3120 177.1281 186.707
Pts + Corsi 0.3243 174.0581 183.637
Pts + Fenwick Close 0.3716 136.2432 145.2751
Pts + Corsi Close 0.3806 134.3879 143.4198
Pts + SAF 0.3954 131.3251 140.357
Pts + SAC 0.4049 129.3522 138.3842
Pts + PDO 0.2434 194.1864 203.7653

 

Note that, when using points as well as various possession measures, we see something slightly different from above. First, Corsi measures are now better than Fenwick measures, as found in the McCurdy study discussed above. However, close measures are still better than standard measures. Why Corsi is better when controlling for points, and Fenwick is better when not, is definitely worth further investigation. Given that the difference is blocked shots, the answer must have something to do with the predictive value of blocked shots versus the explanatory value (how blocked shots correlate with points after 20 games).

Goal differential actually adds very little information to points, because they are so highly correlated – they contain very similar information. Possession measures, however, don’t tell you very much about how a team has done so far, which is actually a good thing in this context. When looking at two variables, you would like to find the variables that contain different kinds  of information – information that is said to be orthogonal to each other. Variables that are good at explaining who has won in the past, such as goal differential, do not offer additional information over points. Ideally, you want information about things that teams did that didn’t lead to points, but are indicative of getting points in the future. Possession metrics contain such information.

It is worth mentioning that this is exactly why exercises such as the one recently by Tangotiger are getting it exactly wrong. It is the shots that a team took that didn’t go in that are useful to know about when making predictions – we already have information about the ones that did go in by looking at the standings.

1 Comment

  1. Max Dyke's Gravatar Max Dyke
    December 13, 2014    

    Just finished finding your web site and read most of the last few articles published. Not sure if the bunch of statistics is dreamed up or pulled out of the air as the writer goes along. The only thing I'm sure is that somewhere some village is missing an idiot

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>