– Shot Attempts Are Valuable Information

This week’s article in The Star looked at the predictive value of information embodied in a team’s points and in their possession metrics. In particular it showed that these two stats contained different kinds of information, so that when used together, predictive power is increased.

I should begin by mentioning that the first draft of the article was written with data at the 20 game mark. It ended up getting pushed back, and so it was rewritten to reflect data from the 25 game mark. There were some interesting observations from the original data, however, that I didn’t get a chance to update, so in this piece I will be referring mostly to results that are based on 20 game data.

The data on various measures of team performance were gathered from the last 5 non-lockout seasons (so 2008-09 to 2011-12 and 2013-14) and considered in isolation and in various combinations to see how well they explained which teams made the playoffs. As mentioned in The Star, www.puckon.net was a valuable resource, but other data used here but not in The Star piece were taken from war-on-ice.com.

Looking at the effect that the number of points after 20 (or 25) games has on making the playoffs is fairly straightforward. However, when looking at possession, there are various forms to consider. There’s Corsi (all shot attempts) or Fenwick (only unblocked shot attempts), both of which examine shot attempts in 5-on-5 situations. They can be modified to consider shot attempts only in “close” situations – when the game is either tied, or there is no more than a one goal lead in the first or second period – or adjusted to reflect score effects (as mentioned in The Star article). These variables all differ in their predictive value.

Note that, since we’re looking at whether teams made the playoffs (a binary variable), probit regressions were used. With these regressions, there is nothing like an R-squared measure to tell how much of the variation is being explained. It does generate, however, a pseudo R-squared, which can be useful for comparing models, although it does not have the interpretation of an R-squared. Other useful measures for comparing models are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For both these measures, smaller scores are better, while bigger is better for the pseudo R-squared. The following table shows these scores for a variety of models run with just a single explanatory variable.

Variable	Pseudo R-Squared	AIC	BIC
Points after 20 games	0.2367	193.8639	200.2499
Points after 25 games	0.3191	145.1337	151.155
Goal Differential (20 gms)	0.2532	189.7471	196.1331
Fenwick (20 gms)	0.1342	219.3607	225.7467
Corsi (20 gms)	0.1254	221.5341	227.9201
Fenwick Close (20 gms)	0.1997	169.874	175.8953
Corsi Close (20 gms)	0.1904	171.8077	177.829
Score Adj. Fenwick (20 gms)	0.2437	160.7652	166.7865
Score Adj. Corsi (20 gms)	0.2428	160.9597	166.9809
PDO (20 gms)	0.0770	233.5908	239.9768

In terms of predictive value after 20 games, score adjusted measures contain more information than the other measures. Interestingly, however, the results are somewhat in contrast to a recent study done by Micah McCurdy over at Hockey-Graphs.com. He also found that score adjusted measures were best, but that close measures were worse than standard possession measures. Here, we are seeing that close measure are in fact better than standard measures. McCurdy also found that Corsi measures are better than Fenwick, in all cases. Here, we are again seeing the opposite (although the difference is negligible when looking at score adjusted measures).

It should be noted that McCurdy was looking at the correlation of these measures to different outcomes – goal percentage and winning percentage, specifically. Perhaps there is something about the coarseness of the measure here – making the playoffs or not, that is causing the differences. At any rate, it is worth further investigation.

Interestingly, when looking at just a single statistic after 20 games, goal differential seems to be the best. According to Elliotte Friedman, this is Mike Babcock’s preferred statistic, so chalk up another point for the man who might be the most sought-after free agent this offseason.

In combination, however, it appears that the best combination is points and Score Adjusted Corsi. The following table gives the above measures of goodness of fit for the various models, all using data from the 20 game mark.

Variables	Pseudo R-Squared	AIC	BIC
Pts + Goal Diff	0.2645	188.934	198.5128
Pts + Fenwick	0.3120	177.1281	186.707
Pts + Corsi	0.3243	174.0581	183.637
Pts + Fenwick Close	0.3716	136.2432	145.2751
Pts + Corsi Close	0.3806	134.3879	143.4198
Pts + SAF	0.3954	131.3251	140.357
Pts + SAC	0.4049	129.3522	138.3842
Pts + PDO	0.2434	194.1864	203.7653

Note that, when using points as well as various possession measures, we see something slightly different from above. First, Corsi measures are now better than Fenwick measures, as found in the McCurdy study discussed above. However, close measures are still better than standard measures. Why Corsi is better when controlling for points, and Fenwick is better when not, is definitely worth further investigation. Given that the difference is blocked shots, the answer must have something to do with the predictive value of blocked shots versus the explanatory value (how blocked shots correlate with points after 20 games).

Goal differential actually adds very little information to points, because they are so highly correlated – they contain very similar information. Possession measures, however, don’t tell you very much about how a team has done so far, which is actually a good thing in this context. When looking at two variables, you would like to find the variables that contain different kinds of information – information that is said to be orthogonal to each other. Variables that are good at explaining who has won in the past, such as goal differential, do not offer additional information over points. Ideally, you want information about things that teams did that didn’t lead to points, but are indicative of getting points in the future. Possession metrics contain such information.

It is worth mentioning that this is exactly why exercises such as the one recently by Tangotiger are getting it exactly wrong. It is the shots that a team took that didn’t go in that are useful to know about when making predictions – we already have information about the ones that did go in by looking at the standings.

1 Comment

Max Dyke

December 13, 2014

Just finished finding your web site and read most of the last few articles published. Not sure if the bunch of statistics is dreamed up or pulled out of the air as the writer goes along. The only thing I'm sure is that somewhere some village is missing an idiot

Shot Attempts Are Valuable Information

1 Comment

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives