Forecasting the Second Half

This week’s article in The Star looked at what information was most useful for predicting how teams will perform in the second half of the season. In a previous article, I considered how a team’s performance in several variables after 25 games was predictive of making the playoffs. Here, the prediction was not a simple yes or no about whether a team would make the playoffs, but rather how many points they would have at season’s end – so it addressed seeding as well.

To begin with, data was collected from puckon.net and waronice.com on various measures of team performance at the midseason point (i.e. after 41 games). Specifically, I looked at Points, Score Adjusted Corsi, Corsi Close, Score Adjusted Fenwick, Fenwick Close, Goal Differential, and PDO.

First, I considered how each one of those variables correlated with the number of points a team amasses in the second half of the season (the last 41 games). 41 games is not a particularly small number, so it certainly seems reasonable to think that how a team does in the first half should be a good predictor of how they do in the second. As it turns out, it isn’t that good. In fact of the variables listed above, only PDO was worse at predicting second half points, and PDO was close to useless. The following table summarizes the R-squared (a measure of how well one variable correlates with another) between each of the variables and second half points:

Variable R-squared
PDO 0.0135
Points after 41 games 0.0876
Goal Differential 0.1037
Fenwick Close 0.1584
Corsi Close 0.1783
Score Adjusted Fenwick 0.1793
Score Adjusted Corsi 0.2031

 

As is being commonly noted around the analytics community, Score Adjusted measures of possession are superior to other measures in predictive value, and Corsi seems to be better than Fenwick. I should also note that there are different forms of these score adjusted measures, and I used the data from puckon.net, which is different from the adjusted measures at waronice, which is different from the adjusted measures proposed by Micah McCurdy over at HockeyGraphs.com. Which of those measures is the best is not the point of this exercise, although would be worth looking at.

The main point here was to examine how predictions could be improved by looking at more than one variable. In that regard, this is similar to the earlier article looking at how best to predict which teams would make the playoffs. There, I found that Points and Score Adjusted Corsi were the best predictors of who would make the playoffs, and the improvement in predictive power was substantial. In light of those results, the findings here were somewhat surprising.

First off, looking at two variables offers very little improvement in predictive value over looking at just one. Perhaps more surprisingly, the best variables to pair together are Score Adjusted Corsi (no surprise there) and PDO. That’s right – PDO, which offers very little information on its own, offers the most incremental information over Score Adjusted Corsi. (To see the results of all the regressions run, go here.)

From an information standpoint, this means that the information contained in PDO has the least overlap with the information contained in Score Adjusted Corsi. Indeed, it offers almost entirely new information, even if it is very little. The fact that the other variables offer even less of an improvement over Score Adjusted Corsi tells us that the information they contain is almost entirely subsumed by the information SAC contains.

On the one hand, there is something intuitive about this, in that games are won by scoring more goals than your opponent. Goals are created by generating shots and converting them. Score Adjusted Corsi is the best measure that we have for a team’s ability to generate more shots than its opponents, and PDO is the measure for a team’s ability to convert those shots at a higher rate than its opponents. As such, it makes some sense that these variables would work well together to predict second half performance. In particular, there doesn’t appear to be much correlation between a team’s ability to generate shots and their ability to finish, so the information contained in those two variables should be somewhat separate from each other.

What is surprising, however, is that these variables hold more predictive value than looking at a team’s goal differential from the first half. They are also more predictive than looking at a team’s first half points – either in isolation or in conjunction with other variables.

There is no question, however, that PDO is a very noisy measure of a team’s ability to convert (or stop) shots. An implication of these findings would be to develop better measures of a team’s true ability in this regard.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>