]]>
In the East, although she's a fan of both the Penguins and Lightning and was reluctant to pick one over the other, when pressed she went with the Penguins  because really, who doesn't like Penguins?
So it looks like once again our contest is going to come down to whether Joe Thornton & Co. are able to finally shrug off the choker label and go where they have never gone before: the Stanley Cup Final.
It should be a good final 4!
]]>Here are her picks for Round 2...
1. Lightning beat Islanders
2. Penguins beat Capitals
3. Stars beat Blues
4. Predators beat Sharks
Since 3 of our 4 picks are identical, it looks like it all comes down to that Predators / Sharks series!
]]>And for the first round her picks  which took about 30 seconds to make and involved no math  are very close to what our model came up with!
So let's see who she likes in the first round...
1. Flyers beat Capitals
2. Penguins beat Rangers
3. Panthers beat Islanders
4. Lightning beat Red Wings
5. Stars beat Wild
6. Blackhawks beat Blues
7. Kings beat Sharks
8. Ducks beat Predators
]]>
To further complicate matters we decided to look for something after the fact – in effect indirect evidence that certain teams are generating higher quality shots – rather than identify the specific action or game situation that makes one team’s shots consistently better than another’s.
There’s a lot to be said on the topic, and we’re not going to pretend to have come anywhere close to addressing it fully. That will require years of effort with insights shared between lots of smart people and of course tracking technology to give us much more reliable data.
However, we do think the shot attempt has become so fetishized that at least some analysts are assuming all hockey dogma is pure nonsense and they wind up confusing skill with luck simply because it doesn’t fit their model.
When we looked at one bit of basic hockey dogma – get your shots on net – we were equally open to the possibility that at best we might find that getting a higher percentage of shots through to goal was an element of unrepeateable “luck” as we were that we would find a repeatable skill that could be optimized.
As luck would have it (pun intended) we seem to have stumbled across a repeatable skill, one that functions at the team level no less.
We started by looking at every full NHL season since 200809 (six seasons excluding the lockout shortened season) and took the percentage of each team’s shot attempts that made it to the net (On Net %) up to January 15 and compared it to that team’s On Net % after January 15.
What we found is that there’s a strong relationship between a team’s On Net % in that first part of the season and the remainder of the regular season.
Specifically, the rsquared we generated was 0.26, which is a strong relationship and difficult to write off as luck! Of course the other 74% we’re not explaining here may be caused by lots of different factors (including injuries, differences in strength of schedule, coaching changes, etc.), but it’s difficult to write off the 26% as simply “luck”.
For those who like charts, the scatter plot is below.
The next thing we looked at was the relationship between On Net % and True Sh%. In order to do that we simply paired each team’s On Net % during the relevant period and its True Sh% during the same period (giving us 360 observations in total). Here the relationship was less strong (rsquared of 0.13), but it was still meaningful.
When we considered True Sh% up to January 15 and after, the relationship was quite poor, giving us an rsquared of only 0.038, only slightly better than team shooting percentage (0.036) and worse than team save percentage (0.052).
What’s clear from all of this is that while getting shot attempts on net is a repeatable skill at the team level and is correlated with scoring more goals, turning shot attempts into goals remains highly variable and subject to the vagaries of “puck luck”. Some aspects of this may be truly random and others skills nobody has managed to identify and quantify thus far. This fact applies whether you look at the NHL’s official shooting percentage or True Sh%, and it’s reason enough to be suspicious of a team that continues to score due to an unusually high team shooting percentage.
As noted in the SI.com piece, none of what we’ve found suggests that a team should start trying to optimize On Net % at the expense of other aspects of shot quality or total number of shot attempts (e.g. the advantage of getting a shot on net with traffic in front of the goalie will often outweigh the increased risk of that traffic preventing the shot from ever getting there).
But it does mean a team that’s consistently good at getting pucks on net (e.g. the Ducks) may be able to give up something in the possession battle and one that’s generating a lot of shot attempts but continuing to lose (e.g. the Canadiens) might want to look at their longterm On Net % rather than just imagine their puck luck will change or blame their goalie.
Introducing shot quality into the analytics conversation makes things a whole lot messier. But doing so makes sense of a complex and dynamic game. Like any such game, hockey demands more than a single strategy.
]]>
They have Carey Price, who, unlike other false prophets with names like Jose Theodore and Cristobal Huet, seems poised to join the ranks of Jacques Plante, Ken Dryden and Patrick Roy in returning this storied franchise to its former glory.
They also made the Conference Finals in 2014 (which perhaps they would have won if Price hadn’t been injured) and were very competitive against a Tampa Bay team that advanced to the Finals last year.
Through this lens, Montreal’s roaring start this season seems like a continuation of a trend that’s been bubbling the past couple of years.
Habs fans could be forgiven for spinning this narrative. It just isn’t true.
You see, for the past two years the Canadiens were a mediocre team that managed to win on the back of a very good goalie and a fair bit of luck.
During 5on5 play last season, Montreal posted a shot attempt percentage (SAT%) of 48.5%, which was 23^{rd} overall. To be fair, the Habs won a lot of games, so you might imagine “score effects” pushed their shot attempt numbers downward. After all, teams that are leading tend to get outshot.
Not in this case.
Despite all those wins, the team’s scoreadjusted shot attempt percentage (SASAT%) was still 48.5% (22^{nd}).
Of course, shot attempts only give us part of the story. After all, a few lethal scoring attempts are surely worth more than lots of 60 footers that go wide.
As it turns out, the Canadiens didn’t just fail to generate shot attempts, they also failed to generate quality shot attempts. They managed only 47.9% of the scoring chances (SCF%) vs. 52.1% for their opponents (24^{th}) – only a shade better than the “McEichel lottery” bridesmaids Arizona Coyotes (who added to their misery by passing on Mitch Marner in favor of Dylan Strome, who we projected to be a major bust – but we digress...)
In any event, it was no accident that Montreal’s pop gun offence yielded a top scorer with only 67 points.
Montreal may have posted an impressive 110 regular season points (2^{nd}), but their goal differential of +32 (6^{th}) tells us that in the absence of some nice bounces that could have gone either way, they would have finished in the middle or perhaps even the bottom half of the playoff pool.
As analysts have noted repeatedly, winning despite poor underlying performance is unsustainable. Eventually the numbers catch up, and your team starts losing (more on that later).
Contrast that performance with this season, and it’s like you’re watching a completely different team.
In the first 7 games the Canadiens have managed a SAT% of 52.8% (6^{th}) and a SASAT% of 54.8% (4^{th}).
When measured by scoring chances and highdanger scoring chances (HDSCF%), the numbers are equally solid at 52.1% (10^{th}) and 51.5% (13^{th}) respectively (all stats are courtesy of waronice.com and puckon.net).
When changes like this happen, there are usually three possible explanations: (i) the lineup; (ii) the coach; and (iii) luck.
The Lineup
Montreal’s lineup has definitely improved, but how much is debatable.
First there was the acquisition of Jeff Petry at the trade deadline. Despite logging more than 17.5 5on5 minutes of ice time on an Oilers squad that was rightly abused for its atrocious defence, Petry somehow managed to establish himself as a darling among the analytics crowd as well as Montreal fans who were impressed with his play down the stretch and in the playoffs.
And yet the Oilers, who continue to struggle to find NHL caliber defencemen, decided to let Petry go rather than negotiate a contract extension with him.
As the table below shows, Petry’s record in Edmonton was mixed: some years the team was better with him on the ice, others they were worse.
Nevertheless, Petry has become a beast this season, elevating the Habs’ impressive shot attempt percentage without him on the ice from 50.2% to 57.9% with him, their scoring chances from 48.9% to 58.3% and their highdanger scoring chances from 50.0% to 54.5%.
The Petry Effect
SAT% With Petry On Ice  SAT% With Petry Off Ice  SCF% With Petry On Ice  SCF% With Petry Off Ice  HDSCF% With Petry On Ice  HDSCF% With Petry Off Ice  
201516 Canadiens  57.9  50.2  58.3  48.9  54.5  50.0 
201415 Oilers / Canadiens  47.4  48.1  45.5  45.9  45.5  44.5 
201314 Oilers  46.8  43.0  46.9  44.6  49.3  44.3 
201213 Oilers  44.4  44.7  41.1  46.4  40.2  45.8 
201112 Oilers  49.2  47.8  48.1  47.8  48.5  47.2 
201011 Oilers  50.2  48.8  50.0  45.3  48.7  45.3 
Up front, the Habs are basically the same team, but again there are some modest improvements.
Alex Galchenyuk, expected by many to have a breakout season, appears to be doing just that. His ice time has been dialed back from 13.4 minutes of 5on5 play per game to 11.6 minutes, but his points per 60 minutes are way up, from 1.8 to 3.0.
Tomas Fleischmann, a formerly decent goal scorer whose low shooting percentage (4.7% in 201314 and 6.9% last season) prevented him from cracking double digits the past two seasons, has been contributing offensively.
Then there’s Alexander Semin, whose signing over the summer was greeted by a chorus of jeers. Despite his remarkable ability to inspire disdain in even (perhaps especially) his teams’ most diehard fans, with the exception of his 201213 campaign (when he still managed a point per game), nearly every team Semin’s ever played for had a better SAT%, SCF% and HDSCF% with him on the ice than without.
So while actually watching Semin’s appalling gaffes and apparent lack of interest might tempt us to pull a Brian Burke and chuck the analytics in favor of the eyeball test, given where he’s being used (bottom 6 ice time with only 21.9% of his faceoffs in the defensive zone), he’s starting to look like a bargain at $1.1 million.
Add it all up – Petry, Fleischmann, Semin, and Galchenyuk taking one more step toward stardom – and you’ve got a better lineup. But 70 with a goal differential of +16 – half of what they generated over 82 games last year?
We’re not buying the roster as the full story here.
The Coach
Coaching the Canadiens is a strange job in that its first prerequisite excludes the overwhelming majority of qualified candidates.
While the other 29 teams try to scoop the best coach, the Habs play a complicated game of politics and media relations that requires the hiring of a French speaker, ideally from Quebec. This does nothing for the team’s top scorers, who speak English, Russian, and Czech, or their worldclass goalie, who’s also an Anglophone.
But hockey is the only big time sport in Quebec, and the Habs want a guy who can parler avec les médias. Which is what they get in Michel Therrien.
Unfortunately all objective data suggest Therrien’s a pretty bad coach – or at least was until about 3 weeks ago.
Price’s heroics allowed many fans to paint Therrien as some kind of “mad scientist” who understood just how far to push his All World Goalie without breaking him. In that view, the results speak for themselves and the rest is just meaningless nerd talk.
As it turns out, Therrien has been perfectly happy to get outshot and outchanced regardless of who’s between the pipes and regardless of how the guy’s doing. With the exception of his 201213 squad during the lockout shortened season, every single team Therrien has coached since 2007 has had a worse than 50% SAT%.
And if you want to blame the players rather than the coach, you need only look at the 200809 Pittsburgh Penguins, which posted a SASAT% of 46.3% (26^{th}) and HDSCF% of 48.2% (21^{st}) in the first 57 games with Therrien as coach and then magically improved to 52.5% (7^{th}) and 53.8% (3^{rd}) respectively with Dan Bylsma at the helm for the remaining 25 before winning the Stanley Cup.
The table below gives the full picture of how Therrien’s teams and his goalies have fared.
The Therrien Effect
SAT%  SASAT%  SCF%  HDSCF%  Team Sv%  Team Sv% Rank  
201516 Canadiens  52.8  54.8  52.1  51.5  .974  1 
201415 Canadiens  48.5  48.5  47.9  48.6  .936  1 
201314 Canadiens  46.7  47.2  46.7  49.4  .930  6 
201213 Canadiens  52.9  53.5  53.7  57.2  .921  16 
200809 Penguins (Therrien)  46.2  46.3  48.8  48.2  .92  15 
200809 Penguins (Bylsma)  52.1  52.5  53.0  53.8  .927  9 
200708 Penguins  45.5  45.9  48.4  51.9  .933  3 
The fact that Therrien’s playbook with Carey Price in his prime is essentially the same one he employed with MarcAndre Fleury between the pipes tells you pretty much everything you need to know about the Habs’ coach.
Now it’s possible Therrien learned how to coach hockey in the offseason. It’s also possible he’s ceded control of the team to more capable hands and is content to occupy more of a ceremonial role. Kind of like Art Howe to Marc Bergevin’s Billy Beane.
But since it’s not easy to teach old dogs new tricks and coaches tend to have egos, neither explanation seems entirely satisfactory.
Luck
Which bring us to the last possibility, namely that the Habs have been lucky.
Luck is tossed around pretty liberally in analytics circles without actually being defined. Often it’s used as a proxy for “unsustainable”.
So, for example, when the Leafs went into a tailspin at the end of last season, this came as no surprise to the analytics community, which repeatedly warned that a team couldn’t put up such horrendous possession numbers and expect to keep winning.
The same was said about the 201314 Colorado Avalanche, which finished 3^{rd} overall – again with horrid puck possession and goaltending heroics from Semyon Varlamov, who had previously given no indication that he was that good. As we (and others) predicted, that story ended badly last season.
It’s still early days, but we also predicted a day of reckoning for the Calgary Flames, and despite all the “this team is different and can get outshot because…” screed we received from southern Alberta hockey fans, they’ve started the season with a 15 record.
As noted above, Montreal is a different story. While last year’s performance portended failure, they’re currently winning in a way that does look sustainable.
The question is has there been some underlying change in the Canadiens’ play or is this simply a team that’s having a good run and will revert to last year’s performance as the season wears on?
On balance, we think there’s a little of both at work here, meaning the Habs are a better team and they will fall to earth.
All signs suggest Tampa Bay will remain a tough divisional opponent and likely the gold standard in the East, while Atlantic division rivals like Toronto and Buffalo shouldn’t be complete punching bags as the season wears on.
Meanwhile, in the Metropolitan, opponents like Pittsburgh, Washington, the Islanders and Rangers should all be dominant and Philadelphia will push for a Wild Card berth.
If Therrien reverts to form and Price stumbles or suffers an injury, fans can look forward to a yearend press conference in which the coach will be able to explain what went wrong in impeccable French while his top players nod in complete bewilderment.
More likely, Montreal finishes second in the Atlantic behind Tampa Bay and enters the playoffs with some hope but also some question marks.
]]>
In the end, she picked Tampa to win it all as well. No doubt she'll be gunning for us next season!
]]>Now if the Caps had managed to knock off the Rangers, she would have gone Lightning over Caps, but as it stands, she's firmly committed to the Rangers winning that series.
Meanwhile, although she initially said both the Ducks and Blackhawks would win, when I explained that she had to pick one, she went with the Ducks, who we continue to think will fall to earth. We'd be a whole lot happier if Chicago, who were already basically rolling 5 D, didn't lose Michal Rozsival in the final game of their sweep over the Wild. But our model's giving the Hawks ridiculously good odds, so hopefully that doesn't knock them down too much.
Time will tell!
]]>
We're hoping to do better this go round, as is she. Without further ado, here are her picks for Round 2...
1. Lightning beat Canadiens
2. Rangers beat Capitals
3. Blackhawks beat Wild
4. Ducks beat Flames
Since 3 of our 4 picks are identical, it looks like it's all going to come down to whether the Ducks can continue their lucky ways or the party finally comes to an end. We'll be watching that series keenly!
]]>
The first issue to sort out is what the outcome is that you’re trying to explain. At some level, it’s very simple: who wins. Winning is a binary variable – either you win or you don’t. When trying to explain or predict a binary variable, probit regression analysis is incredibly useful. We’ve used it before, when looking at some interesting issues concerning playoff success last year. Some examples can be found here, here, here, and here. In these pieces, we used a probit regression. In a nutshell, a probit regression uses the data to assign a “score” to a matchup. This score is used in conjunction with the normal distribution to essentially determine how much “luck” (the error term, which includes the effects of missing variables as well as just random chance, and which is assumed to be drawn from the normal distribution) is required for a given team to win that series.
While we have found this to be a useful approach for some questions in the past, it’s actually not making as efficient use of the data as possible. Some series are very closely contested, and “puck luck” plays a large role, while others are quite lopsided, and the amount of “puck luck” would have to be extreme to change the outcome. The problem with a probit regression is that it lumps all series wins into the same category. We actually have some information on how lopsided a series is by looking at the number of games it went. It’s not perfect by any means, but it does contain information. As such, one can think of a team’s results from a series has having 8 possible outcomes: get swept, lose in 5, lose in 6, lose in 7, win in 7, win in 6, win in 5, and sweep. Note that the way these outcomes have been written represent a sort of worsttobest. In other words, we can order the outcomes. When we can do this, one possible method of analysis is an ordered probit regression. It is similar to a probit, but allows us to exploit additional information about the closeness of a series (and the role of luck) in order to get better predictions.
An ordered probit still constructs a “score” for a series, as the probit regression does, but then uses that score to establish 8 regions of “luck” that would correspond to the 8 possible outcomes. From this, probabilities can be constructed for each of the 8 possible outcomes. The probability that a team wins, then, is simply the sum of the probabilities associated with the four winning outcomes (sweep, win in 5, win in 6, and win in 7).
Once the strategy for determining the probabilities that a given team wins a playoff series against a specific opponent has been established, the next order of business is to figure out what variables should be used to create this “score” used in the ordered probit regression. When doing predictive analysis, you ideally want everything that contains information about how a team will perform in the playoffs. Clearly, how they did in the regular season (i.e. their points) has some value, but what else is there? We went and gathered as much information as we could on a whole mess of variables, with the idea that our estimation strategy would help us figure out what was important. We collected data on regular season points, points in the last half of the season, points in the last 10 games, Corsi, Score Adjusted Corsi, penalty kill, power play, save percentage, shooting percentage, and more. There was one interesting variable that was created by Ian. It has been wellrecognized in the analytics community that “puck luck” is a real thing, and that it can make the standings a poor representation of a team’s ability. One way that puck luck manifests itself is in the outcome of onegoal games – particularly overtime games and shootouts. These games are often determined by odd events, and occasionally a team gets the puck to bounce their way a disproportionate number of times. So, what Ian did was construct a variable that compared a team’s winning percentage in onegoal games to their winning percentage in other games. If they were doing much better in onegoal games, then it is possible that their regular season record is predicated not on ability but on puck luck. More on this variable later.
So, having collected all these historical data where we know the actual outcome of each series, now it’s time to plug them into our ordered probit regression and see what predictors are good at predicting the outcomes that actually happened, right? Unfortunately, it’s not quite that simple. Given that some of these variables were only available going back to the 2008 playoffs (in particular, we pulled the Score Adjusted Corsi from puckon.net, which only has that going back to the 200708 season), we were left with 105 observations on playoff series. With so many variables, and the fact that these variables are actually quite correlated with each other, using everything doesn’t actually yield anything with any statistical power.
Things are further complicated by the fact that, since 2008, playoff teams don’t really look all that different from each other in terms of these variables. We’ve entered an age of parity, and this is not good for statistical analysis. Regression analysis is based on seeing how differences in certain variables (team characteristics) lead to differences in outcomes (winning versus losing a series). If there aren’t many differences in team characteristics, then it gets hard to explain or predict the difference between who wins and who loses a series.
One solution to this problem is factor analysis. Factor analysis takes the variables that you have, and combines them into a single number, called a factor, in a way designed to make teams look as different as possible according to that factor. You would then use that factor in your regression analysis. You can run regressions using a single factor, or you can create multiple factors. The key is that the number of factors you create and use in the regression is less than the number of variables you began with.
So, our choices were to use factor analysis, with the number of factors to be determined, or to use a smaller set of variables in our regressions, that smaller set also to be determined. We wanted to use the best model possible, but which one would that be? What should the criterion be to discriminate between what is a good model and what is a bad model?
In our case, the “best” model is the one that has the most predictive power. This is not (necessarily) the same as the model that fits the data the best. When you run probit regressions, you can see how many of the series you would have got right if you had used that model to make your picks. Unfortunately, this is rather backwards looking, as the model is created using the data on who won. In other words, the model that fits the data the best is the one that has the most explanatory power, which is quite different from predictive power.
In order to establish predictive power, you need to see how well the model does in predicting the outcomes of series that weren’t used in the generation of the model. The way to do this is to run the regression using all the data you have except for one year. Then, use the resulting model to predict that year that wasn’t used, and compare your predictions to the actual results. This is known as “leaveoneout crossvalidation.” So, we tried this with the factor analysis, using several different numbers of factors, as well as several different sets of variables. What we found was that the factor analysis with 5 factors had the most predictive power, predicting on average 10.7 correct series per year (so, out of 15). This was a fair bit better than looking at any single variable by itself, although the one that came closest was Ian’s luck variable. As it turns out, the luck variable was heavily weighted in the construction of the factors, so it turned out to be an important innovation! At some point, we’ll have to look into this more closely.
Now that the model had been established, it was time to generate some results. First off, we used the model to generate probabilities of each of the 8 outcomes for each of the first round series. As mentioned before, the probability that a team wins is the sum of the probabilities associated with that team winning. The first round predicted outcomes are as follows:

Note that this model predicts that the Senators will beat the Canadiens, but if you look at the single most likely outcome, it’s that the Habs win in 7. This is also true for the Flames/Canucks series: the Canucks are predicted to win, but the most likely outcome (of the 8) is that the Flames win in 7. At some level, this is telling us how close these series will be.
From here, we then generated probabilities for all the second round matchups, and generated probabilities for who would win all of those. We then went on to generate probabilities for all the possible third round matchups, and used the model to generate probabilities for the outcomes of those series, and so on right to the Finals. Using this method, we generated probabilities for each team to win the Stanley Cup. Again, it is worth noting, that the most likely Stanley Cup Final (as predicted by this model) is the Blackhawks versus the Lightning. If you were to fill out a bracket by taking the team most likely to win a series into the next round, this is what you would get as well.
Finally, looking at the probabilities generated for the first round, we can see that there are going to be some tightly contested series. The model predicts that each series will go at least 6 games, and the predicted winners generally have less than a 60% chance of winning. We’ll check back after the first round to see how things are going.
]]>