Vic Ferrari put up an interesting post challenging the notion that Minnesota makes their goalies look good. Some of the evidence from the scoring chance charting done by an Edmonton Oilers blogger does seem to suggest that there is little team-to-team variance in shot quality against at even strength. Can that be correct?
I have even-strength save percentages for every goalie in the league since 1998-99. If teams didn't matter at all, then we should see no difference between the year-to-year variance of goalies who played on the same team and goalies who changed teams. I don't think that expectation is realistic even if there really is no team effect, as we would expect a variance in save percentage to be one of the reasons that could cause goalies to change teams. Keeping that in mind, here are the results for a few minimum games played cutoffs (e.g. 20 GP means that a goalie must have played at least 20 games in both seasons being compared):
0 GP: .021 same team, .026 different team (N1=443, N2=204)
5 GP: .013 same team, .017 different team (N1=374, N2=159)
10 GP: .012 same team, .016 different team (N1=330, N2=157)
20 GP: .012 same team, .016 different team (N1=267, N2=95)
30 GP: .010 same team, .012 different team (N1=208, N2=54)
40 GP: .009 same team, .012 different team (N1=158, N2=35)
50 GP: .008 same team, .011 different team (N1=110, N2=15)
Not a huge difference, but I wasn't expecting one as most teams in the league face similar shot quality against. There is a persistently higher variance for goalies who move to different teams than those who play on the same team, suggesting that team situations are not equal. On the other hand, though, it does suggest that most teams are pretty close in terms of difficulty of shots allowed. I don't think this rules out the possibility of a couple of teams being large outliers (a team like Minnesota, for example), but it does look fair to say that most teams face a similar range of shot quality at even-strength.
Another thing that we would expect, if shot quality is significant, is for shot quality measures to correlate well with actual save percentages. Behind the Net has recently posted two years' worth of shot quality data at 5 on 5. I decided to look for goalies who had significant playing time in both of the last two seasons, as well as a prior sample from 2005-06 and 2006-07 that we could use to compare. I ended up setting the cutoffs at a minimum of 30 games played in each of the last 2 seasons, and at least 700 shots against, which gives us 26 goalies.
Correlation between expected and actual save %: 0.34
Correlation between actual save % and career save %: 0.23
Correlation between actual save % and post-lockout save %: 0.38
These numbers bounce around a bit depending on the cutoffs. When I raised the cutoffs for the career numbers, the relationship between career save % and actual save % got a little stronger, which we would expect, and the relationship between expected and actual in 2008-09 got a little weaker. I think that prior save percentage results are likely better correlated with actuals than shot quality predictions are at evens, and that our current shot quality models are still impacted by things like biases and score effects and likely could be further refined, but that there does appear to be some significance to shot quality for 5 on 5 play. That's just a quick check, though, I'll leave it to somebody who has all the shot quality data and can look at a bigger sample size to better test the relationship.
I really like this blog, and I don't mean to be awkward. But could you explain what those numbers in your table are, in a different way? I'm lost.
ReplyDeleteI'm talking about the first table, btw.
ReplyDeleteSorry, I wrote this post pretty quickly and that really wasn't very well explained at all.
ReplyDeleteWhat I did was go through every goalie season since 1999-00, and compared the goalie's even-strength save percentage that season with their number from the previous season. I then sorted them by whether they were playing on the same team, or on a different team. This gives us a bunch of save percentage variances, which I averaged to get the typical year-to-year difference between a goalie playing on the same team as the year before, and a goalie playing on a different team the year before.
I also included the numbers of each just so we know what kind of sample was involved (N1 = same team, N2 = different team).
The first one included every goalie season, even the ones where a guy faced like 2 shots in 5 minutes of play, so I re-ran it with a few minimum games played cutoffs as well.
The table shows that even strength save percentages vary more when goalies change teams than when they stay on the same team, which is evidence for at least some kind of shot quality effect. Not a large one, granted, at least in the majority of cases, but it looks like there is something there.
Okay, still not dead sure that I'm clear, but this is terrific stuff.
ReplyDeleteWouldn't you need to sum up the saves and shots for each goalie?
N2 is always going to be smaller than N1 of course, so the group coming from the smaller sample will be expected to show more variance, no? If all goalies faced the same number of shots (clearly not the case), then n2/n1 squared would be the correction, the equalizer. But we have more math to do to level the playing field here I think.
Does that make sense?
squared should read 'square root'
ReplyDeleteI'm sure that you're right about having to level the playing field. I just subtracted the rates basically with a couple of Excel formulas, kind of as an acid test more than anything, but it does make sense that the smaller sample would have more variance.
ReplyDeleteWhat do you suggest to correct for the sample size issue?
What I did was go through every goalie season since 1999-00, and compared the goalie's even-strength save percentage that season with their number from the previous season. I then sorted them by whether they were playing on the same team, or on a different team. This gives us a bunch of save percentage variances, which I averaged to get the typical year-to-year difference between a goalie playing on the same team as the year before, and a goalie playing on a different team the year before.By "save percentage variances" do you mean absolute value of the difference in save percentage between year 1 and year 2? That's how I read it, and if that's the case I don't believe you need to correct for sample size. It's just a simple average.
ReplyDeleteIf you mean something different (which I think Vic is assuming) then I'm not sure what you did.
Overpass: That's right, the absolute value of the difference in save percentages between year 1 and year 2. Nothing particularly fancy.
ReplyDeleteTake David Aebischer, for example, because he's the first guy alphabetically.
2002: COL, .928
2003: COL, .928, variance of .000, same team
2004: COL, .937, variance of .009, same team
2006: MTL, .905, variance of .032, different team
2007: MTL, .914, variance of .009, same team
Sounds good. My guess is that there was a confusion of terms here, and Vic suggested the sample size correction because he thought you were working with statistical variance, where it's true that a smaller sample size will show more variance. However, if you've just taken a simple average of the y-t-y differences, I don't think that sample size will affect the results.
ReplyDeleteIs that right, Vic? Or am I missing something?
Confusion because of my careless use of terms, probably. I should have written "difference", not "variance". Sorry about that.
ReplyDeleteAlthough again if you have any more advanced calculations you would like to test out, I have the data, so let me know.
ReplyDeleteAh, thanks for the example, CG.
ReplyDeleteDid the goalies who changed teams seem be more likely to be coming off of subpar seasons, CG?
And do backups move teams more than starters? It seems that way to me, though I may well be wrong. Now seems like the time to ask though, as you have the data to hand.
overpass or others will correct me if I'm wrong. But I think you still have to correct for sample size (which is the number of shots each goalie faced, here).
So If Aebischer faced 500 shots in a season and saw a swing in EVsave% of -.030, that's a swing of 15 extra goals.
That swing of -.030 is, for certain, much more dramatic if it happened in a year that he faced 1500 shots. That would be an extra 45 goals!
if you level each season to 1000 games, so if he faced 500 shots, then 15 extra goals * sqrt(1000/500) = 21.2 extra goals, corrected for 100 shots.
Put another way, if you take your spreadsheet and sort it by number of games played, then you'll see that the guys who faced more shots see substantially smaller swings in save%.
That's not because they're more consistent, it's sample size effect. Use the method I've shown above and the swings in expected goals will show no material difference between guys who played a bunch and guys who played much less. Because we've levelled the field for sample size (in this case sample size would be shots against per goalie).
Does that make sense. You could post the raw data in chart format here, or email me the spreadsheet if you like, my explanation may still not be clear (and may be flawed, I'm wide open on alternate
suggestions to solve the sample size problem, we still haven't allowed for the fact that the previous season may have had the small sample).
still, I suspect that the change from year to year by 'change in goals allowed based on correction to 1000 shots' should remove much of the 'change in team effect'.
Really, what we are looking for is a model.
So, alternatively, if you weighted a coin to flip a head 92.1% of the time (expected EVsave% of the goalie from three prior years, knowing that NHL.com calc that wrong in their script some years). And flipped it 300 times and counted the tails, then flipped it 500 times again and couted the tails. How much did the 'flipping percentage' change from one sample to the other? Run a thousand trials and what was the average change?
This gives us a reasonable expectation, because the guy who flips more coins in a season (say 1300 in one trial and 1100 the next) will see a smaller shift in flipping% change, sample to sample.
Does that make sense?
OK, I see what you are getting at Vic. That makes sense to me.
ReplyDeleteI'm not certain it will make a difference, as CG's cutoffs should eliminate most of the sample size issues, but it would be a more correct way of looking at it.
Another thing I thought of that might be influencing the data is the league context. League averages are fairly steady in this period at even strength, but they do move a few thousandths here and there and since that is likely to be the kind of difference we are talking about it I should be adjusting for that to remove that variable.
ReplyDeleteI'm going to add that correction in and then address some of your follow-up questions, Vic.
isnt this also assuming that each team's level of shot quality is fairly even year to year?
ReplyDeleteChristopher: Actually, it is assuming that there are no shot quality effects at all at 5 on 5. That is the point of this exercise, to try to provide evidence for the existence of shot quality effects at even-strength.
ReplyDeleteVic and others have suggested that shot quality has a very minimal effect at even-strength. If that is the case, we would expect that there should be little difference between goalies playing on the same team as the prior season compared to goalies who switch teams, since they all would be effectively in the same boat. On the other hand, if the goalies changing teams see much greater save percentage swings, then we know that something must be impacting the results, most likely shot quality effects.
OK, I normalized to league average, and used Vic's formula to express everything in terms of goals relative to average over 1,000 shots.
ReplyDeleteLet's use Aebischer as an example again to demonstrate and to make sure I'm doing it right.
Step 1: Convert yearly difference to goals
2003: (.926-.929) * 447 = -1.2
2004: (.932-.926) * 1319 = +7.5
2006: (.907-.932) * 1061 = -26.6
2007: (.914-.907) * 700 = +5.0
Step 2: Adjust for sample size
2003: -1.2 * sqrt(1000/447) = -1.7
2004: +7.5 * sqrt(1000/1319) = +6.5
2006: -26.6 * sqrt(1000/1061) = -25.8
2007: +5.0 * sqrt(1000/700) = +5.9
Step 3: Take absolute values of differences
Step 4: Compare averages for goalies who played on the same team and goalies who played on different teams.
I used the same minimum game cutoffs again, just to see what the results would say. I think results are too variable to be worth considering when goalies are playing 1 or 2 games, but with the sample size adjustment even results from 5 games seem to be on a similar scale to full season results.
0 GP: 12.0 same, 12.6 different
5 GP: 10.4 same, 11.5 different
10 GP: 10.3 same, 12.1 different
20 GP: 10.0 same, 11.9 different
30 GP: 10.0 same, 10.9 different
40 GP: 9.8 same, 11.6 different
Looks like the adjustment works. The goalies playing the same team as the year before average a difference of 10 goals per 1,000 shots, and the guys who switched teams average a difference of 11-12. This suggests that team differences may account for only 1-2 goals per 1,000 shots at even strength, or about .001-.002 in save percentage.
One thing that might still need to be accounted for is if we have reason to expect more regression to the mean for either sample. I looked at averages of the actual goal differentials instead of taking the absolute value, to see if the net effect was positive or negative.
For goalies on the same team, the average was about -2, or 2 goals worse than the year before, and that was pretty consistent when I tried different minimums.
For goalies who switched teams, the averages fell as the cutoffs rose.
5 GP: -0.5
10 GP: -0.8
20 GP: -1.6
30 GP: -2.8
40 GP: -3.6
What I guess that means is that they were slightly more likely to be coming off of subpar seasons if they hadn't played many games, i.e. their team sold them low, but were also slightly more likely to be coming off good seasons if they had played lots of games, i.e. someone bought them high. Or could there be another reason, backups playing weaker competition or something like that? I'm not really sure, does anyone have any thoughts?
Assuming these results are valid and there isn't anything we're missing, I have to admit it doesn't look there is a whole lot there in terms of EV team shot quality differential. Certainly less than I was expecting. I don't think it rules out a few outlier teams, but it suggests that if there are some then probably not very many.
ReplyDeleteIt still seems odd that average shot distance against, or blocked shot frequency, or defensive zone coverage system might not have much of an effect. What exactly are shot quality metrics measuring then?
I also have run across results that suggests certain coaches have higher save percentages when they are in charge (Pat Burns, for example). Is this entirely a special teams effect? Also, in my work on playing to the score effects I looked at third period results in the playoffs, and it seemed like shooting percentages dropped slightly for the trailing team and rose slightly for the leading team. I highly doubt that had anything to do with special teams, most of that sample was from the Dead Puck Era where the refs swallowed their whistles and you could probably get away with killing a guy late in the third of a tie game.
If EV SV% is the key stat in today's NHL, how far back can we make that same conclusion? These results make it look like today's NHL has a high degree of parity, but I find it hard to believe that the 1977 Montreal Canadiens had the same shot quality against as the 1977 Washington Capitals. We know from international results, junior results, playing against the top team in the local beer league, etc. that at some point a major skill gap does result in a real difference in shot quality.
It does appear that the differences between even-strength save percentage and shot-quality metrics mostly even out over a season. I don't have any insight into the reason or the implications.
ReplyDeleteDespite this result, I don't think there's any reason to throw the shot quality metrics out. It seems to me they probably still add value when looking at individual games or short stretches of play like a playoff series, where we can't necessarily expect things to even out as they do over a full season.
Umm... wasn't Aebischer-Theodore a deadline deal in 2006? If so, how should that be reflected in the same team / different team distinction?
ReplyDeleteI mean, if my memory is correct, he played 2/3ths of the season in Colorado so he's bound to have racked up some of that statistical drop-off from there as well. Although you could make the case that Colorado before and after the lockout were de facto different teams to begin with.
ILR: You're right about that, Aebischer played most of the season in Colorado that year. The NHL data that I am using doesn't separate by team for goalies who were traded mid-season. I would expect that could have contributed to slightly reducing the observed spread.
ReplyDeleteIt might be worth taking out any who switched teams midway through the year and then re-running the numbers to check if that was a factor.
Outstanding stuff, CG.
ReplyDeleteAs you seem to have it in spreadsheet form and to hand, what are the standard deviations of the two samples?
I suspect that they are even more similar than the year to year changes. Implying that the difference in mean from the two samples explains much of the .001 to .002 difference now remaining. This being indicative of an imperfect assessment of the goalie market by GMs.
overpass: Ability persists, randomness doesn't. If the EV shot quality metrics relate well over short stretches, but "even out over a season" as you say, then that is highly suggestive that they are measuring luck.
To add, the opposite holds true for measurements of ability (such as trivia craps, where the trivia ability of the contestent isn't particularly strong over short stretches, but grows with more games played, is the only predictor of future results (though future dice randomness still has the biggest say in future results), and repeats well through any two sets of random samples of trivia crps games.
ReplyDeleteMakes sense, no?
So the relationship of scoring chances to a player's results over 10 games has a correlation of only r=0.4 for the Oilers last season. If we pretended that there were no luck in the game, and that the dsitribution of talent was normal, they we would deduce that r^2 = 16% of a player's outscoring results stem from his outchancing numbers.
And that his corsi numbers were even less important, in fact trivial.
IMO the correct even strength team model should reflect corsi as ability (until we have a better measure for territorial advantage)
to the power of 1.2 to correct for the 'playing to the score' effect. And scoring chances are begat from corsi in random fashion (surprising, but true for the Oilers last season at least), resulting in a 65 scoring chances per 100 shots at net, as averaged in the population.
And scoring chances turn into goals in random fashion at a 22 in 100 clip. Corrected up or down a bit depending on the historical ability of the team's players to finish and the team's goalie to stop pucks at even strength.
I'm sure that model will end up with a lot of randomness in results. But so does hockey at evens, even over 82 games. And they will be eerily similar.
The rela
"overpass: Ability persists, randomness doesn't. If the EV shot quality metrics relate well over short stretches, but "even out over a season" as you say, then that is highly suggestive that they are measuring luck."
ReplyDeleteThat may be true, but whether or not you are measuring luck the goalie still has to deal with the resulting scoring chance.
If Carey Price faced more difficult shots in round one of the playoffs this year than Tim Thomas, for example, which my tracking suggests that he did, then he had the tougher job. That's true whether that was because of Boston's skill, Montreal's defensive style, or pure dumb luck. It would only be fair to take that into account when evaluating his performance.
I therefore see shot quality as useful in evaluation, even if it has little predictive ability.
overpass:
ReplyDeleteI agree on all counts.
I mean CG, I agree on all counts. :D
ReplyDelete