Wednesday, February 29, 2012

Why the Counting Trophies Method of Goalie Evaluation Is Flawed

One of the biggest problems with evaluating goalies by career value is that there aren't any good commonly-used counting statistics. A goalie's career line merely shows games played, wins, and shutouts, plus the rate stats of GAA and save percentage. Wins and shutouts are very heavily influenced by the rest of the team, with shutouts also varying widely depending on the level of league scoring. Games played is important in determining overall value, but does not take into account level of performance at all other than reflecting how long the netminder was able to convince an NHL team to keep giving them starts.

There has been a shift towards a greater focus on save percentage, but as long as save percentage remains a rate stat it is difficult to understand intuitively whether a goalie with a higher save percentage but a lower workload is contributing more value than one of his peers with the opposite. There are a number of good ways to turn save percentage from a rate stat into a cumulative stat (typically by comparing to league average or to replacement level and then multiplying by the shots against). Hockey Prospectus' Goals Versus Threshold, which is slightly more complex but based on a similar foundation, is a number that is recognized at least within the online hockey stats community, but there is not a widely accepted standard.

As a shortcut, therefore, many people look at a goalie's trophy case, and use that to determine which netminder had the better career. Intuitively that makes sense, as when we evaluate athletes we want to know things like how many times they were considered the best in the league. However, I do not believe this is the best approach for evaluating NHL goaltenders.

If somebody just sweeps the awards year after year, like Dominik Hasek did in the 1990s, then that is definitely showing something meaningful. Or when Glenn Hall kept getting voted ahead of other Hall of Famers while repeatedly overcoming the heavy bias towards the GAA leader, that's also probably revealing something important about how his play was viewed by his contemporaries. But goalies with a handful of awards are exceedingly rare. When you are comparing two veteran goalies where one has a Vezina or a Smythe while the other one doesn't, I don't think that trophy really adds much information at all. Some trophy-focused individuals might, for example, try to argue that Miikka Kiprusoff's Vezina Trophy means he has had a better career than Roberto Luongo, even though the performance gap between them is likely at least 100 goals in Luongo's favour (Lou's career GVT is more than double Kipper's number).

The problem with goalies is that one season is not a large enough sample to properly rate anybody, because results are not accurately tied to performance. Given that awards are handed out primarily based on results, this means that luck and team factors play a disproportionate role in winning awards for goalies.

For a .914 goalie who faces 1800 shots in a season, the 95% confidence interval of his save percentage based on binomial probability would put his performance anywhere between .903 and .925. To prove that this is not just a theoretical exercise, we can just look at two goalies with .914 career save percentages: Ilya Bryzgalov and Ryan Miller. Both have a low mark of .906 as a starting goalie, and despite their strong career track records both have struggled for the majority of this season. Bryzgalov looks likely to set a new career low mark this year, although Miller looks to have turned things around as of late.

Their peaks, on the other hand, are much higher, with Bryzgalov hitting .921 last season and Miller topping out at .929 during his '09-10 Vezina season. Those ranges are in fact almost exactly what would be expected if their numbers varied by random chance alone. Between the two of them, their average high is .925 and if Bryzgalov ends up at his current .899 while Miller stays above .906 then their average low would be .903, which again would exactly match the predicted range given above.

In addition to simple performance variation, a goalie's teammates could raise or lower his save percentage by up to about .005 or so depending on how many penalties they take and whether they are good at preventing shots against on the penalty kill. The official scorer in the goalie's home rink could also assist in cutting or boosting shot totals by a shot or so per game, which again could have an impact in the neighborhood of .005 compared to a goalie on the other end of the spectrum. And maybe a goalie plays for Ken Hitchcock or Jacques Lemaire and his team has a good year in front of him defensively in terms of reducing scoring chances, which could easily add another few thousandths to the final save percentage number.

Over a career these effects often mostly wash out, as a goalie will benefit from them in some seasons and suffer because of them in others. But when looking at a single season, as is the case when considering awards nominations, these factors can further accentuate the already heavy effect of random chance.

A goalie with at least a .920 save percentage over 1800 shots has about a 50% chance of being a Vezina nominee, based on seasonal results since the lockout. Bump that save percentage up to .925, and you probably have about a 50% chance of winning the trophy, given that half the goalies who met both cutoffs won the Vezina. Consider that binomial probabilities suggest that a goalie with league average talent will put up a .920 or better over 1800 shots 1 time out of every 5 just by chance, and you can see how it is entirely possible that a goalie who carves out a decent NHL career will probably luck into at least one good season somewhere along the line even before considering the other factors that could help his statistics. Fortunately there are selection effects that limit the number of flukes, since most average goalies won't be given that many starts by their teams, but it does still happen with some regularity.

Whether a goaltender wins the Vezina or not depends not just on his own play, but also what other goalies are doing around the league. To return to the Luongo/Kiprusoff comparison above, it's easy to see the impact of external factors when you consider that if Martin Brodeur had his 2006-07 season in 2005-06 and vice versa, Luongo would be the guy with a trophy while Kiprusoff would have been shut out.

It's not hard to find examples of goalies doing more or less the same thing year after year even as their voting numbers vary widely. Take Patrick Roy over the last five seasons of his career when he was the very picture of elite consistency, rattling off 61-63 starts per season, overall save percentage typically in the .915-.920 and EV SV% in the .925-.930 range and a GAA usually 2.20-2.30. In 2001-02, his numbers all improved a bit, particularly his GAA and shutouts, although his EV SV% was just .005 better than his five-year average. His team's improved shot prevention and penalty discipline helped as well. There's certainly a chance that Roy played better than normal in 2001-02, but there is also a pretty strong possibility that he was more or less the same goalie all the way through and the breaks went his way in '01-02.

Over that period Roy typically got a few Vezina votes per season, until 2001-02 when he almost won the award. That year the high-minute goalies (Brodeur, Kolzig, et al) had down seasons, while Hasek was playing at a lower, post-injury level. It turned out that only one starting goalie other than Roy posted a save percentage above .921.

Unfortunately for Roy's trophy case, that goalie happened to be Jose Theodore, who put up a .931 save percentage on his way to the Vezina/Hart combo. If you look at Theodore's 2001-02 numbers in context, they are major outliers. He also had a bunch of the indicators of one-year flukes, including very high special teams numbers and much better numbers at home than on the road.

Knowing what we know now, it is very likely that Patrick Roy was a better goalie in '01-02 than Jose Theodore. That just wasn't immediately apparent from that 82 game sample, and that's why Theodore won the award. Using awards as the primary evaluation criteria, Theodore's first five seasons as a starter ranks ahead of Roy's last five seasons as a starter (unless you're one of those guys who thinks only the playoffs matter and you really love Roy's 2001 Cup/Smythe combo, and even in that case you'd probably give Roy just a slight edge), even though Roy's .929 EV SV% on 6165 SA is quite a bit better than Theodore's .921 on 6522 (about 50 goals better, or nearly two wins per season).

In contrast, you won't find a skater win a scoring title primarily because of luck. Assume a typical first liner with 15 minutes per game at even strength and 3.5 minutes per game on the power play, who either doesn't play much on the PK or doesn't score any points when does get an occasional shorthanded shift. Let's say his team takes shots at an average rate while he is on the ice (27 shots per 60 at 5 on 5, 45 shots per 60 on the power play). That player and his linemates would need to put up a ridiculous shooting percentage, and he would have to be involved in an unusually high number of plays to get on the scoresheet enough to be in Art Ross contention.

For example, if the player's team shot 13% at evens and 20% on the power play, both numbers above what any regular player managed last year, and got a point on 90% of his team's even strength goals and 75% of his team's power play goals, both extremely high and unusual participation percentages, the player would still end up with 97 points. That's a pretty high total and it would end up being near the very top of the league, especially this year, but it still isn't high enough for a top-3 finish in any of the post-lockout seasons. And again, this hypothetical player would need to get all the breaks just to get that close. If any of the luck factors drop off, then he's not in contention, and there are many ways that could happen (e.g. he misses a few games, his ice time decreases, his teammates don't score at a high rate, he gets unlucky with second assists or the power play runs more often through a teammate which reduces his scoring opportunities).

Jordan Eberle is pretty much that guy this year (at even strength his team is shooting 13% at even strength with him on the ice and he has points on 88% of his team's goals), and he's still tied for 10th in scoring. Joffrey Lupul is another guy that has flirted with the top of the scoring charts this season based on unsustainable percentages, and while he still sits just above Eberle in 9th it was always just a matter of time before he was going to left behind by the likes of Malkin, Stamkos and Giroux.

Very high points finishes are meaningful because they are unlikely to be flukes. Pure trophy counting would still undervalue someone like Mats Sundin who was consistently productive although never near the very top of the league, but in general a player with a couple of high finishes can be considered to have had a better peak than a similar player who never climbed the table to the same degree (although of course context and team factors need to be taken into account).

That doesn't happen to goalies. Flashes in the pan have won the Vezina, and average goalies have found themselves in trophy contention simply because fortune ended up favouring them over a 50-60 game stretch. Of course observers can sometimes identify when other factors or luck are at play; they don't always just follow the numbers to the exclusion of anything else. However, it is particularly difficult when rating a young goalie without much of a track record who has a big season. Is he breaking out, or is he getting lucky? Is he Pekka Rinne or Steve Mason? Only time will tell.

It is my belief that goalies can only be properly evaluated in a multiple season context. With that in mind, single season awards should be considered relatively insignificant in terms of career evaluation.

5 comments:

PopsTwitTar said...

I cant imagine a worse way to determine actual quality than yearly awards. Ok, maybe a fan vote :) But the Vezina is voted on by GMs - so a group of 6-30 individuals, depending on the year. Second of all, we all know that all of those individuals are not watching every game by every goalie, so they will be heavily influenced by media and their own biases.

The Contrarian Goaltender said...

I can think of a worse way: counting Stanley Cup rings.

Anonymous said...

You overweight save percentage, and career save percentage is more a reflection of team play and era's than it is of an goalie. It is your go to stat and I believe that you foolishly think its the only thing that defines a goalie's abilities. Take a look at all time stats for goalies who played more than 100 games. It is no coincidence that 9 of the top 10 and more like 18 out of the top 20 all time save percentage leaders have played the bulk of their games after 2006. Either all the greatest goalies of all time are playing simultaneously right now, or save percentage is much more a factor of team and league style of play, rather than a goalie's individual talent. That's why I will trust a vote of 30 very hockey keen individuals rather than looking at some era and team based stat, because more often than not, the GM's will get it right.

Furthermore, when you say counting trophies, guys like Theodore only won these types of trophies once. If you can count the trophies, there would be multiple ones which means a goalie is consistently performing at a high level, which means he is clearly a solid goalie and not a single season fluke. And for guys like Theodore, or like Giguere's Conn Smyth in 2003, these players had very solid streaks and deserved these trophies but I don't think anyone is thinking they are in the all time elite class so I am not sure what your issue here really is.

Lastly, an average goalie with a few hot streaks that result in stanley cups, vezinas, or conn symths is better in my opinion than a consistently above average goalie who never won anything. I will take Giguere over Roman Checkmanek or Tomas Vokoun anyday. Lundquist and Luongo have similar numbers to guys like Vokoun but are Vezina finalists and gold medal winners, so its much easier to argue these two are the better goalies. They also happen to be multiple time Vezina finalists. Sure, they didn't take home the hardware, but all-star and Vezina nominations should factor into the analysis too. How about Backstrom, Halak, Hiller, Howard, Quick, Price, Varlamov, Bryzgalov, etc. All these players have equal or better career save percentages than Neimi, Ryan Miller, and Kiprusoff. Overall its debatable who is better, but since stats and era's are so similar in these cases, resorting to Neimi's cup and Miller or Kippers Vezina as a tiebreaker doesn't seem to unreasonable to add into the argument.

I think using trophies to judge goalie skill is not the prime stat that should be used, but it doesn't hurt. Without trophies its tough to make the arguement that Belfour > Cujo, or Roy > Brodeur. In the case of Roy and Brodeur, there aren't many arguments you can make stat wise that favor Roy (Brodeur's regular season and playoff save percentages are higher than Roy's) but the 3 Conn Smyths and 1 extra cup clearly make the difference - and that is coming from a Brodeur fan.

The Contrarian Goaltender said...

Either all the greatest goalies of all time are playing simultaneously right now, or save percentage is much more a factor of team and league style of play, rather than a goalie's individual talent.

Who is looking at unadjusted save percentages? I'm not. There are certainly some issues with save percentage (scoring environment, scorer bias, doesn't reflect non-save skills, shot quality effects for some outlier teams), but it's still by far the best goalie stat.

That's why I will trust a vote of 30 very hockey keen individuals rather than looking at some era and team based stat, because more often than not, the GM's will get it right.

More often that not, the 30 very hockey keen individuals will vote for the save percentage leader.

Overall its debatable who is better, but since stats and era's are so similar in these cases, resorting to Neimi's cup and Miller or Kippers Vezina as a tiebreaker doesn't seem to unreasonable to add into the argument.

Ranking Niemi ahead of the likes of Hiller, Price or Bryzgalov because of his Cup is exactly the kind of illogical conclusion I'm targeting with this post. I do indeed think it is unreasonable to use team awards as a tiebreaker. Also, often a difference of a couple of thousandths in save percentage adds up to a significant gap over several seasons' worth of play, and that is not something that should be handwaved away because it is much easier to compare trophy collections than it is to figure out the actual impact in terms of goals saved.

Without trophies its tough to make the arguement that Belfour > Cujo, or Roy > Brodeur. In the case of Roy and Brodeur, there aren't many arguments you can make stat wise that favor Roy (Brodeur's regular season and playoff save percentages are higher than Roy's) but the 3 Conn Smyths and 1 extra cup clearly make the difference - and that is coming from a Brodeur fan.

If you value save percentage first and foremost, it is in fact the opposite: There aren't many arguments you can make that favor Brodeur over Roy. Brodeur's save percentages are higher entirely because of era effects. Similarly, when you adjust for era, puckhandling and other relevant factors, it's not that hard to make the case for Belfour over Cujo either without discussing trophies.

Hostpph said...

I never thought about the possibility that counting trophies method of goalie evaluation could be flawed, but you said it right and you proved your point!