Friday, July 6, 2012

The Value of Subjective Evaluation

I've written many times about how I do not put a high degree of emphasis on subjective evaluations of goaltenders because there are extraneous and often subconscious factors that can impact how a performance is judged. Even on the individual game level, where the task should presumably be at its very easiest, luck and the performance of the goalie's teammates still have a big impact on how they are rated.

TSN analyst and former NHL goaltender Jamie McLennan scored every goaltending performance in the 2012 Stanley Cup playoffs based on a subjective scale from 1-5, which gives an interesting point of comparison between what the numbers show in terms of saves and goals against and what an informed expert concludes from their individual judgment. I do not question McLennan's scouting ability or his knowledge of goaltending, but the data set he has provided has some interesting properties that force me to question how much value his analysis actually adds.
  • In 86 playoff games, McLennan never gave the losing goalie a better score than the winning goalie.  He gave the two the same score only 6 times, meaning that 93% of the time the goalie on the winning team was judged to have played a better game.  Eleven times the two goalies had a save percentage within .005, suggesting there was likely very little difference in their play, yet the winning goalie routinely received a higher degree of recognition.  When you factor in score effects that typically end up inflating the winning goalie's numbers slightly, it is likely that the losing goalie often managed to match or outperform the winner, yet they never got a better score and only rarely were even graded on the same level as their counterpart
  • The correlation between a goalie's average game score and his overall save percentage was 0.924, which is an extremely high degree of correlation
  • Goalies received a 5 every single time they had a save percentage of .970 or better, regardless of how many shots they faced
  • If you compare the save percentage rankings with the game score rankings, only 4 out of the 17 goalies with at least 3 games played in the 2012 playoffs have a ranking differential of more than two (Niemi, Howard, Brodeur and Holtby)
  • Using regression, each goalie's average game score can be predicted from their overall save percentage.  Every goalie's predicted average score was within 0.25 of their actual average game score, with the exception of Jon Quick, who was 0.37 higher, and Antti Niemi, who came in a whopping 0.67 lower
In summary, McLennan's rankings did not seem to add much information.  He really liked Jon Quick, but he was hardly unique in that viewpoint; the Conn Smythe Trophy is proof enough that most observers liked what they saw from the Kings' goalie. McLennan really wasn't a fan of Niemi's playoffs, giving him scores of 2 for games of .906 and .923, both very unusual for his ranking system (there were only two other games total where a goalie got a 2 for a .900+ save percentage outing).  Perhaps we can conclude from this that the numbers flattered the Sharks' netminder a bit relative to his actual performance.  However, it is also at least possible that McLennan has some sort of bias against Niemi, who had strong numbers at even strength (.940) but was ventilated on the penalty kill (.806), a unit that has been a point of serious weakness for San Jose over the past two seasons.

Beyond those two, comparing the save numbers to the game scores indicates that McLennan thought that Marc-Andre Fleury, Cory Schneider, Jimmy Howard and Martin Brodeur were all a bit better than their numbers suggest, while Tim Thomas, Jose Theodore and Corey Crawford were all a bit worse.  That makes sense for Fleury, awful performances do tend to disproportionately drag down a goalie's overall averages, as well as Brodeur who is noted for his non-save skills (although McLennan also openly admitted to giving the veteran a few sympathy marks for his play in the Stanley Cup Finals).

I would subjectively agree that Thomas and Crawford may both have been a bit worse than their numbers suggest, particularly if you factor in situational leverage and their impact on win probability.  A previous post of mine looked at save percentage by game score, and Thomas and Crawford were certainly outplayed by their opposite numbers when the game was close.  How much of that is randomness and how much of that was each goalie's fault is an open question, but I'm not surprised that someone subjectively rating goalies would take their situational performance into account to some extent.  Still, if I turn the analysis around to predict save percentage based on game score, McLennan was subjectively downgrading both goalies by only about .006-.008, not a very significant change at all for a 150-200 shot sample.

Imagine two individuals, one who didn't watch a second of playoff hockey but was armed with detailed stat sheets showing the save efficiency of every goalie, and one who also didn't watch a second of playoff hockey but still somehow faithfully tuned in every night to hear Jamie McLennan's Post2Post segment on TSN.  Which one would be in better position to rate the goaltenders in the 2012 playoffs?  I think it probably would be the McLennan fan, but their advantage would be very slight as the two would still agree on almost everything.

Over and over, analysis tends to show that the subjective factors often hyped by broadcasters and hockey insiders (shot quality, clutch saves, etc.) really do not have that much of an impact at the end of the day.  That's not always the most intuitive finding for us narrative-obsessed sports fans, but it's hard to argue with the evidence.  I think we should be careful to completely dismiss potential factors that over time could have a significant impact, particularly at the career level, but given that nearly every observer seems to vastly overrate the value of goaltending contributions that are not encapsulated in save percentage I think it reinforces the idea that a goalie's save rate should remain the primary and most trusted method of evaluation.  Subjective opinions from people who know what they are talking about should not be disregarded, but they should at least be treated with some level of skepticism and compared with the statistical record to test their validity before given much weighting in the final evaluation.