Subscribe!
CourtIntelligence powered by kenpom.com

The good stuff


At other venues...
  • ESPN.com ($)
  • Deadspin
  • Slate

  • Strategy
  • Whether to foul up 3 late
  • The value of 2-for-1’s

  • Philosophy
  • Brady Heslip’s non-slump
  • The magic of negative motivation
  • A treatise on plus-minus
  • The preseason AP poll is great
  • The magic of negative motivation
  • The lack of information in close-game performance
  • Why I don’t believe in clutchness*

  • Fun stuff
  • The missing 1-point games
  • Which two teams last lost longest ago?
  • How many first-round picks will Kentucky have?
  • Prepare for the Kobe invasion
  • Predicting John Henson's free throw percentage
  • Can Derrick Williams set the three-point accuracy record?
  • Play-by-play Theater: earliest disqualification
  • Monthly Archives

  • November 2014
  • October 2014
  • September 2014
  • July 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • July 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2007
  • September 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003

  • RSS feed

    Win probability for grown-ups

    by Ken Pomeroy on Monday, September 3, 2012


    About three seasons ago I tried to develop some sort of algorithm to assess a team’s chance of winning at various points in the game. It was the middle of the NCAA tournament, and as favored teams were finding themselves in a deficit at some point during the game, it seemed like it would be a good thing to know exactly where their chance of winning stood. On a larger scale there would be other uses, like measuring the magnitude of a comeback in any situation, or more advanced analysis like measuring how a team performs when the game is truly on the line.

    The method I came up with to accomplish this was rather amateurish, but it worked well in most cases, so there wasn’t a big incentive to go changing it. It wasn’t until I was preparing a list of the most improbable wins from this past season that I noticed the system had a small glitch, mainly in cases where win chances would be small.

    So I’ve spent the past few days taking a more adult approach to this and applying regression to the problem.  Every possession of D-I on D-I action from last season was included in the analysis, and the variables used in the regression were initial win probability estimate, the team with possession, and the current margin. Since the effect of time remaining is non-linear, separate equations were derived for each minute of play, and also for the following times in the final minute: 0:30, 0:15, 0:05, and 0:03.

    All in all, the results aren’t going to be that much different than the old system, but at least this one is grounded in reality and not some theory that was cobbled together in a couple days. The main difference is that the new system is a little more sure of itself - there are more cases of high win probability. I was a little surprised by this, but a check of my work indicates that there is support for this artifact.  Here’s how the model forecasted possessions for various ranges of probabilities last season…

    Range of
    win prob   Cases   Wins   Win%
      78-82    33895  42829   79.1
      88-92    52212  58301   89.6
    94.5-95.5  18969  20008   94.8
    95.5-96.5  22012  22869   96.3
    96.5-97.5  25642  26343   97.3
    97.5-98.5  32766  33235   98.6
    98.5-99.5  48253  48490   99.5
    99.3-99.7  27246  27282   99.9
    
    

    (Of course, the model was created based on last season’s data, so you’d expect it to work out. But I needed to make sure I didn’t screw something up and this provides a sanity check.)

    Keep in mind there are many more cases than games because there can be multiple possessions in each game with a specific win probability. The system seems to be well-calibrated, although it tends to underestimate certainty at cases above 98%. This probably isn’t a bad thing, overfitting and all that. The old system would almost never identify comebacks that had less than a 1% chance of happening. The new system identified 17 such cases last season, which seems more realistic.

    A note of caution when using probabilities: It’s worth mentioning that 90% doesn’t mean 100%. A team with a true 90% win probability is in a great position, but they will lose 10% of the time. In addition, these are win probabilities, not chance-of-game-getting-interesting probabilities. If this system is calibrated, a team with a 90% chance of winning will lose 10% of the time, but the game will become interesting significantly more often than that. The 90% team will even trail in more than 10% of the cases, because all this method cares about is the end result. (End of note of caution.)

    Another advantage of the new system is that it handles late-game win probabilities better.  The old system was far too optimistic about a team facing a deficit in the final minute. Such a comeback could never make the list of most improbable wins, but under the new system, two of the ten most improbable comebacks occurred with less than a minute to play.

    One advantage of the regression here is that it figures out the value of a possession based on the data its given. Under the old system, the value of a possession was assumed to be constant for the game, but the new system identifies that the value of possession increases some in the last 40 seconds or so.

    Even with these improvements, there isn’t a huge difference between the two systems. The new one is just based in math, while the old one was gibberish that got to right answer most of the time. One finding that both have in common is that a big underdog that plays the first half to a draw improves its chances of winning surprisingly little. This fact caused me some panic during the 2010 title game.

    One loose end here is the value of home-court advantage. Since one of the ingredients in the regression is the initial win-probability estimate, I would have assumed home court advantage would have been properly baked into the system. However of the 17 1% comebacks, 13 were accomplished by the road team and one was on a neutral court. It’s possible the model should account for relative team strength and home-court advantage separately.

    So I threw home court status into the regression, and it turned out that HCA was noisy and not significant until the last five minutes and then it was actually a bonus to the home team, the opposite of what I expected based on the population of extremely comebacks.

    In saying HCA wasn’t significant, I’m not saying it doesn’t matter. It’s already baked into the initial win probability estimate, so it’s included in the in-game win probability estimate, and is obviously very real in the first 35 minutes. It’s just that it may be more important in the last five minutes than the Pythagorean estimate of team strength gives it credit for.

    I haven’t included this in the model yet because given the high number of road comebacks that exist without it, I’m not comfortable including it without further examination. There’s some interesting reading produced by Brian Burke a while back on this idea for football (be sure to read the comments).

    While were on the subject of Brian Burke reads, he tried his hand at college hoops win probabilities in the past as well. I hadn’t read this until @GoRomano forwarded me the link Monday night, but it was also enlightening. Our methods are nearly the same, but as Brian notes, he didn’t account for the relative measure of the strength of the competing teams. (He did account for which team was at home.) The difference in the results are instructive. Brian notes…

    One thing I’ve already noticed that’s interesting about basketball is that the win probability equation is the same for nearly the entire game. In other words, a 6-point lead for the home team in the first 10 minutes of the game yields the same WP of 0.86 as a 6-point lead with 10 minutes to go in the 2nd half.

    In looking at my output, this makes sense if the population includes a lot of mismatches, as college hoops tends to have. For instance an 80% favorite with a six-point lead has an 89% chance of winning ten minutes into the game. That same lead with ten minutes to go gives them a 91% chance of winning (assuming the underdog has possession in both cases). But as the participants become more equal, the difference increases. In an even matchup, a six-point lead results in a 67% chance of winning after 10 minutes of play and a 78% chance with ten minutes left.

    OK, this was far too many words devoted to geekdom. If you made it here, you have my appreciation. We’ll take a look at last season’s craziest comebacks next week. The WP graphs on the site still reflect the old method, but I’ll get them updated in time for the next post.