Subscribe!
CourtIntelligence powered by kenpom.com

The good stuff


At other venues...
  • ESPN.com ($)
  • Deadspin
  • Slate

  • Strategy
  • Whether to foul up 3 late
  • The value of 2-for-1’s

  • Philosophy
  • Brady Heslip’s non-slump
  • The magic of negative motivation
  • A treatise on plus-minus
  • The preseason AP poll is great
  • The magic of negative motivation
  • The lack of information in close-game performance
  • Why I don’t believe in clutchness*

  • Fun stuff
  • The missing 1-point games
  • Which two teams last lost longest ago?
  • How many first-round picks will Kentucky have?
  • Prepare for the Kobe invasion
  • Predicting John Henson's free throw percentage
  • Can Derrick Williams set the three-point accuracy record?
  • Play-by-play Theater: earliest disqualification
  • Monthly Archives

  • November 2014
  • October 2014
  • September 2014
  • July 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • July 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2007
  • September 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003

  • RSS feed

    In-game win probabilities

    by Ken Pomeroy on Saturday, April 3, 2010


    Remember when Billy Packer declared the 2008 Final Four game between Kansas and North Carolina over? Billy got a bit of blowback for that, especially after UNC was able to pull within four points midway through the second half. I always felt like Billy was on safe ground with his statement. Granted, I supposed “over” taken literally means that there was no chance of the game becoming interesting. I took it to mean UNC had no chance of winning, although of course there was some small chance of winning. But just how safe was Billy’s statement?

    Previous attempts to quantify in-game win probabilities in college basketball are limited and have left me unsatisfied because none of them accounted for information known before the game starts. For instance, if Kansas and Alcorn State were tied five minutes into a game, we could come up with a better estimate than just saying each team has an equal chance of winning at that point. We can do better and this post documents my first attempt to do so.

    My first step was to estimate a team’s chances of winning, knowing the time and score, and assuming a game between teams of equal strength. To do this, I filtered play-by-play data using my ratings (while accounting for game location). This limits the sample to about 700 play-by-plays involving nearly equal teams, but that’s enough to make reasonable estimates of the probability. With each game, I recorded the lead at a given time and then whether that team won the game. As an example, there were 76 times that a team led by four with ten minutes to go in the first half. Those teams won 56.6% of the time.

    We can’t take that number literally because teams with a 5-point lead at that time had a winning percentage of 67.2, which is a larger difference than is logical. So some smoothing of the data had to be applied, then some logistic regression, and finally I got a table of values that makes sense, as shown below.

                   Minutes left
    Lead  35   30   25   20   15   10   5
    0    500  500  500  500  500  500  500
    1    514  520  526  534  539  547  569
    2    529  541  553  568  578  593  636
    3    543  561  579  602  616  637  698
    4    557  581  604  634  653  679  753
    5    572  601  630  666  688  719  801
    6    586  620  654  696  721  755  842
    7    600  639  677  724  751  788  876
    8    613  658  700  751  780  818  903
    9    627  676  722  775  806  844  925
    10   640  694  743  798  829  867  942
    11   653  711  762  820  850  888  956
    12   666  728  781  839  869  905  966
    13   679  743  799  857  886  920  974
    14   692  759  815  873  901  933  980
    15   704  773  831  887  915  944  985
    16   716  787  845  901  926  953  989
    17   727  801  858  912  936  961  991
    18   738  814  871  923  945  967  993
    19   749  826  882  932  953  973  995
    20   760  837  893  940  959  977  996
    21   770  848  903  947  965  981  997
    22   780  858  912  954  970  984  998
    23   790  868  920  960  974  987  998
    24   800  877  927  965  978  989  999
    25   809  886  934  969  981  991  999
    

    You can read the values as percent times 10. So that team with a four-point lead with 10 minutes left in the first half has a 58.1% chance of winning. This table ignores a couple of important things, namely which team has possession of the ball and the pace of the game. I’m going to punt on the latter for now, since the effect of pace on winning probabilities is an issue requiring additional study. For the possession issue, it seems reasonable to add a point to whichever team has possession since that’s the expected value of a possession. (Update: My original logic was batty on this issue. It’s more correct to add a half-point for possession.)

    I feel that this table is very accurate for teams of even strength, but unfortunately such a matchup is rare in college basketball. Even the two games in the national semifinals, which are matchups of comparable teams, would not have made it through my filter for finding a battle of nearly equal teams. The difficult part is trying to account for team strength.

    I need to use an example to explain why. Let’s say we have a game where we assume one team has a 90% chance to win before the game starts. Now suppose that the game is tied at halftime. From our trusty chart, our favorite would have a 50% chance of winning were it an even match with its opponent. The simple thing to do would be to average our two values – our team has a 70% chance to win now. It seems to make sense to use this linear approach, but one can quickly poke holes in it.

    Suppose the favorite jumped out to a 15-point lead five minutes into the game. Our chart gives the even-strength team a 70.4% chance of winning in that case. Using the linear method, the favorite would now have an 87% chance of winning. But wait, our favorite just jumped all over their opponent, and their chance of winning dropped slightly? Think of it another way. With these two teams starting tied and 40 minutes of basketball ahead of them, the underdog had a 10% chance for victory. Now faced with a 15-point deficit and just 35 minutes remaining, the ‘dog has a better chance of winning? It doesn’t make sense.

    (From this point on, I only recommend reading if you like awkwardly-structured sentences and math. Just know that I have a good formula to calculate win probabilities given the score, time remaining, team possession, and the relative strength of the teams involved. And also know that I’ll be tweeting the in-game probabilities at five-minute game-time intervals during the Final Four.)

    I’ve used two tricks to overcome this. First, I’m not going treat time as linear. This doesn’t change much in the example provided at the 35-minute mark, but think about the halftime example. I don’t believe our favorite had a 70% chance to win at that point. I believe it was higher. I’m not going to bore you with theory on this point, and I haven’t looked at data to support the idea. For now, I’m accepting it. If need be, players are going to try harder as the game goes on.  In order to account for this, I’m altering the time scale of the game by taking the square root of the fractional time remaining. That’s a mouthful, but at halftime, instead of assuming there is 50% of the game yet to be played, I’m going to pretend like there’s 70.7% of the game left to be played.

    However, at the 35-minute mark, no combination of our initial 90% and the predicted 70.4% will give us a number higher than 90%, which is what would make sense. For this, I’m using log5 to adjust our initial estimate of our favorite, using 90% for the favorite, and the 39.6% (100%-70.4%) that’s the even-strength estimate for the opposing team at this point.  That returns a value of 95.5%. I can use that in the linear calculation of win probability. I actually convert the probability to odds before I do this. But putting 95.5% and 70.4% into this sausage machine returns a probability of 95.3% that our favored team will win once they have a 15-point lead five minutes into the game. That our favorite’s chances went from 90% to 95.3% with their early run sounds reasonable.

    There’s lots more calibration to do with this system, but since I just thought about doing this a few days ago, it was necessary to get something done before the Final Four started. This will allow us to get a feel for how important events affect the outcome of each game this weekend. 

    By the way, according to the formula, UNC had about a 5% chance of coming back on Kansas when they were down 28 with 5 minutes to go in the first half. If that seems high, it may be. In my database of evenly-matched games, the largest deficit a team faced at that point in the game was 22. But amazingly, I have cases where a team overcame a 21- and a 19-point deficit. So perhaps Billy Packer was slightly crazy for jumping to conclusions when he did.