by Ken Pomeroy on Monday, September 3, 2012
About three seasons ago I tried to develop some sort of algorithm to assess a team’s chance of winning at various points in the game. It was the middle of the NCAA tournament, and as favored teams were finding themselves in a deficit at some point during the game, it seemed like it would be a good thing to know exactly where their chance of winning stood. On a larger scale there would be other uses, like measuring the magnitude of a comeback in any situation, or more advanced analysis like measuring how a team performs when the game is truly on the line.
The method I came up with to accomplish this was rather amateurish, but it worked well in most cases, so there wasn’t a big incentive to go changing it. It wasn’t until I was preparing a list of the most improbable wins from this past season that I noticed the system had a small glitch, mainly in cases where win chances would be small.
So I’ve spent the past few days taking a more adult approach to this and applying regression to the problem. Every possession of D-I on D-I action from last season was included in the analysis, and the variables used in the regression were initial win probability estimate, the team with possession, and the current margin. Since the effect of time remaining is non-linear, separate equations were derived for each minute of play, and also for the following times in the final minute: 0:30, 0:15, 0:05, and 0:03.
All in all, the results aren’t going to be that much different than the old system, but at least this one is grounded in reality and not some theory that was cobbled together in a couple days. The main difference is that the new system is a little more sure of itself - there are more cases of high win probability. I was a little surprised by this, but a check of my work indicates that there is support for this artifact. Here’s how the model forecasted possessions for various ranges of probabilities last season…