Subscribe!
CourtIntelligence powered by kenpom.com

The good stuff


At other venues...
  • ESPN.com ($)
  • Deadspin
  • Slate

  • Strategy
  • Whether to foul up 3 late
  • The value of 2-for-1’s

  • Philosophy
  • Brady Heslip’s non-slump
  • The magic of negative motivation
  • A treatise on plus-minus
  • The preseason AP poll is great
  • The magic of negative motivation
  • The lack of information in close-game performance
  • Why I don’t believe in clutchness*

  • Fun stuff
  • The missing 1-point games
  • Which two teams last lost longest ago?
  • How many first-round picks will Kentucky have?
  • Prepare for the Kobe invasion
  • Predicting John Henson's free throw percentage
  • Can Derrick Williams set the three-point accuracy record?
  • Play-by-play Theater: earliest disqualification
  • Monthly Archives

  • November 2014
  • October 2014
  • September 2014
  • July 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • July 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2007
  • September 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003

  • RSS feed

    Golf win probability model description

    by Ken Pomeroy on Saturday, May 10, 2014


    Let’s face it, golf is not the most exciting spectator sport. Many find greater enjoyment in painting their ceiling or studying Latin. However, behind the slow pace of a golf tournament is a chaotic system, where about 150 players of various talent are simultaneously competing for a victory. But four rounds of golf is often not enough to separate the best golfers in the field from everyone else.

    In fact, no matter the skill of the golfer, the winner will have to play better than his long-term average. Thus, the typical professional golf tournament is a tribute to randomness. One with expert knowledge could go an entire year without successfully picking a tournament winner and not feel too bad about it. Who is going to play over his head this week? That is the question that must be answered to predict a winner and no one can know that answer with much certainty.

    In order to understand this messed-up world, I’ve been trying to develop a credible win probability model for golf tournaments for a while. This is not something that will solve one of sport’s great mysteries, but perhaps it will make more sense of the wacky world of professional golf, where each tournament contains 2.5 times the entries of the NCAA tournament with significantly more parity than college hoops.

    So if, like Shane Battier, you are subscribing to @kenpomgolf, you will get the output from this model tweeted at you during each PGA event. If you are not interested in professional golf, you should not subscribe to this account. Even if you are, the information may be too much to handle. Follow responsibly.

    A golf tournament is a complicated event to model. There are many things that could go into such a model. Mine is missing a lot of those things, but here is what the figures tweeted through @kenpomgolf take into account.

    1) Player skill. This is derived from the player’s Official World Golf Ranking at the start of the start of the tournament. It is assumed that the player’s average score on the average course will be 67.555*(OWGR+8)^0.0088. Where does this formula come from? Well, a while back I tried to make my own world golf ratings. It turns out that it’s really difficult to maintain such a system, hence my dependence on the OWGR in the model. However, that effort did give me a decent starting point for converting a player’s world ranking to a theoretical single-round score. The equation used was fit to those golf ratings.

    Only players inside the top 1000 are accounted for here. If a player ranks outside the current top 1000, he is given a rank of 250. This works well for the majors, because a player contending in a major who ranks outside the top 1000 has probably been competing in events that don’t earn points in the OWGR (usually on the Champions Tour or in amateur events) and is truly a halfway-decent player. For regular PGA Tour events, this assumption isn’t as good, since unranked players tend to be Monday qualifiers instead of Bernhard Langer or a top amateur. Still, Monday qualifiers are not in contention very often, so this isn’t a big concern.

    2) Player’s current score and position on golf course. This is self-explanatory, but keep in mind that only hole-by-hole scores are taken into account. If a player just dumped his tee shot into the water, this isn’t factored into the win probability calculation until his score for the hole is posted. I’m also at the mercy of pgatour.com for the latest scores. This is a serious issue for three of the four majors, since the PGA Tour does not run the Masters, the U.S. Open, or the Open Championship. Unfortunately, scores will be delayed for these events. (Unless I manually update them, which is usually possible for the final rounds of the Masters and the Open Championship.)

    3) Difficulty of each hole. Live hole-by-hole statistics for the field along with the player’s ability are used to assess the likelihood of various scores on future holes. It’s a challenge to adjust the scoring distribution on each hole for a particular player because the distribution is not normal. For an example, take the famous the 17th hole at TPC Sawgrass. It played to a an average of 0.04 strokes over par during the first two days. However, there were 44 birdies to just 15 bogies. The difference in over-par scores was made up by 11 double-bogies and five triple-bogies (or worse). Modeling the chances of the big numbers can be important to getting the win probabilities correct, especially late in the final round.

    Instead of getting into the details of how I do this - because it’s kind of clumsy - I’ll give an example of what comes out of the sausage machine.

    - For The Players Championship, the model rated Adam Scott as 1.208 strokes per round better than the field average.

    - I assume that Scott’s advantage is spread equally among the 18 holes. This means that he is 0.067 strokes per hole better than the field.

    - This advantage is used to skew the average distribution of scores on each hole. This is how Scott’s modeled probabilities differed from the field average on the 17th hole at Sawgrass:

    Score    Field  Scott
    Eagle     .000   .000
    Birdie    .153   .184
    Par       .739   .729
    Bogey     .052   .044
    DblBogey  .038   .031
    TplBogey  .017   .013
    
    

    Each player has his own set of probabilities for each hole based on his OWGR-converted score relative to the rest of the field. Using these, I run 10,000 simulations of the field to obtain win probabilities for each player. In the early rounds of a tournament, this can take quite a bit of time, so there can be 10 to 20 minutes between the snapshot of the leaderboard and when the win probabilities are broadcast.

    Each hole is assumed to be an independent event, which is probably a decent enough assumption, but if you believe that you can predict how a player will perform based on his performance on the previous hole(s), then I’m missing that here. (And if you have evidence to support this, let me know.)

    Other assumptions that oversimplify real golf, in order of importance:
    - Difficulty is assumed to be the same regardless of time of day. A player that posts a good first-round score with an afternoon tee time will be in a marginally better position than these probabilities indicate.
    - The chance of winning a hypothetical play-off is equal for all participants.
    - No home-course/area/country advantage is considered.
    - A player can’t do worse than triple-bogey on any hole.
    - Cuts are not considered. On Thursday and Friday, the tournament is modeled to be cut-less, so players that have a limited chance of making the cut could still be considered for winning the tournament.
    - There is no accounting for whether a player’s game is particularly (un)suited to a specific type of course.

    If it isn’t explicitly mentioned in this article, it isn’t taken into account. That’s a lot of things and I may patch some holes as time goes on, but if you are really interested in developing something better, it’s totally possible.

    To the extent that my percentages differ from odds offered by people that take wagers, I’d guess it’s most likely due to the failure of the OWGR calculation in assessing each golfer’s current skill. 

    Tweet schedule:
    Thursday and Friday: The first simulation is run beginning at 12:30pm ET on Thursday and tweeting continues every two hours while players are on the course.
    Saturday: Every hour at the bottom of the hour while at least one player in the top four is on the course.
    Sunday (and possibly Monday): Every half-hour, while at least one player in the top three is on the course.

    Legal disclaimers and other warnings:
    - Please consider that in cases where weekend tee times are moved to the morning to beat impending weather, this account will provide spoilers for people wishing to watch the tape-delayed broadcast on TV.
    - The account is automated so there are liable to be glitches from time to time. I will monitor it as best I can, but I can’t baby-sit it all the time. Also, I probably won’t notice tweets addressed to @kenpomgolf.
    - Percentages are rounded to the nearest whole number. Thus, you may find cases where the sum of listed players exceeds 100%.
    - If you are making financial decisions based on this information, you may not be very smart.
    - I am not associated with the PGA Tour.