Golf win probability model description

Let’s face it, golf is not the most exciting spectator sport. Many find greater enjoyment in painting their ceiling or studying Latin. However, behind the slow pace of a golf tournament is a chaotic system, where about 150 players of various talent are simultaneously competing for a victory. But four rounds of golf is often not enough to separate the best golfers in the field from everyone else.

In fact, no matter the skill of the golfer, the winner will have to play better than his long-term average. Thus, the typical professional golf tournament is a tribute to randomness. One with expert knowledge could go an entire year without successfully picking a tournament winner and not feel too bad about it. Who is going to play over his head this week? That is the question that must be answered to predict a winner and no one can know that answer with much certainty.

In order to understand this messed-up world, I’ve been trying to develop a credible win probability model for golf tournaments for a while. This is not something that will solve one of sport’s great mysteries, but perhaps it will make more sense of the wacky world of professional golf, where each tournament contains 2.5 times the entries of the NCAA tournament with significantly more parity than college hoops.

So if, like Shane Battier, you are subscribing to @kenpomgolf, you will get the output from this model tweeted at you during each PGA event. If you are not interested in professional golf, you should not subscribe to this account. Even if you are, the information may be too much to handle. Follow responsibly.

A golf tournament is a complicated event to model. There are many things that could go into such a model. Mine is missing a lot of those things, but here is what the figures tweeted through @kenpomgolf take into account.

1) Player skill. This is derived from the player’s Official World Golf Ranking at the start of the start of the tournament. It is assumed that the player’s average score on the average course will be 67.555*(OWGR+8)^0.0088. Where does this formula come from? Well, a while back I tried to make my own world golf ratings. It turns out that it’s really difficult to maintain such a system, hence my dependence on the OWGR in the model. However, that effort did give me a decent starting point for converting a player’s world ranking to a theoretical single-round score. The equation used was fit to those golf ratings.

Only players inside the top 1000 are accounted for here. If a player ranks outside the current top 1000, he is given a rank of 250. This works well for the majors, because a player contending in a major who ranks outside the top 1000 has probably been competing in events that don’t earn points in the OWGR (usually on the Champions Tour or in amateur events) and is truly a halfway-decent player. For regular PGA Tour events, this assumption isn’t as good, since unranked players tend to be Monday qualifiers instead of Bernhard Langer or a top amateur. Still, Monday qualifiers are not in contention very often, so this isn’t a big concern.

2) Player’s current score and position on golf course. This is self-explanatory, but keep in mind that only hole-by-hole scores are taken into account. If a player just dumped his tee shot into the water, this isn’t factored into the win probability calculation until his score for the hole is posted. I’m also at the mercy of pgatour.com for the latest scores. This is a serious issue for three of the four majors, since the PGA Tour does not run the Masters, the U.S. Open, or the Open Championship. Unfortunately, scores will be delayed for these events. (Unless I manually update them, which is usually possible for the final rounds of the Masters and the Open Championship.)

3) Difficulty of each hole. Live hole-by-hole statistics for the field along with the player’s ability are used to assess the likelihood of various scores on future holes. It’s a challenge to adjust the scoring distribution on each hole for a particular player because the distribution is not normal. For an example, take the famous the 17th hole at TPC Sawgrass. It played to a an average of 0.04 strokes over par during the first two days. However, there were 44 birdies to just 15 bogies. The difference in over-par scores was made up by 11 double-bogies and five triple-bogies (or worse). Modeling the chances of the big numbers can be important to getting the win probabilities correct, especially late in the final round.

Instead of getting into the details of how I do this – because it’s kind of clumsy – I’ll give an example of what comes out of the sausage machine.

– For The Players Championship, the model rated Adam Scott as 1.208 strokes per round better than the field average.

– I assume that Scott’s advantage is spread equally among the 18 holes. This means that he is 0.067 strokes per hole better than the field.

– This advantage is used to skew the average distribution of scores on each hole. This is how Scott’s modeled probabilities differed from the field average on the 17th hole at Sawgrass:

```Score    Field  Scott
Eagle     .000   .000
Birdie    .153   .184
Par       .739   .729
Bogey     .052   .044
DblBogey  .038   .031
TplBogey  .017   .013

```

Each player has his own set of probabilities for each hole based on his OWGR-converted score relative to the rest of the field. Using these, I run 10,000 simulations of the field to obtain win probabilities for each player. In the early rounds of a tournament, this can take quite a bit of time, so there can be 10 to 20 minutes between the snapshot of the leaderboard and when the win probabilities are broadcast.

Each hole is assumed to be an independent event, which is probably a decent enough assumption, but if you believe that you can predict how a player will perform based on his performance on the previous hole(s), then I’m missing that here. (And if you have evidence to support this, let me know.)

Other assumptions that oversimplify real golf, in order of importance:
– Difficulty is assumed to be the same regardless of time of day. A player that posts a good first-round score with an afternoon tee time will be in a marginally better position than these probabilities indicate.
– The chance of winning a hypothetical play-off is equal for all participants.
– No home-course/area/country advantage is considered.
– A player can’t do worse than triple-bogey on any hole.
– Cuts are not considered. On Thursday and Friday, the tournament is modeled to be cut-less, so players that have a limited chance of making the cut could still be considered for winning the tournament.
– There is no accounting for whether a player’s game is particularly (un)suited to a specific type of course.

If it isn’t explicitly mentioned in this article, it isn’t taken into account. That’s a lot of things and I may patch some holes as time goes on, but if you are really interested in developing something better, it’s totally possible.

To the extent that my percentages differ from odds offered by people that take wagers, I’d guess it’s most likely due to the failure of the OWGR calculation in assessing each golfer’s current skill.

Tweet schedule:
Thursday and Friday: The first simulation is run beginning at 12:30pm ET on Thursday and tweeting continues every two hours while players are on the course.
Saturday: Every hour at the bottom of the hour while at least one player in the top four is on the course.
Sunday (and possibly Monday): Every half-hour, while at least one player in the top three is on the course.

Legal disclaimers and other warnings:
– Please consider that in cases where weekend tee times are moved to the morning to beat impending weather, this account will provide spoilers for people wishing to watch the tape-delayed broadcast on TV.
– The account is automated so there are liable to be glitches from time to time. I will monitor it as best I can, but I can’t baby-sit it all the time. Also, I probably won’t notice tweets addressed to @kenpomgolf.
– Percentages are rounded to the nearest whole number. Thus, you may find cases where the sum of listed players exceeds 100%.
– If you are making financial decisions based on this information, you may not be very smart.
– I am not associated with the PGA Tour.