by Ken Pomeroy on Tuesday, October 30, 2012
I’ve rolled the site over to 2013, fresh with my stab at pre-season ratings and reasonably accurate schedules. (If you see something wrong with your team’s information, kindly drop me a line.)
These rankings might not match what you’ve seen in any other venue. The uniqueness is due to two general reasons. First, my computer doesn’t see everything humans see, and for the most part, humans have an advantage here. I generally think humans do a good job of assimilating data this time of year, with perhaps the exception of overvaluing a long tournament run fueled by close wins or a favorable draw. Even then, it’s just a hunch on my part that people overvalue that. I could be the one undervaluing postseason performance.
The other reason is that my computer doesn’t know what humans are thinking. This is mostly an advantage to the computer. I think the AP preseason poll is useful, but one criticism I have of it is that voters’ ballots are a bit too similar. Of this year’s ballots, in what should be a more difficult year to predict, just two of 65 voters had Indiana outside the top 3, and those two had the Hoosiers at number four.
Indiana may well be the proper pick as the best team in the land, but I think if you locked people in a room in late March and made each individual figure it out on his or her own, it wouldn’t have been nearly as obvious that a team with a suspect defense last season should be the best team in the land this season, and at least a few people would have struggled to put them in the top five.
For those that haven’t read the previous editions of College Basketball Prospectus, I’m going to take this post to describe the inputs to the model, and then I’ll discuss teams tomorrow. The logic goes like this: If you could have one thing to predict a team’s offense, what would it be? It turns out last season’s offensive efficiency would be that thing. It does a good job of predicting offense the following season. After that, the previous season’s offensive efficiency is the next best predictor, and after that, last season’s defensive efficiency helps a bit. (Flip the script for the defensive predictors.) Those three things are the foundation of the system.
The model takes those basic stats from the past and adjusts them for returning players. It’s got a bit of intelligence built in to determine which players’ minutes were most valuable last season. On the offensive end, minutes are weighted by the following formula…
(Player ORtg/Team Raw OE)^2 x (Player %Poss/20)^2
As an example, for Weber State’s Damian Lillard, this works out to
(124.2/111.5)^2 x (32.2/20)^2 = 3.23
In other words, for last season, Lillard’s minutes were over three times as important as the average Weber State player on the offensive end, and Lillard played a lot of minutes. Weber returns about half of its minutes from last season, which is a pretty decent figure. But its offense is forecast to suffer quite a bit more than its sheer volume of returning minutes would suggest.
New this season is that the returning component takes into account a player’s class. Past data has shown that returning freshman minutes are more valuable than sophomore minutes which are more valuable than junior minutes. If a player hasn’t been ruled out for the season by suspension or injury, he is assumed to be playing for purposes of this calculation.
The weakest part of the system is clearly accounting for new players. It ignores transfers and recruits outside the top 100. (Functionally, recruits outside the top 50-75 don’t have much impact in the formula.) Obviously, the majority of teams do not have a top 100 recruit coming on board.
Let’s look at the impact of ignoring transfers first. I went back at looked at the teams last season that had “impact transfers” and compared my preseason ranking (“Pred”), which had no knowledge of the transfer, and the team’s final ranking (“Act”). (Impact transfers are defined by this Jason King list.)
Transfer, Team Pred Act Diff Brown, Colorado 116 74 +32 Fuller, USC 116 241 -125 Heslip, Baylor 15 17 -2 Moser, UNLV 18 38 -20 Moultrie, Miss St. 82 89 -7 Rosario, Florida 12 12 0 Spurlock, UCF 93 107 -14 Thames, SD St. 55 69 -14 Turner, Texas A&M 31 114 -83 Wears, UCLA 46 43 +3 White, et al, Iowa St. 120 27 +93
There’s really no trend here. I suppose the “impact” part of the transfer list could be better measured. The guys above are basically from power conferences. However, even at lower levels the influence of transfers is difficult to predict. Rakim Sanders had a huge year at Fairfield as one might have anticipated coming from the ACC, and yet the Stags’ final ranking (100) didn’t differ much from the pre-season (94) either.
I suspect the reason that the system isn’t negatively impacted by a lack of transfer data is that it already has a decent grasp on what a team’s “replacement level” player looks like. In other words, its estimate of how to handle the minutes not accounted for by returning players or elite recruits isn’t so bad. Of course, the system whiffs in some cases. It’s not going to handle the bevy of transfers Iowa State brought in last season, led by an NBA first-rounder, but most transfer cases are not this extreme.
Finally, let’s look at how the system has fared against the AP preseason poll the past two seasons. I’m going to meet the AP at a neutral site and use a team’s tournament seed as the outcome to compare.
Tournament seed by pre-season rank
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 kp 2012 1 2 1 2 1 9 XX 4 5 4 2 7 5 2 3 10 7 6 10 8 3 4 7 1 XX AP 2012 1 1 2 9 1 2 5 7 4 XX 8 3 2 10 4 XX XX 4 9 XX 6 3 7 12 2 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 kp 2011 1 1 1 1 3 10 XX 5 9 11 6 2 4 2 4 XX 4 7 12 10 5 3 9 3 9 AP 2011 1 10 5 1 1 9 1 2 2 3 4 11 9 3 11 XX 8 7 12 6 XX 7 9 3 2
I don’t think there’s a clear winner either way, and we’re ignoring about 325 teams here anyway (although that’s on the AP, not me). I do have to point out that I gave the AP poll a 26-year head start and I only trail it 2-1 in number of times the top four teams have all been one-seeds. However, if we had 20 more years of data, we’d probably find that there’s not much more predictive power in my system as compared to the AP poll. I’d even concede the AP might have a small edge. But I’d say the main benefit of the system is that it’s an independent data point that isn’t concerned if it gets called out for putting Wisconsin #5. (I personally care, though. Please spare my fragile feelings.)
So you get somewhat unorthodox rankings that, at least at the very top, are about as good as the AP poll. At the very, very top, it has nailed the two best teams (if you go by Vegas) heading into the tournament the last two seasons. In 2011 it was Duke/Ohio State, and last season Kentucky/Ohio State. One preseason AP voter had that top two in ‘11 and no voter had it last season.
The point here is not that I’ll nail the top two teams every year or even that that should be your basis for determining whether the system is useful. The system was both lucky and good over the last two seasons, and anyway, this season is much more of a crapshoot. That streak will almost certainly end. (It should go without saying that the top two in the system this season also are not matched by any single voter.) The point is that the system gives you a reasonable prediction and it’s one that isn’t matched by very many people. Don’t ignore the AP poll or my system. They are both points of view worth considering as you gear up for the season. This is one case where subjective opinion and objective data can live in harmony.
Tomorrow, I’ll have some comments on particular teams and devote a few words to Dan Hanner’s awesome new (and more sophisticated) system. In the meantime, if you need to vent you can flame away on twitter to @kenpomeroy.