Tiers of joy

Home-court advantage is important in college basketball – even though it may be at an all-time low – but too often it gets ignored. I suspect we have the RPI to blame for this. The RPI doesn’t include venue in its strength of schedule calculation, and more profoundly, encourages users to look at a team’s record against say, the top 50 teams, without considering where those games were played. (Before I go any further, I have to say I don’t mind the RPI in general. It’s not a bad formula considering its origins and the history behind it is kind of endearing.)

In the spirit of home-court advantage awareness I’ve gone ahead and added additional information to the schedule page. In a fair world, when people talk about top 50 wins they should be accounting for where the game is played. And so I’ve added a notation for whether a game was Tier A or Tier B to each game on a team’s schedule. A game in Tier A represents a top 50 opponent adjusting for the location of the game, and Tier B is the same concept for a top 100 opponent. This is similar to what already exists on the player pages.

It turns out those adjustments are important. Beating the 90th-ranked team on the road is about as difficult as beating the 50th-best team on a neutral floor, which is roughly as difficult as beating the 20th-best team on one’s home floor. (The exact relationship can vary by season.) So it’s poor form to ignore this when some teams in college never go on the road in non-conference and others rarely get to play a non-conference game at home.

It’s with some reluctance that I do this. Anybody’s ranking system should stand on its own. For instance, as I write this Evansville is ranked 41st in my system. That means that the system thinks Evansville is the 41st-best team in the land right now. It’s true the Aces don’t have a top 50 win whether you consider home court or not, and you might propose that Evansville is overrated because of that. But unless you have knowledge that the system tends to overrate teams that have played a weak schedule, then you don’t have a basis for this statement.

Furthermore, just ranking a team based on its performance in various tiers can also be problematic. Teams taking on opponents ranked in the top ten on the road are going to have a worse record than teams playing opponents ranked in the 40-50 range. That gets to why computer ratings exists – it’s practically impossible for a human to compare all of this information and make their own rating on the fly. So humans try to simplify things by looking at a team’s top 50 record and lose information in the process.

In this way, the RPI is unique as far as I know. The RPI devotee is instructed to ignore a team’s actual rank. There are 40+ ranking systems listed on Ken Massey’s composite page. Sure, they’re all flawed, and some are more flawed than others, but I seriously doubt any of the folks behind those systems would say you can’t use the ranking of the team in their system and that you must look at a host of other things to determine which teams are best.

But so it is in the RPI, where the user can’t put any trust in a team’s actual ranking but must look at the underlying data: who a team has played and who it has beaten. And thus, things like record vs. top 50 teams are deemed more important than a team’s own RPI rank. But in doing so, one puts trust in an opponent’s RPI ranking, the very thing the user is told to ignore for the team in question.

Through repetition over three decades, this construct has been ingrained in the at-large selection process so that few people question it. I suspect part of this is because it makes it more difficult for the casual fan to be an expert. Expert: “No, casual fan, I know Team X is ranked 20 spots ahead of Team Y, but it’s just not that simple. Team Y is 4-3 against the top 50 whereas Team X is only 1-2. So team Y is better. Leave it to me to interpret the data.”

But what if Team X was 7-2 against the top 100 and Team Y was 4-4? Well, some expert is going to evaluate that and tell us. You just can’t look at a team’s ranking, silly. And what if all of team X’s games against the top 100 were on the road, but only two of team Y’s games were? Somebody will figure it out. Probably by spending hours and hours looking at the data. And this doesn’t even consider Teams A, B, C, and D who have similar data. The effort involved removes much of the convenience of having an automated ranking.

If you were asked to come up with a ranking of [some things], imagine reporting back, “Here’s my ranking, except don’t pay attention to the ranking, we have to look at what [these things] did when they interacted with [other things]. I know [this thing] is ranked as the best [thing] but it’s not, because of [this reason].” That would be an awkward moment.

But the worst aspect of the RPI is that it encourages the user to ignore the location of a game. Top 50 record has become a ubiquitous measure despite the fact that it’s going to screw over teams that have to play most or all of their games against top 50 competition away from home. To take one example, consider the Monmouth/UCLA game. I don’t know where UCLA is going to end up in the RPI, but in the real world a win on UCLA’s floor is the equivalent of beating a top 50 team on a neutral floor, whether UCLA is ranked in the top 50 or not (and according to rpiforecast.com they probably won’t be). So Monmouth will not get credit for a top 50 win, not because they don’t deserve it, but simply because they are unable to schedule a top 50 team at home.

The oddity is that, Monmouth figures to end up in the RPI top 50 itself. So UCLA will have the appearance of playing a quality opponent, suffering an excusable loss if you will, while Monmouth will only get credit for a pretty ordinary win in the RPI’s view. This is a bad thing and people that are interested in fairness shouldn’t accept this, especially since the fix is easy. I know the basketball committee is made of experts that are smarter than to look at things that way, but it adds another layer of complexity to the process that isn’t necessary.

So that is the motivation behind the tier notation on the team schedules. My rankings are designed to stand on their own, but if you are interested in assessing a team’s quality wins and losses, the tier approach is a more fair way of thinking.

ADVANCED ANALYSIS OF COLLEGE BASKETBALL