Comps are back

I have a love/hate relationship with player comps. There are clearly some good things about them but some of the utility in identifying a player’s group of comparables is removed by the human tendency to first match players based on things like race, ethnicity, or even hair style. While I haven’t studied it, I expect that even comparables among African-Americans are influenced by skin color.

Whether it’s Jalen Rose or Jon Rothstein, the comp-by-looks approach can produce poor results.¹ The next time a white guy is compared to a black guy might be the first for either. And this is not a disease that affects only those two guys. Almost everyone covering the game seems to accept this is the way of doing comp business. Now obviously it isn’t true that all (or even most) within-race comps are bad. But the worst part of player comps are that they too often rely on stereotypes.

However, there are some positive uses for comps. At their best, they are shorthand for a description or projection of a player. One benefit to using advanced stats is being able to better isolate skills whether it’s for a team or a player. But it’s difficult to know if a high-usage, low-efficiency player has any chance to get much better without doing a study specifically looking at those kinds of players. And those types of players come in all kinds of varieties. A six-footer who is primarily taking 2’s and not getting to the free throw line figures to have a different outlook than a six-footer who is taking 3’s.

Statistical comps are a way to get through the mess of considering all of these variables. Part of the reason most humans fall back on racial biases is their own desire for a shortcut. Finding a comparison for Frank Kaminsky among the entire population of past basketball players is impossible for most human brains. But using racial stereotypes makes this task manageable. That’s not to say that statistical comps are superior to human-based comps. But unfortunately, the humans I’ve come across that are the best at this are not in the media.

After a hiatus of about a year, my statistical-based comps are back on the site. Let me describe what goes into them. First of all, I’ve standardized all stats by season. Then for each stat used, I compare the player’s deviation from average to all other players, sum the differences, and voila, the player with lowest difference is the top comp.

Here’s the weight placed on each stat used:

Height (4), Weight (2). Size wasn’t included in the previous version of my comps, but it seems to produce a more sensible list of comparables to include it.

MPG (4). Playing time may not seem like something we should consider, but coaches usually know what they are doing. If a player has great stats and isn’t getting court time, there are usually basketball-related reasons for that beyond the coach being a dope. So this should capture some of the things that aren’t measured in a box score.

AdjOE (2), AdjDE (2), AdjTempo (1), SOSO (.5), SOSD (.5). It’s important to consider the environment that a player is playing in. Here, I mostly rely on team strength but there’s a slight consideration for a team’s pace and its opponent strength as well.

3PA% (3), 2P% (3), 3P% (1), FT Rate (1). For the statistical portion of the comp calculations I tried to be loyal to the numbers that are the most consistent from season to season. Thus three-point percentage gets weighed less than 3PA% and 2P%. In fact, guys who take a lot of 3’s tend to be good three-point shooters anyway (and vice versa), so you won’t see many cases of good shooters matched up with bad shooters. And if you’re taking a lot of 2’s and making them at a high rate, you’re probably getting to the line more than the average player, so there’s less incentive to give extra weight to free throw rate.

Usage (3). Offensive role is really important. I don’t have anything more to say.

Block% (2), Steal% (2). Defense is the hard part to figure out. I’m not going to pretend this weighting scheme does a great job, but overweighting these factors isn’t the answer, either.

Assist% (3), Turnover% (1). If you haven’t figured it out, there’s a lot of subjectivity in the formulation. Much of my testing was done by tinkering with weights and inputs and seeing what looked good. And here it seemed that matching on assists was more important than matching on turnovers.

OR% (1.5), DR% (1). Likewise, rebounding doesn’t add as much to the comps as I liked. Perhaps the height/weight/shooting tendencies tend to overlap a lot here.

Class (∞). I should have mentioned earlier that a player only gets matched to players in his eligiblity class. But players are not matched to someone in the same class playing in the same season.

Race (0), Ethnicity (0), Hair Style (0). Sorry.

So that is it.I reserve the right to tinker with this moving forward. And there’s more data out there than just box score and measurement data that can help us bridge the athleticism gap, so at some point I’d like to include that as well.

The comps for this season will be updated on a nightly basis going forward. The top five comps are provided for each player, so hopefully there’s one in that group that provides some insight. And if not, well maybe the list will start a fun conversation. The whole business of coming up with good comps is extremely challenging whether it’s automated or not.

 [+]

References
^1	The Carlos Boozer-to-Trey Lyles comparison might be the best example of comparison-by-skin-color going wrong and it was used by both guys.

ADVANCED ANALYSIS OF COLLEGE BASKETBALL