Sometimes when I’m looking over a team’s data, it’s not exactly clear what a team’s lineup looks like in reality. One could watch some video of a team’s past games to figure this out, but in our modern fast-paced world, not everyone has the time to do that. So I’ve been working for a while on something that was fun for me to develop and I hope useful for you to use. It’s an algorithm that, given a team’s lineup, will figure out what position each player plays. I’ve applied this to the ten-most frequently used lineups for each team and slapped the information on the bottom of each team’s page.
A word of warning up front: It is not perfect, and being an algorithm and all, it’s completely automated. If your job depends on this information, you should consult other sources to confirm my computer’s guess. Like video, probably. If you are a coach, you probably should be looking at video before you play an upcoming opponent. It’s the least you can do for getting paid to coach a basketball team.
For the most part, though, I like the results. The system’s probably 80% accurate. Maybe higher, but there’s no way to really put a number on it so I’m not sure why I did. You’ll have to judge for yourself. I’m using the old-school concept of basketball positions here which may be a disappointment to some. I’m totally down with redefining positions, but in experimenting with that concept, it added another layer of complexity to the project that didn’t work as well in practice as it did in theory. Also, I’m mainly focusing on offensive positions here. It’s actually not that much of a leap to do something to estimate defensive positions (at that point we can start living in Drew Cannon’s world), but that will require a bit more work.
In order to create the algorithm, I watched a bunch of teams (roughly 100) and assigned a position, one through five, to each player that got decent enough minutes. Then I ran a regression on various stats to best predict the position assignments. I’m using an initial model to identify the player on the floor most likely playing point guard, and then a second model to identify the remaining four spots on the floor.
To identify the point guard, height and assist rate are useful predictors of course. But the system I’m using tends to not like guys who take a lot of threes and have a low turnover percentage because those players are normally playing shooting guard or (“shooting wing”), regardless of their size. Not surprisingly, a low offensive rebound percentage is normally a giveaway for a point guard, but the system relaxes this requirement for taller players.
Still, the system misses on some guys. Most notably, it can handle point guards up to about 6-6 and beyond that it has trouble. So as of this writing, Norman Powell or Bryce Alford shows up as UCLA’s point guard when Kyle Anderson is on the floor. (For kicks, compare Anderson’s numbers to Dwight Powell of Stanford. There’s not a huge difference there and yet it would be laughable to slot the 6-10 Powell at point. This is an example of the challenge in developing such a model.)
And then there’s Vermont’s Brian Voelkel. Voelkel is the reason this project didn’t get off the ground months earlier. He’s 6-6, has easily the highest assist rate on his team, commits enough turnovers to suggest he handles the ball some, and isn’t a good offensive rebounder for his size. So Voelkel shows as up as the UVM’s point guard, even though he’s not. Eventually, I’ve come to grips with the idea that I cannot create a system that will put Brian Voelkel in his proper place. I’d like to politely request that Catamounts’ starting point guard Sandro Carissimo drop a few more dimes and commit a few more turnovers to resolve this. Until that happens, Carissimo will be pegged as a two-guard in Vermont’s lineup. Of course, in some cases the point guard distinction isn’t all that important. Some teams have more than one person on the floor capable of handling the ball and initiating the offense. Vermont is not that case, unfortunately, so I can’t use that excuse here.
The formula for determining the other positions is more straightforward. Height, assist rate (lower indicates a taller position), offensive and defensive rebounding (higher), weight (higher), block rate (higher), two-point percentage (higher), and three-point attempt percentage (lower) are all useful predictors. As with point guards, this part of the model isn’t foolproof, but to me it’s not a big deal if the three and the four are erroneously switched. On a lot of teams, it may not be possible or useful to distinguish those differences even when watching the team play.
Finally, play-by-play data is still not in a state where substitutions are accurately recorded. In some cases, it is impossible to determine the lineup a team has on the floor. So on the team page, after the ten-most used lineups, I’ve included the percentage of unknown lineups. (The percentages listed next to the other lineups are relative to the total number of known lineups.) The more unknown time there is, the less you should trust the real-life frequencies of the other lineups. But even with a lot of unknown time, you can still get an idea of what a team’s lineup looks like when a particular starter is off the floor, or what it looks like when a team wants to play big or small.