This is the second of a series of posts examining whether offense or defense has more control of various aspects of a typical college basketball game. The introduction is here.

What predicts a basketball statistic the best: offensive data or defensive data? If we wanted to predict a team’s free throw percentage in the next game, we could use some combination of that team’s season-long free throw percentage and the opponent’s free throw percentage allowed.

You could use other things, too, of course. Adjusting the inputs for the quality of competition would figure to improve the prediction. For free throw percentage that doesn’t make any sense, but it might be useful for other statistics where there is true interaction between the offense and defense. However, in this work I am only using the raw season-long stats for the statistic in question.

What follows is an explanation of methodology that I used to determine which unit has more control over the rate of various basketball statistics. This can be rather tedious, so if you prefer, skip ahead a few paragraphs. Or do what I often do and read this piece in reverse-paragraph order until you lose interest.

I created a very simple model to predict any statistic in a game. The general form of the model is this:

value = α(offense) + β(defense) + γ(site) + ε

This equation is applied to a game, where “value” is the stat being predicted (in this case, a team’s free throw percentage), and the predictors are the season-long offensive and defensive values of the stat for the participating teams along with a home-court advantage component.

For free throw percentage, the data for a few games looks like this.

                  ************ G A M E ************   ******* S E A S O N *******                           
Gm  Team1  Team2  FTM1 FTA1 FTPct1 FTM2 FTA2 FTPct2   FT%_O1 FT%_D1 FT%_O2 FT%_D2
1   UMBC   Akron   14   28   50.0   12   21   57.1     65.2   70.3   66.3   68.6   
2   USC    Akron   14   22   63.6   10   13   76.9     63.4   70.7   65.8   68.0   
3   Towson Alabama 11   15   73.3   17   19   89.5     66.2   70.0   71.4   71.3
   

I’m trying to predict a team’s game free throw percentage (FTPct1) from its season-long percentage (FT%_O1) and the opponents’ season-long defensive percentage (FT%_D2). Data from the game in question is removed when determining the season-long percentages.

There are two samples for each game since we can try to predict FTPct2 from FT%_O2 and FT%_D1 as well. that gives us roughly 11,000 samples per season. I am actually doing this on a shot level, so there end up being about 220,000 samples since that’s how many free throws were attempted last season. But if you do it on a game-level, you get similar results.

Using the observed data, one gets the coefficients for the model that minimizes the error of the predictions. For free-throw percentage the results look like this, where the weights for offense, defense, and site are denoted by off, def, and site, respectively:

Call:
lm(formula = makes ~ off + def + site)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7722 -0.6675  0.2913  0.3148  0.4091 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.1301928  0.0336423   3.870 0.000109 ***
off         0.0068040  0.0002778  24.492  < 2e-16 ***
def         0.0013127  0.0004240   3.096 0.001963 ** 
site        0.0052330  0.0010422   5.021 5.15e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4607 on 220194 degrees of freedom
Multiple R-squared:  0.002964,	Adjusted R-squared:  0.00295 
F-statistic: 218.2 on 3 and 220194 DF,  p-value: < 2.2e-16

The above is the output from a linear model. I ended up using a logit model for percentage stats which is less straightforward to interpret, but you’ll get similar results with either approach.

Not surprisingly, the coefficient for offensive free throw percentage is larger than the coefficient for defensive free throw percentage. If you want to predict a team’s free throw percentage in a game, it’s more important to know their season-long track-record than the opponent’s defensive track-record. In other words, good free throw shooting offense beats good free throw shooting defense.

The coefficients tell most of the story but not all of it. There is more variance in season-long offensive free throw percentage than defensive free throw percentage. Apply the coefficients to the season-long values and you’ll see that the offensive term, α(offense), supplies much more variance to the predicted value than the defensive term, β(defense), does. For the 2014-15 season, 98.5% of the variance in a team’s predicted free throw percentage was due to the variance in the offensive term, with 1.5% provided by the defense. The defensive term is nearly inconsequential when making predictions.

In truth, you can get a feel for which side controls a stat by avoiding model-building and simply comparing the variance between the offensive and defensive season-long stats for all 351 teams. The side with more variance is the one that normally has more influence over the stat. But while there’s more variation in offensive free throw shooting, there’s still some on the defensive side. The range in the 10th to 90th percentile teams in offensive free throw shooting is 64.7-73.9% and defensively it’s 66.2-72.2%.

In a just world, where the defense presumably has no control over its opponents’ ability to make free throws and there is no random variance, every team’s defensive free throw percentage would be it’s opponents‘ season-long free throw percentage. But we know it is not a just world, because if it was, the Arctic Girls dance team of the Norwegian Basketball League’s Tromsø Storm would be international sensations. Alas.

The reality is that random variation is a real thing that plagues our nation by clouding statistical analysis. (There is also some variation in “free throw schedule strength” across Division-I which increases team-to-team variance.) When one sees UCF leading the country in opponents’ FT shooting at 61.5 percent, even the most tone-deaf analyst understands the Golden Knights didn’t have much control over that figure. The regression approach used here helps see through the random variance and gets us closer to the true answer of which side influences a particular stat. And since this method can be applied to all statistics, we can use it to determine in a relative sense which things are most controlled by the offense.

News flash: Nothing is more controlled by the offense than free throw percentage.

I ran the regression on each of the past ten seasons and the results are provided below. I’ll do this for each stat I analyze going forward, and I’ll also provide the value for home-court advantage. For free throw percentage a team can expect a 0.5% improvement between playing at home or on a neutral court. So yes, there is a difference between home and road free throw shooting, although I suppose whether it’s solely due to the comforts of being at home can be debated.

Year %Offense  HCA
2015    99%    0.5%
2014    97     0.4
2013   100     0.6
2012    98     0.5
2011    99     0.4
2010    97     0.3
2009   100     0.6
2008    98     0.4
2007    99     0.5
2006    97     0.5
 AVG    98%    0.5%