blog | rpi | pomeroy ratings | stats

Monday, August 01, 2005

College Basketball Data

The college basketball game file posted on my site is freely available for anyone to use with three requests on my part. (1) Give me some sort of acknowledgment on your site. (2) Let me know you are using the data and why you are using it. I am always curious. (3) While I do not expect you to actively quality control the data, if you do find errors or somebody reports an error to you, please pass it along to me.

Now a little bit about the format. The home team is listed last. Games not played on a home court are denoted by a letter after the last team’s score. A capital ‘N’ indicates a game played on a neutral court. For a game where the listed home team is not playing on its home court, yet still getting a home court advantage, a lower case ‘n’ is used. For my ratings system, I apply half of the stated home court advantage to the home team listed in ‘semi-neutral’ games. Because these distinctions are made solely at my discretion, I have included the site of the game in these cases, so you can use your own judgment if desired.

I do use some logic on how to classify a game. If a game is played at the home team’s home arena, then it is always classified as a home game. This seems obvious, but there are a few cases where these could be considered neutral games, mainly during post-season tournaments. Semi-neutral games are indicated where a game is not played at a team’s home arena, but is still close enough to the team’s home so that they will benefit from some home court advantage. Usually these cases are obvious, but in some cases there can be debate. Rare exceptions are made where a team plays away from its home arena, but they still are considered home games in the database. These cases occur when the home team is playing very near its home against a team traveling a considerable distance. Also, a few teams regularly play home games at more than one arena (Connecticut and DePaul are two examples).

There are some other letter codes that are used to classify a game.

T - Conference tournament game
P - Postseason game
S - A game between two conference teams that is
    not a conference game

Additionally, I include how many overtime periods (if any) were played. This will be indicated by a number after the last team’s score.

All games involving at least one of the teams in my ratings are included.

Posted on 08/01 at 05:27 PM
FAQs • (0) TrackbacksPermalinkE-mail me

Pomeroy Ratings FAQ

- Schedule Strength is computed by averaging the rating of each opponent, factoring in home court advantage as appropriate. For schedule strength purposes only, unrated opponents are given a rating of the worst rated team.

- Data in the ‘LAST 5 GAMES’ column reflects a team’s performance in its last 5 games against rated teams, based on its opponents current ratings, using the same weighting principles that are used to calculate the season ratings.

What is the purpose of your ratings system?
This system is designed to be predictive. One can get a prediction by simply taking the difference in the ratings of two teams and make appropriate adjustments for home site advantage. You can probably save some work by looking at individual team pages. There you can find predictions for future games, along with the chances of winning the game outright. Check out this site to monitor the accuracy of the major systems out there.

What information goes into the ratings?
The only information I use from each game is the margin of victory/defeat and the site of the game. The result of the game (won/lost/tied) is ignored, other than it being incidental to the margin of victory/defeat. Because the system relies on only past data, it can’t anticipate personnel changes that might affect the relative strength of two teams competing in a future game.

How are the ratings calculated?
The ratings are calculated using a least squares algorithm which develops an equation based on each game. If Team A beats Team B by 15 points, then A = B+15. Of course, some adjustment is made for home site advantage where appropriate. All of the equations are then solved to minimize the mean squared error of each game. Each game is given a weight based on two factors - its significance and when it was played. The significance increases for games involving teams of similar ratings. Significance also increases for games involving teams of disparate ratings where the result is much closer than expected.

Increasing weight is also given to more recent games. In a 30 game schedule, Game 1 will weigh about 40% as much as Game 30, assuming equal significance. For about the first month of the season, some weight is given to the preseason ratings. This is done to prevent the massive amount of daily fluctuation that would otherwise occur with so little data.

Do you cap margin of victory/defeat at all?
Yes. The limit on margin of victory is based on the distribution of margin of victory for all games in a particular season. For college basketball, this works out to something around 16 points by the end of the year.

How do you handle home site advantage?
Pretty much any system out there has shown that teams play better at home. This system applies a fixed home advantage for all teams. I don’t adjust this on a daily basis during the season, instead choosing to use a home site advantage that I have calculated from previous seasons.

Any more questions? Write to ratings@kenpom.com

Posted on 08/01 at 05:18 PM
FAQs • (0) TrackbacksPermalinkE-mail me

Stats Explained

Let’s start with the most basic stats to measure the ability of a team’s offense and defense.

Offensive efficiency
Points scored per 100 offensive possessions.

Defensive efficiency
Points allowed per 100 defensive possessions.

In order to compute efficiency, we need to know how to compute possessions.

Possessions
We can estimate possessions very well from box score stats by using this formula.

FGA-OR+TO+0.475xFTA

For each team, possessions are counted for the team and their opponents, and then averaged.

Efficiency gives us broad view of how well the offense or defense functions, but we can break efficiency into what Dean Oliver dubbed the Four Factors. Shooting, rebounding, turnovers, and free throws provide the basic components of efficiency.

Effective field goal percentage (eFG%)
(FGM + 0.5*3PM) / FGA

Shooting is measured by effective field goal percentage, which differs from conventional field goal percentage by taking into account the extra value of a made 3-pointer.

Offensive rebounding percentage
OR / (OR + DR)

Defensive rebounding percentage can also be computed, using defensive rebounds in the numerator.

Turnover percentage
TO / Possessions

Free throw rate
This can either be FTM/FGA or FTA/FGA. Typically, for team offense FTM/FGA is used, while on defense FTA/FGA is used.

There are other team stats that are less important than the Four Factors, with the common approach of converting the standard per-game stats to per-opportunity.

Assist Rate
A / FGM

Block Rate
Blocked shots / Opp. 2PA

Steal Rate
Steals / Defensive possessions

All of the above stats can apply to individuals in some form, also. There are two other stats that are applied to individuals that aren’t applied to teams. These stats were developed by Dean Oliver, and the formulas are far too complicated to list here. His book, Basketball on Paper, is worth buying if you are interested in how the calculations are performed.

Offensive Rating
This is the personal version of team offensive efficiency.

Usage (% of possessions used)
This describes a player’s role in the offense, by explaining how many of his team’s possessions a player is personally responsible for ending while he is on the floor.

A simpler version of personal efficiency is this one

Points per weighted shot (PPWS)

Points scored / (FGA + 0.475*FTA)

Posted on 08/01 at 03:02 AM
FAQs • (0) TrackbacksPermalinkE-mail me
Page 2 of 3 pages  <  1 2 3 >

Powered by ExpressionEngine