Statistically Speaking: June 2006

Friday, June 30, 2006

The Measure of a Man

Does height have any effect on how well college quarterbacks play? Of course it does. The average Division I quarterback is taller than the average man. Coaches (whether accurate or not) are biased toward taller players. We know there is a difference between groups (quarterbacks and average joes), but is there a difference within groups? Are taller quarterbacks better passers? Are shorter quarterbacks better runners? To answer these questions, I sampled every Division IA quarterback that threw at least 100 passes last season. I set up an excel file and included four facts about each quarterback: their height in inches, their completion percentage, their TD/INT ratio, and their cummulative rushing yards. I then made three seperate graphs with height as the independent variable and the other three variables as the dependent variables. I also ran a regression analysis and determined the r squared value for height and each dependent variable. Unfortunately, I don't yet know how to transpose the graphs onto this blog so you'll have to take my word for it. Here are the r squared values for each set of variables.

Height and Completion %: .0028
Height and TD/INT Ratio: .0003
Height and Rushing Yards: .0596

No r squared value is very high. Completion percentage and TD/INT ratio both have minute positive relationships with height. Rushing yardage actually has a negative relationship with height. Although it is still very weak, it is exponentially more correlated with height than completion percentage or TD/INT ratio. All in all, this exercise was a lot like getting a degree from an online university: A lot of work and little to show for it. The reason their is no real discernable difference in height is beacuse of the lack of variation in height among college quarterbacks. Of the 140 quarterbacks who threw 100 passes last season, only 5 were under 6 feet and only 18 were over 6 foot 4. Maybe the ACLU can file on suit on the lack of diversity.

Friday, June 23, 2006

How Accurate is the Pythagorean Theorem in College Football?

I doth believe UCLA is in for a decline.

In baseball, the Pythagorean Theorem is a often a better indicator of team strength and usually a better predictor of future performance than a team's actual record. Is this also true in college football? Only one way to find out. I wanted to know which variable was a better predictor of each BCS school's 2005 winning percentage: their 2004 winning percentage or their 2004 Pythagorean winning percentage.

R squared for 2004 win %: .4070
R squared for 2004 Pythagorean win %: .5108

Both variables explain a significant portion of the variability of the 2005 record. However, the Pythagorean winning percentage is a better predictor as it explains roughly 25% more of the variance than the standard winning percentage.

It should be noted that most teams' winning percentages are close to their winning percentages as predicted by the Pythagorean Theorem. Now let's shift gears and focus on those teams who had a significant disparity in their winning percentage and their Pythagorean winning percentage. The cutoff point for 'significant disparity' is an arbitrary one, but I chose .100. That means if a team had a winning percentage of .750, but only a Pythagorean winning percentage of .64, they are included in this portion of the study. 22 teams from 2004 fit this criteria. If you're curious, those teams are listed at the bottom of this article. Using the same methodology as the previous study, I looked to see how well the 2004 winning percentage of these teams predicted their 2005 winning percenatage and then how well their 2004 Pythagorean winning percentage predicted their 2005 winning percentage. Here are the results.

R Squared for 2004 win %: .0428
R Squared for 2004 Pythagorean win %: .3097

When we examine only teams with a significant difference in actual and expected winning percentage the predictive power of their actual record practically disappears. The predictive power of the Pythagorean method is much smaller as well, but a relationship can still be deciphered.

The final study is the same as the first, but this time with the 22 teams with significant differences removed.

R Squared for 2004 win %: .5750
R Squared for 2004 Pythagorean win %: .5826

This result is pretty logical. When a team's actual record closely matches its predicted record, both do a pretty good job of predicting the team's record the next year.

With this data, we can conclude that the Pythagorean Theorem is applicable to college football, and when prospecting forward it is best to look at a team's ratio of points scored to points allowed rather than their actual record.

Saturday, June 17, 2006

Who Wins Close Games?

Previously on this blog, I've debunked the fallacy that teams have an 'ability' to win close games. Now I want to take another look. Two writers for baseball prospectus, Rany Jazayerli and Keith Woolner, discovered that bullpen strength influences which teams win close games in Major League Baseball. Does team defensive strength have a similar effect in college football? To answer this question I selected the top 5 defensive teams in terms of scoring defense for each of the past six seasons (2000-2005), a sample of 30 teams. Then I selected the top 5 offensive teams in terms of scoring defense for the past six seasons, another sample of 30 teams. Then I determined each team's record in close games (games decided by 1 score = 8 points or less). Here is the year by year examination of the top 5 scoring offense and top 5 scoring defenses in terms of their record in close games.

2000

Defensive Teams Offensive Teams

Texas Christian 0-2 Boise State 1-2
Florida State 1-1 Miami (Fla) 1-1
Toledo 1-1 Florida State 1-1
Western Michigan 2-2 Nebraska 2-1
Miami (Fla) 1-1 Virginia Tech 2-0

Total: 5-7 .417 Total: 7-5 .583

2001

Defensive Teams Offensive Teams
Miami (Fla) 1-0 BYU 5-0
Virginia Tech 0-2 Florida 0-2
Texas 1-1 Miami (Fla) 1-0
Oklahoma 2-1 Fresno State 2-2
Florida 0-2 Hawaii 3-3

Total: 4-6 .400 Total: 11-7 .611

2002

Defensive Teams Offensive Teams
Kansas State 2-2 Boise State 0-0
Ohio State 7-0 Kansas State 2-2
North Texas 2-3 Miami (Fla) 2-1
Georgia 5-1 Bowling Green 1-0
Alabama 2-1 Oklahoma 1-1

Total: 18-7 .720 Total: 6-4 .600

2003

Defensive Teams Offensive Teams
LSU 3-0 Boise State 3-1
Nebraska 1-0 Miami (Ohio) 2-0
Georgia 2-2 Oklahoma 1-1
Miami (Fla) 5-1 Texas Tech 2-2
Oklahoma 1-1 Southern Cal 0-1

Total: 12-4 .750 Total: 8-5 .615

2004

Defensive Teams Offensive Teams
Auburn 3-0 Louisville 2-1
Virginia Tech 4-2 Boise State 3-1
Southern Cal 4-0 Utah 0-0
Florida State 3-3 Bowling Green 0-1
Penn State 1-3 Fresno State 1-2

Total: 15-8 .652 Total: 6-5 .545

2005

Defensive Teams Offensive Teams
Alabama 3-1 Texas 2-0
Miami (Fla) 2-2 Southern Cal 2-1
Virginia Tech 1-1 Louisville 1-1
Georgia 3-3 Texas Tech 2-2
Texas 2-0 Fresno State 0-4

Total: 11-7 .611 Total: 7-8 .467

The 30 strong defensive teams posted a combined record of 65-39 in close games (.625). The 30 strong offensive teams posted a combined record of 45-34 in close games (.570). So it appears strong defensive teams do win more than their fair share of close games. Furthermore, 13 of the 30 strong defensive teams posted winning records in close games, 12 posted .500 records, and only 5 posted losing records. 12 of the strong offensive teams posted winning records in close games, 10 posted .500 records, 6 posted losing records, and 2 had no record.

No statistical study is complete without a control group. To find a control group, I randomly selected 5 teams for each season (numbering each team alpahebtically and using a random number generator) to be the basis for comparison. These team ranged from great (Penn State 2005 and LSU 2003) to medicore (South Carolina 2004) to awful (Duke 2000). Here are those teams and their respective records in close games.

2000

Minnesota 1-4
Duke 0-2
Kansas State 3-1
Southern Miss 3-3
Nevada 2-0

Total: 9-10 .474

2001

Miami (Ohio) 4-3
Mississippi State 2-5
Clemson 4-2
Oklahoma 2-1
Rice 5-1

Total: 17-12 .586

2002

Eastern Michigan 3-1
Louisiana Tech 2-1
Miami (Ohio) 3-3
Virginia 4-2
Colorado 2-2

Total: 14-9 .609

2003

Vanderbilt 0-2
Minnesota 3-2
UTEP 1-2
LSU 3-0
Memphis 1-2

Total: 8-8 .500

2004

Florida State 3-3
South Carolina 2-2
East Carolina 1-2
Hawaii 2-1
San Jose State 1-1

Total: 9-9 .500

2005

Kansas 1-0
Toledo 1-1
Mississippi 2-2
Penn State 3-1
Connecticut 1-1

Total: 8-5 .615

The 30 random teams had a cummulative winning percentage in close games of .551 (65-53). If your memory is short, that is less than the winning percentage of the strong defensive (.625) and offensive (.570) teams. Of those 30 random teams, 14 posted winning records, 9 posted .500 records, and 7 posted losing records. That's actually more teams with winning records, but also more with losing records for the control group.

Judging from this data, strong defensive teams do appear to win more than their fair share of close games. However, there are several important issues to discuss. Foremost, points allowed may not be the best method to rate defenses. Many factors account for scoring points. A team with the best defense may not finish as the top ranked scoring defense if their offense has many turnovers that put them in bad spots or if their special teams do likewise. Perhaps yardage or even yards per play is a better indicator of a defense's true strength. A second problem is schedule strength. The teams in BCS conferences are usually the best defensive teams thanks to the talent they are able to recruit. However, their schedules are also more difficult because they play other BCS schools who are also able to recruit the best talent. For this reason, their points allowed may be higher than small-conference schools that enjoy easier schedules. For example, in 2002 North Texas had the 3rd ranked scoring defense. They shut out 3 teams that season. One of those teams was non-Division IA Nicholls State and the other two were Louisiana Lafayette (averaged 16.92 points per game) and Idaho (averaged 23.75 points per game). The Mean Green did have a stout defense in 2002 (they held Texas, TCU, and Arizona below their seasonal averages), but they were definitely not the third best in the nation. Even though this study is not perfect, something can still be gleaned. Winning close games, while still heavily determined by luck and pure randomness, seem to be a skill that strong defensive teams somewhat possess.

Thursday, June 15, 2006

Steady as She Goes: Addendum

Last week, I posted a small regression analysis of year to year correlation of points scored versus points allowed. In the past week, I have been conducting similar analyses for previous seasons. Here are the results:

Correlation of points per game:
2004-2005: .3375
2003-2004: .3740
2002-2003: .3602
2001-2002: .1415
2000-2001: .2625

Correlation of points allowed per game:
2004-2005: .4108
2003-2004: .2935
2002-2003: .3606
2001-2002: .3868
2000-2001: .4159

If I had stopped after the second regression analysis, I would have concluded that offensive and defensive correlation from year to year was essentially random. However, after looking at 5 seasons worth of data, I am inclined to believe that year to year correlations between defense are more consistent that year to year correlations between offense. The correlation coefficient for defense was higher than offense for 3 of the 5 seasons, lower for only 1 season, and almost equal for another. Furthermore, the range for the correlation coefficient for defense was much smaller (.1224) than the same range for offense (.2325). Thus I believe the conslusion I reached last week remains correct (somewhat). Your thoughts?

Friday, June 09, 2006

Steady as She Goes

In keeping with the spirit of my previous post about consistency, I decided to conduct a little study to see which aspects of a college football team's performance are more consistent over time. Executing the study was simple, I simply calculated how many points per game each Division IA team scored in 2004 and determined how well they predicted each team's points per game in 2005 by using the r squared (correlation coefficient). Here's a technical definition. In laymen's terms, the r squared is the percentage of variation in the 2005 numbers that are explained by the 2004 numbers. I then did the same thing with each Division IA team's defense. The results are below, and to me they are a bit surprsing.

Correlation of points per game 2004-2005: .3375

Correlation of points allowed per game 2004-2005: .4108

Defense, at least from 2004-2005, is more consistent than offensive perfromance. This seems counter-intuitive because defense is a game of reactions. The offense dictates not only the pace, but also the personnel that the defense must have on the field. Later on this week, I will post the correlation for 2003-2004 and also 2002-2003 if I have time. If this holds true, it could be good news for teams like Georgia Tech and Alabama that have had good defenses for a few years running, but have been derailed by below-average offenses. Additionally, it could be bad news for teams like Notre Dame that had dramatic offensive improvements in 2005, but had similar defensive results. Of course, this data is at the macro level, and it would be prudent to consider each case individually when attempting to prospect how each team will do in 2006 relative to their 2005 numbers (graduating players, change in coaching style, etc.). As always, your thoughts on this seemingly counter-intuitive phenomenon are welcome.