Friday, June 23, 2006

How Accurate is the Pythagorean Theorem in College Football?

I doth believe UCLA is in for a decline.

In baseball, the Pythagorean Theorem is a often a better indicator of team strength and usually a better predictor of future performance than a team's actual record. Is this also true in college football? Only one way to find out. I wanted to know which variable was a better predictor of each BCS school's 2005 winning percentage: their 2004 winning percentage or their 2004 Pythagorean winning percentage.

R squared for 2004 win %:
.4070
R squared for 2004 Pythagorean win %: .5108

Both variables explain a significant portion of the variability of the 2005 record. However, the Pythagorean winning percentage is a better predictor as it explains roughly 25% more of the variance than the standard winning percentage.

It should be noted that most teams' winning percentages are close to their winning percentages as predicted by the Pythagorean Theorem. Now let's shift gears and focus on those teams who had a significant disparity in their winning percentage and their Pythagorean winning percentage. The cutoff point for 'significant disparity' is an arbitrary one, but I chose .100. That means if a team had a winning percentage of .750, but only a Pythagorean winning percentage of .64, they are included in this portion of the study. 22 teams from 2004 fit this criteria. If you're curious, those teams are listed at the bottom of this article. Using the same methodology as the previous study, I looked to see how well the 2004 winning percentage of these teams predicted their 2005 winning percenatage and then how well their 2004 Pythagorean winning percentage predicted their 2005 winning percentage. Here are the results.

R Squared for 2004 win %: .0428
R Squared for 2004 Pythagorean win %: .3097

When we examine only teams with a significant difference in actual and expected winning percentage the predictive power of their actual record practically disappears. The predictive power of the Pythagorean method is much smaller as well, but a relationship can still be deciphered.

The final study is the same as the first, but this time with the 22 teams with significant differences removed.

R Squared for 2004 win %: .5750
R Squared for 2004 Pythagorean win %: .5826

This result is pretty logical. When a team's actual record closely matches its predicted record, both do a pretty good job of predicting the team's record the next year.

With this data, we can conclude that the Pythagorean Theorem is applicable to college football, and when prospecting forward it is best to look at a team's ratio of points scored to points allowed rather than their actual record.

Hoya Suxa said...

Matt,

Nice job with the pythagoras stuff.

Just to make sure that your data accurate, the exponent for the pythagorean formula is not 2. It's actually closer to 2.37.

(Note: The 2.37 exponent is actually the correct value for NFL football, but given the fact that I haven't sat down and gone through enough college football games to come up with an applicable exponent for the college version, the 2.37 value is sufficient.)

If you want to double check your numbers, I put together a pythagoras table for 2004 and 2005 on my blog a while back. The link is:

http://orange44.blogspot.com/2005/12/bonanza-of-numbers-part-i.html

The difference should be negligible, but it can never hurt to be sure.

matt said...

Thanks, Matt. Actually I did use 2.37. I should have mentioned that in the body of the post.

Hoya Suxa said...

Excellent.

Nothing breaks up the doldrums of late-June like some pythagoras.

Anonymous said...

How do you decide what the "correct" exponent for the pythagorean formula is? If you're fitting it to previous data, that seems like the statistical version of telling yourself what you want to hear. :-)

Matt Crawford said...

I don't know if you'll see this comment on such an old post, but I just took a look at the last 5 years, and the best exponent for college is much lower, at 1.8.

I'm guessing that the number of blowouts actually decreases the best exponent, because putting up 55 points doesn't really mean you have a good team. In the NFL, putting up a lot of points is much more indicative of quality.

I have a big spreadsheet if you want it.

Matt Crawford said...

I actually started by looking at 1986, which could provide a nice sanity check for the last five years. And the best exponent was 1.8 for that year too.