Tuesday, September 26, 2006

Team Similarity Scores

As a Wake Forest alum and fan, I'm pretty excited by the start of this football season. A 4-0 start (soon to be 5-0 barring an unmitigated disaster against Liberty) has me already making plans to attend a bowl game come late December. What I want to know is, what is the likely record for this year's Wake Forest football team? In order to estimate this year's team's record, I decided to borrow a concept from Bill James. James introduced similarity scores for individual baseball players in his groundbreaking Baseball Abstracts in the 1980s. These scores allowed James to predict how certain players would age and develop. What I want to do is create some rudimentary similarity scores for teams in order to better project how certain teams will develop. Developing similarity scores for professional teams is much easier because of the small number (30-32 teams compared to 119 or so) of teams and massive discrepancies in schedules in the college game. But I'll give it the old college try. Here's the methodology.

1. Start with 1000 points

2. Through 'x' number of games take the difference in winning percentage multiply by 1000 and subtract from 1000
example: Team A is 4-0 and Team B 3-1, then the difference in winning percentage would be 1-.75=.25, multiplying this by 1000=250, subtract this number from 1000

3. For every game difference in home/road inequality subtract 50 points
example: Team A has played 2 road games and 2 home games, Team B has played 3 road games and 1 home game, subtract 50 points (neutral sites count as half games)

4. Subtract the difference in point differential through 'x' number of games

5. Subtract the difference in average opponents' Sagarin Rating (I think its a pretty good measure of schedule strength)

6. Subtract the difference multiplied by 1000 in previous year's record (we need to know how good the team's were in the previous season)

7. Subtract the difference multiplied by 1000 in previous year's Pythagorean Winning Percentage (a better indicator of team strength than actual record)

8. The remaining points are the teams' similarity score (the higher the better)

Is this formula accurate? I have no idea. It's just a little tool I've been toying with. So anyway, which teams are most similar to the 2006 incarnation of Wake Forest? Before I address that question, I need to tackle a few problems. There are a ton of college football teams. Even if we just limit this study to BCS teams from the last 5 years, thats a sample size of roughly 350 teams. Without any type of database and computer program to filter out the teams, next season would have started by the time I finished. That being said, I am limiting the sample to BCS teams from last year. Admittedly this limits the usefulness of these mathematical shenanigans by a great deal. Still, perhaps there is a little something we can gleam from this exercise.

The 3 most similar teams to Wake Forest (2006) from last year-- similarity score in parentheses and final record following

1. Michigan State (748.5) 5-6
2. Vanderbilt (738) 5-6
3. West Virginia (366) 11-1

You can pretty much throw West Virginia out as Michigan State and Vanderbilt are by far the two most similar teams. Both these teams finished 1-6 after their 4-0 starts. After the Liberty game, its likely Wake Forest will not be favored in any game over the course of the rest of the season. Of course, Wake has won as an underdog before (twice already this season), and several of the games are winnable (North Carolina and NC State on the road, BC at home, and perhaps even Maryland on the road). Wake fans should be excited about the teams hot start, but we should also pull a Larry David and curb our enthusiasm. Our schedule so far has not exactly been a who's who of top 10 teams (Syracuse, Duke, Ole Miss, and Connecticut), our starting quarterback, running back, and one of our offensive tackles are either out for the year or for an extended period of time. While a bowl game is probable, it certainly is not assured.

For those interested, her is computation of Vanderbilt's similarity score.

1. Through 4 games, both Wake and Vandy were 4-0 so no points is subtracted

2. Through 4 games, both Wake and Vandy have both played 2 home games and 2 road games so points subtracted

3. Through 4 games, Wake had a point differential of 46, Vandy of 40-- subtract 6 points

4. Through 4 games, Wake's opponents have an average Sagarin Rating of 83, Vandy's had an average rating of 85-- subtract 2 points

5. In 2005 Wake went 4-7 (.364), in 2004 Vandy went 2-9 (.182), difference of .182, multiply by 1000-- subtract 182

6. In 2005 Wake had a Pythagorean winning percentage of .405, in 2004 Vandy had a Pythagorean winning percentage of .333, difference of .072, multiply by 1000-- subtract 72

Total points remaining: 738


Anonymous said...

Wake Forest is nuber #1!

Sam said...

When you used the average Sagarin ratings for Vandy's first four opponents, did you use the final ratings or the Week 5 ratings?

matt said...

Good question. Its actually the end of year rankings as I could not find the Week 5 ratings.