S_ATTws : Phase 1

Click for a full-size image

After spending some time looking through the database from Baseball-Databank, I started looking into something that I found interesting.  I began examining a team’s total payroll as it relates to their attendance.  The correlation between the two isn’t phenomenally strong, but it’s an interesting topic to examine.  Payroll is a component of a team’s expenses, and fans attending the games bring in revenue.  So this lead to a thought; “Which teams are spending most efficiently”?  I’ve seen opinions where payroll is analyzed directly with wins.  This is a logical thing to look at, but having a winning team doesn’t guarantee that the organization is earning a return for their spending.  They make money by putting people in the seats.  A pure payroll / attendance figure shows which teams pay the least per every fan in the stands, but it doesn’t take the quality of the teams into account.  So, I decided to look further into this, and try to come up with a statistic which measures a team’s spending efficiency, while taking team performance into account.  It can also be thought of as efficiently spending for a successful team.


S_ATTws1HEADThe image to the right explains the context of the table.  It is organized by S_ATTws in ascending order, showing the top 25 teams from 1990-2009.  The 2006 Florida Marlins top the list, featuring young superstars such as Dontrelle Willis, Miguel Cabrera, Josh Johnson, Hanley Ramirez, and Dan Uggla.  After dismantling the 2003 World Series Championship winning team in the off-season, the ’06 Marlins had a payroll of just $15 million, less than 1% of the total MLB payroll.  The Marlins have historically struggled to bring fans out to the ballpark, but they show how a team’s success with limited financial commitments reflects well in S_ATTws.  A polar opposite would be the 1993 expansion Colorado Rockies, who come in at 20th on the list.  Their payroll and attendance numbers are astounding.  In the scatter plot at the top of the page, they’re the dot in the upper-left.  They averaged around 55,000 fans per game, and almost doubled the average total attendance for the year.  They led the league in attendance, while having the lowest payroll in the MLB.  Although the team wasn’t very competitive, as is the norm for first-year expansion teams, their payroll to attendance ratio skews their S_ATTws value downwards.  At 18th are the famous 2002 Moneyballing Oakland Athletics, who won 100+ games, while only making up 1.72% of the total MLB payroll.  However, the 2001 pre-Michael Lewis team is ranked 6 spots higher.  The trend that jumps out to me the most, is how successful the Montreal Expos of the early-mid 90’s were.  They appear in the top 25 six times with a data set range of 8 years.  The Expos are a prime example of the negative affects of the 1994 labor strike.  During the best season in team history, the season was canceled.  Due to the lost revenue from attendance and media contracts, one of the best organizations in MLB history (with the best logo) was forced to the cellar of the NL East, and had to pack up for a move to Washington D.C.  

S_ATTws1MEANstdevThe table above shows the worst 25 teams according to S_ATTws listed in descending order. The 2008 New York Yankees top the list. They went 89-73 while making up 7.74% of the MLB payroll, and led the league in attendance.  The Yankees have been notorious for spending big time money on big time stars ever since the late George Steinbrenner bought the team in 1973.  This table is relevant because it brings up an irregularity in S_ATTws that I will be fixing for Phase 2.  The attendance and salary numbers are ran through the formula as a fraction of the league totals.  This leads to a large difference between the rich and poor teams, due to the overall payroll numbers being spread far apart from each other.  Instead of using ratios, I’ll need to report the numbers as standard deviations away from the mean.  This way, the ratio will be standardized, and big-spending teams will not be penalized as harshly.  I’ve been struggling to calculate the correct standard deviations using SQL, but I’m making progress.  My code is a little bit of a mess right now, I’m sure I’ll look back on it one day and laugh at how sloppy and confusing everything is put together.  Phase 2 will be released when I fix this problem.

I exported the data from the SQL search query into an excel spreadsheet and have placed it in the downloads section.


P.S. In reviewing my post, I’ve realized that I need to standardize a team’s spending based on their market size.  You can’t fault the Yankees for spending all of this money when it’s there at their fingertips.  Maybe also adjust the attendance based on stadium capacity, but I’m leaning against it.  You make money for the pure number of tickets sold, not the percentage of seats filled.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s