Monday, July 4, 2011

Effects of Race on Baseball Players Salaries


BACKGROUND AND SUMMARY OF RESULTS

Using data from over 300 major league baseball players we found that a player's salary is statistically effected by the interaction between his race and the racial composition city in which he plays. Whites earn less than blacks and Hispanics in cities populated by minorities. When the population in a city grows to  20% black,  a black player will earn about 5.2% more than a white player even if they perform equally good on the field. This may or may not be a sign of racial discrimination in player's salaries.  According to economist and statistician Jeffrey M. Wooldrige:

"We cannot simply claim that discrimination exist against blacks and Hispanics, because the estimates imply that whites earn less than blacks and Hispanics in cities heavily populated by minorities.  The importance of city composition on salaries might be due to player preferences:  perhaps the best black players live disproportionately in cities with more blacks and maybe the best Hispanic players tend to be in cities with more Hispanics."

The econometric estimations that follow show a strong correlation between race, racial composition of player's cities and earnings of Major League Baseball players, but we cannot distinguish between the hypotheses of racial discrimination versus player preference as the driving factors.

DATA AND VARIABLES

The analysis of a city's racial decomposition on player's salaries will be done via a multivariate regression analysis.  The idea behind a regression analysis is that it allows us to compare apples to apples.  There are many variables that can effect a players salary, but the most obvious is performance.  We want to control for a players performance in our regression analysis.  The following variables will be our control variables, with the exception of blackpb and hispph which are the variables we are testing.  The variables blackpb and hispph are interaction variables between race and the racial composition in a player's city.   The estimates and statistical significance of blackpb and hisph are the subject of this post. A full list of all variables included in the regression analysis are described below:

Data Source:  Companion website for Introduction to Econometrics by Wooldridge.


This next table provides some descriptive statistics for the variables above...


 
The table above shows that the average player in our data set has 6.3 years of experience with a minimum of 1 and maximum of 20 years in the league.  The batting average is 258.98 hits out of a thousand at bats with about 7.1 home runs per year.  The percentage of players that are black is 30.5% .While Hispanic players account for 18.1 % of the population.

The table below is a simple scatter plot that shows the salary of players and the games per year they played.  There is an obvious positive correlation with games played per year and salary:


Not surprisingly there is also a strong positive correlation between the percentage of years a player has been an all-star and wages as shown in this graph below:



Although these scatter plots are descriptive of the performance to salary relationships they are far too naive for any real comprehensive analysis.  In order to capture the true essence of the effect of performance and race on a baseball player's salary a multivariate regression analysis needs to reconcile all the driving factors.  This is what is done below.

REGRESSION ANALYSIS

The following table is the output from a multivariate regression describing the correlations between our control variables, variables we are testing (race and city racial composition interaction), and wages.  The top section of the table describes some statistics of the model, but focus your attention on the column labeled "Coef." and "t" in the bottom table.

 
Reading Regression Results Above

The variables in the regression are under the "lsalary" column.  The column "Coef." can be interpreted as the percentage change in salary given a one unit change in our explanatory variables from the first table.  For example, look at the "year" row above and the its value under "Coef." of 6.7%, this means that for every year a player is in the league you can expect his salary to increase by 6.7% on average after controlling for all the other variables in the regression.  The phrase, "after controlling for all other variables in the regression" can be included after the interpretation of any variable in this model!  This is what makes regression analysis so powerful. 

Next focus on the column labeled "t".  This is a t-statistics and when it is greater than two in absolute value we say the interpretation of the "Coef." variable is statistically significant.  In other words, if "t"is say 10 or -10, then we say the effect of  the corresponding "Coef." x on a player's salary is statistically significant.  If however the "t" column is contains -.5 or 1, for example, then we say that the effect of the corresponding variable in the table is statistically insignificant in explaining MLB salaries.  In the regression table above, only years, gamesyr, allstar, and the interaction between race and city composition are statistically significant.

Interpretation of Statistically Significant Race and a City's Racial Composition

The regression table above suggest that after controlling for player characteristics and performance Hispanic and black players get paid more as the percentage of  their race in the city they play in increases.

Being black in a city with zero percent black people means you earn about 19.8% less than white players after controlling for player ability  (this comes from the black coefficient above).  However, as the percentage of blacks increases this changes discrepancy rapidly changes.  If a city has a population of 10% black then one can calculate the effect by multiplying coefficient of blckbp ( which is 0.125 in the regression table above) by 10 and adding it to the negative 19.8 % to see what effect this change  racial composition has on salaries for a black player:

-19.8 + 0.125(10) = -0.73

The calculation above shows that when the population of blacks within a city increases black players get paid only 7.3% less even if the players have identical performance statistics (regression...controlling for performance variables).  When the population in a city grows to  20% a black player will earn about 5.2% more than a white player even if they perform equally good on the field. The population with the largest percentage of blacks is Detroit with about 74% black residents.