U.S. Returns to Schooling Data — SchoolingReturns • ivreg

Data from the U.S. National Longitudinal Survey of Young Men (NLSYM) in 1976 but using some variables dating back to earlier years.

data("SchoolingReturns", package = "ivreg")

Format

A data frame with 3010 rows and 22 columns.

wage: Raw wages in 1976 (in cents per hour).
education: Education in 1976 (in years).
experience: Years of labor market experience, computed as age - education - 6.
ethnicity: Factor indicating ethnicity. Is the individual African-American ("afam") or not ("other")?
smsa: Factor. Does the individual reside in a SMSA (standard metropolitan statistical area) in 1976?
south: Factor. Does the individual reside in the South in 1976?
age: Age in 1976 (in years).
nearcollege: Factor. Did the individual grow up near a 4-year college?
nearcollege2: Factor. Did the individual grow up near a 2-year college?
nearcollege4: Factor. Did the individual grow up near a 4-year public or private college?
enrolled: Factor. Is the individual enrolled in college in 1976?
married: factor. Is the individual married in 1976?
education66: Education in 1966 (in years).
smsa66: Factor. Does the individual reside in a SMSA in 1966?
south66: Factor. Does the individual reside in the South in 1966?
feducation: Father's educational attainment (in years). Imputed with average if missing.
meducation: Mother's educational attainment (in years). Imputed with average if missing.
fameducation: Ordered factor coding family education class (from 1 to 9).
kww: Knowledge world of work (KWW) score.
iq: Normed intelligence quotient (IQ) score
parents14: Factor coding living with parents at age 14: both parents, single mother, step parent, other
library14: Factor. Was there a library card in home at age 14?

Source

Supplementary material for Verbeek (2004).

Details

Investigating the causal link of schooling on earnings in a classical model for wage determinants is problematic because it can be argued that schooling is endogenous. Hence, one possible strategy is to use an exogonous variable as an instrument for the years of education. In his well-known study, Card (1995) uses geographical proximity to a college when growing up as such an instrument, showing that this significantly increases both the years of education and the wage level obtained on the labor market. Using instrumental variables regression Card (1995) shows that the estimated returns to schooling are much higher than when simply using ordinary least squares.

The data are taken from the supplementary material for Verbeek (2004) and are based on the work of Card (1995). The U.S. National Longitudinal Survey of Young Men (NLSYM) began in 1966 and included 5525 men, then aged between 14 and 24. Card (1995) employs labor market information from the 1976 NLSYM interview which also included information about educational attainment. Out of the 3694 men still included in that wave of NLSYM, 3010 provided information on both wages and education yielding the subset of observations provided in SchoolingReturns.

The examples replicate the results from Verbeek (2004) who used the simplest specifications from Card (1995). Including further region or family background characteristics improves the model significantly but does not affect much the main coefficients of interest, namely that of years of education.

References

Card, D. (1995). Using Geographical Variation in College Proximity to Estimate the Return to Schooling. In: Christofides, L.N., Grant, E.K., and Swidinsky, R. (eds.), Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp, University of Toronto Press, Toronto, 201-222.

Verbeek, M. (2004). A Guide to Modern Econometrics, 2nd ed. John Wiley.

Examples

## load data
data("SchoolingReturns", package = "ivreg")

## Table 5.1 in Verbeek (2004) / Table 2(1) in Card (1995)
## Returns to education: 7.4%
m_ols <- lm(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south,
  data = SchoolingReturns)
summary(m_ols)
#> 
#> Call:
#> lm(formula = log(wage) ~ education + poly(experience, 2, raw = TRUE) + 
#>     ethnicity + smsa + south, data = SchoolingReturns)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.59297 -0.22315  0.01893  0.24223  1.33190 
#> 
#> Coefficients:
#>                                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                       4.7336642  0.0676026  70.022  < 2e-16 ***
#> education                         0.0740090  0.0035054  21.113  < 2e-16 ***
#> poly(experience, 2, raw = TRUE)1  0.0835958  0.0066478  12.575  < 2e-16 ***
#> poly(experience, 2, raw = TRUE)2 -0.0022409  0.0003178  -7.050 2.21e-12 ***
#> ethnicityafam                    -0.1896315  0.0176266 -10.758  < 2e-16 ***
#> smsayes                           0.1614230  0.0155733  10.365  < 2e-16 ***
#> southyes                         -0.1248615  0.0151182  -8.259  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.3742 on 3003 degrees of freedom
#> Multiple R-squared:  0.2905,	Adjusted R-squared:  0.2891 
#> F-statistic: 204.9 on 6 and 3003 DF,  p-value: < 2.2e-16
#> 

## Table 5.2 in Verbeek (2004) / similar to Table 3(1) in Card (1995)
m_red <- lm(education ~ poly(age, 2, raw = TRUE) + ethnicity + smsa + south + nearcollege,
  data = SchoolingReturns)
summary(m_red)
#> 
#> Call:
#> lm(formula = education ~ poly(age, 2, raw = TRUE) + ethnicity + 
#>     smsa + south + nearcollege, data = SchoolingReturns)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -12.511  -1.722  -0.296   1.876   7.199 
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               -1.869524   4.298357  -0.435 0.663638    
#> poly(age, 2, raw = TRUE)1  1.061441   0.301398   3.522 0.000435 ***
#> poly(age, 2, raw = TRUE)2 -0.018760   0.005231  -3.586 0.000341 ***
#> ethnicityafam             -1.468367   0.115443 -12.719  < 2e-16 ***
#> smsayes                    0.835403   0.109252   7.647 2.76e-14 ***
#> southyes                  -0.459700   0.102434  -4.488 7.47e-06 ***
#> nearcollegeyes             0.347105   0.106997   3.244 0.001191 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 2.516 on 3003 degrees of freedom
#> Multiple R-squared:  0.1185,	Adjusted R-squared:  0.1168 
#> F-statistic: 67.29 on 6 and 3003 DF,  p-value: < 2.2e-16
#> 

## Table 5.3 in Verbeek (2004) / similar to Table 3(5) in Card (1995)
## Returns to education: 13.3%
m_iv <- ivreg(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south |
  nearcollege + poly(age, 2, raw = TRUE) + ethnicity + smsa + south,
  data = SchoolingReturns)
summary(m_iv)
#> 
#> Call:
#> ivreg(formula = log(wage) ~ education + poly(experience, 2, raw = TRUE) + 
#>     ethnicity + smsa + south | nearcollege + poly(age, 2, raw = TRUE) + 
#>     ethnicity + smsa + south, data = SchoolingReturns)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.82400 -0.25248  0.02286  0.26349  1.31561 
#> 
#> Coefficients:
#>                                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                       4.0656675  0.6084961   6.682 2.81e-11 ***
#> education                         0.1329473  0.0513794   2.588 0.009712 ** 
#> poly(experience, 2, raw = TRUE)1  0.0559614  0.0259944   2.153 0.031412 *  
#> poly(experience, 2, raw = TRUE)2 -0.0007957  0.0013403  -0.594 0.552797    
#> ethnicityafam                    -0.1031403  0.0773729  -1.333 0.182624    
#> smsayes                           0.1079848  0.0497399   2.171 0.030010 *  
#> southyes                         -0.0981752  0.0287645  -3.413 0.000651 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.4032 on 3003 degrees of freedom
#> Multiple R-Squared: 0.1764,	Adjusted R-squared: 0.1747 
#> Wald test: 148.1 on 6 and 3003 DF,  p-value: < 2.2e-16 
#>