Data from the U.S. National Longitudinal Survey of Young Men (NLSYM) in 1976 but using some variables dating back to earlier years.

data("SchoolingReturns", package = "ivreg")

Format

A data frame with 3010 rows and 22 columns.

wage

Raw wages in 1976 (in cents per hour).

education

Education in 1976 (in years).

experience

Years of labor market experience, computed as age - education - 6.

ethnicity

Factor indicating ethnicity. Is the individual African-American ("afam") or not ("other")?

smsa

Factor. Does the individual reside in a SMSA (standard metropolitan statistical area) in 1976?

south

Factor. Does the individual reside in the South in 1976?

age

Age in 1976 (in years).

nearcollege

Factor. Did the individual grow up near a 4-year college?

nearcollege2

Factor. Did the individual grow up near a 2-year college?

nearcollege4

Factor. Did the individual grow up near a 4-year public or private college?

enrolled

Factor. Is the individual enrolled in college in 1976?

married

factor. Is the individual married in 1976?

education66

Education in 1966 (in years).

smsa66

Factor. Does the individual reside in a SMSA in 1966?

south66

Factor. Does the individual reside in the South in 1966?

feducation

Father's educational attainment (in years). Imputed with average if missing.

meducation

Mother's educational attainment (in years). Imputed with average if missing.

fameducation

Ordered factor coding family education class (from 1 to 9).

kww

Knowledge world of work (KWW) score.

iq

Normed intelligence quotient (IQ) score

parents14

Factor coding living with parents at age 14: both parents, single mother, step parent, other

library14

Factor. Was there a library card in home at age 14?

Source

Supplementary material for Verbeek (2004).

Details

Investigating the causal link of schooling on earnings in a classical model for wage determinants is problematic because it can be argued that schooling is endogenous. Hence, one possible strategy is to use an exogonous variable as an instrument for the years of education. In his well-known study, Card (1995) uses geographical proximity to a college when growing up as such an instrument, showing that this significantly increases both the years of education and the wage level obtained on the labor market. Using instrumental variables regression Card (1995) shows that the estimated returns to schooling are much higher than when simply using ordinary least squares.

The data are taken from the supplementary material for Verbeek (2004) and are based on the work of Card (1995). The U.S. National Longitudinal Survey of Young Men (NLSYM) began in 1966 and included 5525 men, then aged between 14 and 24. Card (1995) employs labor market information from the 1976 NLSYM interview which also included information about educational attainment. Out of the 3694 men still included in that wave of NLSYM, 3010 provided information on both wages and education yielding the subset of observations provided in SchoolingReturns.

The examples replicate the results from Verbeek (2004) who used the simplest specifications from Card (1995). Including further region or family background characteristics improves the model significantly but does not affect much the main coefficients of interest, namely that of years of education.

References

Card, D. (1995). Using Geographical Variation in College Proximity to Estimate the Return to Schooling. In: Christofides, L.N., Grant, E.K., and Swidinsky, R. (eds.), Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp, University of Toronto Press, Toronto, 201-222.

Verbeek, M. (2004). A Guide to Modern Econometrics, 2nd ed. John Wiley.

Examples

## load data
data("SchoolingReturns", package = "ivreg")

## Table 5.1 in Verbeek (2004) / Table 2(1) in Card (1995)
## Returns to education: 7.4%
m_ols <- lm(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south,
  data = SchoolingReturns)
summary(m_ols)
#> 
#> Call:
#> lm(formula = log(wage) ~ education + poly(experience, 2, raw = TRUE) + 
#>     ethnicity + smsa + south, data = SchoolingReturns)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.59297 -0.22315  0.01893  0.24223  1.33190 
#> 
#> Coefficients:
#>                                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                       4.7336642  0.0676026  70.022  < 2e-16 ***
#> education                         0.0740090  0.0035054  21.113  < 2e-16 ***
#> poly(experience, 2, raw = TRUE)1  0.0835958  0.0066478  12.575  < 2e-16 ***
#> poly(experience, 2, raw = TRUE)2 -0.0022409  0.0003178  -7.050 2.21e-12 ***
#> ethnicityafam                    -0.1896315  0.0176266 -10.758  < 2e-16 ***
#> smsayes                           0.1614230  0.0155733  10.365  < 2e-16 ***
#> southyes                         -0.1248615  0.0151182  -8.259  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.3742 on 3003 degrees of freedom
#> Multiple R-squared:  0.2905,	Adjusted R-squared:  0.2891 
#> F-statistic: 204.9 on 6 and 3003 DF,  p-value: < 2.2e-16
#> 

## Table 5.2 in Verbeek (2004) / similar to Table 3(1) in Card (1995)
m_red <- lm(education ~ poly(age, 2, raw = TRUE) + ethnicity + smsa + south + nearcollege,
  data = SchoolingReturns)
summary(m_red)
#> 
#> Call:
#> lm(formula = education ~ poly(age, 2, raw = TRUE) + ethnicity + 
#>     smsa + south + nearcollege, data = SchoolingReturns)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -12.511  -1.722  -0.296   1.876   7.199 
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               -1.869524   4.298357  -0.435 0.663638    
#> poly(age, 2, raw = TRUE)1  1.061441   0.301398   3.522 0.000435 ***
#> poly(age, 2, raw = TRUE)2 -0.018760   0.005231  -3.586 0.000341 ***
#> ethnicityafam             -1.468367   0.115443 -12.719  < 2e-16 ***
#> smsayes                    0.835403   0.109252   7.647 2.76e-14 ***
#> southyes                  -0.459700   0.102434  -4.488 7.47e-06 ***
#> nearcollegeyes             0.347105   0.106997   3.244 0.001191 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 2.516 on 3003 degrees of freedom
#> Multiple R-squared:  0.1185,	Adjusted R-squared:  0.1168 
#> F-statistic: 67.29 on 6 and 3003 DF,  p-value: < 2.2e-16
#> 

## Table 5.3 in Verbeek (2004) / similar to Table 3(5) in Card (1995)
## Returns to education: 13.3%
m_iv <- ivreg(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south |
  nearcollege + poly(age, 2, raw = TRUE) + ethnicity + smsa + south,
  data = SchoolingReturns)
summary(m_iv)
#> 
#> Call:
#> ivreg(formula = log(wage) ~ education + poly(experience, 2, raw = TRUE) + 
#>     ethnicity + smsa + south | nearcollege + poly(age, 2, raw = TRUE) + 
#>     ethnicity + smsa + south, data = SchoolingReturns)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.82400 -0.25248  0.02286  0.26349  1.31561 
#> 
#> Coefficients:
#>                                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                       4.0656675  0.6084961   6.682 2.81e-11 ***
#> education                         0.1329473  0.0513794   2.588 0.009712 ** 
#> poly(experience, 2, raw = TRUE)1  0.0559614  0.0259944   2.153 0.031412 *  
#> poly(experience, 2, raw = TRUE)2 -0.0007957  0.0013403  -0.594 0.552797    
#> ethnicityafam                    -0.1031403  0.0773729  -1.333 0.182624    
#> smsayes                           0.1079848  0.0497399   2.171 0.030010 *  
#> southyes                         -0.0981752  0.0287645  -3.413 0.000651 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.4032 on 3003 degrees of freedom
#> Multiple R-Squared: 0.1764,	Adjusted R-squared: 0.1747 
#> Wald test: 148.1 on 6 and 3003 DF,  p-value: < 2.2e-16 
#>