Fit instrumental-variable regression by two-stage least squares (2SLS). This is equivalent to direct instrumental-variables estimation when the number of instruments is equal to the number of regressors. Alternative robust-regression estimators are also provided, based on M-estimation (2SM) and MM-estimation (2SMM).

```
ivreg(
formula,
instruments,
data,
subset,
na.action,
weights,
offset,
contrasts = NULL,
model = TRUE,
y = TRUE,
x = FALSE,
method = c("OLS", "M", "MM"),
...
)
```

- formula, instruments
formula specification(s) of the regression relationship and the instruments. Either

`instruments`

is missing and`formula`

has three parts as in`y ~ x1 + x2 | z1 + z2 + z3`

(recommended) or`formula`

is`y ~ x1 + x2`

and`instruments`

is a one-sided formula`~ z1 + z2 + z3`

(only for backward compatibility).- data
an optional data frame containing the variables in the model. By default the variables are taken from the environment of the

`formula`

.- subset
an optional vector specifying a subset of observations to be used in fitting the model.

- na.action
a function that indicates what should happen when the data contain

`NA`

s. The default is set by the`na.action`

option.- weights
an optional vector of weights to be used in the fitting process.

- offset
an optional offset that can be used to specify an a priori known component to be included during fitting.

- contrasts
an optional list. See the

`contrasts.arg`

of`model.matrix.default`

.- model, x, y
logicals. If

`TRUE`

the corresponding components of the fit (the model frame, the model matrices, the response) are returned. These components are necessary for computing regression diagnostics.- method
the method used to fit the stage 1 and 2 regression:

`"OLS"`

for traditional 2SLS regression (the default),`"M"`

for M-estimation, or`"MM"`

for MM-estimation, with the latter two robust-regression methods implemented via the`rlm`

function in the MASS package.- ...
further arguments passed to

`ivreg.fit`

.

`ivreg`

returns an object of class `"ivreg"`

that inherits from
class `"lm"`

, with the following components:

- coefficients
parameter estimates, from the stage-2 regression.

- residuals
vector of model residuals.

- residuals1
matrix of residuals from the stage-1 regression.

- residuals2
vector of residuals from the stage-2 regression.

- fitted.values
vector of predicted means for the response.

- weights
either the vector of weights used (if any) or

`NULL`

(if none).- offset
either the offset used (if any) or

`NULL`

(if none).- estfun
a matrix containing the empirical estimating functions.

- n
number of observations.

- nobs
number of observations with non-zero weights.

- p
number of columns in the model matrix x of regressors.

- q
number of columns in the instrumental variables model matrix z

- rank
numeric rank of the model matrix for the stage-2 regression.

- df.residual
residual degrees of freedom for fitted model.

- cov.unscaled
unscaled covariance matrix for the coefficients.

- sigma
residual standard deviation.

- qr
QR decomposition for the stage-2 regression.

- qr1
QR decomposition for the stage-1 regression.

- rank1
numeric rank of the model matrix for the stage-1 regression.

- coefficients1
matrix of coefficients from the stage-1 regression.

- df.residual1
residual degrees of freedom for the stage-1 regression.

- exogenous
columns of the

`"regressors"`

matrix that are exogenous.- endogenous
columns of the

`"regressors"`

matrix that are endogenous.- instruments
columns of the

`"instruments"`

matrix that are instruments for the endogenous variables.- method
the method used for the stage 1 and 2 regressions, one of

`"OLS"`

,`"M"`

, or`"MM"`

.- rweights
a matrix of robustness weights with columns for each of the stage-1 regressions and for the stage-2 regression (in the last column) if the fitting method is

`"M"`

or`"MM"`

,`NULL`

if the fitting method is`"OLS"`

.- hatvalues
a matrix of hatvalues. For

`method = "OLS"`

, the matrix consists of two columns, for each of the stage-1 and stage-2 regression; for`method = "M"`

or`"MM"`

, there is one column for*each*stage=1 regression and for the stage-2 regression.- df.residual
residual degrees of freedom for fitted model.

- call
the original function call.

- formula
the model formula.

- na.action
function applied to missing values in the model fit.

- terms
a list with elements

`"regressors"`

and`"instruments"`

containing the terms objects for the respective components.- levels
levels of the categorical regressors.

- contrasts
the contrasts used for categorical regressors.

- model
the full model frame (if

`model = TRUE`

).- y
the response vector (if

`y = TRUE`

).- x
a list with elements

`"regressors"`

,`"instruments"`

,`"projected"`

, containing the model matrices from the respective components (if`x = TRUE`

).`"projected"`

is the matrix of regressors projected on the image of the instruments.

`ivreg`

is the high-level interface to the work-horse function
`ivreg.fit`

. A set of standard methods (including `print`

,
`summary`

, `vcov`

, `anova`

, `predict`

, `residuals`

,
`terms`

, `model.matrix`

, `bread`

, `estfun`

) is available
and described in `ivregMethods`

. For methods related to regression
diagnostics, see `ivregDiagnostics`

.

Regressors and instruments for `ivreg`

are most easily specified in a
formula with two parts on the right-hand side, e.g., ```
y ~ x1 + x2 | z1
+ z2 + z3
```

, where `x1`

and `x2`

are the explanatory variables and `z1`

,
`z2`

, and `z3`

are the instrumental variables. Note that exogenous regressors
have to be included as instruments for themselves.

For example, if there is
one exogenous regressor `ex`

and one endogenous regressor `en`

with instrument `in`

, the appropriate formula would be ```
y ~ en +
ex | in + ex
```

. Alternatively, a formula with three parts on the right-hand
side can also be used: `y ~ ex | en | in`

. The latter is typically more convenient, if
there is a large number of exogenous regressors.

Moreover, two further equivalent specification strategies are possible that are
typically less convenient compared to the strategies above. One option is to use
an update formula with a `.`

in the second part of the formula is used:
`y ~ en + ex | . - en + in`

. Another option is to use a separate formula
for the instruments (only for backward compatibility with earlier versions):
`formula = y ~ en + ex, instruments = ~ in + ex`

.

Internally, all specifications are converted to the version with two parts on the right-hand side.

Greene, W.H. (2003) *Econometric Analysis*, 5th ed., Upper Saddle River: Prentice Hall.

`ivreg.fit`

, `ivregDiagnostics`

, `ivregMethods`

,
`lm`

, `lm.fit`

```
## data
data("CigaretteDemand", package = "ivreg")
## model
m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome),
data = CigaretteDemand)
summary(m)
#>
#> Call:
#> ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | salestax +
#> log(rincome), data = CigaretteDemand)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.611000 -0.086072 0.009423 0.106912 0.393159
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.4307 1.3584 6.943 1.24e-08 ***
#> log(rprice) -1.1434 0.3595 -3.181 0.00266 **
#> log(rincome) 0.2145 0.2686 0.799 0.42867
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.1896 on 45 degrees of freedom
#> Multiple R-Squared: 0.4189, Adjusted R-squared: 0.3931
#> Wald test: 6.534 on 2 and 45 DF, p-value: 0.003227
#>
summary(m, vcov = sandwich::sandwich, df = Inf)
#>
#> Call:
#> ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | salestax +
#> log(rincome), data = CigaretteDemand)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.611000 -0.086072 0.009423 0.106912 0.393159
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 9.4307 1.2194 7.734 1.04e-14 ***
#> log(rprice) -1.1434 0.3605 -3.172 0.00151 **
#> log(rincome) 0.2145 0.3018 0.711 0.47729
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.1896 on Inf degrees of freedom
#> Multiple R-Squared: 0.4189, Adjusted R-squared: 0.3931
#> Wald test: 17.47 on 2 DF, p-value: 0.0001605
#>
## ANOVA
m2 <- update(m, . ~ . - log(rincome) | . - log(rincome))
anova(m, m2)
#> Analysis of Variance Table
#>
#> Model 1: log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome)
#> Model 2: log(packs) ~ log(rprice) | salestax
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 45 1.6172
#> 2 46 1.6668 -1 -0.049558 0.6379 0.4287
car::Anova(m)
#> Analysis of Deviance Table (Type II tests)
#>
#> Response: log(packs)
#> Df F Pr(>F)
#> log(rprice) 1 10.1161 0.002662 **
#> log(rincome) 1 0.6379 0.428667
#> Residuals 45
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## same model specified by formula with three-part right-hand side
ivreg(log(packs) ~ log(rincome) | log(rprice) | salestax, data = CigaretteDemand)
#>
#> Call:
#> ivreg(formula = log(packs) ~ log(rincome) | log(rprice) | salestax, data = CigaretteDemand)
#>
#> Coefficients:
#> (Intercept) log(rprice) log(rincome)
#> 9.4307 -1.1434 0.2145
#>
# Robust 2SLS regression
data("Kmenta", package = "ivreg")
Kmenta1 <- Kmenta
Kmenta1[20, "Q"] <- 95 # corrupted data
deq <- ivreg(Q ~ P + D | D + F + A, data=Kmenta) # demand equation, uncorrupted data
deq1 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1) # standard 2SLS, corrupted data
deq2 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, subset=-20) # standard 2SLS, removing bad case
deq3 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, method="MM") # 2SLS MM estimation
car::compareCoefs(deq, deq1, deq2, deq3)
#> Calls:
#> 1: ivreg(formula = Q ~ P + D | D + F + A, data = Kmenta)
#> 2: ivreg(formula = Q ~ P + D | D + F + A, data = Kmenta1)
#> 3: ivreg(formula = Q ~ P + D | D + F + A, data = Kmenta1, subset = -20)
#> 4: ivreg(formula = Q ~ P + D | D + F + A, data = Kmenta1, method = "MM")
#>
#> Model 1 Model 2 Model 3 Model 4
#> (Intercept) 94.63 117.96 92.42 91.09
#> SE 7.92 11.64 9.67 10.62
#>
#> P -0.2436 -0.4054 -0.2300 -0.2374
#> SE 0.0965 0.1417 0.1047 0.1135
#>
#> D 0.3140 0.2351 0.3233 0.3468
#> SE 0.0469 0.0690 0.0527 0.0569
#>
round(deq3$rweights, 2) # robustness weights
#> P stage_2
#> 1922 0.97 0.98
#> 1923 0.97 0.98
#> 1924 1.00 0.87
#> 1925 1.00 0.96
#> 1926 0.98 0.90
#> 1927 1.00 0.98
#> 1928 0.97 0.95
#> 1929 0.64 0.53
#> 1930 0.80 0.91
#> 1931 0.89 0.77
#> 1932 0.98 1.00
#> 1933 1.00 0.91
#> 1934 0.97 0.92
#> 1935 0.89 1.00
#> 1936 0.72 0.88
#> 1937 0.84 0.53
#> 1938 0.94 1.00
#> 1939 0.53 0.69
#> 1940 1.00 0.98
#> 1941 0.98 0.00
```