Contaminated Control Variables in 2SLS Models
Abstract
Two stage least squares (2SLS) models are widely used in empirical research to account for endogeneity. Researchers typically include an assortment of control variables in addition to specific instrumental variables that are intended to isolate exogenous variation in the key variable of interest. In discussions of the relevance and exclusion conditions, these instruments are often well motivated insofar as they relate specifically to the key variable of interest and the error term. However, minimal consideration is typically given to the possibility that the other control variables might be correlated with both the instrument(s) and the error term. This ignored correlation biases the 2SLS estimate, is partially observable, and is not eliminated with “strong” instruments. We refer to models with this ignored correlation as having “contaminated controls”.This paper makes the following contributions: First, we derive an analytical expression for the contaminated control bias in the 2SLS estimate as the product of two terms: one that depends on the correlation between the control variables and the error term, and another that depends on the correlation between the control variables and the instrument. Second, we propose a new diagnostic test for whether contaminated controls might be affecting the 2SLS estimate based on the observable correlation between the control variables and the instrument. We present simulation results to validate the distribution of the test statistic and investigate the power of the test. Third, using simulation studies we explore practical suggestions for what to do if the control variables are contaminated. Fourth, we provide an empirical example. We first duplicate the 2SLS results of a published paper suggesting that firms with multiple divisions experience a valuation premium rather than the diversification discount previously reported in this literature. We then show that this unexpected result could be driven by the contaminated controls in the model.