« Back to Results

Econometric Methods

Lightning Round Session

Friday, Jan. 3, 2025 10:15 AM - 12:15 PM (PST)

Hilton San Francisco Union Square
Hosted By: American Economic Association
  • Chair: Oscar Jorda, Federal Reserve Bank of San Francisco

Contaminated Control Variables in 2SLS Models

Asad Dossani
,
Colorado State University
Rob Schonlau
,
Colorado State University
Jeffrey Dotson
,
Brigham Young University

Abstract

Two stage least squares (2SLS) models are widely used in empirical research to account for endogeneity. Researchers typically include an assortment of control variables in addition to specific instrumental variables that are intended to isolate exogenous variation in the key variable of interest. In discussions of the relevance and exclusion conditions, these instruments are often well motivated insofar as they relate specifically to the key variable of interest and the error term. However, minimal consideration is typically given to the possibility that the other control variables might be correlated with both the instrument(s) and the error term. This ignored correlation biases the 2SLS estimate, is partially observable, and is not eliminated with “strong” instruments. We refer to models with this ignored correlation as having “contaminated controls”.

This paper makes the following contributions: First, we derive an analytical expression for the contaminated control bias in the 2SLS estimate as the product of two terms: one that depends on the correlation between the control variables and the error term, and another that depends on the correlation between the control variables and the instrument. Second, we propose a new diagnostic test for whether contaminated controls might be affecting the 2SLS estimate based on the observable correlation between the control variables and the instrument. We present simulation results to validate the distribution of the test statistic and investigate the power of the test. Third, using simulation studies we explore practical suggestions for what to do if the control variables are contaminated. Fourth, we provide an empirical example. We first duplicate the 2SLS results of a published paper suggesting that firms with multiple divisions experience a valuation premium rather than the diversification discount previously reported in this literature. We then show that this unexpected result could be driven by the contaminated controls in the model.

Estimating Counterfactual Matrix Means with Short Panel Data

Lihua Lei
,
Stanford University
Brad Ross
,
Stanford University

Abstract

We develop a new, spectral approach for identifying and estimating average counterfactual outcomes under a low-rank factor model with short panel data and general outcome missingness patterns. Potential applications include event studies and studies of outcomes of “matches” between agents of two types, e.g. workers and firms or people and places. We show that our approach identifies all counterfactual outcome means, including those not estimable by existing methods, if a particular graph constructed based on overlaps in the sets of observed outcomes between subpopulations of units is connected. Our analogous, computationally efficient estimation procedure yields consistent, asymptotically normal estimates of counterfactual outcome means under fixed-T (number of outcomes), large-N (sample size) asymptotics. In a semi-synthetic simulation study based on matched employer-employee data, our method yields estimates of average wages with lower bias and only slightly higher variance than a Two-Way-Fixed-Effects-model-based estimator, suggesting complementarities between workers and firms do affect wages.

External Validity in an Instrumental Variable Setting

Alexander Kwon
,
CUNY-Graduate Center
Kyungtae Lee
,
CUNY-Graduate Center

Abstract

We study the external validity within the context of instrumental variable estimation. The key assumption we impose for external validity is conditional external unconfoundedness among compliers, meaning that the treatment effect and target selection are independent among compliers conditional on covariates. By using a case study about the impact of solid fuel usage on women's average cooking time, we compare the local average treatment effect (LATE) of the country of interest with the predicted LATE estimated with data from other countries. While the sub-population is an important factor, it does not significantly undermine external validity in our case study. Among six countries examined, four (Ethiopia, Honduras, Kenya, and Zambia) exhibit no statistically significant difference between predicted and actual LATE across various specifications. These results give evidence that external validity is not severely harmed in our case study. Conversely, in Cambodia and Nepal, the two LATEs are statistically different, indicating distinct sub-populations compared to those in other countries. These findings provide evidence that sub-population is a non-trivial factor for external validity.

Fuzzy Regression Discontinuity Design without Monotonicity

Yi Cui
,
University of North Carolina-Chapel Hill

Abstract

In this paper, we present a novel approach to derive nonparametric sharp (i.e., the tightest possible) bounds for compliers and defiers separately under fuzzy regression discontinuity (FRD) design, without relying on the monotonicity assumption. Our method builds on the seminal work by Imbens and Angrist (1994), Angrist et al. (1996) and Hahn et al. (2001). Unlike the existing literature that tests the validity of FRD design assuming monotonicity or weaker forms of it, we demonstrate the invalidity and bias of Wald estimand without this crucial assumption.

Identification and Estimation of Discrete Choice Models with Spillovers Using Partial Network Data

Shuo Qi
,
Southern Methodist University

Abstract

Understanding peer influences is essential in an increasingly interconnected world. However, to what extent can we use data on social connections if they are incomplete? This paper investigates peer effects in discrete choice models with incomplete data on social links. Following Graham (2017), we set up an undirected dyadic link formation model where connections are based on homophily (similarities in characteristics) and individual fixed effects. We identify homophily effects through available configurations among tetrads (groups of four agents). We then identify the distribution of fixed effects through available configurations among triads (groups of three agents). After recovering the network-generating process, we propose a simulated network approach to study the influence of peers on individual decision-making. Simulations illustrate that the finite sample performance of the estimator is close to that obtained when the true network is observed. We apply our estimator to examine household microfinance participation decisions in rural India (Banerjee et al., 2013), detecting positive peer effects even in cases of missing links and missing networks.

Quantile Local Projections: Identification, Smooth Estimation, and Inference

Josef Ruzicka
,
Nazarbayev University

Abstract

Standard impulse response functions measure the average effect of a shock on a response variable. However, different parts of the distribution of the response variable may react to the shock differently. A popular method to capture this heterogeneity are quantile regression local projections. We identify them by short-run restrictions or external instruments, and we establish their asymptotics. To overcome their excessive volatility, we introduce two novel smoothing estimators. We propose information criteria for optimal smoothing and apply the estimators to shocks in financial conditions and monetary policy. We show that financial conditions affect the entire distribution of GDP growth and not just its lower part. Thus, financial conditions matter not only for recessions, but also during normal times and even in recovery periods. We also find that conventional monetary policy is more effective at curbing inflation than at generating it.

Reliable Wild Bootstrap Inference with Multiway Clustering

Jiahao Lin
,
SUNY-Albany
Ulrich Hounyo
,
SUNY-Albany

Abstract

This paper studies wild bootstrap-based inference for regression models with multiway clustering. Our proposed method is a multiway counterpart to the (one-way) wild cluster bootstrap approach introduced by Cameron et al. (2008). We establish the validity of our method for studentized statistics. Theoretical results are provided, accommodating arbitrary serial dependence in the common time effects – an aspect excluded by existing two-way bootstrap-based approaches. Simulation experiments document the potential for enhanced inference with our novel approach. We illustrate the effectiveness of the method by revisiting empirical studies involving multiway clustered and correlated data.

Synthetic IV Estimation in Panels

Jaume Vives
,
Massachusetts Institute of Technology
Ahmet Gulek
,
Massachusetts Institute of Technology

Abstract

We propose a Synthetic Instrumental Variables (SIV) estimator for panel data that combines the strengths of instrumental variables and synthetic controls to address unmeasured confounding. We derive conditions under which SIV is consistent and asymptotically normal, even when the standard IV estimator is not. Motivated by the finite sample properties of our estimator, we introduce an ensemble estimator that simultaneously addresses multiple sources of bias and provide a permutation-based inference procedure. We demonstrate the effectiveness of our methods through a calibrated simulation exercise, two shift-share empirical applications, and an application in digital economics that includes both observational data and data from a randomized control trial. In our primary empirical application, we examine the impact of the Syrian refugee crisis on Turkish labor markets. Here, the SIV estimator reveals significant effects that the standard IV does not capture. Similarly, in our digital economics application, the SIV estimator successfully recovers the experimental estimates, whereas the standard IV does not.

Testable Identification of Finite Mixture Models

Bruno de Albuquerque Furtado
,
Royal Holloway University of London and Oxford University

Abstract

Finite mixture models, in which observations are drawn from a combination of latent component distributions with unknown weights, have a wide array of applications in economics. It is well known that the latent parameters of such models are not always identifiable under common identification assumptions. This paper establishes sufficient conditions for non-parametric identifiability of finite mixtures models, assuming that an observable covariate shifts mixture weights while leaving component distributions unchanged. Such an assumption is naturally satisfied across various domains, including latent topic models of text data, addressing misclassified categorical regressors, and analyzing Markovian regime switching models. The proposed identification conditions focus solely on observable mixture distributions, enabling verification without direct knowledge of latent parameters. Therefore, they can be verified without direct knowledge of the latent parameters, and in principle can be checked prior to estimating the model. Building on this, I introduce a statistical test to assess whether these conditions are met and derive its asymptotic distribution. The test employs an extremum estimator with a strictly concave objective function, which makes simulating its asymptotic distribution computationally tractable. Additionally, a straightforward transformation of the same extremum estimator yields a consistent estimator of the latent parameters.

Testing Spatial Correlation for Spatial Models with Heterogeneous Coefficients When Both N and T Are Large

Shi Ryoung Chang
,
Ohio State University
Robert de Jong
,
Ohio State University

Abstract

The widely used approach to test spatial correlation in spatial models is to formulate the hypothesis as a restriction on a homogenous spatial autoregressive coefficient. This paper proposes a novel test statistic, denoted by S, for spatial correlation in the spatial panel data models with fully heterogeneous spatial lag coefficients when both n and T are large. In the case of small reciprocal interactions, the proposed S test is shown to have a standard normal distribution as (n,T) → ∞ such that √n/T → 0. Moreover, it is shown that the traditional test is likely to draw erroneous conclusions on spatial correlation when spatial correlations are heterogeneous in nature. Monte Carlo experiments show that the proposed S test has satisfactory finite sample properties and considerably better power than the traditional one.
JEL Classifications
  • C0 - General