« Back to Results

Uses of Imputation in Economic Analysis

Paper Session

Friday, Jan. 5, 2024 2:30 PM - 4:30 PM (CST)

Grand Hyatt, Presidio B
Hosted By: American Economic Association
  • Chair: Serena Ng, Columbia University

Parameter Recovery with Remotely Sensed Variables

Jonathan Proctor
,
Harvard University
Tamma Carleton
,
University of California-Santa Barbara
Sandy Sum
,
University of California-Santa Barbara

Abstract

Remotely sensed measurements and other machine learning predictions are increasingly used in place of direct observations in empirical analyses. Errors in such measures may bias parameter estimation, but it remains unclear how large such biases are or how to correct for them. We show empirically that using remotely sensed variables without correction leads to substantial bias in point estimates and standard errors across a diversity of models. We demonstrate that multiple imputation, a standard and easily implementable statistical imputation technique that has yet to be tested in this setting, effectively reduces bias and improves statistical coverage in both cross-sectional and panel data designs. Paper can be found here: https://www.nber.org/papers/w30861

Imputing Missing Values in the U.S. Census Bureau’s County Business Pattern

Fabian Eckert
,
University of California-San Diego
Teresa Fort
,
Dartmouth College
Peter Schott
,
Yale University
Natalie J. Yang
,
Columbia University

Abstract

The County Business Patterns data published by the US Census Bureau track employment by county and industry from 1946 to the present. Two features of the data limit their usefulness to researchers: (1) employment for the majority of county-industry cells is suppressed to protect confidentiality, and (2) industry classifications change over time. We address both issues. First, we develop a linear programming method that exploits the large set of adding-up constraints implicit in the hierarchical arrangement of the data to impute missing employment. Second, we provide concordances to map all data to a consistent set of industry codes. Finally, we construct a user-friendly, 1975 to 2018 county-level panel that classifies industries according to a consistent set of 2012 NAICS codes in all years. Paper can be found here: https://www.nber.org/papers/w26632

Fixed-Effects PCA: Imputation and Inference for Large Non-stationary Panel Data with Missing Observations

Junting Duan
,
Stanford University
Markus Pelger
,
Stanford University
Ruoxuan Xiong
,
Emory University

Abstract

Fixed-Effects PCA: Imputation and Inference for Large Non-Stationary Panel Data with Missing Observations Abstract: This paper studies the imputation and inference for large dimensional non-stationary panel data with missing observations. We propose the novel method, Fixed-Effects PCA (FE-PCA), for estimating a latent factor structure with non-stationary two-way fixed effects. FE-PCA is simple-to-use and applicable to general missing patterns, which can depend on both the latent factor structure and the two-way fixed effects. We show the consistency and asymptotic normality of the estimated fixed-effects and factor model under general assumptions. The generality of our framework is particularly important for causal inference in panels, where the unobserved counterfactual outcomes can be modeled as missing values. For two well-known causal applications, we demonstrate that FE-PCA can lead to different and more credible economic conclusions compared to conventional difference-in-differences and PCA methods.

Missing Data in Asset Pricing Panels

Joachim Freyberger
,
University of Bonn
Bjorn Hoppner
,
University of Bonn
Andreas Neuhierl
,
Washington University-St. Louis
Michael Weber
,
University of Chicago

Abstract

Missing data for return predictors is a common problem in cross sectional asset pricing. Most papers do not explicitly discuss how they deal with missing data but conventional treatments focus on the subset of firms with no missing data for any predictor or impute the unconditional mean. Both methods have undesirable properties - they are either inefficient or lead to biased estimators and incorrect inference. We propose a simple and computationally attractive alternative using conditional mean imputations and weighted least squares, cast in a generalized method of moments (GMM) framework. This method allows us to use all observations with observed returns, it results in valid inference, and it can be applied in non-linear and high-dimensional settings. In Monte Carlo simulations, we find that it performs almost as well as the efficient but computationally costly GMM estimator in many cases. We apply our procedure to a large panel of return predictors and find that it leads to improved out-of-sample predictability. Paper can be found here: https://www.nber.org/papers/w30761
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C4 - Econometric and Statistical Methods: Special Topics