Uses of Imputation in Economic Analysis
Paper Session
Friday, Jan. 5, 2024 2:30 PM - 4:30 PM (CST)
- Chair: Serena Ng, Columbia University
Imputing Missing Values in the U.S. Census Bureau’s County Business Pattern
Abstract
The County Business Patterns data published by the US Census Bureau track employment by county and industry from 1946 to the present. Two features of the data limit their usefulness to researchers: (1) employment for the majority of county-industry cells is suppressed to protect confidentiality, and (2) industry classifications change over time. We address both issues. First, we develop a linear programming method that exploits the large set of adding-up constraints implicit in the hierarchical arrangement of the data to impute missing employment. Second, we provide concordances to map all data to a consistent set of industry codes. Finally, we construct a user-friendly, 1975 to 2018 county-level panel that classifies industries according to a consistent set of 2012 NAICS codes in all years. Paper can be found here: https://www.nber.org/papers/w26632Fixed-Effects PCA: Imputation and Inference for Large Non-stationary Panel Data with Missing Observations
Abstract
Fixed-Effects PCA: Imputation and Inference for Large Non-Stationary Panel Data with Missing Observations Abstract: This paper studies the imputation and inference for large dimensional non-stationary panel data with missing observations. We propose the novel method, Fixed-Effects PCA (FE-PCA), for estimating a latent factor structure with non-stationary two-way fixed effects. FE-PCA is simple-to-use and applicable to general missing patterns, which can depend on both the latent factor structure and the two-way fixed effects. We show the consistency and asymptotic normality of the estimated fixed-effects and factor model under general assumptions. The generality of our framework is particularly important for causal inference in panels, where the unobserved counterfactual outcomes can be modeled as missing values. For two well-known causal applications, we demonstrate that FE-PCA can lead to different and more credible economic conclusions compared to conventional difference-in-differences and PCA methods.Missing Data in Asset Pricing Panels
Abstract
Missing data for return predictors is a common problem in cross sectional asset pricing. Most papers do not explicitly discuss how they deal with missing data but conventional treatments focus on the subset of firms with no missing data for any predictor or impute the unconditional mean. Both methods have undesirable properties - they are either inefficient or lead to biased estimators and incorrect inference. We propose a simple and computationally attractive alternative using conditional mean imputations and weighted least squares, cast in a generalized method of moments (GMM) framework. This method allows us to use all observations with observed returns, it results in valid inference, and it can be applied in non-linear and high-dimensional settings. In Monte Carlo simulations, we find that it performs almost as well as the efficient but computationally costly GMM estimator in many cases. We apply our procedure to a large panel of return predictors and find that it leads to improved out-of-sample predictability. Paper can be found here: https://www.nber.org/papers/w30761JEL Classifications
- C1 - Econometric and Statistical Methods and Methodology: General
- C4 - Econometric and Statistical Methods: Special Topics