« Back to Results

High-Dimensional Econometrics and Machine Learning

Paper Session

Sunday, Jan. 3, 2021 3:45 PM - 5:45 PM (EST)

Hosted By: Econometric Society
  • Chair: Mikkel Soelvsten, University of Wisconsin-Madison

Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts

Sukjin Han
,
University of Texas-Austin
Eric Schulman
,
University of Texas-Austin

Abstract

Many products have key attributes that are high dimensional (e.g., design, text). Quantifying these attributes is important for economic analysis. This paper considers one of the simplest design products, fonts, and quantifies their shape by constructing their embeddings using a modern convolutional neural network. The embedding maps fonts’ shapes onto a low dimensional vector. Importantly, we verify the resulting embed- ding is economically meaningful by showing that mutual information is high between the embedding and descriptions assigned to each font by font designers and consumers. We illustrate the usefulness of the embeddings by a simple trend analysis of font style.

Sparse Quantile Regression

Le-Yu Chen
,
Academia Sinica
Sokbae (Simon) Lee
,
Columbia University

Abstract

We estimate a quantile regression model with a penalty on the number of selected covariates. We derive probability bounds on the estimated sparsity as well as probability and expectation bounds on the excess quantile prediction risk and the mean-square parameter estimation error of our proposed estimator. These theoretical results are non-asymptotic and established in a high-dimensional setting. In particular, we show that our method yields a sparse estimator whose L0-norm can be close to true sparsity with high probability and obtain the oracle rates of convergence for the excess prediction risk and the mean-square parameter estimation error. We implement the proposed procedure via the method of mixed integer linear programming and also a more scalable first-order approximation algorithm. The finite-sample numerical performance is illustrated in Monte Carlo experiments.

Testing Many Restrictions Under Heteroskedasticity

Stanislav Anatolyev
,
CERGE-EI and New Economic School
Mikkel Soelvsten
,
University of Wisconsin-Madison

Abstract

We propose a hypothesis test that allows for many tested restrictions in a heteroskedastic linear regression model. The test compares the conventional F-statistic to a critical value that corrects for many restrictions and conditional heteroskedasticity. The correction utilizes leave-one-out estimation to recenter the conventional critical value and leave-three-out estimation to rescale it. Large sample properties of the test are established in an asymptotic framework where the number of tested restrictions may grow in proportion to the number of observations. We show that the test is asymptotically valid and has non-trivial asymptotic power against the same local alternatives as the exact F test when the latter is valid. Simulations corroborate the relevance of these theoretical findings and suggest excellent size control in moderately small samples also under strong heteroskedasticity.

Inference for High-Dimensional Exchangeable Arrays

Harold Chiang
,
Vanderbilt University
Kengo Kato
,
Cornell University
Yuya Sasaki
,
Vanderbilt University

Abstract

For multiway cluster sampled data and dyadic data, we develop novel bootstrap methods and theories for inference about multi- dimensional, increasing-dimensional and high-dimensional parameters. Based on non-asymptotic Gaussian approximation error bounds for the test-statistic on hyper-rectangles, we propose novel bootstrap methods and establish their finite sample validity. We illustrate applications of our proposed methods to robust inference in demand analysis, robust inference in extended gravity analysis, and construction of uniform confidence bands for densities of migration and trade.

Inference for Heterogeneous Treatment Effects for Observational Data with High-Dimensional Covariates

Jing Tao
,
University of Washington

Abstract

We consider heterogeneous treatment effects on a set of high-dimensional covariates for observational data without the strong ignorability assumption (Rosenbaum and Rubin, 1983). With a binary instrumental variable, the parameters of interest are identifiable on an unobservable subgroup (compliers) of the population through a two-stage regression model. The Lasso estimation under a non-convex objective function is developed for the two-stage regression. Its de-sparsifying estimator and the inference procedure are proposed. The confidence interval for the treatment effect given specific covariates is also constructed. The proposed approach works for both continuous and categorical response variables under the framework of generalized linear models. Theoretical properties of the proposed method are derived, and simulation studies are conducted to evaluate its performance. A real data analysis on the Oregon Health Insurance Experiment is performed to illustrate the utility of the proposed method in practice.
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C5 - Econometric Modeling