Advances in Imputing Race and Ethnicity to Administrative Data
Paper Session
Friday, Jan. 5, 2024 2:30 PM - 4:30 PM (CST)
- Chair: Mark Mazur, U.S. Treasury Department
Measuring Marriage Penalties and Bonuses by Race and Ethnicity: An Application of Race and Ethnicity Re-Weighting to Tax Data
Abstract
Tax law can have different impacts on individuals in different racial and ethnic groups because individuals’ tax return characteristics vary across groups on average. As tax forms do not collect information about individual race or ethnicity, it has been challenging to use administrative tax data to analyze tax differentials by race and ethnicity. To facilitate better understanding, the U.S. Treasury Department's Office of Tax Analysis imputes race/ethnicity information for a stratified random sample of taxpayers on its microsimulation model. This paper uses this information to simulate the marriage-penalty and -bonus outcomes under the federal individual income tax system by race and ethnicity. Legal scholars such as Moran and Whitford (1996) and Brown (1997, 2021) have discussed probable disparate marriage penalty outcomes by race due to group differences in the spousal division of income. Comparing Black and White married couples shows sparse evidence of group disparities on average. However, for income levels above $75,000, Black couples have a higher penalty rate and a lower bonus rate relative to White couples with the same income, other things being equal. Hispanic couples on average have a higher penalty rate, a lower bonus rate, and a smaller bonus amount relative to White couples.Tax Expenditures by (Improved Imputed) Race and Ethnicity
Abstract
U.S. tax forms do not collect information about race or ethnicity. While no tax rule is established based on the taxpayer’s race or ethnicity, not taking race and ethnicity into consideration in the policymaking process can result in the unintentional consequence of widening racial and ethnic disparities in after-tax income. Fisher (2023) imputed race to the Office of Tax Analysis’ Individual Tax Model by applying Bayesian inference to a set of explanatory variables available in tax data, including total income, filing status, age, number of dependents, taxable interest, presence of farm income, first name, last name, and the ZIP Code Tabulation Area (ZCTA). Cronin, DeFilippes and Fisher (2023) used this imputation to analyze the distribution of tax expenditures by race and ethnicity (RH). This paper extends Cronin et al. (2023) by using better data sources for the imputation of RH, including Asian families as one of the RH categories, and by considering the effects of filing status by RH on previous results. We find that by updating the geocode and tax data to more recent years and using the richer Census data for first names, we are able to better estimate RH, especially for Asian families. We also find that filing status varies significantly across RH groups and reduces the measured differences in tax expenditure benefits by RH for certain tax expenditures. The measured difference in the benefits of the tax expenditure for preferential rates on capital gains and dividends is unchanged from the earlier paper.Using Multiple Data Sources to Learn about the Race and Ethnicity of Taxpayers
Abstract
A difficulty in using administrative tax data to study income distribution and other aspects of economic well-being is that tax data lacks information on race and ethnicity. The Congressional Budget Office (CBO) maintains an individual tax model that statistically merges administrative tax data with the Current Population Survey (CPS) to create a household distribution model that is used for CBO reports. The CBO tax model primarily uses information about income from tax returns, with supplemental information on non-filers, nontaxable sources of income, and household structure from the CPS. While the merged data contains race and ethnicity data from the CPS, CBO has not used it for analysis by race because it is not clear that the statistical match preserves the relationships between income, tax liability, and race and ethnicity. CBO is working with the Census Bureau to assess the validity of estimates of household income and other factors that affect tax liability by race and ethnicity in CBO’s statistically matched data, by comparing it to Census Bureau data that match CPS records to administrative tax data at the individual level. Understanding the quality of CBO's statistically matched CPS and tax data could expand ability to use alternative data sources and decrease the need to access to highly sensitive individually linked data. This paper will present preliminary comparisons of CBO's statistically matched data and Census’s linked data and discuss implications of the differences for future CBO analysis of income and taxes by race.Discussant(s)
Robert McClelland
,
Tax Policy Center, Urban Institute and Brookings Institution
Rhonda Vonshay Sharpe
,
Women's Institute for Science, Equity and Race
Charles Hokayem
,
U.S. Census Bureau
Sheridan Fuller
,
Federal Reserve Board
JEL Classifications
- H2 - Taxation, Subsidies, and Revenue
- C5 - Econometric Modeling