Data for Tax-Calculator#

A Tax-Calculator Records object is created by passing a Pandas DataFrame or a string that provides the path to a CSV file with data you’d like to use with Tax-Calculator.

Using prepared data with Tax-Calculator#

To make Tax-Calculator more useful out of the box, it includes three data options for users, two of which (cps.csv and puf.csv) are created in the taxdata repository, and one of which (tmd.csv) is created in the tax-microdata repository. We refer users to those repositories for more specific documentation of these data, but we provide a brief overview of the three prepared data options that are compatible with Tax-Calculator.

Current Population Survey data (cps.csv)#

Using the Records.cps_constructor() method to create a Records class object, Tax-Calculator users will be loading the taxdata Current Population Survey (CPS) data file. This file is based on publicly available survey data, which is then weighted in taxdata to hit IRS/SOI targets. The data are then grown out to hit aggregate forecasts through the time horizon available in Tax-Calculator (approximately the next 10 years). All the files required to use this prepared data option are included in the Tax-Calculator package.

Tax-Calculator provides unit tests to ensure that certain totals are hit with the CPS-based file. However, users should note that these tests are simply to ensure the accuracy of Tax-Calculator’s tax logic and not the accuracy of the CPS-based data file produced by taxdata. Please see the taxdata documentation for any validation of those data.

2011 IRS public use data (puf.csv)#

The taxdata repository also produces a weights file and growth factors file for use with the 2011 IRS-SOI Public Use File (PUF). For users who have purchased their own version of the 2011 PUF, the puf_weights.csv.gz and growfactors.csv files that are included in Tax-Calculator can be used along with the taxdata generated puf.csv file when using Tax-Calculator.

We refer users of the PUF to the IRS limitations on the use of those data and their distribution. We also refer users of the PUF weights file and grow factors to the taxdata documentation for details on how to use these files with the PUF and to see how well the resulting tax calculations hit aggregate targets published by the IRS. However, we do note that analysis with a PUF-based data file tends to be more accurate than the cps.csv file and that the validation of Tax-Calculator with other microsimulation models uses the puf.csv file.

2015 IRS public use data (tmd.csv)#

The tax-microdata repository produces an input variables file (tmd.csv), a national weights file (tmd_weights.csv.gz), and a variable growth factors file (tmd_growfactors.csv) that can be used with the Tax-Calculator package beginning with the 3.6.0 release. The tmd.csv file is available only to Tax-Calculator users who have purchased their own version of the 2015 IRS-SOI PUF. For those users, those three files are avaiable from the tax-microdata repository. These three tmd files can be used with the Tax-Calculator Python API (using the Records.tmd_constructor() and Policy.tmd_constructor()static methods) or with the Tax-Calculator CLI tool, tc.

Using other data with Tax-Calculator#

Using other data sources with Tax-Calculator is possible. Users can pass any csv file to the Records class and, so long as it has the appropriate input variables, one may be able to obtain results. Using Tax-Calculator with custom data takes care and significant understanding of the model and data. Those interested in using their own data with Tax-Calculator might also look to the Tax-Cruncher project, which is built as an interface between Tax-Calculator and custom datasets.