Recipe 3: Creating a Custom Table#

This is an advanced recipe that should be followed only after mastering the basic recipe. This recipe shows how to prepare a custom table.

import taxcalc as tc
import numpy as np

# use publicly-available CPS input file
recs = tc.Records.cps_constructor()

# specify Calculator object for static analysis of current-law policy
pol = tc.Policy()
calc = tc.Calculator(policy=pol, records=recs)

CYR = 2020

# calculate aggregate current-law income tax liabilities for cyr
calc.advance_to_year(CYR)
calc.calc_all()

# tabulate custom table showing number of filing units receiving EITC
# and the average positive EITC amount by IRS-SOI AGI categories
vardf = calc.dataframe(['s006', 'c00100', 'eitc'])
vardf = tc.add_income_table_row_variable(vardf, 'c00100', tc.SOI_AGI_BINS)
gbydf = vardf.groupby('table_row', as_index=False)
/Users/jason.debacker/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/var/folders/bw/1cvntf9x02b49nfwgy_qnfn00000gp/T/ipykernel_23337/2301227557.py:21: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  gbydf = vardf.groupby('table_row', as_index=False)

Filing Units Receiving EITC and Average Positive EITC by AGI Category

# print AGI table with ALL row at bottom
results = '{:23s}\t{:8.3f}\t{:8.3f}'
colhead = '{:23s}\t{:>8s}\t{:>8s}'
print(colhead.format('AGI Category', 'Num(#M)', 'Avg($K)'))
tot_recips = 0.
tot_amount = 0.
idx = 0
for grp_interval, grp in gbydf:
    recips = grp[grp['eitc'] > 0]['s006'].sum() * 1e-6
    tot_recips += recips
    amount = (grp['eitc'] * grp['s006']).sum() * 1e-9
    tot_amount += amount
    if recips > 0:
        avg = amount / recips
    else:
        avg = np.nan
    glabel = '[{:.8g}, {:.8g})'.format(grp_interval.left, grp_interval.right)
    print(results.format(glabel, recips, avg))
    idx += 1
avg = tot_amount / tot_recips
print(results.format('ALL', tot_recips, avg))
AGI Category           	 Num(#M)	 Avg($K)
[-9e+99, 1)            	   0.077	   0.306
[1, 5000)              	   3.177	   0.548
[5000, 10000)          	   4.808	   1.343
[10000, 15000)         	   5.338	   1.878
[15000, 20000)         	   3.159	   3.613
[20000, 25000)         	   2.578	   4.326
[25000, 30000)         	   2.303	   3.867
[30000, 40000)         	   4.457	   2.660
[40000, 50000)         	   2.644	   1.384
[50000, 75000)         	   0.536	   0.547
[75000, 100000)        	   0.000	     nan
[100000, 200000)       	   0.000	     nan
[200000, 500000)       	   0.000	     nan
[500000, 1000000)      	   0.000	     nan
[1000000, 1500000)     	   0.000	     nan
[1500000, 2000000)     	   0.000	     nan
[2000000, 5000000)     	   0.000	     nan
[5000000, 10000000)    	   0.000	     nan
[10000000, 9e+99)      	   0.000	     nan
ALL                    	  29.077	   2.254