Recipe 3: Creating a Custom Table#
This is an advanced recipe that should be followed only after mastering the basic recipe. This recipe shows how to prepare a custom table.
import taxcalc as tc
import numpy as np
# use publicly-available CPS input file
recs = tc.Records.cps_constructor()
# specify Calculator object for static analysis of current-law policy
pol = tc.Policy()
calc = tc.Calculator(policy=pol, records=recs)
CYR = 2020
# calculate aggregate current-law income tax liabilities for cyr
calc.advance_to_year(CYR)
calc.calc_all()
# tabulate custom table showing number of filing units receiving EITC
# and the average positive EITC amount by IRS-SOI AGI categories
vardf = calc.dataframe(['s006', 'c00100', 'eitc'])
vardf = tc.add_income_table_row_variable(vardf, 'c00100', tc.SOI_AGI_BINS)
gbydf = vardf.groupby('table_row', as_index=False)
/Users/jason.debacker/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
/var/folders/bw/1cvntf9x02b49nfwgy_qnfn00000gp/T/ipykernel_23337/2301227557.py:21: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
gbydf = vardf.groupby('table_row', as_index=False)
Filing Units Receiving EITC and Average Positive EITC by AGI Category
# print AGI table with ALL row at bottom
results = '{:23s}\t{:8.3f}\t{:8.3f}'
colhead = '{:23s}\t{:>8s}\t{:>8s}'
print(colhead.format('AGI Category', 'Num(#M)', 'Avg($K)'))
tot_recips = 0.
tot_amount = 0.
idx = 0
for grp_interval, grp in gbydf:
recips = grp[grp['eitc'] > 0]['s006'].sum() * 1e-6
tot_recips += recips
amount = (grp['eitc'] * grp['s006']).sum() * 1e-9
tot_amount += amount
if recips > 0:
avg = amount / recips
else:
avg = np.nan
glabel = '[{:.8g}, {:.8g})'.format(grp_interval.left, grp_interval.right)
print(results.format(glabel, recips, avg))
idx += 1
avg = tot_amount / tot_recips
print(results.format('ALL', tot_recips, avg))
AGI Category Num(#M) Avg($K)
[-9e+99, 1) 0.077 0.306
[1, 5000) 3.177 0.548
[5000, 10000) 4.808 1.343
[10000, 15000) 5.338 1.878
[15000, 20000) 3.159 3.613
[20000, 25000) 2.578 4.326
[25000, 30000) 2.303 3.867
[30000, 40000) 4.457 2.660
[40000, 50000) 2.644 1.384
[50000, 75000) 0.536 0.547
[75000, 100000) 0.000 nan
[100000, 200000) 0.000 nan
[200000, 500000) 0.000 nan
[500000, 1000000) 0.000 nan
[1000000, 1500000) 0.000 nan
[1500000, 2000000) 0.000 nan
[2000000, 5000000) 0.000 nan
[5000000, 10000000) 0.000 nan
[10000000, 9e+99) 0.000 nan
ALL 29.077 2.254