TaxCalculator Utilities¶
TaxCalculator Utilities
taxcalc.utils¶
PUBLIC lowlevel utility functions for TaxCalculator.

taxcalc.utils.
add_income_table_row_variable
(dframe, income_measure, bin_edges)[source]¶ Add a variable to specified Pandas DataFrame, dframe, that specifies the table row and is called ‘table_row’. The rows are defined by the specified bin_edges function argument. Note that the bin groupings are LEFT INCLUSIVE, which means that bin_edges=[1,2,3,4] implies these three bin groupings: [1,2), [2,3), [3,4).
 Parameters
dframe (Pandas DataFrame) – the object to which we are adding bins
income_measure (String) – specifies income variable used to construct bins
bin_edges (list of scalar bin edges) –
 Returns
dframe – the original input plus the added ‘table_row’ column
 Return type
Pandas DataFrame

taxcalc.utils.
add_quantile_table_row_variable
(dframe, income_measure, num_quantiles, pop_quantiles=False, decile_details=False, weight_by_income_measure=False)[source]¶ Add a variable to specified Pandas DataFrame, dframe, that specifies the table row and is called ‘table_row’.
When weight_by_income_measure=False, the rows hold an equal number of people if pop_quantiles=True or an equal number of filing units if pop_quantiles=False.
When weight_by_income_measure=True, the rows hold an equal number of income dollars.
This function assumes that specified dframe contains columns for the specified income_measure and for sample weights, s006, and when pop_quantiles=True, number of exemptions, XTOT.
 . When num_quantiles is 10 and decile_details is True,
the bottom decile is broken up into three subgroups (neg, zero, and pos income_measure) and the top decile is broken into three subgroups (9095, 9599, and top 1%).

taxcalc.utils.
atr_graph_data
(vdf, year, mars='ALL', atr_measure='combined', pop_quantiles=False)[source]¶ Prepare average tax rate data needed by xtr_graph_plot utility function.
 Parameters
vdf (a Pandas DataFrame object containing variables and tax liabilities) – (See Calculator.atr_graph method for required elements of vdf.)
year (integer) – specifies calendar year of the data in vdf
mars (integer or string) –
specifies which filing status subgroup to show in the graph
’ALL’: include all filing units in sample
1: include only single filing units
2: include only marriedfilingjointly filing units
3: include only marriedfilingseparately filing units
4: include only headofhousehold filing units
atr_measure (string) –
specifies which average tax rate to show on graph’s y axis
’itax’: average individual income tax rate
’ptax’: average payroll tax rate
’combined’: sum of average income and payroll tax rates
pop_quantiles (boolean) – specifies whether or not quantiles contain an equal number of people (True) or an equal number of filing units (False)
 Returns
 Return type
dictionary object suitable for passing to xtr_graph_plot utility function

taxcalc.utils.
bootstrap_se_ci
(data, seed, num_samples, statistic, alpha)[source]¶ Return bootstrap estimate of standard error of statistic and bootstrap estimate of 100*(12*alpha)% confidence interval for statistic in a dictionary along with specified seed and nun_samples (B) and alpha.

taxcalc.utils.
ce_aftertax_expanded_income
(df1, df2, custom_params=None, require_no_agg_tax_change=True)[source]¶ Return dictionary that contains certaintyequivalent of the expected utility of aftertax expanded income computed for several constantrelativeriskaversion parameter values for each of two Pandas DataFrame objects: df1, which represents the prereform situation, and df2, which represents the postreform situation. Both DataFrame objects must contain ‘s006’, ‘combined’, and ‘expanded_income’ columns.
IMPORTANT NOTES: These normative welfare calculations are very simple. It is assumed that utility is a function of only consumption, and that consumption is equal to aftertax income. This means that any assumed responses that change work effort will not affect utility via the correpsonding change in leisure. And any saving response to changes in aftertax income do not affect consumption.
The cmin value is the consumption level below which marginal utility is considered to be constant. This allows the handling of filing units with very low or even negative aftertax expanded income in the expectedutility and certaintyequivalent calculations.

taxcalc.utils.
certainty_equivalent
(exputil, crra, cmin)[source]¶ Calculate and return certaintyequivalent of exputil of consumption assuming an isoelastic utility function with crra and cmin as parameters.
 Parameters
exputil (float) – expected utility value
crra (nonnegative float) – constant relative risk aversion parameter of isoelastic utility function
cmin (positive float) – consumption level below which marginal utility is assumed to be constant
 Returns
 Return type
certaintyequivalent of specified expected utility, exputil

taxcalc.utils.
create_diagnostic_table
(dframe_list, year_list)[source]¶ Extract diagnostic table from list of Pandas DataFrame objects returned from a Calculator dataframe(DIST_VARIABLES) call for each year in the specified list of years.
 Parameters
dframe_list (list of Pandas DataFrame objects containing the variables) –
year_list (list of calendar years corresponding to the dframe_list) –
 Returns
 Return type
Pandas DataFrame object containing the diagnostic table

taxcalc.utils.
create_difference_table
(vdf1, vdf2, groupby, tax_to_diff, pop_quantiles=False)[source]¶ Get results from two different vdf, construct tax difference results, and return the difference statistics as a table.
 Parameters
vdf1 (Pandas DataFrame including columns named in DIFF_VARIABLES list) – for example, object returned from a dataframe(DIFF_VARIABLES) call on the basesline Calculator object
vdf2 (Pandas DataFrame including columns in the DIFF_VARIABLES list) – for example, object returned from a dataframe(DIFF_VARIABLES) call on the reform Calculator object
groupby (String object) –
 options for input: ‘weighted_deciles’ or
’standard_income_bins’ or ‘soi_agi_bins’
determines how the rows in the resulting Pandas DataFrame are sorted
tax_to_diff (String object) – options for input: ‘iitax’, ‘payrolltax’, ‘combined’ specifies which tax to difference
pop_quantiles (boolean) – specifies whether or not weighted_deciles contain an equal number of people (True) or an equal number of filing units (False)
 Returns
difference table as a Pandas DataFrame with DIFF_TABLE_COLUMNS and
groupby rows.
NOTE (when groupby is ‘weighted_deciles’, the returned table has three) – extra rows containing topdecile detail consisting of statistics for the 0.900.95 quantile range (bottom half of top decile), for the 0.950.99 quantile range, and for the 0.991.00 quantile range (top one percent); and the returned table splits the bottom decile into filing units with negative (denoted by a 010n row label), zero (denoted by a 010z row label), and positive (denoted by a 010p row label) values of the specified income_measure.

taxcalc.utils.
create_distribution_table
(vdf, groupby, income_measure, pop_quantiles=False, scaling=True)[source]¶ Get results from vdf, sort them by expanded_income based on groupby, and return them as a table.
 Parameters
vdf (Pandas DataFrame including columns named in DIST_TABLE_COLUMNS list) – for example, an object returned from the distribution_table_dataframe function in the Calculator distribution_tables method
groupby (String object) –
 options for input: ‘weighted_deciles’ or
’standard_income_bins’ or ‘soi_agi_bins’
determines how the rows in the resulting Pandas DataFrame are sorted
income_measure (String object) – options for input: ‘expanded_income’ or ‘expanded_income_baseline’ determines which variable is used to sort rows
pop_quantiles (boolean) – specifies whether or not weighted_deciles contain an equal number of people (True) or an equal number of filing units (False)
scaling (boolean) – specifies whether or not table entry values are scaled
 Returns
distribution table as a Pandas DataFrame with DIST_TABLE_COLUMNS and
groupby rows.
NOTE (when groupby is ‘weighted_deciles’, the returned table has three) – extra rows containing topdecile detail consisting of statistics for the 0.900.95 quantile range (bottom half of top decile), for the 0.950.99 quantile range, and for the 0.991.00 quantile range (top one percent); and the returned table splits the bottom decile into filing units with negative (denoted by a 010n row label), zero (denoted by a 010z row label), and positive (denoted by a 010p row label) values of the specified income_measure.

taxcalc.utils.
expected_utility
(consumption, probability, crra, cmin)[source]¶ Calculate and return expected utility of consumption.
 Parameters
consumption (numpy array) – consumption for each filing unit
probability (numpy array) – samplying probability of each filing unit
crra (nonnegative float) – constant relative risk aversion parameter of isoelastic utility function
cmin (positive float) – consumption level below which marginal utility is assumed to be constant
 Returns
 Return type
expected utility of consumption array

taxcalc.utils.
get_sums
(dframe)[source]¶ Compute unweighted sum of items in each column of Pandas DataFrame, dframe.
 Returns
 Return type
Pandas Series object containing column sums indexed by dframe column names.

taxcalc.utils.
isoelastic_utility_function
(consumption, crra, cmin)[source]¶ Calculate and return utility of consumption.
 Parameters
consumption (float) – consumption for a filing unit
crra (nonnegative float) – constant relative risk aversion parameter
cmin (positive float) – consumption level below which marginal utility is assumed to be constant
 Returns
 Return type
utility of consumption

taxcalc.utils.
json_to_dict
(json_text)[source]¶ Convert specified JSON text into an ordered Python dictionary.
 Parameters
json_text (string) – JSON text.
 Raises
ValueError: – if json_text contains a JSON syntax error.
 Returns
dictionary – JSON data expressed as an ordered Python dictionary.
 Return type
collections.OrderedDict

taxcalc.utils.
mtr_graph_data
(vdf, year, mars='ALL', mtr_measure='combined', mtr_variable='e00200p', alt_e00200p_text='', mtr_wrt_full_compen=False, income_measure='expanded_income', pop_quantiles=False, dollar_weighting=False)[source]¶ Prepare marginal tax rate data needed by xtr_graph_plot utility function.
 Parameters
vdf (a Pandas DataFrame object containing variables and marginal tax rates) – (See Calculator.mtr_graph method for required elements of vdf.)
year (integer) – specifies calendar year of the data in vdf
mars (integer or string) –
specifies which filing status subgroup to show in the graph
’ALL’: include all filing units in sample
1: include only single filing units
2: include only marriedfilingjointly filing units
3: include only marriedfilingseparately filing units
4: include only headofhousehold filing units
mtr_measure (string) –
specifies which marginal tax rate to show on graph’s y axis
’itax’: marginal individual income tax rate
’ptax’: marginal payroll tax rate
’combined’: sum of marginal income and payroll tax rates
mtr_variable (string) – any string in the Calculator.VALID_MTR_VARS set specifies variable to change in order to compute marginal tax rates
alt_e00200p_text (string) – text to use in place of mtr_variable when mtr_variable is ‘e00200p’; if empty string then use ‘e00200p’
mtr_wrt_full_compen (boolean) – see documentation of Calculator.mtr() argument wrt_full_compensation (value has an effect only if mtr_variable is ‘e00200p’)
income_measure (string) –
specifies which income variable to show on the graph’s x axis
’wages’: wage and salary income (e00200)
’agi’: adjusted gross income, AGI (c00100)
’expanded_income’: sum of AGI, nontaxable interest income, nontaxable social security benefits, and employer share of FICA taxes.
pop_quantiles (boolean) – specifies whether or not quantiles contain an equal number of people (True) or an equal number of filing units (False)
dollar_weighting (boolean) – False implies both income_measure percentiles on x axis and mtr values for each percentile on the y axis are computed without using dollar income_measure weights (just sampling weights); True implies both income_measure percentiles on x axis and mtr values for each percentile on the y axis are computed using dollar income_measure weights (in addition to sampling weights). Specifying True produces a graph x axis that shows income_measure (not filing unit) percentiles.
 Returns
 Return type
dictionary object suitable for passing to xtr_graph_plot utility function

taxcalc.utils.
pch_graph_data
(vdf, year, pop_quantiles=False)[source]¶ Prepare percentage change in aftertax expanded income data needed by pch_graph_plot utility function.
 Parameters
vdf (a Pandas DataFrame object containing variables) – (See Calculator.pch_graph method for required elements of vdf.)
year (integer) – specifies calendar year of the data in vdf
pop_quantiles (boolean) – specifies whether or not quantiles contain an equal number of people (True) or an equal number of filing units (False)
 Returns
 Return type
dictionary object suitable for passing to pch_graph_plot utility function

taxcalc.utils.
pch_graph_plot
(data, width=850, height=500, xlabel='', ylabel='', title='')[source]¶ Plot percentage change in aftertax expanded income using data returned from the pch_graph_data function.
 Parameters
data (dictionary object returned from ?tr_graph_data() utility function) –
width (integer) – width of plot expressed in pixels
height (integer) – height of plot expressed in pixels
xlabel (string) – xaxis label; if ‘’, then use label generated by pch_graph_data
ylabel (string) – yaxis label; if ‘’, then use label generated by pch_graph_data
title (string) – graph title; if ‘’, then use title generated by pch_graph_data
 Returns
 Return type
bokeh.plotting figure object containing a raster graphics plot
Notes
See Notes to xtr_graph_plot function.

taxcalc.utils.
read_egg_csv
(fname, index_col=None)[source]¶ Read from egg the file named fname that contains CSV data and return pandas DataFrame containing the data.

taxcalc.utils.
read_egg_json
(fname)[source]¶ Read from egg the file named fname that contains JSON data and return dictionary containing the data.

taxcalc.utils.
unweighted_sum
(dframe, col_name)[source]¶ Return unweighted sum of Pandas DataFrame col_name items.

taxcalc.utils.
weighted_sum
(dframe, col_name)[source]¶ Return weighted sum of Pandas DataFrame col_name items.

taxcalc.utils.
write_graph_file
(figure, filename, title)[source]¶ Write HTML file named filename containing figure. The title is the text displayed in the browser tab.
 Parameters
figure (bokeh.plotting figure object) –
filename (string) – name of HTML file to which figure is written; should end in .html
title (string) – text displayed in browser tab when HTML file is displayed in browser
 Returns
 Return type
Nothing

taxcalc.utils.
xtr_graph_plot
(data, width=850, height=500, xlabel='', ylabel='', title='', legendloc='bottom_right')[source]¶ Plot marginal/average tax rate graph using data returned from either the mtr_graph_data function or the atr_graph_data function.
 Parameters
data (dictionary object returned from ?tr_graph_data() utility function) –
width (integer) – width of plot expressed in pixels
height (integer) – height of plot expressed in pixels
xlabel (string) – xaxis label; if ‘’, then use label generated by ?tr_graph_data
ylabel (string) – yaxis label; if ‘’, then use label generated by ?tr_graph_data
title (string) – graph title; if ‘’, then use title generated by ?tr_graph_data
legendloc (string) – options: ‘top_right’, ‘top_left’, ‘bottom_left’, ‘bottom_right’ specifies location of the legend in the plot
 Returns
 Return type
bokeh.plotting figure object containing a raster graphics plot
Notes
USAGE EXAMPLE:
gdata = mtr_graph_data(...) gplot = xtr_graph_plot(gdata)
THEN when working interactively in a Python notebook:
bp.show(gplot)
OR when executing script using Python commandline interpreter:
bio.output_file('graphname.html', title='?TR by Income Percentile') bio.show(gplot) [OR bio.save(gplot) WILL JUST WRITE FILE TO DISK]
WILL VISUALIZE GRAPH IN BROWSER AND WRITE GRAPH TO SPECIFIED HTML FILE
To convert the visualized graph into a PNGformatted file, click on the “Save” icon on the Toolbar (located in the topright corner of the visualized graph) and a PNGformatted file will written to your Download directory.
The ONLY output option the bokeh.plotting figure has is HTML format, which (as described above) can be converted into a PNGformatted raster graphics file. There is no option to make the bokeh.plotting figure generate a vector graphics file such as an EPS file.