Skip to content

Commit efb4f7a

Browse files
committed
address merge conflict
2 parents df4e48a + f716ee6 commit efb4f7a

19 files changed

Lines changed: 502 additions & 2986 deletions

Makefile

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -121,16 +121,26 @@ puf_stage3/puf_ratios.csv: puf_stage3/stage3.py \
121121
cd puf_stage3 ; python stage3.py
122122

123123
.PHONY=cps-files
124-
cps-files: cps_data/cps.csv.gz \
124+
cps-files: cps_data/pycps/cps_raw.csv.gz \
125125
cps_stage1/stage_2_targets.csv \
126126
cps_stage2/cps_weights.csv.gz
127127

128-
cps_data/cps.csv.gz: cps_data/finalprep.py \
129-
cps_data/cps_raw.csv.gz \
130-
cps_data/adjustment_targets.csv \
131-
cps_data/benefitprograms.csv
132-
cd cps_data ; python finalprep.py && \
133-
gunzip cps.csv.gz && gzip -n cps.csv
128+
cps_data/pycps/cps_raw.csv.gz: cps_data/pycps/create.py \
129+
cps_data/pycps/benefits.py \
130+
cps_data/pycps/filing_rules.json \
131+
cps_data/pycps/finalprep.py \
132+
cps_data/pycps/helpers.py \
133+
cps_data/pycps/impute.py \
134+
cps_data/pycps/pycps.py \
135+
cps_data/pycps/splitincome.py \
136+
cps_data/pycps/targeting.py \
137+
cps_data/pycps/taxunit.py \
138+
cps_data/pycps/template.txt \
139+
cps_data/pycps/transform_sas.py \
140+
cps_data/pycps/adjustment_targets.csv \
141+
cps_data/benefitprograms.csv
142+
cd cps_data/pycps ; python create.py && \
143+
gunzip cps.csv.gz && gzip -n cps.csv
134144

135145
cps_stage1/stage_2_targets.csv: cps_stage1/stage1.py \
136146
cps_stage1/SOI_estimates.csv \
@@ -140,11 +150,11 @@ cps_stage1/stage_2_targets.csv: cps_stage1/stage1.py \
140150

141151
cps_stage2/cps_weights.csv.gz: cps_stage2/stage2.py \
142152
cps_stage2/solve_lp_for_year.py \
143-
cps_data/cps_raw.csv.gz \
153+
cps_data/pycps/cps_raw.csv.gz \
144154
puf_stage1/Stage_I_factors.csv \
145155
cps_stage1/stage_2_targets.csv
146156
cd cps_stage2 ; python stage2.py && \
147-
gunzip cps_weights.csv.gz && gzip -n cps_weights.csv
157+
gunzip cps_weights.csv.gz && gzip -n cps_weights.csv
148158

149159
.PHONY=all
150160
all: puf-files cps-files

README.md

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,21 @@ sample data being used. But the weights, ratios, and benefits files
3535
do depend on the data file, so they are different in the two sets of
3636
data files.
3737

38+
Installation
39+
-----------
40+
41+
Currently, the only way to install `taxdata` currently is to clone the git
42+
repo locally.
43+
```
44+
git clone https://github.com/PSLmodels/taxdata.git
45+
```
46+
Next navigate to the directory and install the `taxdata-dev` conda environment
47+
```
48+
cd taxdata
49+
conda env create -f environment.yml
50+
```
51+
To run the scripts that produce `puf.csv` and `cps.csv.gz`, activate the
52+
`taxdata-dev` conda environment and follow the workflow laid out below.
3853

3954
Data-Preparation Documentation and Workflow
4055
-------------------------------------------
@@ -51,10 +66,8 @@ running the `make help` command. If you want more background on the
5166
make utility and makefiles, search for Internet links with the
5267
keywords `makefile` and `automate`.
5368

54-
Please see the [Makefile Documentation](MAKEFILE_doc.md) for more information
55-
56-
Note that the stage2 linear program that generates the weights file is
57-
very long-running, taking five or more hours depending on your
69+
Note that the stage2 linear program that generates the weights file for the PUF
70+
is very long-running, taking five or more hours depending on your
5871
computer's CPU speed. We are considering options for speeding up this
5972
stage2 work, but for the time being you can execute `make puf-files`
6073
and `make cps-files` in separate terminal windows to have the two
@@ -73,6 +86,14 @@ the corresponding files in the Tax-Calculator directory tree) without
7386
actually doing the file copies. At the terminal command-prompt in the
7487
top-level taxdata directory, execute `./csvcopy.sh` to get help.
7588

89+
### Example
90+
91+
To create `cps.csv.gz`, run
92+
```
93+
conda activate taxdata-dev
94+
make cps-files
95+
```
96+
7697

7798
Contributing to taxdata Repository
7899
----------------------------------
@@ -83,9 +104,17 @@ to make sure your proposed code is consistent with the repository's
83104
coding style and then run `make pytest` to ensure that all the tests
84105
pass.
85106

107+
Disclaimer
108+
----------
109+
110+
`taxdata` is under continuous development. As such, results will change as the
111+
underlying data and logic improves.
112+
86113

87114
Contributors
88115
------------
89-
- Anderson Frailey
90-
- John O'Hare
91-
- Amy Xu
116+
117+
A full list of contributors on GitHub can be found
118+
here](https://github.com/PSLmodels/taxdata/graphs/contributors). John O'Hare
119+
of Quantria Strategies has also made significant contributions to the
120+
development of `taxdata`.

cps_data/README.md

Lines changed: 83 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,92 @@
11
About cps_data
22
==============
33

4-
This directory contains the following script:
4+
This directory contains the python scripts used to create `cps.csv.gz`. You
5+
can run all of the scripts with the command `python create.py`. By default,
6+
you will get a CPS file composed of the 2013, 2014, and 2015 March CPS Supplemental
7+
files. If you would like to use another combination of the 2013, 2014, 2015,
8+
2016, 2017, and 2018 files, there are two ways to do so.
59

6-
* Python script **finalprep.py**, which reads/writes:
10+
1. You can modify `create.py` by adding the `cps_files` argument to the `create()`
11+
function call at the bottom of the file to specify which files you would like to
12+
use. For example, to use the 2016, 2017, and 2018 files, the function call would
13+
now be
14+
```python
15+
if __name__ == "__main__":
16+
create(
17+
exportcsv=False, exportpkl=True, exportraw=False, validate=False,
18+
benefits=True, verbose=True, cps_files=[2016, 2017, 2018]
19+
)
20+
```
721

8-
Input files:
9-
- cps_raw.csv.gz
10-
- adjustment_targets.csv
11-
- benefitprograms.csv
22+
2. You could write a separate python file that imports the `create()` function
23+
and calls it in the same way as above.
1224

13-
Output files:
14-
- cps.csv
25+
## Input files:
26+
With the exception of the CPS March Supplements, all input files can be found
27+
in the `pycps/data` directory.
28+
29+
### CPS March Supplements
30+
* asec2013_pubuse.dat
31+
* asec2014_pubuse_tax_fix_5x8_2017.dat
32+
* asec2015_pubuse.dat
33+
* asec2016_pubuse.dat
34+
* asec2017_pubuse.dat
35+
* asec2018_pubuse.dat
36+
37+
### C-TAM Benefit Imputations
38+
39+
Note that we only have C-TAM imputations for the 2013, 2014, and 2015 files.
40+
For other years, we just use the benefit program information in the CPS
41+
* Housing_Imputation_logreg_2013.csv
42+
* Housing_Imputation_logreg_2014.csv
43+
* Housing_Imputation_logreg_2015.csv
44+
* medicaid2013.csv
45+
* medicaid2014.csv
46+
* medicaid2015.csv
47+
* medicare2013.csv
48+
* medicare2014.csv
49+
* medicare2015.csv
50+
* otherbenefitprograms.csv
51+
* SNAP_Imputation_2013.csv
52+
* SNAP_Imputation_2014.csv
53+
* SNAP_Imputation_2015.csv
54+
* SS_augmentation_2013.csv
55+
* SS_augmentation_2014.csv
56+
* SS_augmentation_2015.csv
57+
* SSI_Imputation2013.csv
58+
* SSI_Imputation2014.csv
59+
* SSI_Imputation2015.csv
60+
* TANF_Imputation_2013.csv
61+
* TANF_Imputation_2014.csv
62+
* TANF_Imputation_2015.csv
63+
* UI_imputation_logreg_2013.csv
64+
* UI_imputation_logreg_2014.csv
65+
* UI_imputation_logreg_2015.csv
66+
* VB_Imputation2013.csv
67+
* VB_Imputation2014.csv
68+
* VB_Imputation2015.csv
69+
* WIC_imputation_children_logreg_2013.csv
70+
* WIC_imputation_children_logreg_2014.csv
71+
* WIC_imputation_children_logreg_2015.csv
72+
* WIC_imputation_infants_logreg_2013.csv
73+
* WIC_imputation_infants_logreg_2014.csv
74+
* WIC_imputation_infants_logreg_2015.csv
75+
* WIC_imputation_women_logreg_2013.csv
76+
* WIC_imputation_women_logreg_2014.csv
77+
* WIC_imputation_women_logreg_2015.csv
78+
79+
### Imputation Parameters
80+
81+
These parameters are used in the imputations found in `pycps/impute.py`
82+
* logit_beta.csv
83+
* ols_betas.csv
84+
85+
## Output Files
86+
87+
Only `cps.csv.gz` is included in the repository due to the size of `cps_raw.csv.gz`.
88+
* cps.csv.gz
89+
* cps_raw.csv.gz
1590

1691

1792
Documentation

0 commit comments

Comments
 (0)