Commit 08261c0
committed
Fix major underreporting bug in CA HCD data
This increase CA permits in recent years by quite a bit:
2018: 94,154 -> 97,963
2019: 101,411 -> 104,872
2020: 92,925 -> 99,806
2021: 115,382 -> 121,651
2022: 117,134 -> 127,964
2023: 111,421 -> 123,861
2024: 93,206 -> 104,306
The cause was very-low income (non-deed-restricted) and low-income (deed-restricted)
projects being dropped. The technical reason was very dumb reason (those columns had
a mix of numbers and strings, and .sum(numeric_only=True) drops columns that contain
non-numeric values).
I'm not sure how long this has been an issue. I think when I added the
pd.to_numeric(df["BP_ABOVE_MOD_INCOME"], errors="coerce") logic, above moderate income
was the only column that had strings, so I only casted that column to numeric.
But at some point these other two columns also started including string values.
I could look through the history of housing-data-data to figure out exactly when if I
wanted to.
There is still a discrepancy between my numbers and the CA HCD dashboard's numbers
(https://www.hcd.ca.gov/housing-open-data-tools/apr-dashboard). Their numbers are still
significantly higher:
2018: 114,513
2019: 121,428
2020: 112,314
2021: 134,619
2022: 136,578
2023: 133,568
2024: 114,930
I'll keep digging into the remaining discrepancy. That is most likely caused by my duplicate
filtering logic (filtering out duplicate rows from projects that received both a permit and
a COO in the same year). Hopefully my logic is still filtering out duplicates correctly and
not unintentionally dropping real permit rows.1 parent cf2a8eb commit 08261c0
1 file changed
Lines changed: 9 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
77 | 78 | | |
78 | 79 | | |
79 | 80 | | |
80 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
| |||
0 commit comments