|
1 | | -# python-censusbatchgeocoder |
| 1 | +## Links |
2 | 2 |
|
3 | | -A simple Python wrapper for [U.S. Census Geocoding Services API batch service](https://www.documentcloud.org/documents/3894452-Census-Geocoding-Services-API.html). |
4 | | - |
5 | | -* Issues: [github.com/datadesk/python-censusbatchgeocoder/issues](https://github.com/datadesk/python-censusbatchgeocoder/issues) |
| 3 | +* Docs: [palewi.re/docs/censusbatchgeocoder](http://palewi.re/docs/censusbatchgeocoder) |
| 4 | +* Issues: [github.com/datadesk/python-censusbatchgeocoder/issues](https://github.com/datadesk/censusbatchgeocoder/issues) |
6 | 5 | * Packaging: [pypi.python.org/pypi/censusbatchgeocoder](https://pypi.python.org/pypi/censusbatchgeocoder) |
7 | | -* Testing: [travis-ci.org/datadesk/python-censusbatchgeocoder](https://travis-ci.org/datadesk/python-censusbatchgeocoder) |
8 | | -* Coverage: [coveralls.io/r/datadesk/python-censusbatchgeocoder](https://coveralls.io/r/datadesk/python-censusbatchgeocoder) |
9 | | - |
10 | | -### Installation |
11 | | - |
12 | | -```bash |
13 | | -$ pip install censusbatchgeocoder |
14 | | -``` |
15 | | - |
16 | | -## Basic usage |
17 | | - |
18 | | -Importing the library |
19 | | - |
20 | | -```python |
21 | | -import censusbatchgeocoder |
22 | | -``` |
23 | | - |
24 | | -According to the [official Census documentation](https://www.documentcloud.org/documents/3894452-Census-Geocoding-Services-API.html), the input is expected to contain the following fields: |
25 | | - |
26 | | -* ``id``: Your unique identifier for the record |
27 | | -* ``address``: Structure number and street name (required) |
28 | | -* ``city``: City name (required) |
29 | | -* ``state``: State (optional) |
30 | | -* ``zipcode``: ZIP Code (optional) |
31 | | - |
32 | | -You can geocode a comma-delimited file from the filesystem. Results are returned as a list of dictionaries. |
33 | | - |
34 | | -An example could look like this: |
35 | | - |
36 | | -```text |
37 | | -id,address,city,state,zipcode |
38 | | -1,1600 Pennsylvania Ave NW,Washington,DC,20006 |
39 | | -2,202 W. 1st Street,Los Angeles,CA,90012 |
40 | | -``` |
41 | | - |
42 | | -Which is then passed in like this: |
43 | | - |
44 | | -```python |
45 | | -results = censusbatchgeocoder.geocode("./my_file.csv") |
46 | | -``` |
47 | | - |
48 | | -The results are returned with the following columns from the Census |
49 | | - |
50 | | -* ``id``: The unique id provided with the record. |
51 | | -* ``returned_address``: The address that was submitted to the geocoder. |
52 | | -* ``geocoded_address``: The address of the match returned by the geocoder. |
53 | | -* ``is_match``: Whether or not the geocoder found a match. |
54 | | -* ``is_exact``: The precision of the match. |
55 | | -* ``coordinates``: The longitude and latitude of the match together in a string. |
56 | | -* ``longitude``: The longitude of the match as a float. |
57 | | -* ``latitude``: The latitude of the match as a float. |
58 | | -* ``tiger_line``: The Census TIGER line of the match. |
59 | | -* ``side``: The side of the Census TIGER line of the match. |
60 | | -* ``state_fips``: The FIPS state code identifying the state of the match. |
61 | | -* ``county_fips``: The FIPS county code identifying the county of the match. |
62 | | -* ``tract``: The Census tract of the match. |
63 | | -* ``block``: The Census block of the match. |
64 | | - |
65 | | -```python |
66 | | -print(results) |
67 | | -[ |
68 | | - { |
69 | | - "address": "1600 Pennsylvania Ave NW", |
70 | | - "block": "1031", |
71 | | - "city": "Washington", |
72 | | - "coordinates": "-77.03535,38.898754", |
73 | | - "county_fips": "001", |
74 | | - "geocoded_address": "1600 Pennsylvania Ave NW, Washington, DC, 20006", |
75 | | - "id": "1", |
76 | | - "is_exact": "Non_Exact", |
77 | | - "is_match": "Match", |
78 | | - "latitude": 38.898754, |
79 | | - "longitude": -77.03535, |
80 | | - "returned_address": "1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502", |
81 | | - "side": "L", |
82 | | - "state": "DC", |
83 | | - "state_fips": "11", |
84 | | - "tiger_line": "76225813", |
85 | | - "tract": "006202", |
86 | | - "zipcode": "20006", |
87 | | - }, |
88 | | - { |
89 | | - "address": "202 W. 1st Street", |
90 | | - "block": "1034", |
91 | | - "city": "Los Angeles", |
92 | | - "coordinates": "-118.24456,34.053005", |
93 | | - "county_fips": "037", |
94 | | - "geocoded_address": "202 W. 1st Street, Los Angeles, CA, 90012", |
95 | | - "id": "2", |
96 | | - "is_exact": "Exact", |
97 | | - "is_match": "Match", |
98 | | - "latitude": 34.053005, |
99 | | - "longitude": -118.24456, |
100 | | - "returned_address": "202 W 1ST ST, LOS ANGELES, CA, 90012", |
101 | | - "side": "L", |
102 | | - "state": "CA", |
103 | | - "state_fips": "06", |
104 | | - "tiger_line": "141618115", |
105 | | - "tract": "207400", |
106 | | - "zipcode": "90012", |
107 | | - }, |
108 | | -] |
109 | | -``` |
110 | | - |
111 | | -Any extra metadata fields included in the file are still present in the returned data. |
112 | | - |
113 | | -So the ``my_metadata`` column here... |
114 | | - |
115 | | -```text |
116 | | -id,address,city,state,zipcode,my_metadata |
117 | | -1,1600 Pennsylvania Ave NW,Washington,DC,20006,foo |
118 | | -2,202 W. 1st Street,Los Angeles,CA,90012,bar |
119 | | -``` |
120 | | - |
121 | | -.. is still there after you geocode. |
122 | | - |
123 | | -```python |
124 | | -censusbatchgeocoder.geocode("./my_file.csv") |
125 | | -[ |
126 | | - { |
127 | | - "address": "1600 Pennsylvania Ave NW", |
128 | | - "block": "1031", |
129 | | - "city": "Washington", |
130 | | - "coordinates": "-77.03535,38.898754", |
131 | | - "county_fips": "001", |
132 | | - "geocoded_address": "1600 Pennsylvania Ave NW, Washington, DC, 20006", |
133 | | - "id": "1", |
134 | | - "is_exact": "Non_Exact", |
135 | | - "is_match": "Match", |
136 | | - "latitude": 38.898754, |
137 | | - "longitude": -77.03535, |
138 | | - "returned_address": "1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502", |
139 | | - "my_metadata": "foo", |
140 | | - "side": "L", |
141 | | - "state": "DC", |
142 | | - "state_fips": "11", |
143 | | - "tiger_line": "76225813", |
144 | | - "tract": "006202", |
145 | | - "zipcode": "20006", |
146 | | - }, |
147 | | - { |
148 | | - "address": "202 W. 1st Street", |
149 | | - "block": "1034", |
150 | | - "city": "Los Angeles", |
151 | | - "coordinates": "-118.24456,34.053005", |
152 | | - "county_fips": "037", |
153 | | - "geocoded_address": "202 W. 1st Street, Los Angeles, CA, 90012", |
154 | | - "id": "2", |
155 | | - "is_exact": "Exact", |
156 | | - "is_match": "Match", |
157 | | - "latitude": 34.053005, |
158 | | - "longitude": -118.24456, |
159 | | - "returned_address": "202 W 1ST ST, LOS ANGELES, CA, 90012", |
160 | | - "my_metadata": "foo", |
161 | | - "side": "L", |
162 | | - "state": "CA", |
163 | | - "state_fips": "06", |
164 | | - "tiger_line": "141618115", |
165 | | - "tract": "207400", |
166 | | - "zipcode": "90012", |
167 | | - }, |
168 | | -] |
169 | | -``` |
170 | | - |
171 | | -#### Custom column names |
172 | | - |
173 | | -If you have column headers that do not exactly match those expected by the geocoder you should override them. |
174 | | - |
175 | | -So a file like this: |
176 | | - |
177 | | -```text |
178 | | -foo,bar,baz,bada,boom |
179 | | -1,521 SWARTHMORE AVENUE,PACIFIC PALISADES,CA,90272-4350 |
180 | | -2,2015 W TEMPLE STREET,LOS ANGELES,CA,90026-4913 |
181 | | -``` |
182 | | - |
183 | | -Can be mapped like this: |
184 | | - |
185 | | -```python |
186 | | -censusbatchgeocoder.geocode( |
187 | | - self.weird_path, id="foo", address="bar", city="baz", state="bada", zipcode="boom" |
188 | | -) |
189 | | -``` |
190 | | - |
191 | | -#### Optional columns |
192 | | - |
193 | | -The state and ZIP Code columns are optional. If your data doesn't have them, pass ``None`` as keyword arguments. |
194 | | - |
195 | | -```python |
196 | | -censusbatchgeocoder.geocode("./my_file.csv", state=None, zipcode=None) |
197 | | -``` |
198 | | - |
199 | | -#### Lists of dictionaries |
200 | | - |
201 | | -A list of dictionaries, like those created by the csv module's ``DictReader`` can also be mapped. |
202 | | - |
203 | | -```python |
204 | | -my_list = [ |
205 | | - { |
206 | | - "address": "521 SWARTHMORE AVENUE", |
207 | | - "city": "PACIFIC PALISADES", |
208 | | - "id": "1", |
209 | | - "state": "CA", |
210 | | - "zipcode": "90272-4350", |
211 | | - }, |
212 | | - { |
213 | | - "address": "2015 W TEMPLE STREET", |
214 | | - "city": "LOS ANGELES", |
215 | | - "id": "2", |
216 | | - "state": "CA", |
217 | | - "zipcode": "90026-4913", |
218 | | - }, |
219 | | -] |
220 | | -censusbatchgeocoder.geocode(my_list) |
221 | | -``` |
222 | | - |
223 | | -#### pandas DataFrames |
224 | | - |
225 | | -You can geocode a pandas DataFrame by converting it into a list of dictionaries. |
226 | | - |
227 | | -```python |
228 | | -result = censusbatchgeocoder.geocode(df.to_dict("records")) |
229 | | -``` |
230 | | - |
231 | | -Then convert it back into a DataFrame. |
232 | | - |
233 | | -```python |
234 | | -result_df = pd.DataFrame(result) |
235 | | -``` |
236 | | - |
237 | | -That's it. |
238 | | - |
239 | | -#### File objects |
240 | | - |
241 | | -You can also geocode an in-memory file object of data in CSV format. |
242 | | - |
243 | | -```python |
244 | | -my_data = """id,address,city,state,zipcode |
245 | | -1,1600 Pennsylvania Ave NW,Washington,DC,20006 |
246 | | -2,202 W. 1st Street,Los Angeles,CA,90012""" |
247 | | -censusbatchgeocoder.geocode(io.StringIO(my_data)) |
248 | | -``` |
249 | | - |
250 | | -#### Different encodings |
251 | | - |
252 | | -If you are using Python 2 and your CSV file has an unusual encoding that's causing problems, try explicitly passing in the encoding name. |
253 | | - |
254 | | -```python |
255 | | -censusbatchgeocoder.geocode("./my_file.csv", encoding="utf-8-sig") |
256 | | -``` |
0 commit comments