Creating a Makefile allows others to recreate necessary components of your work flow while managing unecessary steps.
###Why do this now?
The Makefile process can be useful when working with shapefiles. Instead of carrying large .shp files around in a repo, we'll give anyone the opportunity to create the files.
###Makefile terminology and symbols
Pseudo code for each section of a Makefile is as follows.
Target: dependency
commands to run
The following variables come in handy when writing make files.
$@ = target
$(dir $@) = target directory
$(notdir $@) = target filename
$< = dependency
###Example
For example, we could create a target named somePath/someFile.txt that depends on first building stuff.
stuff:
echo $@
echo $(notdir $@)
somePath/someFile.txt: stuff
echo $@
echo $(dir $@)
echo $(notdir $@)
echo $<
To see this example execute, run the following script from the command line within the gis_tools directory.
make somePath/someFile.txt
The terminology for shapefiles can be a little confusing, but the main ideas will be discussed below.
First, let's get some data. To download a zip file containing county data and then upzip it in a new data directory, please run the following script from the command line within the gis_tools directory.
make data/tl_2013_us_county.shp
When it finishes, you can view the files in the gis_tools/data directory. These files were downloaded from the US Census Bureau.
- Click for more information about the census data used in this example.
We'll use python's fiona and shapely packages to view the contents of the shapefile and determine geometric properties of the corresponding features.
#!/usr/bin/env python
import pprint
import fiona
with fiona.open("data/tl_2013_us_county.shp") as fc:
print "File type:",fc.driver
print "Schema:",pprint.pprint(fc.schema)
print "Number of records:", len(fc)
print "Bounds of all records:", fc.bounds #visit http://boundingbox.klokantech.com/ to view these coords
print "Additional info (coordinate reference system):", fc.crs
print "Field type map:"
pprint.pprint(fiona.FIELD_TYPES_MAP)
records=list(fc)
###Features
- The collection (fc) contains 3,234 records or features.
- Each feature is a Python dict structured exactly like a GeoJSON Feature.
- Each feature starts and ends with the same point (eg. complete polygon)
- Each feature (in this example) contains information about a specific county.
- Click for more information about
fiona. - Click for codes related to the features' 'properites'.
- Click for codes specific to county features (page A-80).
#!/usr/bin/env python
import fiona
import pprint
pprint.pprint(records[0])
print
print
print "###########################"
print "alternative method below"
print "###########################"
print
print
with fiona.open("data/tl_2013_us_county.shp") as fc:
first=True
for record in fc:
if first:
print pprint.pprint(record)
break
###Analyzing features
- use Python's 'shapely' package.
- Click for documentation on evaluating if shapes intersect.
- Click for documentation on evaluating if points are contained within shapes.
import fiona
from shapely.geometry import Point, shape, Polygon, box
#print records[0]
#print shape(records[0]["geometry"]) # one feature converted to shapley polygon object
print "{} Shapley object: {}".format(records[0]['properties']['NAMELSAD'],box(*shape(records[0]["geometry"]).bounds))
print
shp1 = box(*shape(records[0]["geometry"]).bounds)
print "{} Shapley object: {}".format(records[1]['properties']['NAMELSAD'],box(*shape(records[1]["geometry"]).bounds))
print
shp2 = box(*shape(records[1]["geometry"]).bounds)
print "Does {} intersect {}?".format(records[0]['properties']['NAMELSAD'],records[1]['properties']['NAMELSAD'])
print "{}".format(shp1.intersects(shp2))
print
print "Is (-123.4688,46.2674) in {}?".format(records[0]['properties']['NAMELSAD'])
p1=Point(-123.4688,46.2674)
print "{}".format(shape(records[0]["geometry"]).contains(p1))
print
print "Is (-123.4688,46.2674) in {}?".format(records[1]['properties']['NAMELSAD'])
print "{}".format(shape(records[1]["geometry"]).contains(p1))
print
print "Create a list of centroids for every county."
centroids=[]
with fiona.open("data/tl_2013_us_county.shp") as fc:
with open("data/centroids.txt","wb") as cntr:
for feature in fc:
pt=(float(feature["properties"]["INTPTLON"]),float(feature["properties"]["INTPTLAT"]))
centroids.append(pt)
cntr.write(str(pt)+'\n')
print "{} centroids created".format(len(centroids))
###Utilities Two tools that can be handy when working with shapefiles:
- ogr2ogr - part of GDAL that merges multiple .shp files into a single GeoJson file (among many other features).
- topojson - shrinks GeoJson files to a convenient format.
Example that uses both tools.
#Using rev_geo.py This utility takes coordinates and returns location information, but requires the user to point at the tl_2013_us_county.shp file.
- currently only available for US state/county info
###Standard options The options are rather limited at this point.
- points outside of the US are currently excluded.
- considerations:
1. general solution for an efficiency grid indpendent of shapefile used.
usage: rev_geo.py [-h] [-b GRID_BOUNDARIES] [-d DELTA] [-g]
[-s SHAPE_FILE_PATH] [-t]
[file_name]
Reverse geo coder returns location info given a set of lon,lat
positional arguments:
file_name Input file name (optional).
optional arguments:
-h, --help show this help message and exit
-b GRID_BOUNDARIES, --bounding-box GRID_BOUNDARIES
Set bounding box for region to include (default:
[-185,15,-65,70])
-d DELTA, --delta DELTA
Set the number of degrees between grid coords
(default: 5)
-g, --use-saved-grid Save grid or use previously saved version in
data/grid.json
-s SHAPE_FILE_PATH, --shape-file-path SHAPE_FILE_PATH
Set shapefile path (default:
data/tl_2013_us_county.shp)
-t, --tweet-input Set input as tweet payload instead of coordinates (in
progress)
###Standard output The general form of the output includes the following:
- county: str [county name]
- centroid: (longitude, latitude) [center of county]
- coords: (longitude, latitude) [specific coords passed to rev_geo.py]
- GEOID: int(5) [state_code + county_code]
{"county": "Wahkiakum",
"centroid": [-123.4244583, 46.2946377],
"coords": [-123.4244583, 46.2946377],
"GEOID": "53069"}
###Example script
$head data/centroids.txt | ./rev_geo.py -g > info.json
$cat info.json
{"county": "Cuming", "centroid": [-96.7885168, 41.9158651], "coords": [-96.7885168, 41.9158651], "GEOID": "31039"}
{"county": "Wahkiakum", "centroid": [-123.4244583, 46.2946377], "coords": [-123.4244583, 46.2946377], "GEOID": "53069"}
{"county": "De Baca", "centroid": [-104.3686961, 34.3592729], "coords": [-104.3686961, 34.3592729], "GEOID": "35011"}
{"county": "Lancaster", "centroid": [-96.6886584, 40.7835474], "coords": [-96.6886584, 40.7835474], "GEOID": "31109"}
{"county": "Nuckolls", "centroid": [-98.0468422, 40.1764918], "coords": [-98.0468422, 40.1764918], "GEOID": "31129"}
{"county": "Las Piedras", "centroid": [-65.871189, 18.1871483], "coords": [-65.871189, 18.1871483], "GEOID": "72085"}
{"county": "Minnehaha", "centroid": [-96.7957261, 43.6674723], "coords": [-96.7957261, 43.6674723], "GEOID": "46099"}
{"county": "Menard", "centroid": [-99.8539896, 30.8843655], "coords": [-99.8539896, 30.8843655], "GEOID": "48327"}
{"county": "Sierra", "centroid": [-120.5219926, 39.5769252], "coords": [-120.5219926, 39.5769252], "GEOID": "06091"}
{"county": "Clinton", "centroid": [-85.1534262, 36.7288647], "coords": [-85.1534262, 36.7288647], "GEOID": "21053"}