Approx median fixes#20
Conversation
|
Added a quick fix for #24 |
| .. code-block:: python | ||
|
|
||
| >>> moe_example = [ | ||
| dict(min=math.nan, max=9999, n=6, moe=1), |
There was a problem hiding this comment.
If we're going to advise leaving first min and the last max null, we need to be totally consistent and do that in every single example. I also think that we should the default built in None type variable rather than nan so that we're hewing as closely as possible to the standard Python library.
| Estimate a median and approximate the margin of error. Follows the U.S. Census Bureau's official guidelines for estimation. Useful for generating medians for measures like household income and age when aggregating census geographies. | ||
|
|
||
| Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys: | ||
| Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys with an optional fourth key for margin of error inputs: |
There was a problem hiding this comment.
When we first describe this input. We need to be emphatically clear that the first min and the last max must be None type objects and briefly explain why.
| dict(min=200000, max=math.nan, n=18, moe=10) | ||
| ] | ||
| >>> import numpy | ||
| >>> numpy.random.seed(711355) |
There was a problem hiding this comment.
I don't think we need the seed in example unless we are the point about reproducibility.
| Find the value for the dataset you are estimating by referring to `the bureau's reference material <https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.html>`_. | ||
|
|
||
| If you have an associated "jam values" for your dataset provided in the `American Community Survey's technical documentation <https://www.documentcloud.org/documents/6165752-2017-SummaryFile-Tech-Doc.html#document/p20/a508561>`_, input the pair as a list to the `jam_values` keyword argument. | ||
| Then if the median falls in the first or last bin, the jam value will be returned instead of `None`. |
There was a problem hiding this comment.
Add a jam values example here that does not use the simulation method.
| ] | ||
| >>> import numpy | ||
| >>> numpy.random.seed(711355) | ||
| >>> census_data_aggregator.approximate_median(moe_example, design_factor=1, sampling_percentage=5*2.5, simulations=50, jam_values=[2499, 200001]) |
There was a problem hiding this comment.
This example probably doesn't need the jam value inputs
| Jam values will not be used in the simulation approach. If the estimated median falls in the lower or upper bin, the estimate returned will be `None`. | ||
|
|
||
|
|
||
| .. code-block:: python |
There was a problem hiding this comment.
If we believe that the moe-based method is superior, we should make it the first and default example.
Deals with #17 and #18