Approx median fixes by sastoudt · Pull Request #20 · datadesk/census-data-aggregator

sastoudt · 2019-08-16T18:08:41Z

Deals with #17 and #18

allows moe inputs, triggers simulation of new moe
clears up use of jam values (returned instead of None when median falls in upper or lower bin)
plenty of warnings/errors to aid with jam value usage along with tests for these cases

sastoudt · 2019-08-27T17:23:27Z

Added a quick fix for #24

palewire · 2019-08-28T16:01:44Z

+.. code-block:: python
+
+     >>> moe_example = [
+            dict(min=math.nan, max=9999, n=6, moe=1),


If we're going to advise leaving first min and the last max null, we need to be totally consistent and do that in every single example. I also think that we should the default built in None type variable rather than nan so that we're hewing as closely as possible to the standard Python library.

palewire · 2019-08-28T16:02:49Z

 Estimate a median and approximate the margin of error. Follows the U.S. Census Bureau's official guidelines for estimation. Useful for generating medians for measures like household income and age when aggregating census geographies.

-Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys:
+Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys with an optional fourth key for margin of error inputs:


When we first describe this input. We need to be emphatically clear that the first min and the last max must be None type objects and briefly explain why.

palewire · 2019-08-28T16:04:10Z

+            dict(min=200000, max=math.nan, n=18, moe=10)
+        ]
+     >>> import numpy
+     >>> numpy.random.seed(711355)


I don't think we need the seed in example unless we are the point about reproducibility.

palewire · 2019-08-28T16:04:47Z

+Find the value for the dataset you are estimating by referring to `the bureau's reference material <https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.html>`_.
+
+If you have an associated "jam values" for your dataset provided in the `American Community Survey's technical documentation <https://www.documentcloud.org/documents/6165752-2017-SummaryFile-Tech-Doc.html#document/p20/a508561>`_, input the pair as a list to the `jam_values` keyword argument. 
+Then if the median falls in the first or last bin, the jam value will be returned instead of `None`.


Add a jam values example here that does not use the simulation method.

palewire · 2019-08-28T16:05:15Z

+        ]
+     >>> import numpy
+     >>> numpy.random.seed(711355)
+     >>> census_data_aggregator.approximate_median(moe_example, design_factor=1, sampling_percentage=5*2.5, simulations=50, jam_values=[2499, 200001])


This example probably doesn't need the jam value inputs

palewire · 2019-08-28T16:06:09Z

+Jam values will not be used in the simulation approach. If the estimated median falls in the lower or upper bin, the estimate returned will be `None`.
+

+.. code-block:: python


If we believe that the moe-based method is superior, we should make it the first and default example.

sastoudt added 9 commits August 14, 2019 16:34

moe and jam adds, need to work out edge cases

bea3b1e

nan to none and tests for cases that are working

5b0b35c

stop averaging jam values in moe, tests that reflect this change

0669165

add comments for simulation jam choices

83cb895

make different jam value warnings, tests for warnings

6128fe1

error if only one jam value given and two are needed, plus test

02af661

documentation

0a31f3c

add example to readme, spacing tests

742ad1e

update examples, consistent call of functions

438f8f8

This was referenced Aug 27, 2019

optional MOE input for approximate_median #18

Open

Correct handling of jam values in median approximation #17

Open

take max of zero and simulated n to avoid negative numbers

c910c0c

sastoudt mentioned this pull request Aug 27, 2019

negative values from numpy.random.normal #24

Open

palewire reviewed Aug 28, 2019

View reviewed changes

sastoudt added 3 commits August 28, 2019 17:16

fix none v. nan

7e211a6

documentation tweaks

c23809d

fix tests and bug in none handling

5cb87b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approx median fixes#20

Approx median fixes#20
sastoudt wants to merge 13 commits into
datadesk:mainfrom
sastoudt:approxMedianFixes

sastoudt commented Aug 16, 2019

Uh oh!

sastoudt commented Aug 27, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

palewire Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Jam values will not be used in the simulation approach. If the estimated median falls in the lower or upper bin, the estimate returned will be `None`.


		.. code-block:: python

Conversation

sastoudt commented Aug 16, 2019

Uh oh!

sastoudt commented Aug 27, 2019

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

palewire Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants