Integrated the random allocation directory#414
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
arung54
left a comment
There was a problem hiding this comment.
Hi Moshe,
No comments on the logic / implementation. Most of these comments are about style (and stuff that we will probably have to address anyway to pass the style checks) which are hopefully easy to fix. Some of these could have been prevented if I shared our style guide in advance, apologies for only pointing you to it during the review! If you haven't already, I'd also suggest fixing any issues raised by:
$ flake8 random_allocation/*.py
$ pyink random_allocation/*.py
$ pydocstyle --convention=google --add-ignore=D101,D102,D103,D105,D202,D402 random_allocation/*.py
$ pylint random_allocation/*.py
$ pytype random_allocation/*.py -k
Some top-level comments:
- Since this library is already in a folder named 'random_allocation', consider renaming some of the libraries to be shorter (e.g. from 'random_allocation_convolution' to 'convolution') for brevity.
- See https://google.github.io/styleguide/pyguide.html#22-imports - a lot of these libraries are importing methods, whereas the style guide suggests importing modules and then calling the method from the package name. Renaming to shorter module names might help with doing this while maintaining brevity.
- For the test files, can you match the style of the other test files in dp_accounting here?:
- Use absltest instead of pytest
- Prefer absltest.TestCase or numpy.testing assertions over assert ..., since these will give more verbose failures.
- Put "Test" at the end of test class names rather than beginning
| """Represents the random-allocation mechanism. | ||
| In this event, each element in the dataset is independently allocated to k | ||
| uniformly sampled calls to the mechanism out of t total calls. |
There was a problem hiding this comment.
Reference to the PLD accounting paper could be useful here? (Will also get you more visibility on the paper :) )
| ) | ||
| self._pld = self._pld.compose(truncated_subsampled_gaussian_pld) | ||
| return None | ||
| elif isinstance(event, dp_event.RandomAllocationDpEvent): |
There was a problem hiding this comment.
Is it possible to add a simple test for this event? Similar to e.g "test_gaussian_basic" in pld_privacy_accountant_test.
| self._pld = self._pld.compose(truncated_subsampled_gaussian_pld) | ||
| return None | ||
| elif isinstance(event, dp_event.RandomAllocationDpEvent): | ||
| ra_event: dp_event.RandomAllocationDpEvent = event |
There was a problem hiding this comment.
Is the re-naming needed here? The isinstance() should let the interpreter know there is a subevent to access?
If it's just for clarity (e.g. to avoid calling it 'event.event') maybe better to rename the field in RandomAllocationDpEvent.
| assert_not_contains_named_tuples(reconstructed) | ||
| self.assertEqual(event, reconstructed) | ||
|
|
||
| def test_random_allocation_event_has_no_discretization_override(self): |
There was a problem hiding this comment.
Maybe a comment on why this test is necessary would be useful?
|
|
||
| import numpy as np | ||
| from dp_accounting.pld.common import compute_self_convolve_bounds | ||
| from numba import njit |
There was a problem hiding this comment.
We may want to keep numba an optional dependency in the overall library, while still allowing users who have it installed to benefit from the njit. Can you change to something like this?
try:
from numba import njit
_HAS_NUMBA = True
except ImportError:
_HAS_NUMBA = False
def optional_njit():
if _HAS_NUMBA:
return njit(cache=True)
else:
return lambda x: x
| if bound_type == BoundType.DOMINATES: | ||
| # Avoid inflating the grid when the target is finer than the original one. | ||
| effective_disc = max(realization.step, loss_discretization) | ||
| coarsened_base = rediscretize_dist( |
There was a problem hiding this comment.
Similar to another comment, here and elsewhere it seems like our type checker will guarantee a DenseDiscreteDist, in which case the additional type check might not be necessary (but maybe you want to check spacing anyway)
| ) -> DenseDiscreteDist: | ||
| """Prepare add-direction factors from a loss-space realization. | ||
|
|
||
| Algorithm 2 (`rand-alloc-add`) in Appendix C. |
| ) -> DenseDiscreteDist: | ||
| """Exponentiation by squaring-based self-convolution using a provided convolve function. | ||
|
|
||
| Algorithm 3 (`self-conv`) in Appendix C. |
|
|
||
|
|
||
| # ============================================================================= | ||
| # Internal Helper Functions |
There was a problem hiding this comment.
Move these to above their first usage, including reordering them (e.g. _expand_to_grid above _align_distributions_to_union_grid)
|
|
||
|
|
||
| def calc_pld_dual(realization: PLDRealization) -> PLDRealization: | ||
| """Compute the paper PLD dual ``D(L)`` (Definition 3.1). |
|
A few minor comments |
|
|
Added a new pld/random_allocation package which implements the accounting algorithm proposed in "Efficient privacy loss accounting for subsampling and random allocation".
It contains public builders for Gaussian and explicit-realization random allocation PLDs in random_allocation_api.py.
It also wires the feature into dp_accounting as RandomAllocationDpEvent in dp_event.py, and makes PLDAccountant handle Gaussian inner events in pld_privacy_accountant.py.