Add optional name parameter to PyMCStateSpace for multiple models#611
Add optional name parameter to PyMCStateSpace for multiple models#611opherdonchin wants to merge 5 commits intopymc-devs:mainfrom
Conversation
|
Friendly ping for review when someone has a moment — this PR implements the naming support requested in #602. Thanks! |
|
Thanks for the poke, this one slipped past me. Will review asap. Looks like pre-commit is failing though, start by clearing that. |
|
I’ve cleared the pre-commit.ci formatting issue (ruff-format is now clean). The remaining required check (all_tests) is currently awaiting maintainer approval, since this PR comes from a fork. Please let me know when you’d like the workflows to be approved/run. |
Sphinx/autodoc queries dunder attributes like __mro__ during documentation build, which was triggering UnsupportedDistributionError. Now raises AttributeError for dunder attributes to allow proper introspection.
|
Read the Docs failure is due to pymc_extras.prior.getattr attempting to resolve mro during Sphinx introspection. I pushed a small guard so dunder attributes raise AttributeError (no distribution lookup). RTD should now build cleanly. all_tests is still awaiting maintainer approval because this PR is from a fork. |
|
Bumping this. |
jessegrabowski
left a comment
There was a problem hiding this comment.
I like this idea a lot, but I think the implementation will be a lot cleaner after we merge #607 . I have two issues with this PR as it stands:
- I'm hesitant about the
register_variablemethod. It adds another bookkeeping dictionary, and I'm not 100% sure what the intended use-case is. I suspect we can accomplish what you want to without it, for example using deterministics, as in:
with pm.Model() as m:
x_for_model_1 = pm.Normal('x_for_model_1', ...)
x_for_model_2 = pm.Deterministic('x_for_model_2')
If you really think it's necessary, let's talk about it in another PR.
- It's a heavy PR -- every single usage of every single named object (matrix, parameter, data, prediction, etc) needs to be wrapped with
prefixed_name. What I would prefer is to intervene on the names themselves. I think this will be facilitated by #607. Maybe have a look at that PR and propose a way to work a model name into that framework, then we can revisit this in a few days once it's done?
| """ | ||
| raise NotImplementedError("The add_default_priors property has not been implemented!") | ||
|
|
||
| def register_variable(self, name: str, variable: pt.TensorVariable) -> None: |
There was a problem hiding this comment.
Can you give an example of a workflow where you want to use this? I'm hesitant because it breaks the pattern assumed by statespace, in which the statespace model and the pymc model are totally independent.
| # Protect against Sphinx/RTD introspection of dunder attributes | ||
| if name.startswith("__") and name.endswith("__"): | ||
| raise AttributeError(f"module '{__name__}' has no attribute '{name}'") |
There was a problem hiding this comment.
revert this, it's unrelated this to PR (cherry pick it if you want to fix the issue)
| with pymc_model: | ||
| for param_name in self.param_names: | ||
| param = getattr(pymc_model, param_name, None) | ||
| # Look for the parameter with the prefixed name in the PyMC model |
There was a problem hiding this comment.
Clear away all these comments everywhere in the PR, they don't add helpful context to future devs.
|
Thanks! I agree waiting for #607 and building on the refactor seems like the cleanest path. My current thinking is to keep all semantic metadata unscoped (parameters/states/data roles remain “base names”) and apply scoping only at the PyMC/PyTensor graph boundary, where collisions actually occur. Concretely I think that means adding an optional name/namespace to PyMCStateSpace plus a single graph_name(base: str) helper, and then using it in three places:
I don’t currently see a need for a reverse base_name(...) mapping in the statespace code (we don’t seem to parse arbitrary model vars back into semantics), but can add it later if a concrete use-case emerges. This should let me drop the extra register_variable bookkeeping from the current PR and re-implement with a much smaller diff after #607. Functionally it’s orthogonal, but #607 should make it easier to keep the change localized and reviewable. Do you think this could be a good way forward? Opher |
|
Closing this in favor of a rebased update. The named This version incorporates the upstream changes and resolves the conflicts introduced by #607. No additional conceptual changes were introduced beyond what was discussed here. |
Summary
As discussed in [#602](#602), this PR adds an optional
nameparameter toPyMCStateSpace. When provided, it is used to prefix all variables and registered data created by that state-space model. This makes it possible to use multiple state-space models inside one PyMC model without name collisions.What this changes
Add
name: str | None = NonetoPyMCStateSpace.__init__.Add
_prefix_name(self, var_name: str) -> str, which:var_nameunchanged ifself.name is None"{name}_{var_name}".Apply
_prefix_namewhen:pm.Deterministicvariables,SequenceMvNormal,H_jittered,x0_slice,P0_slice,forecast_*,irf,initial_shock,Update data helpers (
add_data_to_active_model,register_data_with_pymc) so that state-space models can optionally pass prefixed data names.Backwards compatibility
nameis not provided, everything behaves exactly as before.PyMCStateSpaceshould work without any changes.Example
This results in variables such as
"vision_obs","motor_obs","vision_data", etc., allowing both models to coexist cleanly in the same PyMC model.Testing
tests/statespace/test_register_variable.pyto check the basic naming and variable-registration behavior.pytest tests/statespacein a cleanpip install -e ".[dev]"environment.FutureWarningerrors frompytensor.graph.basic.*imports (these have moved topytensor.graph.traversal.*). This appears unrelated to this PR and originates from existing imports in the codebase and test suite.