Skip to content

Named StateSpaceModel v2 – rebase and updates after #607#654

Open
opherdonchin wants to merge 7 commits intopymc-devs:mainfrom
opherdonchin:named-statespace-v2
Open

Named StateSpaceModel v2 – rebase and updates after #607#654
opherdonchin wants to merge 7 commits intopymc-devs:mainfrom
opherdonchin:named-statespace-v2

Conversation

@opherdonchin
Copy link
Copy Markdown

Summary

This PR reintroduces the Named StateSpaceModel changes after rebasing onto main following the merge of #607.

The previous version was submitted as #611. This update incorporates the changes introduced in #607 and resolves the resulting conflicts while preserving the intended named state space behavior.

No new conceptual changes are introduced beyond what was discussed in #611.

Changes

Tests

Environment:

  • Linux (Fedora) and Windows 11
  • Python 3.14.2 (Linus) and 3.12 (Windows)
  • Commit: b173e6e

Results:

  • tests/statespace/core

    • 182 passed
    • 2 skipped (JAX tests require nutpie)
    • 0 failed
  • tests/statespace/filters

    • 59 passed
    • 1 failed (test_kalman_filter_jax[cholesky])

The single failing test occurs in the JAX Cholesky filter path and appears unrelated to the named model changes. All standard (non-JAX) filter paths and core state space tests pass.

Notes

@jessegrabowski
Copy link
Copy Markdown
Member

It looks like you checked in several files related to your local dev environment, could you clean it up?

@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from cbea84d to 1bf3670 Compare February 20, 2026 20:09
@opherdonchin
Copy link
Copy Markdown
Author

Sorry about that. The current PR includes only the changed files and a test:

statespace.py
data_tools.py
test_namespace.py

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 21, 2026

Codecov Report

❌ Patch coverage is 75.75758% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.24%. Comparing base (0bcccd5) to head (1bf3670).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pymc_extras/statespace/core/statespace.py 72.41% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #654       +/-   ##
===========================================
+ Coverage   42.43%   69.24%   +26.81%     
===========================================
  Files          63       73       +10     
  Lines        7207     7738      +531     
===========================================
+ Hits         3058     5358     +2300     
+ Misses       4149     2380     -1769     
Files with missing lines Coverage Δ
pymc_extras/statespace/utils/data_tools.py 69.56% <100.00%> (+53.04%) ⬆️
pymc_extras/statespace/core/statespace.py 78.44% <72.41%> (+63.87%) ⬆️

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is looking really clean. Just a few requests.

Also be sure to install pre-commit (pip install pre-commit; pre-commit install; pre-commit run --all) so your PR gets formatted and linted correctly.

Comment thread pymc_extras/statespace/core/statespace.py Outdated
)

if name in self._tensor_variable_info:
gname = self.graph_name(name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. gname is too cryptic. Another option would be to change name to base_name and make name the prefixed version, since this will now be the "canonical" name that users see.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the new scheme (base_name for the unprefixed name and name for the name in the graph) it makes most sense to just call this variable name.

raise ValueError(
f"{name} is not a model parameter. All placeholder variables should correspond to model "
f"parameters."
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
f"{name} is not a model data variable. All placeholder variables should correspond to model "

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

f"{name} is not a model parameter. All placeholder variables should correspond to model "
f"parameters."
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
f"data-variables."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"data-variables."
f"data variables."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

n_obs=self.ssm.k_endog,
obs_coords=obs_coords,
register_data=True,
data_name=self.graph_name("data"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's let users choose the data name in the PyMCStateSpace constructor in this PR as well, instead of hard coding it to "data". The default can still be data, but then this line becomes data_name = self.graph_name(self.data_name)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tests/statespace/test_namespace.py Outdated
from pymc_extras.statespace.core.statespace import PyMCStateSpace


def test_two_statespace_models_can_coexist_with_names(monkeypatch):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this test in the test_model file, it doesn't need a new file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@opherdonchin
Copy link
Copy Markdown
Author

The new commit includes all changes suggested and the pre-commit processing.

@jessegrabowski
Copy link
Copy Markdown
Member

Really like where this landed, it's looking very clean.

Could you double-check that we don't need to make any changes to Component or StructuralTimeSeries? Those are in the models/structural module, and are a sort-of parallel StatespaceModel that needs to be cleaned up some day.

@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from ec6a4bd to 27a28ea Compare February 25, 2026 12:31
@opherdonchin
Copy link
Copy Markdown
Author

I'm working on the fix for structural models and in working on the __init__ for StructuralTimeSeries, I ran into a problem.

Components create placeholders during make_symbolic_graph(), before we know the final StructuralTimeSeries(name=...). If a component is used in multiple StructuralTimeSeries models with different names, it could lead to silent aliasing which would be hard to track and fix.

I see a few options and I need help choosing:

  1. Defer placeholder creation until model build. That is, components only provide a build() function which will create the placeholders and then names are applied when the model is actually constructed by StructuralTimeSeries. This is probably the cleanest solution, but it might be a significant refactor.

  2. Allow component names independent of model name. On model build test that the component objects are not being used twice in the same model (this silently allows components to be used twice in different PyMC models, but that should be less dangerous). Extend name functionality separately to components and StructuralTimeSeries objects. This is consistent with current PyMC practice of forcing user to be responsible to avoid renaming. It would be possible today (but a little weird) to use the same PyTensor object in different PyMC graphs, so that is also consistent with current practice.

  3. Force canonical naming in structural models. Do not allow multiple structural models in the same PyMC graph. This is what we had before for all state space models. We could create an issue for this and fix it later.

Please let me know which direction you'd like to take it.

@jessegrabowski
Copy link
Copy Markdown
Member

We should be able to use graph_replace to swap out all the placeholder variables with new placeholders that have the prefixed name. So the components themselves don't need to know about this naming convention, the __init__ method in StructuralStateSpace can just handle it.

You can check PyMCStateSpace._insert_random_variables for a pattern on how to collect the dummies and do a replace, but it should look something like replacements = {var: var.type(name=self.prefix_name(var.name)) for var in explicit_graph_inputs(statespace_matrices)}

@AlexAndorra
Copy link
Copy Markdown
Contributor

Any help from me needed here @opherdonchin @jessegrabowski ?

@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch from 27a28ea to e950db5 Compare April 3, 2026 22:56
@opherdonchin
Copy link
Copy Markdown
Author

@AlexAndorra: thanks for the nudge!

I think what I've done makes sense and follows @jessegrabowski's suggestion. The branch is now rebased onto current main (v0.10.0) with a clean 3-commit history.

What changed in the new commit:

StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create fresh, prefixed placeholder variables in all SSM matrices when a model name is provided. A single _prefix_placeholder_variables() method collects all placeholders from tensor_variable_info and tensor_data_info, builds new variables via old_var.type(name=self.prefixed_name(old_var.name)), applies graph_replace to all 9 SSM matrices, and rebuilds the symbolic info objects with aligned names. Components remain unaware of the naming convention.

I added appropriate tests in TestGraphReplacePlaceholderNamespacing.

Let me know if this is a good fix and if anything else is needed before this is ready to merge?

Thanks!
Opher

Copy link
Copy Markdown
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small quibbles, but this looks pretty much good to go!


return new_ssm, new_variable_info, new_data_info

def _validate_symbolic_info(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is never called outside of tests. If you don't want a check in __init__ we should make this a test utility.



class TestGraphReplacePlaceholderNamespacing:
"""Tests for the graph_replace-based placeholder namespacing in StructuralTimeSeries."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Tests for the graph_replace-based placeholder namespacing in StructuralTimeSeries."""

The docstring adds nothing the name doesn't already tell me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the tests need an AI cleanup pass. Tell it to make the tests maximally self-documenting with minimal additional comments

)


def test_two_statespace_models_can_coexist_with_names(monkeypatch):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see a real end-to-end test here (including sampling, but using the mock_sample pymc.testing fixture)

Comment on lines +246 to +248
Creates new placeholder variables whose names are prefixed with the model name,
applies graph_replace to swap them into all SSM matrices, and rebuilds the
symbolic info objects to reference the new placeholders.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Creates new placeholder variables whose names are prefixed with the model name,
applies graph_replace to swap them into all SSM matrices, and rebuilds the
symbolic info objects to reference the new placeholders.

matrices = [getattr(ssm, name) for name in LONG_MATRIX_NAMES]
replaced_matrices = graph_replace(matrices, replace=replacements, strict=False)

new_ssm = ssm.copy()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this copy necessary?

@opherdonchin
Copy link
Copy Markdown
Author

opherdonchin commented Apr 6, 2026

Hi Jesse, thanks for the review! I think I've addressed all your suggestions:

  • Trimmed the _prefix_placeholder_variables docstring to a one-liner
  • Removed ssm.copy() (unnecessary, as confirmed)
  • Moved the validation logic out of production code — _validate_symbolic_info is gone from structural/core.py and replaced with a @staticmethod _variable_names_match helper inside the test class
  • Removed the redundant class docstring from TestGraphReplacePlaceholderNamespacing
  • Did a cleanup pass on all tests in that class — stripped comments, made tests self-documenting by name
  • Replaced the shallow monkeypatch test in test_statespace.py with a real end-to-end test using mock_pymc_sample that builds two named models in the same pm.Model and samples from both

In the process of writing the end-to-end test I also found and fixed a bug: "obs" was hardcoded in build_statespace_graph in statespace.py, which would cause ValueError: Variable name obs already exists when two named models shared a pm.Model. Changed it to self.prefixed_name("obs").

Let me know if anything needs further adjustment!

@jessegrabowski
Copy link
Copy Markdown
Member

jessegrabowski commented Apr 6, 2026

looks like you need to run pre-commit.

The P0 special handling is a code smell. The name of variable is just syntactic sugar in pytensor. When we do graph_replace it's done by object, so it doesn't matter what the name is. I think it should be safe to just rename P0 as well. Could you try and see? Give it 5 minutes of effort, if it doesn't work it's fine we can leave it for later.

Can you also fire up a notebook and make sure that all post estimation methods (sample_conditional_posterior/prior, sample_unconditional_posterior/prior, impulse_response_function, forecast) work fine with multiple models in one context? You don't have to write tests for all of them, just check once and verify.

@opherdonchin
Copy link
Copy Markdown
Author

Thanks again for the detailed review, @jessegrabowski. I wish I had this kind of comfort with the codebase.

Here's what I've addressed since the last round:

  • Pre-commit: ran ruff-format; one long line in test_statespace.py was wrapped.
  • P0 special-case handling removed: dropped all the if not sv.name.endswith("_P0") / if sv.name != "P0" guards from test_core.py. As you suspected, graph_replace works by object identity, so the fact that __setitem__ overwrites the tensor's .name to "initial_state_cov" doesn't affect correctness. The _variable_names_match helper is now a cleaner prefix-only check.
  • Verification notebook: I ran through all six post-estimation methods (sample_conditional_posterior/prior, sample_unconditional_posterior/prior, impulse_response_function, forecast) with two named LevelTrend models sharing a single pm.Model. Everything comes back clean — no NaNs, no name collisions. I'm attaching the notebook here rather than committing it to the repo.

verify_named_models.ipynb

Let me know if there's anything else worth doing before this is merge-ready!
Opher

@jessegrabowski
Copy link
Copy Markdown
Member

jessegrabowski commented Apr 9, 2026

Some tests are failing because the test model was being passed a name (like "data"). This was a harmless no-op before, but now results in the variable names changing and the test failing. Solution is to just not name these models.

Other failures outside statespace are not your problem.

@jessegrabowski jessegrabowski mentioned this pull request Apr 10, 2026
@opherdonchin
Copy link
Copy Markdown
Author

Fixes in this update (commits 533bd76, db0527c)

1. Dead code in StructuralTimeSeries.__init__

self._name = name or "StructuralTimeSeries" was set but never read anywhere. The real attribute self.name is set by PyMCStateSpace.__init__ via super().__init__(name=name), and prefixed_name() uses self.name throughout. The docstring was also incorrectly claiming a default of name to "StructuralTimeSeries". Removed the dead line and corrected the docstring.

2. Stale docstring in Component.build()

The name parameter description read "Name of the exogenous data being modeled. Default is 'data'" — leftover legacy text for data_name, not the model prefix. Updated with a correct description. I also explicitly mentioned model.prefixed_name(p) as the way to get the expected prior name for each parameter — e.g. after ssm = (trend + seasonal).build(name="m1"), you'd write pm.Normal(ssm.prefixed_name("initial_trend"), ...) rather than hard-coding "m1_initial_trend".

3. getattrname in pymc_model in _insert_random_variables / _insert_data_variables

The old getattr(pymc_model, name, None) is not None would return truthy for any model attribute or property that happened to share a name with a parameter (e.g. logp, compile_fn). Changed to name in pymc_model, which uses PyMC's __contains__ and checks only named_vars.

4. Question: name= support in SARIMAX / VARMAX / ETS / DFM

Issue #602 explicitly called for adding name= to the __init__ of BayesianSARIMAX, VARMAX, ETS, and DFM. None of these currently accept or forward name= to super().__init__(), so users can't use multiple instances of those models in a single PyMC model. Should this be addressed here, or tracked as a follow-up issue?

@jessegrabowski
Copy link
Copy Markdown
Member

Thanks for the changes. I'm not an MCP server, please speak to me like a human.

To answer your question, you can do either. It's in scope for the PR, but I also understand if you want to wrap it up and open an issue for later.

@opherdonchin
Copy link
Copy Markdown
Author

Sorry.

It's not about the AI assist so much as this being my first PR and I'm really not sure about how things work.

If it's ok to put this out there, that would be great. I have colleagues who want to be able to use it from the official version and they don't need the other structures.

I'll open an issue and get the other structures working, though. It shouldn't take much time.

Opher

@jessegrabowski
Copy link
Copy Markdown
Member

AI's great and this PR is great too. Sorry I'm just on edge because I've been dealing with a slop avalanche elsewhere.

We can merge after tests pass (except the ones that arent your issue)

@opherdonchin
Copy link
Copy Markdown
Author

I appreciate it. Thanks!

This was really fun and educaitonal to work on. If you want to point me at something else, feel free. I'll start on it after I get the other models done.

Opher

@opherdonchin
Copy link
Copy Markdown
Author

Sorry. Previous push didn't actually fix the bugs. Try now.

@jessegrabowski
Copy link
Copy Markdown
Member

Sorry. Previous push didn't actually fix the bugs. Try now.

Did you push? I'll trigger the CI as soon as you do

@opherdonchin
Copy link
Copy Markdown
Author

Yes. I pushed.

@jessegrabowski
Copy link
Copy Markdown
Member

Are you sure? There's no new changes since your "try now" comment

@opherdonchin
Copy link
Copy Markdown
Author

I'm pretty sure I'm sure. When I said "try now" that was right after I pushed what I think is the last change. I just check the code in the commit that is in the pull request and the name parmeters that were causing the bugs have been removed, the dead-end self._name is gone from StructuralTimeSeries and the misleading doc strings have been updated.

We could run the tests again, but I'm at least 64% sure.

Opher

@jessegrabowski
Copy link
Copy Markdown
Member

Could you update your local main and rebase onto main? I just merged a PR that clears the test failures that we were seeing here (but that were unrelated to your changes).

I can help with it if you're not up for a git adventure.

…nique graph naming

Create a test to ensure multiple state space models can coexist with distinct names.
StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create
fresh, prefixed placeholder variables in all SSM matrices when a model
name is provided. This replaces the previous metadata-only rename
approach, which left the actual graph placeholders unchanged and caused
silent aliasing when the same Component instance was reused across
multiple named StructuralTimeSeries models.

Key changes:
- Add _prefix_placeholder_variables() method that builds a replacement
  mapping from old to new (prefixed) placeholder variables, applies
  graph_replace across all SSM matrices, and rebuilds SymbolicVariableInfo
  and SymbolicDataInfo with aligned names.
- Add _validate_symbolic_info() diagnostic helper.
- Pass name=name through to PyMCStateSpace.__init__.
- Add 8 focused tests covering the aliasing bug, name alignment,
  graph correctness, and the validation helper.
- Drop fragile sv.name == sv.symbolic_variable.name assertions: the
  tensor .name attribute is overwritten by PytensorRepresentation
  __setitem__ ("initial_state_cov"), but graph_replace works by
  object identity, not name, so correctness is not affected.
- Replace the P0 if-guards in every test with cleaner prefix-only
  assertions; update test_corrupted_metadata_name_raises to match the
  revised helper error message.
- Rename test_unnamed_model_preserves_original_placeholders to
  test_unnamed_model_has_no_prefix_on_variable_names and give it a
  meaningful assertion (registered names == param_names).
- Format test_statespace.py with ruff-format (long line wrap).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants