Named StateSpaceModel v2 – rebase and updates after #607 by opherdonchin · Pull Request #654 · pymc-devs/pymc-extras

opherdonchin · 2026-02-20T01:03:44Z

Summary

This PR reintroduces the Named StateSpaceModel changes after rebasing onto main following the merge of #607.

The previous version was submitted as #611. This update incorporates the changes introduced in #607 and resolves the resulting conflicts while preserving the intended named state space behavior.

No new conceptual changes are introduced beyond what was discussed in #611.

Changes

Rebased named state space branch onto current main
Resolved conflicts introduced by Represent statespace metadata with dataclasses #607
Ensured compatibility with updated state space internals
Cleaned working tree (no unrelated changes included)

Tests

Environment:

Linux (Fedora) and Windows 11
Python 3.14.2 (Linus) and 3.12 (Windows)
Commit: b173e6e

Results:

tests/statespace/core
- 182 passed
- 2 skipped (JAX tests require nutpie)
- 0 failed
tests/statespace/filters
- 59 passed
- 1 failed (test_kalman_filter_jax[cholesky])

The single failing test occurs in the JAX Cholesky filter path and appears unrelated to the named model changes. All standard (non-JAX) filter paths and core state space tests pass.

Notes

This PR supersedes Add optional name parameter to PyMCStateSpace for multiple models #611.
The branch has been rebased onto current main.
No out-of-scope fixes are included.

jessegrabowski · 2026-02-20T10:19:21Z

It looks like you checked in several files related to your local dev environment, could you clean it up?

opherdonchin · 2026-02-20T20:12:50Z

Sorry about that. The current PR includes only the changed files and a test:

statespace.py
data_tools.py
test_namespace.py

codecov-commenter · 2026-02-21T06:51:55Z

Codecov Report

❌ Patch coverage is 75.75758% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.24%. Comparing base (0bcccd5) to head (1bf3670).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_extras/statespace/core/statespace.py	72.41%	8 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #654       +/-   ##
===========================================
+ Coverage   42.43%   69.24%   +26.81%     
===========================================
  Files          63       73       +10     
  Lines        7207     7738      +531     
===========================================
+ Hits         3058     5358     +2300     
+ Misses       4149     2380     -1769

Files with missing lines	Coverage Δ
pymc_extras/statespace/utils/data_tools.py	`69.56% <100.00%> (+53.04%)`	⬆️
pymc_extras/statespace/core/statespace.py	`78.44% <72.41%> (+63.87%)`	⬆️

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jessegrabowski

Nice, this is looking really clean. Just a few requests.

Also be sure to install pre-commit (pip install pre-commit; pre-commit install; pre-commit run --all) so your PR gets formatted and linted correctly.

jessegrabowski · 2026-02-21T06:48:11Z

            )

-        if name in self._tensor_variable_info:
+        gname = self.graph_name(name)


As above. gname is too cryptic. Another option would be to change name to base_name and make name the prefixed version, since this will now be the "canonical" name that users see.

Under the new scheme (base_name for the unprefixed name and name for the name in the graph) it makes most sense to just call this variable name.

jessegrabowski · 2026-02-21T06:48:49Z

            raise ValueError(
-                f"{name} is not a model parameter. All placeholder variables should correspond to model "
-                f"parameters."
+                f"{name} is not a model data-variable. All placeholder variables should correspond to model "


Suggested change

f"{name} is not a model data-variable. All placeholder variables should correspond to model "

f"{name} is not a model data variable. All placeholder variables should correspond to model "

jessegrabowski · 2026-02-21T06:48:57Z

-                f"{name} is not a model parameter. All placeholder variables should correspond to model "
-                f"parameters."
+                f"{name} is not a model data-variable. All placeholder variables should correspond to model "
+                f"data-variables."


Suggested change

f"data-variables."

f"data variables."

jessegrabowski · 2026-02-21T06:50:37Z

                n_obs=self.ssm.k_endog,
                obs_coords=obs_coords,
                register_data=True,
+                data_name=self.graph_name("data"),


Let's let users choose the data name in the PyMCStateSpace constructor in this PR as well, instead of hard coding it to "data". The default can still be data, but then this line becomes data_name = self.graph_name(self.data_name)

jessegrabowski · 2026-02-21T06:52:07Z

+from pymc_extras.statespace.core.statespace import PyMCStateSpace
+
+
+def test_two_statespace_models_can_coexist_with_names(monkeypatch):


put this test in the test_model file, it doesn't need a new file.

opherdonchin · 2026-02-21T11:39:54Z

The new commit includes all changes suggested and the pre-commit processing.

jessegrabowski · 2026-02-23T06:34:41Z

Really like where this landed, it's looking very clean.

Could you double-check that we don't need to make any changes to Component or StructuralTimeSeries? Those are in the models/structural module, and are a sort-of parallel StatespaceModel that needs to be cleaned up some day.

opherdonchin · 2026-02-25T13:02:46Z

I'm working on the fix for structural models and in working on the __init__ for StructuralTimeSeries, I ran into a problem.

Components create placeholders during make_symbolic_graph(), before we know the final StructuralTimeSeries(name=...). If a component is used in multiple StructuralTimeSeries models with different names, it could lead to silent aliasing which would be hard to track and fix.

I see a few options and I need help choosing:

Defer placeholder creation until model build. That is, components only provide a build() function which will create the placeholders and then names are applied when the model is actually constructed by StructuralTimeSeries. This is probably the cleanest solution, but it might be a significant refactor.
Allow component names independent of model name. On model build test that the component objects are not being used twice in the same model (this silently allows components to be used twice in different PyMC models, but that should be less dangerous). Extend name functionality separately to components and StructuralTimeSeries objects. This is consistent with current PyMC practice of forcing user to be responsible to avoid renaming. It would be possible today (but a little weird) to use the same PyTensor object in different PyMC graphs, so that is also consistent with current practice.
Force canonical naming in structural models. Do not allow multiple structural models in the same PyMC graph. This is what we had before for all state space models. We could create an issue for this and fix it later.

Please let me know which direction you'd like to take it.

jessegrabowski · 2026-02-26T00:41:06Z

We should be able to use graph_replace to swap out all the placeholder variables with new placeholders that have the prefixed name. So the components themselves don't need to know about this naming convention, the __init__ method in StructuralStateSpace can just handle it.

You can check PyMCStateSpace._insert_random_variables for a pattern on how to collect the dummies and do a replace, but it should look something like replacements = {var: var.type(name=self.prefix_name(var.name)) for var in explicit_graph_inputs(statespace_matrices)}

AlexAndorra · 2026-03-29T17:28:30Z

Any help from me needed here @opherdonchin @jessegrabowski ?

opherdonchin · 2026-04-04T05:42:17Z

@AlexAndorra: thanks for the nudge!

I think what I've done makes sense and follows @jessegrabowski's suggestion. The branch is now rebased onto current main (v0.10.0) with a clean 3-commit history.

What changed in the new commit:

StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create fresh, prefixed placeholder variables in all SSM matrices when a model name is provided. A single _prefix_placeholder_variables() method collects all placeholders from tensor_variable_info and tensor_data_info, builds new variables via old_var.type(name=self.prefixed_name(old_var.name)), applies graph_replace to all 9 SSM matrices, and rebuilds the symbolic info objects with aligned names. Components remain unaware of the naming convention.

I added appropriate tests in TestGraphReplacePlaceholderNamespacing.

Let me know if this is a good fix and if anything else is needed before this is ready to merge?

Thanks!
Opher

jessegrabowski

some small quibbles, but this looks pretty much good to go!

jessegrabowski · 2026-04-06T13:44:58Z

+
+        return new_ssm, new_variable_info, new_data_info
+
+    def _validate_symbolic_info(self):


This function is never called outside of tests. If you don't want a check in __init__ we should make this a test utility.

jessegrabowski · 2026-04-06T13:48:22Z

+
+
+class TestGraphReplacePlaceholderNamespacing:
+    """Tests for the graph_replace-based placeholder namespacing in StructuralTimeSeries."""


Suggested change

"""Tests for the graph_replace-based placeholder namespacing in StructuralTimeSeries."""

The docstring adds nothing the name doesn't already tell me

All of the tests need an AI cleanup pass. Tell it to make the tests maximally self-documenting with minimal additional comments

jessegrabowski · 2026-04-06T13:54:29Z

        )


+def test_two_statespace_models_can_coexist_with_names(monkeypatch):


I'd rather see a real end-to-end test here (including sampling, but using the mock_sample pymc.testing fixture)

jessegrabowski · 2026-04-06T13:58:59Z

+        Creates new placeholder variables whose names are prefixed with the model name,
+        applies graph_replace to swap them into all SSM matrices, and rebuilds the
+        symbolic info objects to reference the new placeholders.


Suggested change

Creates new placeholder variables whose names are prefixed with the model name,

applies graph_replace to swap them into all SSM matrices, and rebuilds the

symbolic info objects to reference the new placeholders.

jessegrabowski · 2026-04-06T13:59:48Z

+        matrices = [getattr(ssm, name) for name in LONG_MATRIX_NAMES]
+        replaced_matrices = graph_replace(matrices, replace=replacements, strict=False)
+
+        new_ssm = ssm.copy()


is this copy necessary?

opherdonchin · 2026-04-06T17:05:25Z

Hi Jesse, thanks for the review! I think I've addressed all your suggestions:

Trimmed the _prefix_placeholder_variables docstring to a one-liner
Removed ssm.copy() (unnecessary, as confirmed)
Moved the validation logic out of production code — _validate_symbolic_info is gone from structural/core.py and replaced with a @staticmethod _variable_names_match helper inside the test class
Removed the redundant class docstring from TestGraphReplacePlaceholderNamespacing
Did a cleanup pass on all tests in that class — stripped comments, made tests self-documenting by name
Replaced the shallow monkeypatch test in test_statespace.py with a real end-to-end test using mock_pymc_sample that builds two named models in the same pm.Model and samples from both

In the process of writing the end-to-end test I also found and fixed a bug: "obs" was hardcoded in build_statespace_graph in statespace.py, which would cause ValueError: Variable name obs already exists when two named models shared a pm.Model. Changed it to self.prefixed_name("obs").

Let me know if anything needs further adjustment!

jessegrabowski · 2026-04-06T17:34:20Z

looks like you need to run pre-commit.

The P0 special handling is a code smell. The name of variable is just syntactic sugar in pytensor. When we do graph_replace it's done by object, so it doesn't matter what the name is. I think it should be safe to just rename P0 as well. Could you try and see? Give it 5 minutes of effort, if it doesn't work it's fine we can leave it for later.

Can you also fire up a notebook and make sure that all post estimation methods (sample_conditional_posterior/prior, sample_unconditional_posterior/prior, impulse_response_function, forecast) work fine with multiple models in one context? You don't have to write tests for all of them, just check once and verify.

opherdonchin · 2026-04-06T20:07:15Z

Thanks again for the detailed review, @jessegrabowski. I wish I had this kind of comfort with the codebase.

Here's what I've addressed since the last round:

Pre-commit: ran ruff-format; one long line in test_statespace.py was wrapped.
P0 special-case handling removed: dropped all the if not sv.name.endswith("_P0") / if sv.name != "P0" guards from test_core.py. As you suspected, graph_replace works by object identity, so the fact that __setitem__ overwrites the tensor's .name to "initial_state_cov" doesn't affect correctness. The _variable_names_match helper is now a cleaner prefix-only check.
Verification notebook: I ran through all six post-estimation methods (sample_conditional_posterior/prior, sample_unconditional_posterior/prior, impulse_response_function, forecast) with two named LevelTrend models sharing a single pm.Model. Everything comes back clean — no NaNs, no name collisions. I'm attaching the notebook here rather than committing it to the repo.

verify_named_models.ipynb

Let me know if there's anything else worth doing before this is merge-ready!
Opher

jessegrabowski · 2026-04-09T17:46:27Z

Some tests are failing because the test model was being passed a name (like "data"). This was a harmless no-op before, but now results in the variable names changing and the test failing. Solution is to just not name these models.

Other failures outside statespace are not your problem.

opherdonchin · 2026-04-10T23:08:15Z

Fixes in this update (commits `533bd76`, `db0527c`)

1. Dead code in `StructuralTimeSeries.init`

self._name = name or "StructuralTimeSeries" was set but never read anywhere. The real attribute self.name is set by PyMCStateSpace.__init__ via super().__init__(name=name), and prefixed_name() uses self.name throughout. The docstring was also incorrectly claiming a default of name to "StructuralTimeSeries". Removed the dead line and corrected the docstring.

2. Stale docstring in `Component.build()`

The name parameter description read "Name of the exogenous data being modeled. Default is 'data'" — leftover legacy text for data_name, not the model prefix. Updated with a correct description. I also explicitly mentioned model.prefixed_name(p) as the way to get the expected prior name for each parameter — e.g. after ssm = (trend + seasonal).build(name="m1"), you'd write pm.Normal(ssm.prefixed_name("initial_trend"), ...) rather than hard-coding "m1_initial_trend".

3. `getattr` → `name in pymc_model` in `_insert_random_variables` / `_insert_data_variables`

The old getattr(pymc_model, name, None) is not None would return truthy for any model attribute or property that happened to share a name with a parameter (e.g. logp, compile_fn). Changed to name in pymc_model, which uses PyMC's __contains__ and checks only named_vars.

4. Question: `name=` support in SARIMAX / VARMAX / ETS / DFM

Issue #602 explicitly called for adding name= to the __init__ of BayesianSARIMAX, VARMAX, ETS, and DFM. None of these currently accept or forward name= to super().__init__(), so users can't use multiple instances of those models in a single PyMC model. Should this be addressed here, or tracked as a follow-up issue?

jessegrabowski · 2026-04-10T23:29:54Z

Thanks for the changes. I'm not an MCP server, please speak to me like a human.

To answer your question, you can do either. It's in scope for the PR, but I also understand if you want to wrap it up and open an issue for later.

opherdonchin · 2026-04-10T23:37:34Z

Sorry.

It's not about the AI assist so much as this being my first PR and I'm really not sure about how things work.

If it's ok to put this out there, that would be great. I have colleagues who want to be able to use it from the official version and they don't need the other structures.

I'll open an issue and get the other structures working, though. It shouldn't take much time.

Opher

jessegrabowski · 2026-04-10T23:48:21Z

AI's great and this PR is great too. Sorry I'm just on edge because I've been dealing with a slop avalanche elsewhere.

We can merge after tests pass (except the ones that arent your issue)

opherdonchin · 2026-04-11T00:14:42Z

I appreciate it. Thanks!

This was really fun and educaitonal to work on. If you want to point me at something else, feel free. I'll start on it after I get the other models done.

Opher

opherdonchin · 2026-04-11T12:54:54Z

Sorry. Previous push didn't actually fix the bugs. Try now.

jessegrabowski · 2026-04-11T14:34:50Z

Sorry. Previous push didn't actually fix the bugs. Try now.

Did you push? I'll trigger the CI as soon as you do

opherdonchin · 2026-04-11T16:15:24Z

Yes. I pushed.

jessegrabowski · 2026-04-11T17:19:16Z

Are you sure? There's no new changes since your "try now" comment

opherdonchin · 2026-04-11T18:33:38Z

I'm pretty sure I'm sure. When I said "try now" that was right after I pushed what I think is the last change. I just check the code in the commit that is in the pull request and the name parmeters that were causing the bugs have been removed, the dead-end self._name is gone from StructuralTimeSeries and the misleading doc strings have been updated.

We could run the tests again, but I'm at least 64% sure.

Opher

jessegrabowski · 2026-04-11T20:48:49Z

Could you update your local main and rebase onto main? I just merged a PR that clears the test failures that we were seeing here (but that were unrelated to your changes).

I can help with it if you're not up for a git adventure.

…nique graph naming Create a test to ensure multiple state space models can coexist with distinct names.

…d move namespace test

StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create fresh, prefixed placeholder variables in all SSM matrices when a model name is provided. This replaces the previous metadata-only rename approach, which left the actual graph placeholders unchanged and caused silent aliasing when the same Component instance was reused across multiple named StructuralTimeSeries models. Key changes: - Add _prefix_placeholder_variables() method that builds a replacement mapping from old to new (prefixed) placeholder variables, applies graph_replace across all SSM matrices, and rebuilds SymbolicVariableInfo and SymbolicDataInfo with aligned names. - Add _validate_symbolic_info() diagnostic helper. - Pass name=name through to PyMCStateSpace.__init__. - Add 8 focused tests covering the aliasing bug, name alignment, graph correctness, and the validation helper.

- Drop fragile sv.name == sv.symbolic_variable.name assertions: the tensor .name attribute is overwritten by PytensorRepresentation __setitem__ ("initial_state_cov"), but graph_replace works by object identity, not name, so correctness is not affected. - Replace the P0 if-guards in every test with cleaner prefix-only assertions; update test_corrupted_metadata_name_raises to match the revised helper error message. - Rename test_unnamed_model_preserves_original_placeholders to test_unnamed_model_has_no_prefix_on_variable_names and give it a meaningful assertion (registered names == param_names). - Format test_statespace.py with ruff-format (long line wrap).

… for param lookup

…eries and build()

opherdonchin mentioned this pull request Feb 20, 2026

Add optional name parameter to PyMCStateSpace for multiple models #611

Closed

opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from cbea84d to 1bf3670 Compare February 20, 2026 20:09

jessegrabowski requested changes Feb 21, 2026

View reviewed changes

opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from ec6a4bd to 27a28ea Compare February 25, 2026 12:31

opherdonchin force-pushed the named-statespace-v2 branch from 27a28ea to e950db5 Compare April 3, 2026 22:56

jessegrabowski requested changes Apr 6, 2026

View reviewed changes

jessegrabowski mentioned this pull request Apr 10, 2026

Fix failing tests #670

Merged

opherdonchin mentioned this pull request Apr 10, 2026

name= support in SARIMAX / VARMAX / ETS / DFM #671

Open

opherdonchin added 7 commits April 14, 2026 06:49

Add name parameter to PyMCStateSpace and update related methods for u…

038c28d

…nique graph naming Create a test to ensure multiple state space models can coexist with distinct names.

Address PR review: rename name helper, add configurable data_name, an…

2ec724a

…d move namespace test

Sync back to upstream main

1dd78b3

Fix failing tests: remove name from test builds; use model membership…

e8646b0

… for param lookup

Fix dead self._name, correct name param docstrings in StructuralTimeS…

2b4f02e

…eries and build()

jessegrabowski force-pushed the named-statespace-v2 branch from db0527c to 2b4f02e Compare April 14, 2026 11:53

	f"{name} is not a model data-variable. All placeholder variables should correspond to model "
	f"{name} is not a model data variable. All placeholder variables should correspond to model "

		from pymc_extras.statespace.core.statespace import PyMCStateSpace


		def test_two_statespace_models_can_coexist_with_names(monkeypatch):


		return new_ssm, new_variable_info, new_data_info

		def _validate_symbolic_info(self):



		class TestGraphReplacePlaceholderNamespacing:
		"""Tests for the graph_replace-based placeholder namespacing in StructuralTimeSeries."""

	Creates new placeholder variables whose names are prefixed with the model name,
	applies graph_replace to swap them into all SSM matrices, and rebuilds the
	symbolic info objects to reference the new placeholders.

Conversation

opherdonchin commented Feb 20, 2026

Summary

Changes

Tests

Notes

Uh oh!

jessegrabowski commented Feb 20, 2026

Uh oh!

opherdonchin commented Feb 20, 2026

Uh oh!

codecov-commenter commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opherdonchin commented Feb 21, 2026

Uh oh!

jessegrabowski commented Feb 23, 2026

Uh oh!

opherdonchin commented Feb 25, 2026

Uh oh!

jessegrabowski commented Feb 26, 2026

Uh oh!

AlexAndorra commented Mar 29, 2026

Uh oh!

opherdonchin commented Apr 4, 2026

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opherdonchin commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opherdonchin commented Apr 6, 2026

Uh oh!

jessegrabowski commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opherdonchin commented Apr 10, 2026

Fixes in this update (commits 533bd76, db0527c)

1. Dead code in StructuralTimeSeries.__init__

codecov-commenter commented Feb 21, 2026 •

edited

Loading

opherdonchin commented Apr 6, 2026 •

edited

Loading

jessegrabowski commented Apr 6, 2026 •

edited

Loading

jessegrabowski commented Apr 9, 2026 •

edited

Loading

Fixes in this update (commits `533bd76`, `db0527c`)

1. Dead code in `StructuralTimeSeries.init`

2. Stale docstring in `Component.build()`

3. `getattr` → `name in pymc_model` in `_insert_random_variables` / `_insert_data_variables`

4. Question: `name=` support in SARIMAX / VARMAX / ETS / DFM