Skip to content

fix(pipeline): fail-loudly hygiene + operator override hatches#16

Open
GrigoryEvko wants to merge 2 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/pipeline-builder-hygiene
Open

fix(pipeline): fail-loudly hygiene + operator override hatches#16
GrigoryEvko wants to merge 2 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/pipeline-builder-hygiene

Conversation

@GrigoryEvko
Copy link
Copy Markdown

@GrigoryEvko GrigoryEvko commented May 15, 2026

Four small hygiene fixes around pipeline construction. Theme: configuration surfaces that silently accept ambiguous input are harder to debug than ones that fail loudly.

StageRegistry.register silently overwrote cls._stages[class_name] on duplicate registration. Two modules decorating distinct Stage subclasses with the same class name would shadow each other invisibly; downstream get_stage(name).import_path lookups would resolve to whichever decorator imported last (load-order dependent). Now raises ValueError listing both import paths. Tests that intentionally re-register can call StageRegistry.clear() between registrations — the existing escape hatch.

PipelineBuilder.add_stage had the same silent-overwrite pattern: self._nodes[name] = factory keyed by a caller-supplied string with no collision guard. The class already exposes a replace_stage method as the explicit overwrite path, so making add_stage raise on duplicate is consistent with intent. A subclass that copy-pastes a parent's add_stage call now fails at construction instead of producing a pipeline where downstream wiring references the original factory by name while the registry holds the shadow.

DefaultPipelineBuilder._contribute_default_deps reassigned the whole self._deps dict via self._deps = {...} literal. A subclass that overrides _contribute_default_deps, adds its own deps to self._deps, and then calls super()._contribute_default_deps() would silently lose everything it added. Switched to per-key self._deps.setdefault(stage, []).extend(deps) — subclass-added deps for unrelated stages survive, and deps for stages the parent also contributes are concatenated rather than overwritten.

select_pipeline_builder, CMAOptPipelineBuilder.__init__, and OptunaOptPipelineBuilder.__init__ all branched on problem_context.is_contextual to decide whether to add context wiring, with no operator override path. When the filesystem-inferred heuristic guesses wrong for a specific run, the only escape was switching to a different pipeline YAML. Added a force: str | None = None kwarg accepting "context" / "default" to all three call sites — overrideable from Hydra via +pipeline_builder.force=default. Invalid values raise so typos surface immediately rather than falling through to the heuristic. Factored validation into a shared _resolve_context_flag helper so the three call sites stay in sync.

…s / honor force= override

Three small fixes to the pipeline-builder seam, theme: configuration
that silently does the wrong thing is harder to debug than configuration
that fails at startup.

`StageRegistry.register` silently overwrote `cls._stages[class_name]` on
duplicate registration.  Two modules decorating distinct `Stage` subclasses
with the same class name would shadow each other invisibly; downstream
`get_stage(name).import_path` lookups would resolve to whichever decorator
ran last (load-order dependent).  Now raises `ValueError` listing both
import paths; tests that intentionally re-register can call `clear()`
between (existing pattern).

`DefaultPipelineBuilder._contribute_default_deps` reassigned the whole
`self._deps` dict via `self._deps = {...}` literal.  A subclass that
overrides `_contribute_default_deps`, adds its own deps, and calls
`super()._contribute_default_deps()` afterwards would silently lose
everything it added.  Switched to per-key
`self._deps.setdefault(stage, []).extend(deps)` — subclass-added deps
for unrelated stages survive, and deps for stages the parent also
contributes are concatenated rather than overwritten.

`select_pipeline_builder` had no way for an operator to bypass the
`problem_context.is_contextual` heuristic.  When the auto-detection
guesses wrong (or the operator wants to force a specific builder for a
run), the only escape was switching the whole pipeline YAML.  Added
`force: str | None = None` kwarg accepting `"context"` / `"default"` —
overrideable from Hydra via `+pipeline_builder.force=default`.  Invalid
values raise so typos surface immediately rather than falling back to
heuristic.

5 files changed (3 source + 2 test), +98 / -2.  All 4493 tests pass.
Hunter-resolver pass on PC2/PC3/PC4 surfaced three more instances of the
same anti-patterns in the same files this PR touches.

`PipelineBuilder.add_stage` had the exact `StageRegistry.register`
silent-overwrite pattern: `self._nodes[name] = factory` keyed by a
caller-supplied string with no collision guard. The class already exposes
`replace_stage` as the explicit overwrite escape hatch, so making
`add_stage` raise on duplicate is consistent with intent. A subclass
copy-pasting a parent `add_stage` call now fails at construction instead
of producing a working-but-wrong pipeline (the second factory shadows the
first, but downstream wiring still references the first by name).

`CMAOptPipelineBuilder` and `OptunaOptPipelineBuilder` both branched on
`ctx.problem_ctx.is_contextual` to decide whether to add context wiring,
with no operator override — same PC4 shape as `select_pipeline_builder`.
Added `force: str | None = None` kwarg (`"context"` / `"default"` / None)
to each constructor, overrideable via Hydra
`+pipeline_builder.force=default`. Factored the validation into a
`_resolve_context_flag` helper that `select_pipeline_builder` now
delegates to for consistency.

All 6 production builder constructions verified to construct cleanly under
the new `add_stage` raise (no latent duplicate add_stage in DefaultPipelineBuilder,
ContextPipelineBuilder, CMAOptPipelineBuilder, OptunaOptPipelineBuilder
for both contextual and non-contextual problem contexts).

Regression tests added (97 unit tests pass; full repo 4500 still green):
- `tests/entrypoint/test_default_pipelines.py::TestPipelineBuilder::test_add_stage_duplicate_raises`
- `TestCMAOptPipelineBuilder::test_force_default_skips_context_when_inferred_contextual`
- `TestCMAOptPipelineBuilder::test_force_context_adds_context_when_inferred_non_contextual`
- `TestCMAOptPipelineBuilder::test_force_invalid_raises`
- `TestOptunaOptPipelineBuilder::test_force_default_skips_context_when_inferred_contextual`
- `TestOptunaOptPipelineBuilder::test_force_context_adds_context_when_inferred_non_contextual`
- `TestOptunaOptPipelineBuilder::test_force_invalid_raises`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant