Skip to content

fix: preserve error context across stage / runner / validation paths#15

Open
GrigoryEvko wants to merge 802 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/error-context-preservation
Open

fix: preserve error context across stage / runner / validation paths#15
GrigoryEvko wants to merge 802 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/error-context-preservation

Conversation

@GrigoryEvko

@GrigoryEvko GrigoryEvko commented May 15, 2026

Copy link
Copy Markdown

Four small fixes for the same anti-pattern: error sites that discard useful debugging information before logging or re-raising. All independent, no cross-file coupling.

optimization/utils.py:evaluate_single and the Optuna per-trial path at optimization/optuna/stage.py both rendered ExecRunnerError failures via .stderr.rsplit("\n", 1)[-1] to grab the last line of the worker's captured stderr. That drops the entire traceback above the final RuntimeError: ... summary line — exactly the part a human reading the log needs. Both sites now keep the full stderr in the returned detail string.

optimization/optuna/stage.py:_apply_modifications re-raised the inner SyntaxError as raise ValueError(f"Parameterized code syntax error: {e}") without from e, orphaning __cause__. Logs showed only the wrapped message. Adding from e restores the chain.

ValidateCodeStage re-raised the compile-time SyntaxError as a plain message string — SyntaxError(f"SyntaxError at line {e.lineno}, offset {e.offset}: {e.msg}. Line: ...") — with no .text / .lineno / .offset / .filename set on the new instance. traceback.format_exception then can't render the caret pointing at the offending column. Reconstructing via SyntaxError(msg, (filename, lineno, offset, text)) keeps the structural attributes intact; the original error is still chained. Existing test_syntax_error updated to assert structural preservation (exc.lineno, exc.text) rather than the discarded prefix string.

DagScheduler's timeout and finished-task harvest paths used logger.error("...: {}", e) — only the exception's str() was emitted, no traceback and no __cause__ chain. Replaced with logger.exception(...) so loguru's exception sink renders the full stack.

ShiftingBorders and others added 30 commits April 1, 2026 16:00
short id will generate based on full id when required
KhrulkovV and others added 23 commits April 6, 2026 13:29
refactor(memory): deduplicate code and delete dead paths
Replace hand-written RetrievalWeights.from_mapping() and
CardUpdateDedupConfig.from_mapping() dict parsers (~87 lines) with
Pydantic v2 @model_validator(mode="before") — same behavior, idiomatic.

Also: add docstrings to all functions in card_update_dedup.py, fix
stale test references to deleted _apply_update_actions wrapper, add
test for flat config format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): Pydantic-idiomatic config parsing
- memory_write_example.py → write_pipeline.py (it's production, not an example)
- memory_write_config.py → write_pipeline_config.py
- selected_ideas_6.py → origin_analysis.py (remove versioned filename)
- Delete test_memory_write_example_extended.py (duplicate of test_write_pipeline.py)
- Update all 8 import sites + 1 dynamic importlib.import_module call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): rename write pipeline and analysis files
Extract from card_conversion.py (554 → 420 lines):
- base.py: GigaEvoMemoryBase abstract class (20 lines)
- card_search.py: format_search_results, search_cards_by_keyword,
  synthesize_search_results (115 lines)

Update 4 import sites directly (no re-exports).
card_conversion.py retains: normalization, conversion, GAM config,
constants, protocols.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): split card_conversion into focused modules
Define MemoryError, MemoryRetrieverError, MemorySearchError, and
MemoryStorageError in gigaevo/exceptions.py following the existing
GigaEvoError hierarchy.

Wire them into the memory subsystem:
- gam_search.build() wraps all failures in MemoryRetrieverError
- memory.py narrows two gam.build() catches from bare Exception
- card_store._load() narrows to (json.JSONDecodeError, OSError)
- card_dedup import block narrows to (ImportError, OSError)

Resilience-critical catches (search fallback, merge loop, __exit__)
remain broad by design.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): custom exception hierarchy and narrowed catches
…t base to ABC

- concept_api.py: all 5 RuntimeError raises → MemoryStorageError
  (matches gigaevo/database pattern of wrapping I/O errors)
- base.py: GigaEvoMemoryBase now uses ABC + @AbstractMethod
  (matches MutationOperator, Stage, LangGraphAgent pattern)
- card_dedup.py: narrow two broad catches:
  - JSONL read fallback: except Exception → (json.JSONDecodeError, OSError)
  - GAM store build: except Exception → (MemoryRetrieverError, OSError)
- Update 6 test assertions from RuntimeError to MemoryStorageError

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ormity

refactor(memory): exception conformity + ABC base class
When write_pipeline.py passes MemoryCard/ProgramCard Pydantic models to
memory_platform.save_card(), the dict() call on a Pydantic model doesn't
properly flatten nested Pydantic objects like ConnectedIdea. This caused
TypeError in _persist_index() when json.dumps() tried to serialize.

Root cause: write_pipeline returns list[AnyCard] (Pydantic models) and both
backends (memory_platform and memory/shared_memory) consume these cards via
save_card(). memory_platform's normalize_memory_card() must explicitly call
.model_dump() on Pydantic inputs to flatten nested objects.

Fix verified: all 788 memory + integration tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests the exact bug path: Pydantic MemoryCard/ProgramCard with nested
ConnectedIdea and MemoryCardExplanation objects must be properly
flattened to plain dicts before JSON serialization.

6 tests covering: ProgramCard with ConnectedIdea, MemoryCard with
MemoryCardExplanation, plain dict passthrough, JSON round-trips, None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add gigaevo-memory Git dependency to pyproject.toml
- Remove sys.path manipulation from memory_platform/memory.py and
  remote_gam_retriever.py (no longer needed with proper install)
- Simplify test file to use direct imports instead of module mocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expands from 6 to 11 tests covering the complete save_card → _persist_index
flow with Pydantic inputs. Tests verify:

- normalize_memory_card: ConnectedIdea/MemoryCardExplanation → dict
- save_card: Pydantic ProgramCard/MemoryCard → JSON-serializable index
- _card_to_backend_content: API payload is clean dict
- persist/reload roundtrip: index file survives write→read cycle

Uses _make_platform_memory() factory with mocked API client to test
memory_platform in isolation without network dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docstrings to 15 public methods across 5 files (memory.py,
  concept_api.py, card_dedup.py, openai_inference.py, write_pipeline.py)
- Add return type annotations to 4 functions in amem_gam_retriever.py
- Fix 2 mypy errors: annotate retrievers dict, rename variable in api_sync.py
- Extract magic numbers: _MAX_SUMMARY_CHARS, _MAX_DESCRIPTION_CHARS,
  _ENTITY_NAME_MAX_LENGTH, _MAX_CONNECTED_DESCRIPTIONS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): type annotations, docstrings, constants, platform bug fix
Four small fixes for the same anti-pattern: error sites that discard
useful debugging information before logging or re-raising.

ER2 / Optuna and shared optimization utils: ``ExecRunnerError`` rendering
used ``.stderr.rsplit("\n", 1)[-1]`` to grab the last line of the worker's
captured stderr.  That dropped the entire traceback above the final
``RuntimeError: ...`` line, which is exactly the part a human needs.
Both ``optimization/utils.py:evaluate_single`` and the optuna
``stage.py`` per-trial path now keep the full stderr in the returned
detail string.

ER4 / Optuna ``_apply_modifications``: ``raise ValueError(f"... {e}")``
without ``from e`` orphaned the original SyntaxError, so logs showed only
the wrapped message.  Added ``from e``.

ER7 / ``ValidateCodeStage``: re-raised SyntaxErrors as
``SyntaxError(f"... {e.msg}. Line: `{code}`")`` — a plain message string
with no ``.text``/``.lineno``/``.offset``/``.filename`` set.
``traceback.format_exception`` then couldn't render the caret pointing
at the offending column.  Reconstruct via
``SyntaxError(msg, (filename, lineno, offset, text))`` to keep the
structural attributes intact; the original error is still chained.
Test ``test_syntax_error`` updated to assert structural preservation
rather than the discarded prefix string.

ER8 / ``DagScheduler``: two ``logger.error("...: {}", e)`` sites in the
timeout/finished harvest paths logged only the exception's ``str()`` —
no traceback, no chain.  Replaced with ``logger.exception(...)`` so
loguru's exception sink renders the full stack.

All 1432 stages + dag tests pass.
Hunter-resolver pass on the original four fixes surfaced two more
instances of the same anti-patterns inside the files this PR already
touches.

`wrapper.py` cloudpickle-loads-failure raise: `raise ExecRunnerError(...)`
inside `except Exception as e` was missing `from e`, orphaning the
underlying UnpicklingError / ImportError / AttributeError that triggered
the failure.  Added `from e`.

`dag_runner.py` sibling `logger.error('...: {}', e)` sites — the
original PR fixed 2 of 11 instances in the scheduler loop; the other
9 followed the same anti-pattern in the same control flow (fetch-by-status
failure, orphan-fetch failure, mget-for-launch failure, DAG-build failure
incl. an open-coded `logger.error('Traceback:\n{}', traceback.format_exc())`
two-liner, state-update failure during build-error discard, batch
mark-started failure, DAG run failure, post-run state-transition failure,
and batch RUNNING→DONE flush failure).  All converted to
`logger.exception(...)`; redundant `import traceback` + format_exc
line removed.

Test strengthening for `test_syntax_error`: also asserts
`exc.__cause__ is SyntaxError` and that `traceback.format_exception`
output contains both the offending source line and a caret `^` —
which is the whole point of preserving SyntaxError's structural
attributes.

1432 stages + dag tests pass; ruff clean.
The 1-line modifier added in 9882855 is in the legacy ``WorkerPool``
path of ``wrapper.py``.  PR FusionBrainLab#14 (loky-executor) rewrites the entire
module — the ``run_exec_runner`` function this raise statement lives
in is replaced by ``LokyBackend.execute``, and the cloudpickle-loads
site moves to ``_load_spill``.  Keeping the modifier here forces a
merge conflict with FusionBrainLab#14 regardless of order; removing it lets either
PR merge independently against ``main``.

The equivalent ``from e`` is worth re-adding inside the post-FusionBrainLab#14
``LokyBackend`` cloudpickle path as a follow-up after either PR
merges.
@GrigoryEvko

Copy link
Copy Markdown
Author

Conflict map vs PR #10 (fix/llm-output-sanitization) and PR #14 (loky-executor)

After the pre-emptive revert on this branch (commit b52544c — dropped the 1-line from e modifier in wrapper.py, which would have been dead code post-PR #14), the conflict footprint is:

vs PR #14: none.

vs PR #10:

File Hunks Type
gigaevo/runner/dag_runner.py 11 This PR converts 11 sites from logger.error("...: {}", e) to logger.exception(...); PR #10 wraps the same sites with logger.error("...: {}", sanitize_for_log(str(e)))
gigaevo/programs/stages/optimization/optuna/stage.py 2 ER2 stderr-rsplit reverts + sanitize_for_log overlap
gigaevo/programs/stages/optimization/utils.py 1 Same pattern as optuna stage
gigaevo/programs/stages/validation.py 1 This PR reconstructs SyntaxError preserving structural attrs; PR #10 throws a flattened sanitised string

Resolution (whichever PR merges second):

For the 11 dag_runner.py hunks: logger.exception(...) is strictly more informative — loguru's auto-attached traceback already escapes control bytes — so taking this PR's version supersedes PR #10's sanitize_for_log wrapping there.

For the other 3 files: take the union — keep this PR's structural fixes (full stderr instead of rsplit("\n", 1)[-1]; structurally-rich SyntaxError reconstruction) and apply sanitize_for_log to the interpolated strings. Example for validation.py:

new_err = SyntaxError(
    sanitize_for_log(e.msg or "syntax error"),
    (e.filename, e.lineno, e.offset, e.text),
)
raise new_err from e

The conflict is intrinsic — same lines edited for orthogonal reasons (error-context preservation vs control-byte sanitisation). It is not eliminable at the branch level without gutting most of this PR's scope. Merge order is interchangeable; the second-merged PR needs ~15 line-level unions as documented above.

KhrulkovV added a commit that referenced this pull request May 26, 2026
… designs

flush.py:
- Add kill_run_writers() — kills parent run.py processes writing to target
  DBs before flushing. Previously only exec_runner children were killed,
  leaving the parent alive to immediately repopulate a flushed DB.
- Brief 2s sleep after kills to let processes release Redis connections.

preflight_check.py (check #9):
- When a DB is non-empty, scan for live run.py writers and report their
  PIDs in the failure message so the fix is obvious.

preflight_check.py (check #15):
- Add factorial_design: true support in experiment.yaml. When declared,
  the >2-cells check is skipped (passes with a note). Without the flag,
  the failure message now suggests adding it.
- Add helper _find_run_pids_for_db() used by check #9.

experiment.yaml (heilbron/adversarial-dynamic-updates):
- Add factorial_design: true to silence the false-positive MAJOR.

tests: 20 tests (12 existing + 5 check15 + 3 check9), all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants