Skip to content

fix(mutation): keep canonicalization output parseable#5

Open
GrigoryEvko wants to merge 4 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/mutation-canonicalization-syntactic-validity
Open

fix(mutation): keep canonicalization output parseable#5
GrigoryEvko wants to merge 4 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/mutation-canonicalization-syntactic-validity

Conversation

@GrigoryEvko
Copy link
Copy Markdown

_DocstringRemover pops a function or class's only statement (its docstring) and leaves an empty body; ast.unparse then emits def f(): with no suite — invalid Python returned to the canonicalization caller silently.

Reproducer against current main:

from gigaevo.evolution.mutation.mutation_operator import LLMMutationOperator
import ast
ast.parse(LLMMutationOperator._canonicalize_code(
    'def stub():\n    """just a stub."""\n'
))  # before: SyntaxError; after: parses

Fix: insert ast.Pass() (with copy_location) when the body becomes empty (function/class only; modules can stay empty). Plus a round-trip check in _canonicalize_code — re-parse the unparse result, fall back to original on SyntaxError/ValueError/RecursionError.

When `strip_comments_and_docstrings=True` and a function/class body contains only a docstring, popping it leaves the body empty and `ast.unparse` emits `def f():` with no indented suite — re-parsing fails. Append `ast.Pass()` for function/async/class nodes; module bodies are allowed to be empty.
ast.unparse does not validate its output. If any future AST transformer emits unparseable code (the same class of bug the previous commit fixed in _DocstringRemover), `_canonicalize_code` would return invalid Python to its caller silently. Re-parse the result and fall back to the original on round-trip failure.
- _DocstringRemover: copy_location on the inserted Pass so callers that
  compile() the transformed tree directly (without an ast.unparse round-trip)
  don't crash on missing lineno/col_offset.
- _canonicalize_code: broaden the fallback to catch ValueError and
  RecursionError from ast.unparse, not just SyntaxError.
Decorated function and class-with-bases preserve their decorators/bases after the Pass insertion. Non-string leading Expr is not misidentified as a docstring. ast.unparse raising ValueError or RecursionError both fall back to the original code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant