Skip to content

feat: python tools requirement#1040

Open
akihikokuroda wants to merge 9 commits into
generative-computing:mainfrom
akihikokuroda:pythonrequirement
Open

feat: python tools requirement#1040
akihikokuroda wants to merge 9 commits into
generative-computing:mainfrom
akihikokuroda:pythonrequirement

Conversation

@akihikokuroda
Copy link
Copy Markdown
Member

@akihikokuroda akihikokuroda commented May 7, 2026

Requirement PR

Use this template when adding or modifying requirements in mellea/stdlib/requirements/.

Description

Add requirements for Python code generation.

Implementation Checklist

Base Class

  • Extends appropriate base class:
    • Requirement - standard requirement
    • ALoraRequirement - uses specialized Intrinsic/Adapter for generation-based validation

Validation Logic

  • validation_fn defined (if using Python-based validation)
    • re-usable functionality within the validation_fn should be separated out into mellea/stdlib/tools/
  • validate returns a ValidationResult with
    • a thunk and context if using a backend to generate
    • a specific reason and score when possible

Integration

  • Requirement exported in mellea/stdlib/requirements/__init__.py or, if you are adding a library of requirements, from your sub-module

Testing

  • Tests added to tests/requirements/
  • New code has 100% coverage
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used: claude

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from a team as a code owner May 7, 2026 21:22
@github-actions github-actions Bot added the enhancement New feature or request label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

result=False,
reason="Your code creates plots with pyplot but never calls `plt.savefig()` to save them.\n\n"
"Add this before your plotting code or at the end:\n"
" plt.savefig('{output_path}')\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should match the approach in _make_output_artifacts_validator in the way it handles path/output_path.

I assume this was intended to be a f" string instead of a " string.

"Fix this by adding to the top of your code:\n"
" import matplotlib\n"
" matplotlib.use('Agg')\n\n"
"Then replace `plt.show()` with `plt.savefig('{output_path}'); plt.close()`",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below ie

I think this should match the approach in _make_output_artifacts_validator in the way it handles path/output_path.

I assume this was intended to be a f" string instead of a " string.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this and the the other f" string still need to be changed - the strings starting on line 287 and 311.

My other review comments have been address 👍

Comment thread mellea/stdlib/requirements/python_tools.py Outdated
Comment thread mellea/stdlib/requirements/python_tools.py
Comment thread docs/examples/python_plotting_repair.py Outdated
Comment thread docs/examples/python_plotting_repair.py Outdated

async def main():
"""Run the canonical plotting repair example."""
import tempfile
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move all the imports to the top

module_name = node.module.split(".")[0]
if module_name not in allowed_imports:
unauthorized.append(module_name)
except (SyntaxError, ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not raise these?
This pass means that the function did not do its job.

Most likely this code is unusable elsewhere anyway, but if that external assumption is true it just snuck past the unauthorized import check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are added explaining this.

headless_backends = ("Agg", "Svg", "Cairo", "PDF", "PS", "WebAgg", "nbAgg")
for backend in headless_backends:
if (
f"matplotlib.use('{backend}')" in code
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code would be a false positive.

You might consider using python tokenize to strip comments.
Probably strip docstring too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added simpler way to strip the comments.

Comment thread docs/examples/python_plotting_repair.py Outdated
m = mellea.start_session()

# Create requirements bundle for plotting validation
# Allows matplotlib import (no output_path = skip file creation check)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is too confusing. It always fail w/o output_path because there is a bunch of code to ensure that the output file exists. Then I add output_path and still consistently fail to get a file.

This might be intentional (fail to write file), but if so it needs to be clearer because right now it looks like a bad example that needs fixing/debugging.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. I added comment explaining a little better.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from markstur May 9, 2026 13:28
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these new requirements also feel like they aren't building on pre-existing things in our library: Built on top of the existing uses_tool, tool_arg_validator, and PythonExecutionReq scaffolding. If they can't be, please detail why / improve the underlying implementations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't been putting files directly into docs/examples. Please create a folder for this. I'm not sure what the folder should be; maybe it can go in the existing tools dir? I also see that docs/examples/as_generic_chat_history.py is in that same directory, can you please move it as well (either in this PR or a separate one).

Comment thread docs/examples/python_plotting_repair.py Outdated
@@ -0,0 +1,174 @@
# pytest: ollama, e2e, qualitative
"""Granite 4.1 repairs the three canonical plotting failures with Python tool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an example that lives in our main repo, the underlying model might change (especially since you are using start_session). Can you instead refer to the model or find some other wording to keep maintenance cost lower?

Comment thread docs/examples/python_plotting_repair.py Outdated
Comment on lines +78 to +86
Requirements:
- Use the python tool to execute your code
- Import numpy and matplotlib
- Generate x values from 0 to 2π
- Plot sin(x) against x
- Save the plot to the specified file path

Use the python tool with your complete code."""
instruction = Instruction(description=description)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that these requirements aren't appended as actual "requirements" to the instruction but instead embedded in the description? If possible, I think the preferred way of writing this code would be:

m.instruct(<description>, <requirements>, <sampling_strategy>

We tend not to directly invoke the sampling strategy. Are you doing so just so that you can utilize the bundled python reqs from above?

Comment thread mellea/helpers/imports.py Outdated
def get_unauthorized_imports(
code: str, allowed_imports: list[str] | None = None
) -> list[str]:
r"""Extract unauthorized imports from Python code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a raw string? It's just a docstring.

Comment on lines +17 to +51
FAILURE MATRIX — How each requirement catches the canonical plotting failures:

Scenario: Model generates plotting code with matplotlib

Attempt 1: No tool call
→ MustInvokePythonTool fails
→ Repair: "Call the `python` tool with your code"

Attempt 2: Tool called but no 'code' arg
→ PythonToolHasCodeArg fails
→ Repair: "The python tool requires a 'code' argument"

Attempt 3: Code has syntax error
→ PythonCodeParses fails
→ Repair: "Your code has a syntax error at line X: {error}"

Attempt 4: Code imports matplotlib (not in allowed_imports)
→ PythonImportsAllowed fails
→ Repair: "matplotlib is not allowed. Use only: {allowed_list}"

Attempt 5: Code uses plt.show() without headless backend
→ MatplotlibHeadless fails
→ Repair: "Add matplotlib.use('Agg') and replace plt.show() with plt.savefig(...)"

Attempt 6: Code has plt.plot() but no plt.savefig()
→ PlotsAreSaved fails
→ Repair: "Add plt.savefig('{output_path}') to save the plot"

Attempt 7: Code runs, but output file not created
→ OutputArtifactsExist fails
→ Repair: "File '{output_path}' was not created. Check plt.savefig() call"

Attempt 8: Success
→ All requirements pass
→ Result: plot file exists and is non-empty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop this part of the docstring. The sampling strategies typically determine what actually happens.

Comment on lines +137 to +147
def _sets_headless_backend(code: str) -> bool:
"""Check if code sets matplotlib to use a headless backend."""
clean_code = _strip_comments(code)
headless_backends = ("Agg", "Svg", "Cairo", "PDF", "PS", "WebAgg", "nbAgg")
for backend in headless_backends:
if (
f"matplotlib.use('{backend}')" in clean_code
or f'matplotlib.use("{backend}")' in clean_code
):
return True
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. If we want some matplotlib specific requirements, then we should introduce structure into our requirements folder so that the specificity is easy to find and makes sense.

Comment on lines +150 to +170
def _uses_pyplot_plot(code: str) -> bool:
"""Check if code calls pyplot plotting functions."""
plot_functions = (
"plt.plot",
"plt.bar",
"plt.scatter",
"plt.hist",
"plt.imshow",
"plt.figure",
"plt.subplot",
".plot(",
".bar(",
".scatter(",
".hist(",
)
return any(func in code for func in plot_functions)


def _calls_savefig(code: str) -> bool:
"""Check if code calls plt.savefig() or fig.savefig()."""
return "savefig" in code
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines +381 to +517
class PythonToolRequirements:
"""Pre-composed bundle of requirements for Python code generation via the tool.

This bundle validates the complete Python code generation flow: tool invocation,
syntax, imports, execution, and output. It's designed to work with repair loops
(SOFAI, MultiTurnStrategy) to iteratively fix common plotting failures.

Markers:
- **Deterministic** (unit-testable): tool invocation, syntax, imports, headless backend,
savefig presence, file existence, output limits
- **Qualitative** (needs model to evaluate): execution without error (captured via stderr)

Args:
output_path (str | None): Path where plots should be saved. If specified, enables
output artifact validation. Defaults to None.
allowed_imports (list[str] | None): Allowlist of importable top-level modules.
None (default) allows any import. Set to list like ["numpy", "matplotlib"]
to restrict imports.
output_limit_bytes (int): Maximum bytes of stdout/stderr allowed. Defaults to 50000.
check_output_artifacts (bool): If True, validate that output file exists and is
non-empty after execution. Defaults to True if output_path is specified.

Attributes:
requirements (list[Requirement]): The composed list of requirements, suitable
for use with sampling strategies.
"""

def __init__(
self,
output_path: str | None = None,
allowed_imports: list[str] | None = None,
output_limit_bytes: int = 50_000,
check_output_artifacts: bool | None = None,
):
"""Initialize the Python tool requirements bundle."""
self.output_path = output_path
self.allowed_imports = allowed_imports
self.output_limit_bytes = output_limit_bytes

# Auto-enable output artifact checking if output_path is specified
if check_output_artifacts is None:
check_output_artifacts = output_path is not None

self._check_output_artifacts = check_output_artifacts

self.requirements = self._build_requirements()

def _build_requirements(self) -> list[Requirement]:
"""Build the list of requirements for this bundle."""
reqs: list[Requirement] = []

# Tool invocation requirements (deterministic)
reqs.append(
Requirement(
description="Use the python tool to execute code.",
validation_fn=_validate_python_tool_invoked,
check_only=False,
)
)

reqs.append(
Requirement(
description="The python tool call must include a code argument.",
validation_fn=_validate_python_tool_has_code_arg,
check_only=False,
)
)

# Code quality requirements (deterministic)
reqs.append(
Requirement(
description="The Python code must parse correctly.",
validation_fn=_make_code_parses_validator(),
check_only=False,
)
)

# Import validation (deterministic)
if self.allowed_imports is not None:
reqs.append(
Requirement(
description=f"Imports must be from allowed list: {', '.join(self.allowed_imports)}",
validation_fn=_make_imports_allowed_validator(self.allowed_imports),
check_only=False,
)
)

# Matplotlib-specific requirements (deterministic)
reqs.append(
Requirement(
description=(
"If using pyplot, must set headless backend and use savefig."
),
validation_fn=_make_matplotlib_headless_validator(),
check_only=False,
)
)

reqs.append(
Requirement(
description="If creating plots, must call savefig to save them.",
validation_fn=_make_plots_saved_validator(),
check_only=False,
)
)

# Output artifact validation (deterministic, post-execution)
if self._check_output_artifacts and self.output_path:
reqs.append(
Requirement(
description=f"Output file must be created at {self.output_path}",
validation_fn=_make_output_artifacts_validator(self.output_path),
check_only=False,
)
)

# Output limiting (deterministic)
reqs.append(
Requirement(
description=f"Output must not exceed {self.output_limit_bytes} bytes.",
validation_fn=_make_output_limit_validator(self.output_limit_bytes),
check_only=False,
)
)

return reqs

def __repr__(self) -> str:
"""Return a developer-readable representation."""
return (
f"PythonToolRequirements("
f"output_path={self.output_path!r}, "
f"allowed_imports={self.allowed_imports!r}, "
f"output_limit_bytes={self.output_limit_bytes}, "
f"requirements={len(self.requirements)} items"
f")"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely new class / notation. I'm not sure we want to introduce something bundled like this. If anything, it should likely just be exported as a list of requirements. I don't know if it makes sense to introduce this grouping though. A lot of these python requirements are specific to matplotlib, not python in general.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I feel like this file has a lot going on. I think additional structure (either through folders or splitting up this file), would be helpful. If we need to make sub folders with their own helper function files, that seems reasonable to.

Comment thread mellea/helpers/imports.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the correct place for this file. Lets move it closer to python_tools or in the stdlib.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from jakelorocco May 12, 2026 01:06
@akihikokuroda
Copy link
Copy Markdown
Member Author

I believe I addressed all comments so far.

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. I like how you handled the requirement bundling; I think that's a nice touch.

Handles various matplotlib import styles and fallback to string matching.
"""
if _find_function_calls(code, ["matplotlib.use"]):
headless_backends = {"Agg", "Svg", "Cairo", "PDF", "PS", "WebAgg", "nbAgg"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be extracted as a constant? I imagine that we would always want to use the same set of headless backends no matter the context?

Comment on lines +164 to +167
except (SyntaxError, ValueError):
return _code_contains_strings(
code, [f"matplotlib.use('{b}')" for b in headless_backends]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the control flow so that this error is caught elsewhere, and we only have the code_contains_strings call at the bottom of the function?

Comment on lines +136 to +150
def _uses_pyplot_show(code: str) -> bool:
"""Check if code calls plt.show() or similar show() methods.

Uses AST analysis to robustly detect show() calls regardless of import
aliases (e.g., `import matplotlib.pyplot as mpl`). AST approach detects
actual method calls, avoiding false positives from string literals.
Falls back to string matching only if code doesn't parse.
"""
if _find_attribute_calls(code, ["show"]):
return True
try:
ast.parse(code)
except (SyntaxError, ValueError):
return _code_contains_strings(code, ["plt.show", ".show()"])
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay with this having false-positives? Doesn't this check for any invocation of .show even if it's not plt?

Comment on lines +197 to +214
def _uses_pyplot_plot(code: str) -> bool:
"""Check if code calls pyplot plotting functions.

Uses AST analysis to detect plot-related method calls. Handles import
aliases and detects actual method calls, avoiding false positives from
string literals or method references. Falls back to string matching
only if code doesn't parse.
"""
plot_methods = {"plot", "bar", "scatter", "hist", "imshow", "figure", "subplot"}
if _find_attribute_calls(code, list(plot_methods)):
return True
try:
ast.parse(code)
except (SyntaxError, ValueError):
return _code_contains_strings(
code, [f".{m}(" for m in plot_methods] + [f"plt.{m}" for m in plot_methods]
)
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, there's a risk that false-positives are detected. Can this be combined with one of your other checks that checks for matplotlib being imported as well?

Comment on lines +217 to +231
def _calls_savefig(code: str) -> bool:
"""Check if code calls plt.savefig() or fig.savefig().

Uses AST analysis to robustly detect savefig() calls regardless of
how matplotlib was imported. Detects actual method calls, avoiding
false positives from string literals. Falls back to string matching
only if code doesn't parse.
"""
if _find_attribute_calls(code, ["savefig"]):
return True
try:
ast.parse(code)
except (SyntaxError, ValueError):
return _code_contains_strings(code, ["savefig"])
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines +6 to +31
def get_unauthorized_imports(
code: str, allowed_imports: list[str] | None = None
) -> list[str]:
"""Extract unauthorized top-level imports from Python code."""
if allowed_imports is None:
return []

unauthorized: set[str] = set()
try:
tree = ast.parse(code)
except (SyntaxError, ValueError):
# Syntax errors are validated separately by dedicated validators.
return []

for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
module = alias.name.split(".")[0]
if module not in allowed_imports:
unauthorized.add(module)
elif isinstance(node, ast.ImportFrom) and node.module:
module = node.module.split(".")[0]
if module not in allowed_imports:
unauthorized.add(module)

return sorted(unauthorized)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I mentioned this previously, but is there a reason not to utilize our pre-existing allowed imports checker function?

return score


def extract_python_code(ctx: Context) -> ValidationResult:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the function below it. Can you please describe why it's needed / why the two functions can't be combined?

Comment on lines +85 to +101
def _code_parses(code: str) -> tuple[bool, str | None]:
"""Check if code parses as valid Python.

Returns:
(True, None) if code parses
(False, error_message) if syntax error
"""
try:
ast.parse(code)
return True, None
except SyntaxError as e:
error_msg = f"Syntax error at line {e.lineno}: {e.msg}"
if e.text:
error_msg += f"\n {e.text.rstrip()}"
if e.offset:
error_msg += "\n " + " " * (e.offset - 1) + "^"
return False, error_msg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still unresolved.

Comment on lines +1 to +7
"""Requirement factories for Python tool invocation and code validation.

This module provides generic requirements for Python-tool usage and code
correctness. Plotting-specific checks are exposed separately through
``plotting.python_plotting_requirements(...)`` so they are not implied to be
universal Python-tool requirements.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify this docstring. I don't think this encompasses the plotting tools anymore.

from ..tools.interpreter import StaticAnalysisEnvironment
from .imports import get_unauthorized_imports
from .plotting import python_plotting_requirements
from .python_reqs import extract_python_code
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add more context to this extract_python_code function (especially in the functions that call it). Do we even have a python tool that is invoked by that name? Is the python tool invoked / computed at the point in time that requirements are being checked?

If we keep this function as is, please add notes to the requirements that utilize it that explain how it looks for the python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Requirements library for the Python tool

4 participants