Skip to content

MetaboT v1.1 0 Update dependencies, replace code interpreter, some other improvements#170

Merged
lfnothias merged 28 commits into
mainfrom
dev
May 5, 2026
Merged

MetaboT v1.1 0 Update dependencies, replace code interpreter, some other improvements#170
lfnothias merged 28 commits into
mainfrom
dev

Conversation

@lfnothias

@lfnothias lfnothias commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

User description

MetaboT v1.1 0 Update dependencies, replace code interpreter, someother improvements


PR Type

enhancement, bug_fix, dependencies, tests


Description

  • Replace codeinterpreterapi with PythonREPL for local execution

  • Update dependencies including langchain, openai, pydantic, and numpy

  • Fix deprecated pydantic imports and update requirements

  • Add big_benchmark.csv for automated evaluation


Diagram Walkthrough

flowchart LR
  A["Replace codeinterpreterapi"] -- "with" --> B["PythonREPL"]
  C["Update dependencies"] --> D["langchain, openai, pydantic, numpy"]
  E["Fix imports"] --> F["pydantic"]
  G["Add benchmark"] --> H["big_benchmark.csv"]
Loading

File Walkthrough

Relevant files
Enhancement
14 files
tool_chemicals.py
Update `pydantic` import and type annotations                       
+2/-2     
tool_smiles.py
Update `pydantic` import and type annotations                       
+2/-2     
tool_target.py
Update `pydantic` import and type annotations                       
+2/-2     
tool_taxon.py
Update `pydantic` import and type annotations                       
+5/-7     
tool_filesparser.py
Add `FileAnalyzerInput` schema and update imports               
+6/-2     
agent.py
Pass `llm_instance` to tool constructor                                   
+1/-0     
tool_interpreter.py
Replace `codeinterpreterapi` with `PythonREPL`                     
+133/-103
tool_spectrum.py
Update `pydantic` import and type annotations                       
+2/-2     
tool_merge_result.py
Update `pydantic` import and type annotations                       
+2/-2     
tool_sparql.py
Update `pydantic` import and type annotations                       
+5/-5     
tool_wikidata_query.py
Update `pydantic` import and type annotations                       
+5/-5     
tool_validator.py
Update `pydantic` import and type annotations                       
+2/-2     
database_manager.py
Update memory database manager to use `MemorySaver`           
+40/-39 
langraph_workflow.py
Improve error logging with exception info                               
+1/-1     
Cleanup
1 files
custom_sqlite_file.py
Remove unused custom SQLite checkpoint file                           
+0/-97   
Tests
2 files
evaluation.py
Add automated evaluation setup and checks                               
+74/-59 
installation_test.py
Update installation test setup                                                     
+8/-6     
Configuration changes
1 files
params.ini
Update default OpenAI models to gpt-5.4                                   
+4/-4     
Dependencies
1 files
environment.yml
Update and add dependencies for new features                         
+14/-15 


Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • lfnothias and others added 11 commits April 23, 2026 14:16
    The interpreter tool previously relied on codeinterpreterapi (remote sandbox) which is no longer
    compatible with the current dependency stack. This replaces it with langchain_experimental
    PythonREPL (local execution) while preserving the same agent contract.
    
    Key changes:
    - tool_interpreter.py: rewrite using PythonREPL — injects file previews (columns, dtypes,
      3-row sample) into the LLM prompt so generated code uses exact column names; truncates
      REPL stdout at 2000 chars to prevent context overflow; wires shared llm_instance from
      the agent instead of instantiating a separate model
    - agent.py: passes llm_instance to tool constructor via import_tools introspection
    - params.ini: bump default OpenAI models from gpt-4o / gpt-4o-mini to gpt-5.4 / gpt-5.4-mini
    - requirements.txt / environment.yml: pin langchain_experimental==0.3.4 and remove
      codeinterpreterapi / codeboxapi dependencies
    - Remove custom_sqlite_file.py (unused after langgraph checkpoint refactor)
    
    Security note: PythonREPL executes LLM-generated code in-process without sandboxing.
    A warning comment has been added; restrict to trusted deployment environments.
    @dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 24, 2026
    @coderabbitai

    coderabbitai Bot commented Apr 24, 2026

    Copy link
    Copy Markdown

    Note

    Reviews paused

    It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

    Use the following commands to manage reviews:

    • @coderabbitai resume to resume automatic reviews.
    • @coderabbitai review to trigger a single review.

    Use the checkboxes below for quick actions:

    • ▶️ Resume reviews
    • 🔍 Trigger review

    Walkthrough

    Migrate many tools from LangChain pydantic shims to native pydantic, add explicit type annotations, replace in-process code-interpreter with LLM-driven script generation + sandboxed subprocess execution, add trusted-mode and session/file security, remove a custom SQLite checkpointer, update memory fallback, add tests, and bump dependencies.

    Changes

    Cohort / File(s) Summary
    Pydantic migration (multiple tools)
    app/core/agents/enpkg/tool_chemicals.py, app/core/agents/enpkg/tool_smiles.py, app/core/agents/enpkg/tool_target.py, app/core/agents/interpreter/tool_spectrum.py, app/core/agents/sparql/tool_merge_result.py, app/core/agents/validator/tool_validator.py
    Replace langchain.pydantic_v1 imports with pydantic and add explicit args_schema: type[BaseModel] = ... annotations.
    Typed attributes & constants
    app/core/agents/enpkg/tool_taxon.py, app/core/agents/sparql/tool_sparql.py, app/core/agents/sparql/tool_wikidata_query.py
    Add/strengthen type annotations (ClassVar[str], name: str, description: str, requires_params: bool) and annotate args_schema as type[BaseModel].
    File parser input schema
    app/core/agents/entry/tool_filesparser.py
    Introduce FileAnalyzerInput: BaseModel and set FileAnalyzer.args_schema: type[BaseModel] = FileAnalyzerInput; switch to native pydantic imports.
    Interpreter tool & sandbox (major)
    app/core/agents/interpreter/tool_interpreter.py, app/core/agents/interpreter/sandbox_runner.py, app/core/agents/interpreter/prompt.py, app/core/tests/test_interpreter_tool.py, app/core/tests/test_sandbox_runner.py
    Replace prior code-interpreter integration with LLM-driven single fenced-Python script generation, robust path extraction/allowlisting, hardened sandboxed subprocess execution (sandbox_runner.py) with AST/import/resource/network/filesystem restrictions, snapshot/changed-file detection, truncated outputs, and extensive tests.
    Tool/agent injection
    app/core/agents/interpreter/agent.py, app/core/agents/sparql/agent.py
    Pass selected llm_instance (and optional openai_key) into import_tools via tool_parameters so tools can access the LLM/OpenAI key.
    Memory / checkpoints
    app/core/memory/custom_sqlite_file.py, app/core/memory/database_manager.py
    Remove custom SqliteCheckpointerSaver; memory_database() falls back to LangGraph’s MemorySaver when DB config is absent (conditional in-function import).
    Security & trusted mode
    app/core/security.py, app/core/utils.py, streamlit_webapp/streamlit_app.py, app/core/agents/supervisor/prompt.py
    Add env-driven trusted-mode and interpreter resource limits, path-safety helpers, post-process config to inject Interpreter_agent when trusted, and gate interpreter/Streamlit features accordingly.
    CLI session staging & main changes
    app/core/main.py, app/core/tests/test_main.py
    Add -f/--file session file staging with validation and collision handling, create session_id and initialize session context early, reinitialize logger per session, and thread --api-key into LLM creation/workflow.
    Workflow & logging
    app/core/workflow/langraph_workflow.py
    On stream errors include full traceback in logs (exc_info=True).
    Tests, evaluation & installation
    app/core/tests/test_security.py, app/core/tests/evaluation.py, app/tests/installation_test.py
    Add/modify tests for security, interpreter, sandbox, main staging; require OPENAI_API_KEY for evaluation, dataset bootstrapping, and ensure repo root on sys.path in installation test.
    Dependencies & docs
    requirements.txt, README.md, streamlit_webapp/README.md
    Remove codeboxapi/codeinterpreterapi, bump LangChain stack to 0.3.x, add langchain_experimental, upgrade other deps, and update README/Streamlit notes and startup guidance.
    Streamlit UI tweaks
    streamlit_webapp/streamlit_utils.py, streamlit_webapp/streamlit_app.py, streamlit_webapp/test_streamlit_app.py
    Adjust agent output key names (Entry_Agent, Validator), expose uploader only in trusted mode, write uploads via resolve_session_path, update imports and UI sizing, and add a smoke test for app import.

    Sequence Diagram(s)

    sequenceDiagram
        actor User
        participant Agent as InterpreterTool
        participant DB as Database
        participant LLM as OpenAI Chat LLM
        participant Sandbox as SandboxRunner
        participant FS as FileSystem
    
        User->>Agent: invoke InterpreterTool(input, session_id)
        alt no explicit file paths
            Agent->>DB: query session files
            DB-->>Agent: file_paths (JSON)
        else explicit paths provided
            Agent-->>Agent: parse quoted/unquoted paths
        end
        Agent->>FS: read previews (truncated)
        FS-->>Agent: previews
        Agent->>LLM: request single fenced Python script
        LLM-->>Agent: returns ```python ... ```
        Agent->>Sandbox: run script via sandbox_runner (json config)
        Sandbox-->>Agent: result {success, stdout, stderr, error}
        Agent->>FS: snapshot session dir, detect new/changed files
        Agent->>DB: persist outputs and discovered file paths
        Agent-->>User: return truncated output + files_created
    
    Loading

    Estimated code review effort

    🎯 4 (Complex) | ⏱️ ~45 minutes

    Possibly related PRs

    • Dev #161 — overlaps on LLM API-key handling and llm_creation/api_key propagation.

    Suggested labels

    Possible security concern, Review effort 3/5

    Suggested reviewers

    • madina1203

    Poem

    🐰 I hop from shims to Pydantic bright,
    LLMs write scripts by soft lamplight.
    A sandbox keeps mischief out of sight,
    Sessions safe, checkpoints take flight.
    New tests and keys — the rabbit's delight.

    🚥 Pre-merge checks | ✅ 4 | ❌ 1

    ❌ Failed checks (1 warning)

    Check name Status Explanation Resolution
    Docstring Coverage ⚠️ Warning Docstring coverage is 10.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
    ✅ Passed checks (4 passed)
    Check name Status Explanation
    Title check ✅ Passed The title references the main v1.1.0 release with multiple focus areas: dependency updates, code interpreter replacement, and other improvements, which aligns with the substantial changeset covering 40+ files.
    Description check ✅ Passed The description relates to the changeset, covering dependency updates, codeinterpreterapi replacement with PythonREPL, pydantic import fixes, and benchmark additions.
    Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
    Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

    ✏️ Tip: You can configure your own custom pre-merge checks in the settings.

    ✨ Finishing Touches
    🧪 Generate unit tests (beta)
    • Create PR with unit tests
    • Commit unit tests in branch dev

    Tip

    💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

    Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

    • Generate code and open pull requests
    • Plan features and break down work
    • Investigate incidents and troubleshoot customer tickets together
    • Automate recurring tasks and respond to alerts with triggers
    • Summarize progress and report instantly

    Built for teams:

    • Shared memory across your entire org—no repeating context
    • Per-thread sandboxes to safely plan and execute work
    • Governance built-in—scoped access, auditability, and budget controls

    One agent for your entire SDLC. Right inside Slack.

    👉 Get started


    Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

    ❤️ Share

    Comment @coderabbitai help to get the list of available commands and usage tips.

    @qodo-code-review

    Copy link
    Copy Markdown

    Review Summary by Qodo

    Replace code interpreter with PythonREPL and fix deprecated pydantic imports

    ✨ Enhancement 🐞 Bug fix

    Grey Divider

    Walkthroughs

    Description
    • Replace codeinterpreterapi with langchain_experimental PythonREPL for local code execution
      - Inject file previews into LLM prompts for accurate column name usage
      - Truncate REPL output at 2000 chars to prevent context overflow
      - Wire shared llm_instance from agent instead of separate model instantiation
    • Fix deprecated pydantic imports across all tool files
      - Replace langchain.pydantic_v1 with native pydantic imports
      - Add type annotations to args_schema and class variables
    • Update dependencies to latest versions
      - langchain, langchain_core, langchain_openai, langgraph, litellm, openai, pydantic
      - Add langchain_experimental==0.3.4 for PythonREPL support
    • Update default OpenAI models from gpt-4o to gpt-5.4 in configuration
    • Remove custom_sqlite_file.py and use langgraph's built-in MemorySaver
    • Improve evaluation script with dataset creation and better error handling
    
    Diagram
    flowchart LR
      A["codeinterpreterapi<br/>remote sandbox"] -->|"replaced with"| B["PythonREPL<br/>local execution"]
      B --> C["File previews<br/>injected to LLM"]
      C --> D["Generated code<br/>with exact columns"]
      D --> E["Output truncated<br/>at 2000 chars"]
      F["Deprecated<br/>langchain.pydantic_v1"] -->|"replaced with"| G["Native pydantic<br/>imports"]
      H["gpt-4o"] -->|"updated to"| I["gpt-5.4"]
      J["custom_sqlite_file.py"] -->|"removed"| K["langgraph MemorySaver"]
    
    Loading

    Grey Divider

    File Changes

    1. app/core/agents/interpreter/tool_interpreter.py Enhancement, bug fix +133/-103

    Replace codeinterpreterapi with PythonREPL implementation

    app/core/agents/interpreter/tool_interpreter.py


    2. app/core/agents/interpreter/agent.py ✨ Enhancement +1/-0

    Pass llm_instance to tool constructor

    app/core/agents/interpreter/agent.py


    3. app/core/agents/enpkg/tool_chemicals.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/enpkg/tool_chemicals.py


    View more (16)
    4. app/core/agents/enpkg/tool_smiles.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/enpkg/tool_smiles.py


    5. app/core/agents/enpkg/tool_target.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/enpkg/tool_target.py


    6. app/core/agents/enpkg/tool_taxon.py 🐞 Bug fix +5/-7

    Fix pydantic imports and add ClassVar annotations

    app/core/agents/enpkg/tool_taxon.py


    7. app/core/agents/entry/tool_filesparser.py 🐞 Bug fix +6/-2

    Fix pydantic imports and create FileAnalyzerInput schema

    app/core/agents/entry/tool_filesparser.py


    8. app/core/agents/interpreter/tool_spectrum.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/interpreter/tool_spectrum.py


    9. app/core/agents/sparql/tool_merge_result.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/sparql/tool_merge_result.py


    10. app/core/agents/sparql/tool_sparql.py 🐞 Bug fix +5/-5

    Fix pydantic imports and add type annotations

    app/core/agents/sparql/tool_sparql.py


    11. app/core/agents/sparql/tool_wikidata_query.py 🐞 Bug fix +5/-5

    Fix pydantic imports and add ClassVar annotations

    app/core/agents/sparql/tool_wikidata_query.py


    12. app/core/agents/validator/tool_validator.py 🐞 Bug fix +2/-2

    Fix pydantic imports and add type annotations

    app/core/agents/validator/tool_validator.py


    13. app/core/memory/custom_sqlite_file.py Miscellaneous +0/-97

    Remove unused custom SQLite checkpoint implementation

    app/core/memory/custom_sqlite_file.py


    14. app/core/memory/database_manager.py ✨ Enhancement +40/-39

    Replace custom SQLite with langgraph MemorySaver

    app/core/memory/database_manager.py


    15. app/core/tests/evaluation.py ✨ Enhancement +74/-59

    Improve evaluation script with dataset creation logic

    app/core/tests/evaluation.py


    16. app/core/workflow/langraph_workflow.py ✨ Enhancement +1/-1

    Add exception info to error logging

    app/core/workflow/langraph_workflow.py


    17. app/tests/installation_test.py ✨ Enhancement +8/-6

    Improve test script with proper path handling

    app/tests/installation_test.py


    18. app/config/params.ini ⚙️ Configuration changes +4/-4

    Update default OpenAI models to gpt-5.4

    app/config/params.ini


    19. environment.yml Dependencies +14/-15

    Update all dependencies to latest compatible versions

    environment.yml


    Grey Divider

    Qodo Logo


    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • @qodo-code-review

    qodo-code-review Bot commented Apr 24, 2026

    Copy link
    Copy Markdown

    Code Review by Qodo

    🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

    Grey Divider


    Action required

    1. LLM code execution RCE ✓ Resolved 🐞 Bug ⛨ Security
    Description
    Interpreter tool executes LLM-generated Python via PythonREPL in the server process without
    sandboxing or execution limits, enabling remote code execution and denial-of-service. This tool is
    reachable from end-user prompts through the Streamlit-driven LangGraph workflow.
    
    Code

    app/core/agents/interpreter/tool_interpreter.py[R147-148]

    +        repl = PythonREPL()
    +        execution_output = repl.run(code)
    Evidence
    The interpreter runs arbitrary Python returned by the model (PythonREPL.run). The app’s workflow
    includes the Interpreter agent, and Streamlit streams user prompts into the LangGraph app, making
    this path user-reachable in typical deployments.
    

    app/core/agents/interpreter/tool_interpreter.py[3-8]
    app/core/agents/interpreter/tool_interpreter.py[128-153]
    app/config/langgraph.json[2-8]
    streamlit_webapp/streamlit_app.py[468-472]

    Agent prompt
    The issue below was found during a code review. Follow the provided context and guidance below and implement a solution
    
    ## Issue description
    `Interpreter` executes LLM-generated Python directly via `PythonREPL().run(code)` in the server process with no sandbox, timeout, resource limits, or I/O/network restrictions. In the Streamlit/LangGraph workflow this is reachable from user prompts, creating an RCE + DoS vector.
    ### Issue Context
    The interpreter agent is part of the configured workflow and Streamlit streams user input into the graph. The tool currently asks the LLM to return Python code and then executes it verbatim.
    ### Fix Focus Areas
    - app/core/agents/interpreter/tool_interpreter.py[84-173]
    - app/config/langgraph.json[2-8]
    - streamlit_webapp/streamlit_app.py[468-472]
    ### Implementation guidance
    - Replace in-process `PythonREPL` execution with an isolated runner (separate OS process / container) that:
    - enforces a hard timeout
    - applies CPU/memory limits
    - runs as a restricted user
    - restricts filesystem access to the session directory
    - disables/limits outbound network by default
    - If isolation is not available, disable/remove this tool from public-facing deployments (e.g., do not register `Interpreter_agent` / do not expose it via the supervisor routing) and gate it behind an explicit trusted-mode flag.
    - Add explicit allowlist for readable input files (e.g., only under the session dir) and validate paths before preview/execution.
    

    ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



    Remediation recommended

    2. File path regex misses dots ✓ Resolved 🐞 Bug ≡ Correctness
    Description
    Interpreter.extract_file_paths() cannot match filenames containing dots in the stem (e.g.,
    "sample.v1.csv"), so user-provided file references may not be detected and the interpreter may
    analyze the wrong fallback files. This is likely in Streamlit because uploaded files are saved using
    the original filename.
    
    Code

    app/core/agents/interpreter/tool_interpreter.py[R175-177]

    def extract_file_paths(self, text: str):
    -        # Regex to find file paths or filenames with extensions, possibly surrounded by quotes
    regex = r"['\"]?([a-zA-Z0-9_/\\-]+(?:\.csv|\.tsv|\.mgf|\.txt|\.xlsx|\.xls))['\"]?"
    matches = re.finditer(regex, text)
    -        file_paths = [match.group(1).replace("'", "").replace('"', '') for match in matches]
    -        return file_paths
    Evidence
    The regex character class for the path stem excludes '.', so dotted filenames won’t match. Streamlit
    persists uploads using file.name, which commonly contains dots, making the mismatch user-visible.
    

    app/core/agents/interpreter/tool_interpreter.py[175-178]
    streamlit_webapp/streamlit_app.py[390-398]

    Agent prompt
    The issue below was found during a code review. Follow the provided context and guidance below and implement a solution
    
    ## Issue description
    `Interpreter.extract_file_paths()` uses a restrictive regex that does not allow `.` in the filename/path stem, so paths like `foo.v1.csv` are not captured.
    ### Issue Context
    Streamlit saves uploaded files using the original `file.name`, which can include dots. If extraction fails, the interpreter falls back to DB-derived paths which may be unrelated to the user’s intended files.
    ### Fix Focus Areas
    - app/core/agents/interpreter/tool_interpreter.py[175-178]
    ### Implementation guidance
    - Update the regex to allow dots in the stem, e.g. include `.` in the character class: `[a-zA-Z0-9_./\\-]+` (and consider spaces if you expect them).
    - Alternatively, avoid regex parsing for paths: pass structured file paths to the interpreter tool explicitly (preferred) and/or validate paths with `Path(fp).suffix` checks.
    

    ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


    3. Missing OpenAI key check 🐞 Bug ☼ Reliability
    Description
    Interpreter allows openai_key to be None and only fails later when the first LLM call is attempted,
    producing a late runtime failure instead of a clear configuration error. This is especially
    problematic when create_all_agents falls back to OPENAI_API_KEY but it is unset.
    
    Code

    app/core/agents/interpreter/tool_interpreter.py[R51-88]

    +    def __init__(self, openai_key: Optional[str] = None, session_id: Optional[str] = None, llm_instance: Optional[Any] = None):
    super().__init__()
    -        self.openai_key = openai_key
    +        self.openai_key = openai_key or os.getenv("OPENAI_API_KEY")
    self.session_id = session_id
    +        self.llm_instance = llm_instance
    def _run(
    self,
    input: str,
    run_manager: Optional[CallbackManagerForToolRun] = None,
    -    ) -> None:
    -
    +    ) -> str:
    logger.info(f"Input: {input}")
    file_paths = self.extract_file_paths(input)
    logger.info(f"File paths: {file_paths}")
    session_dir = create_user_session(self.session_id, user_session_dir=True)
    -
    -        settings.OPENAI_API_KEY = self.openai_key
    -        settings.MODEL = "gpt-3.5-turbo"
    -
    -        with CodeInterpreterSession() as session:
    -
    -            db_manager = tools_database()
    -
    -            user_request = (
    -                "You are an interpreter helping to analyze different questions, files and outputs generated from a series of LLMs."
    -                f"The details of the current request: {input}"
    -                "Please interpret the current request to generate a meaningful answer."
    -                "Here's some instructions that you have to follow for acheiving the task:"
    -                "1. For any file provided, analyse if and provide clear and brief information about it unless something else is asked."
    -                "2. Only if a specific visualization (e.g., bar chart, diagram) is requested in the question, use the provided information to generate a .json file containing the JSON code for a Plotly graph. This file should be named identically to the analyzed file."
    -                "3. After you finish your tasks, your answer should contain both the interpretation asked and the full visualization file name if visualization was requested."
    -            )
    -
    -            files = []
    -
    -            for file in file_paths:
    -                files.append(File.from_path(file))
    -                logger.info(f"File added to interpreter Agent: {file}")
    -
    -            if not files:
    -                
    -                logger.info("No files provided from the Interpreter Agent. Manually scrapping the database")
    -                output_merged = db_manager.get("tool_merge_result")  # Get all file paths from the database associated with the tool_interpreter
    -                if output_merged is not None:
    -                    output_merged_json = json.loads(output_merged)
    -                    merged_filepaths = output_merged_json['output']['paths']
    -
    -                    for merged_filepath in merged_filepaths:
    -                        logger.info(f"File added to interpreter Agent from the Output Merged tool: {merged_filepath}")
    -                        files.append(File.from_path(merged_filepath))
    -
    -            if not files:
    -
    -                sparql_output = db_manager.get("tool_sparql")  # Get all file paths from the database associated with the tool_interpreter
    -                if sparql_output is not None:
    -                    sparql_output_json = json.loads(sparql_output)
    -                    sparql_filepaths = sparql_output_json['output']['paths']
    -
    -                    for sparql_filepath in sparql_filepaths:
    -                        logger.info(f"File added to interpreter Agent from the SPARQL tool: {sparql_filepath}")
    -                        files.append(File.from_path(sparql_filepath))
    -                
    -            # generate the response
    -            logger.info(f"Files submitted: {files}")
    -            response = session.generate_response(user_request, files=files)
    -            logger.info(f"Interpreter Agent Response: {response}")
    -
    -            filepaths = []
    -
    -            # Handling and saving output files
    -            if response.files:
    -                for file in response.files:
    -                    logger.info(f"File: {file.name}")
    -
    -                    generated_file_path = session_dir / file.name
    -                    with open(generated_file_path, 'wb') as f:
    -                        f.write(file.content)
    -                    filepaths.append(str(generated_file_path))
    -
    -                    logger.info(f"File saved: {file.name}")
    -            else:
    -                logger.info("No files generated by Interpreter Tool.")
    -
    -            output_data = {
    -                "output": {
    -                    "paths": filepaths
    -                }
    -            }
    -
    -            
    +        db_manager = tools_database()
    +
    +        if not file_paths:
    +            output_merged = db_manager.get("tool_merge_result")
    +            if output_merged is not None:
    +                merged_json = json.loads(output_merged)
    +                file_paths = merged_json.get("output", {}).get("paths", [])
    +                logger.info(f"File paths from merge result: {file_paths}")
    +
    +        if not file_paths:
    +            sparql_output = db_manager.get("tool_sparql")
    +            if sparql_output is not None:
    +                sparql_json = json.loads(sparql_output)
    +                file_paths = sparql_json.get("output", {}).get("paths", [])
    +                logger.info(f"File paths from SPARQL tool: {file_paths}")
    +
    +        llm = self.llm_instance or ChatOpenAI(
    +            api_key=self.openai_key,
    +            model="gpt-5.4",
    +            temperature=0,
    +        )
    Evidence
    Interpreter sets self.openai_key from the parameter or env var, but does not validate it before
    creating/using an LLM. Agent factory similarly takes OPENAI_API_KEY from env without enforcing
    presence, so the interpreter can reach llm.invoke(...) with missing credentials.
    

    app/core/agents/interpreter/tool_interpreter.py[51-88]
    app/core/agents/interpreter/tool_interpreter.py[128-132]
    app/core/agents/agents_factory.py[33-38]

    Agent prompt
    The issue below was found during a code review. Follow the provided context and guidance below and implement a solution
    
    ## Issue description
    `Interpreter` can be constructed with `openai_key=None` (and env var missing), which leads to a late failure when `llm.invoke()` is called.
    ### Issue Context
    `create_all_agents` may pass `openai_key=None` if `OPENAI_API_KEY` is not set. The interpreter then builds/uses an LLM without explicit validation.
    ### Fix Focus Areas
    - app/core/agents/interpreter/tool_interpreter.py[51-88]
    - app/core/agents/agents_factory.py[33-38]
    ### Implementation guidance
    - In `Interpreter.__init__`, if `llm_instance is None` and `self.openai_key` is falsy, raise a clear `ValueError` (e.g., "OPENAI_API_KEY is required for Interpreter tool") before `_run`.
    - Optionally validate earlier in agent/workflow initialization to fail fast for the whole app when required keys are missing.
    

    ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



    Advisory comments

    4. Sys.path fix too late 🐞 Bug ≡ Correctness
    Description
    installation_test inserts the repo root into sys.path after importing app.core.main, so the sys.path
    adjustment cannot help when running the script from a working directory that doesn’t already include
    the repo root. This makes the intended “run from anywhere” behavior unreliable.
    
    Code

    app/tests/installation_test.py[R1-7]

    import sys
    +from pathlib import Path
    +from app.core.main import main
    +
    +REPO_ROOT = Path(__file__).resolve().parents[2]
    +if str(REPO_ROOT) not in sys.path:
    +    sys.path.insert(0, str(REPO_ROOT))
    Evidence
    Python executes top-level imports before executing the later sys.path mutation, so `from
    app.core.main import main` can fail before the repo root is inserted.
    

    app/tests/installation_test.py[1-8]

    Agent prompt
    The issue below was found during a code review. Follow the provided context and guidance below and implement a solution
    
    ## Issue description
    `installation_test.py` modifies `sys.path` after importing `app.core.main`, making the sys.path workaround ineffective if the import would otherwise fail.
    ### Issue Context
    This script appears intended to be runnable as a quick installation check.
    ### Fix Focus Areas
    - app/tests/installation_test.py[1-8]
    ### Implementation guidance
    - Move the repo-root sys.path insertion above `from app.core.main import main`, or defer the import into the `if __name__ == "__main__":` block after sys.path is adjusted.
    

    ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


    Grey Divider

    Qodo Logo

    @github-actions

    Copy link
    Copy Markdown
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Docstring Format

    Several key tool classes (for example, ChemicalResolver) still use a basic multi‐line description that does not fully comply with the Google docstring style guidelines. Consider updating them to use the recommended format, with parameter types in parentheses and clear sections such as Args, Returns, and Raises.

    class ChemicalResolver(BaseTool):
        name: str = "CHEMICAL_RESOLVER"
        description: str = """
        Resolves chemicals to NPC Class URIs using a specific CSV database.
    
        Args:
            chemical_name str: the chemical name string.
    
        Returns:
            Dict[str, str]: a dictionary that contains the output chemical name and corresponding NPC Class URI.
        """
    Duplicate Import

    The typing module is imported twice for the same symbols (e.g. Optional) in the file. Removing the redundant import will improve code clarity.

    from typing import List, Optional
    
    from typing import Any, Dict, Optional
    from urllib.error import HTTPError, URLError

    @github-actions

    github-actions Bot commented Apr 24, 2026

    Copy link
    Copy Markdown
    Contributor

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    @lfnothias

    Copy link
    Copy Markdown
    Contributor Author

    @CodeRabbit

    @coderabbitai

    coderabbitai Bot commented Apr 24, 2026

    Copy link
    Copy Markdown

    Tip

    For best results, initiate chat on the files or code changes.

    @lfnothias, it looks like your message may have been sent incomplete! How can I help you with this PR? For example, I can:

    • 🔍 Review the changes — trigger a full review of the PR
    • 📝 Answer questions — explain specific changes or design decisions
    • 🛠️ Suggest improvements — analyze code quality, security, or performance
    • 📋 Summarize the PR — provide an overview of what changed and why

    Just let me know what you need!

    @lfnothias

    Copy link
    Copy Markdown
    Contributor Author

    @CodeRabbit review

    @coderabbitai

    coderabbitai Bot commented Apr 24, 2026

    Copy link
    Copy Markdown
    ✅ Actions performed

    Review triggered.

    Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

    Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated

    @coderabbitai coderabbitai Bot left a comment

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 6

    Caution

    Some comments are outside the diff and can’t be posted inline due to platform limitations.

    ⚠️ Outside diff range comments (1)
    app/core/tests/evaluation.py (1)

    141-143: ⚠️ Potential issue | 🟡 Minor

    Update project_metadata.model to reflect the actual default model.

    The hardcoded "gpt-4o" in the project metadata is outdated. The workflow uses llm_o which is now configured as gpt-5.4 in params.ini. Either read the actual model name from the models dictionary (e.g., extract from models["llm_o"]) or update the literal to "gpt-5.4" to ensure LangSmith runs are tagged with the correct model.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/evaluation.py` around lines 141 - 143, The project_metadata
    currently hardcodes "gpt-4o"; update it to reflect the actual default model by
    using the configured models map or the new literal. Replace the literal with the
    value from the models dictionary (e.g., project_metadata={"model":
    models["llm_o"]} or models.get("llm_o", "gpt-5.4")) or directly set it to
    "gpt-5.4"; ensure the symbol models and key "llm_o" are in scope where
    project_metadata is constructed so LangSmith runs are tagged with the correct
    model.
    
    🧹 Nitpick comments (2)
    app/core/memory/database_manager.py (1)

    29-36: Minor: redundant if class_path: guard.

    class_path is guaranteed truthy here because the outer if already requires os.getenv("MEMORY_DATABASE_MANAGER_CLASS") (and the same applies in tools_database). You can drop the inner branch, which also removes the implicit None return if it ever fell through. Same nit applies to lines 13-18.

    ♻️ Suggested simplification
    -    if os.getenv('DATABASE_URL') and os.getenv("MEMORY_DATABASE_MANAGER_CLASS"):
    -        class_path = os.getenv('MEMORY_DATABASE_MANAGER_CLASS')
    -        if class_path:
    -            # Dynamically import the module and class based on the environment variable
    -            module_name, class_name = class_path.rsplit('.', 1)
    -            module = importlib.import_module(module_name)
    -            manager_class = getattr(module, class_name)
    -            return manager_class()
    -    else:
    -        # Use langgraph's built-in MemorySaver which implements the current checkpoint API
    -        from langgraph.checkpoint.memory import MemorySaver
    -        return MemorySaver()
    +    class_path = os.getenv("MEMORY_DATABASE_MANAGER_CLASS")
    +    if os.getenv("DATABASE_URL") and class_path:
    +        module_name, class_name = class_path.rsplit(".", 1)
    +        module = importlib.import_module(module_name)
    +        manager_class = getattr(module, class_name)
    +        return manager_class()
    +
    +    # Use langgraph's built-in MemorySaver which implements the current checkpoint API
    +    from langgraph.checkpoint.memory import MemorySaver
    +    return MemorySaver()
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/memory/database_manager.py` around lines 29 - 36, Remove the
    redundant inner guard around class_path in the environment-driven dynamic import
    logic: the outer condition already checks
    os.getenv("MEMORY_DATABASE_MANAGER_CLASS"), so delete the inner "if class_path:"
    branch and directly perform module_name, class_name = class_path.rsplit('.', 1),
    module = importlib.import_module(module_name), manager_class = getattr(module,
    class_name) and return manager_class(); apply the same simplification to the
    analogous tools_database block (the earlier block that checks the same env var)
    to avoid the unnecessary implicit None return path.
    
    app/core/agents/enpkg/tool_smiles.py (1)

    4-39: LGTM on the pydantic migration.

    BaseModel/Field directly from pydantic and the typed args_schema are correct for the 0.3.x stack.

    Nit (pre-existing): from langchain.tools import BaseTool is imported twice (lines 3 and 14). Can be cleaned up while touching this file.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/agents/enpkg/tool_smiles.py` around lines 4 - 39, The file imports
    BaseTool twice; remove the redundant import and keep a single import of BaseTool
    (referenced by the class SMILESResolver) so there is no duplicate import
    statement; ensure the remaining imports still include BaseModel and Field from
    pydantic, Optional if used elsewhere, CallbackManagerForToolRun if needed, and
    setup_logger, then run linters/import-sort to confirm clean import ordering.
    
    🤖 Prompt for all review comments with AI agents
    Verify each finding against the current code and only fix it if needed.
    
    Inline comments:
    In `@app/core/agents/interpreter/tool_interpreter.py`:
    - Around line 144-160: The current diff only records newly created files using
    existing_files and new_files, so files that were overwritten or modified in
    place (saved to session_dir with the same name) are missed; to fix, before
    calling PythonREPL.run(code) capture a mapping of existing files in session_dir
    to their diagnostics (e.g., stat().st_mtime and st_size or a content hash) using
    session_dir.iterdir() (existing_files -> existing_meta), then after execution
    enumerate session_dir again and compare each path's new mtime/size/hash to the
    saved meta to build a set of changed_or_new_files, replace the new_files
    calculation with that set, and populate filepaths from changed_or_new_files so
    modified files (not just newly created ones) are included in the DB entry and
    logs (referencing session_dir, existing_files/existing_meta, new_files,
    PythonREPL.run, and filepaths).
    - Around line 147-153: The PythonREPL.run call is missing a timeout, which can
    cause the agent to hang; update the call to use the timeout parameter (e.g.,
    repl.run(code, timeout=30) or a configurable value) and wrap the call in a
    try/except to catch the timeout exception (TimeoutError or the specific
    exception thrown by langchain_experimental) and any other execution errors, set
    execution_output to a clear timeout/error message, and log the incident via
    logger.info/logger.error so the agent handles timeouts gracefully (refer to
    PythonREPL.run, execution_output, and logger).
    
    In `@app/core/memory/database_manager.py`:
    - Around line 22-40: The current memory_database() falls back to
    langgraph.checkpoint.memory.MemorySaver (ephemeral) which makes state
    non-persistent for Streamlit and the compiled workflow; decide whether
    persistence is intended and either (A) change the default fallback to a
    persistent saver such as langgraph.checkpoint.sqlite.SqliteSaver (or SqliteSaver
    from langgraph-checkpoint-sqlite) so memory_database() returns a persistent
    checkpointer when MEMORY_DATABASE_MANAGER_CLASS is not set, or (B) make
    persistence explicit via a new env var (e.g., MEMORY_PERSISTENT=true) and, when
    set, instantiate the persistent class (SqliteSaver) otherwise keep MemorySaver;
    update memory_database(), reference to MEMORY_DATABASE_MANAGER_CLASS,
    MemorySaver and tools_database for consistency with tools default
    (SqliteToolsDatabaseManager) so behavior is consistent across environments.
    
    In `@app/core/tests/evaluation.py`:
    - Around line 118-128: In evaluate_result, avoid using `_input.get("question")
    or _input["messages"][0]["content"]` because an empty string will incorrectly
    fall through and can cause KeyError/IndexError; instead explicitly check for
    None/absence of the "question" key (e.g., `question is not None`) and only then
    use it, otherwise validate that `_input` contains a non-empty "messages" list
    and that `messages[0]["content"]` exists; if validation fails raise a clear
    ValueError describing the missing field so callers of evaluate_result get a
    meaningful error before constructing the HumanMessage and calling app.invoke.
    - Around line 44-77: Narrow the broad except around client.read_dataset to only
    catch the LangSmith "not found" error (e.g., LangSmithNotFoundError) instead of
    Exception, preserve original failure causes when re-raising, and change the
    FileNotFoundError raise to use "raise ... from None" to avoid chained context;
    after parsing rows with json.loads into inputs/outputs (used by
    client.create_examples and dataset.id), guard against calling
    client.create_examples if inputs is empty (log and abort dataset creation) and
    fix the double space in the dataset description ("MetaboT Benchmark") so the
    created dataset uses the corrected description.
    
    In `@app/tests/installation_test.py`:
    - Around line 2-7: The import of main occurs before the REPO_ROOT sys.path
    bootstrap, causing ModuleNotFoundError when running the test directly; move the
    sys.path mutation (REPO_ROOT calculation and insertion into sys.path) above the
    import and then import main (from app.core.main import main) after ensuring
    REPO_ROOT is inserted, so adjust the file to compute REPO_ROOT, insert it into
    sys.path if missing, and only then import main.
    
    ---
    
    Outside diff comments:
    In `@app/core/tests/evaluation.py`:
    - Around line 141-143: The project_metadata currently hardcodes "gpt-4o"; update
    it to reflect the actual default model by using the configured models map or the
    new literal. Replace the literal with the value from the models dictionary
    (e.g., project_metadata={"model": models["llm_o"]} or models.get("llm_o",
    "gpt-5.4")) or directly set it to "gpt-5.4"; ensure the symbol models and key
    "llm_o" are in scope where project_metadata is constructed so LangSmith runs are
    tagged with the correct model.
    
    ---
    
    Nitpick comments:
    In `@app/core/agents/enpkg/tool_smiles.py`:
    - Around line 4-39: The file imports BaseTool twice; remove the redundant import
    and keep a single import of BaseTool (referenced by the class SMILESResolver) so
    there is no duplicate import statement; ensure the remaining imports still
    include BaseModel and Field from pydantic, Optional if used elsewhere,
    CallbackManagerForToolRun if needed, and setup_logger, then run
    linters/import-sort to confirm clean import ordering.
    
    In `@app/core/memory/database_manager.py`:
    - Around line 29-36: Remove the redundant inner guard around class_path in the
    environment-driven dynamic import logic: the outer condition already checks
    os.getenv("MEMORY_DATABASE_MANAGER_CLASS"), so delete the inner "if class_path:"
    branch and directly perform module_name, class_name = class_path.rsplit('.', 1),
    module = importlib.import_module(module_name), manager_class = getattr(module,
    class_name) and return manager_class(); apply the same simplification to the
    analogous tools_database block (the earlier block that checks the same env var)
    to avoid the unnecessary implicit None return path.
    
    🪄 Autofix (Beta)

    Fix all unresolved CodeRabbit comments on this PR:

    • Push a commit to this branch (recommended)
    • Create a new PR with the fixes

    ℹ️ Review info
    ⚙️ Run configuration

    Configuration used: Repository UI

    Review profile: CHILL

    Plan: Pro

    Run ID: da086e6d-0f23-4e28-838b-a779bdfe6d75

    📥 Commits

    Reviewing files that changed from the base of the PR and between de6d438 and 8adb7a3.

    ⛔ Files ignored due to path filters (4)
    • app/config/params.ini is excluded by !**/*.ini
    • app/data/big_benchmark.csv is excluded by !**/*.csv, !**/*.csv
    • app/data/evaluation_dataset.csv is excluded by !**/*.csv, !**/*.csv
    • environment.yml is excluded by !**/*.yml
    📒 Files selected for processing (18)
    • app/core/agents/enpkg/tool_chemicals.py
    • app/core/agents/enpkg/tool_smiles.py
    • app/core/agents/enpkg/tool_target.py
    • app/core/agents/enpkg/tool_taxon.py
    • app/core/agents/entry/tool_filesparser.py
    • app/core/agents/interpreter/agent.py
    • app/core/agents/interpreter/tool_interpreter.py
    • app/core/agents/interpreter/tool_spectrum.py
    • app/core/agents/sparql/tool_merge_result.py
    • app/core/agents/sparql/tool_sparql.py
    • app/core/agents/sparql/tool_wikidata_query.py
    • app/core/agents/validator/tool_validator.py
    • app/core/memory/custom_sqlite_file.py
    • app/core/memory/database_manager.py
    • app/core/tests/evaluation.py
    • app/core/workflow/langraph_workflow.py
    • app/tests/installation_test.py
    • requirements.txt
    💤 Files with no reviewable changes (1)
    • app/core/memory/custom_sqlite_file.py

    Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated
    Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated
    Comment on lines +22 to +40
    def memory_database():
    """
    Get the database manager based on the environment variables

    :return: The database manager
    """

    if os.getenv('DATABASE_URL') and os.getenv("MEMORY_DATABASE_MANAGER_CLASS"):
    class_path = os.getenv('MEMORY_DATABASE_MANAGER_CLASS')
    if class_path:
    # Dynamically import the module and class based on the environment variable
    module_name, class_name = class_path.rsplit('.', 1)
    module = importlib.import_module(module_name)
    manager_class = getattr(module, class_name)
    return manager_class()
    else:
    # Use langgraph's built-in MemorySaver which implements the current checkpoint API
    from langgraph.checkpoint.memory import MemorySaver
    return MemorySaver() No newline at end of file

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue | 🟠 Major

    Default memory is now ephemeral — confirm this is intentional for Streamlit/production.

    Falling back to MemorySaver means LangGraph checkpoints live only in the current Python process. For the two call sites of memory_database():

    • streamlit_webapp/streamlit_app.py stores the saver in st.session_state.memory. Any server restart (or Streamlit worker recycle) will drop all conversation state silently — previously the SQLite checkpointer would persist across restarts.
    • app/core/workflow/langraph_workflow.py uses it as the compiled workflow's checkpointer, so thread state won't survive restarts either.

    This also creates an inconsistency with tools_database() above, which still defaults to the persistent SqliteToolsDatabaseManager. If ephemeral memory is desired for dev, great; if not, consider defaulting to a persistent LangGraph checkpointer (e.g. SqliteSaver from langgraph-checkpoint-sqlite) to preserve the previous persistence guarantee.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/memory/database_manager.py` around lines 22 - 40, The current
    memory_database() falls back to langgraph.checkpoint.memory.MemorySaver
    (ephemeral) which makes state non-persistent for Streamlit and the compiled
    workflow; decide whether persistence is intended and either (A) change the
    default fallback to a persistent saver such as
    langgraph.checkpoint.sqlite.SqliteSaver (or SqliteSaver from
    langgraph-checkpoint-sqlite) so memory_database() returns a persistent
    checkpointer when MEMORY_DATABASE_MANAGER_CLASS is not set, or (B) make
    persistence explicit via a new env var (e.g., MEMORY_PERSISTENT=true) and, when
    set, instantiate the persistent class (SqliteSaver) otherwise keep MemorySaver;
    update memory_database(), reference to MEMORY_DATABASE_MANAGER_CLASS,
    MemorySaver and tools_database for consistency with tools default
    (SqliteToolsDatabaseManager) so behavior is consistent across environments.
    

    Comment on lines +44 to +77
    try:
    client.read_dataset(dataset_name=dataset_name)
    logger.info(f"Dataset '{dataset_name}' found in LangSmith workspace.")
    except Exception:
    logger.info(f"Dataset '{dataset_name}' not found. Attempting to create it from local file...")


    if not os.path.exists(local_data_path):
    raise FileNotFoundError(
    f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script.")


    # Load local data and create the dataset in the user's LangSmith account
    df = pd.read_csv(local_data_path)
    dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT Benchmark")

    inputs = []
    outputs = []

    for _, row in df.iterrows():
    try:
    # The CSV stores the JSON arrays as strings, so we need to parse them back into Python objects
    parsed_messages = json.loads(row["messages"])
    parsed_end_state = json.loads(row["__end__"])

    inputs.append({"messages": parsed_messages})
    outputs.append({"__end__": parsed_end_state})

    except json.JSONDecodeError as e:
    logger.warning(f"Skipping a row due to JSON parsing error: {e}")
    continue

    client.create_examples(inputs=inputs, outputs=outputs, dataset_id=dataset.id)
    logger.info(f"Successfully created dataset '{dataset_name}' in LangSmith.")

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue | 🟡 Minor

    🧩 Analysis chain

    🌐 Web query:

    langsmith Python client exception class raised when read_dataset does not find a dataset

    💡 Result:

    The exception class raised by the LangSmith Python client when read_dataset does not find a dataset is LangSmithNotFoundError.

    Citations:


    🏁 Script executed:

    find . -name "evaluation.py" -path "*/core/tests/*" | head -5

    Repository: HolobiomicsLab/MetaboT

    Length of output: 97


    🏁 Script executed:

    wc -l $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1)

    Repository: HolobiomicsLab/MetaboT

    Length of output: 101


    🏁 Script executed:

    sed -n '40,80p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

    Repository: HolobiomicsLab/MetaboT

    Length of output: 1972


    🏁 Script executed:

    sed -n '115,130p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

    Repository: HolobiomicsLab/MetaboT

    Length of output: 632


    🏁 Script executed:

    sed -n '135,150p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

    Repository: HolobiomicsLab/MetaboT

    Length of output: 508


    🏁 Script executed:

    grep -n "from langsmith" $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1)

    Repository: HolobiomicsLab/MetaboT

    Length of output: 165


    🏁 Script executed:

    python3 << 'EOF'
    try:
        from langsmith.utils import LangSmithNotFoundError
        print("✓ LangSmithNotFoundError import from langsmith.utils: SUCCESS")
    except ImportError as e:
        print(f"✗ LangSmithNotFoundError import from langsmith.utils: FAILED - {e}")
    
    try:
        from langsmith import LangSmithNotFoundError
        print("✓ LangSmithNotFoundError import from langsmith: SUCCESS")
    except ImportError as e:
        print(f"✗ LangSmithNotFoundError import from langsmith: FAILED - {e}")
    EOF

    Repository: HolobiomicsLab/MetaboT

    Length of output: 242


    🏁 Script executed:

    rg "LangSmithNotFoundError|langsmith.*Error" --type py | head -20

    Repository: HolobiomicsLab/MetaboT

    Length of output: 48


    🏁 Script executed:

    rg "\.get\(.*\)\s+or\s+" --type py -A 2 | head -40

    Repository: HolobiomicsLab/MetaboT

    Length of output: 410


    🏁 Script executed:

    rg "read_dataset|create_dataset" --type py -B 2 -A 5 | head -50

    Repository: HolobiomicsLab/MetaboT

    Length of output: 1125


    Dataset-bootstrap block: narrow the exception, preserve the cause, and guard against an all-bad CSV.

    Several robustness concerns in this newly added bootstrap path:

    1. Line 47 — blind except Exception. Any error (network, auth, permissions) is silently treated as "dataset missing" and triggers a create attempt. Narrow this to the LangSmith "not found" case (e.g., LangSmithNotFoundError) so real failures surface. (Ruff BLE001)

    2. Lines 52–53 — re-raise inside except. Use raise ... from None to avoid the misleading "During handling of the above exception…" chain. (Ruff B904)

    3. Lines 60–76 — empty-examples edge case. If every row fails json.loads, inputs/outputs stay empty and create_examples is still invoked, leaving an empty dataset created in the user's workspace. Skip the call when inputs is empty.

    4. Line 58 — minor: double space in "MetaboT Benchmark".

    🛠️ Proposed fix
    +from langsmith import LangSmithNotFoundError
    +
     try:
         client.read_dataset(dataset_name=dataset_name)
         logger.info(f"Dataset '{dataset_name}' found in LangSmith workspace.")
    -except Exception:
    +except LangSmithNotFoundError:
         logger.info(f"Dataset '{dataset_name}' not found. Attempting to create it from local file...")
     
     
         if not os.path.exists(local_data_path):
             raise FileNotFoundError(
    -            f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script.")
    +            f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script."
    +        ) from None
     
     
         # Load local data and create the dataset in the user's LangSmith account
         df = pd.read_csv(local_data_path)
    -    dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT  Benchmark")
    +    dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT Benchmark")
     
         inputs = []
         outputs = []
     
         for _, row in df.iterrows():
             try:
                 # The CSV stores the JSON arrays as strings, so we need to parse them back into Python objects
                 parsed_messages = json.loads(row["messages"])
                 parsed_end_state = json.loads(row["__end__"])
     
                 inputs.append({"messages": parsed_messages})
                 outputs.append({"__end__": parsed_end_state})
     
             except json.JSONDecodeError as e:
                 logger.warning(f"Skipping a row due to JSON parsing error: {e}")
                 continue
     
    +    if not inputs:
    +        raise ValueError(
    +            f"No valid examples parsed from '{local_data_path}'; aborting dataset creation."
    +        )
    +
         client.create_examples(inputs=inputs, outputs=outputs, dataset_id=dataset.id)
         logger.info(f"Successfully created dataset '{dataset_name}' in LangSmith.")
    🧰 Tools
    🪛 Ruff (0.15.11)

    [warning] 47-47: Do not catch blind exception: Exception

    (BLE001)


    [warning] 52-53: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/evaluation.py` around lines 44 - 77, Narrow the broad except
    around client.read_dataset to only catch the LangSmith "not found" error (e.g.,
    LangSmithNotFoundError) instead of Exception, preserve original failure causes
    when re-raising, and change the FileNotFoundError raise to use "raise ... from
    None" to avoid chained context; after parsing rows with json.loads into
    inputs/outputs (used by client.create_examples and dataset.id), guard against
    calling client.create_examples if inputs is empty (log and abort dataset
    creation) and fix the double space in the dataset description ("MetaboT
    Benchmark") so the created dataset uses the corrected description.
    

    Comment on lines 118 to 128
    def evaluate_result(_input, thread_id: int = 1):
    """
    Evaluate the result based on input.

    Args:
    _input (dict): Input containing messages to process

    Returns:
    dict: The evaluation result
    """

    # Prepare the message
    """Evaluate the result based on input."""
    # Note: Adjust the 'question' key below if your dataset uses a different input key
    input_text = _input.get("question") or _input["messages"][0]["content"]

    message = {
    "messages": [
    HumanMessage(content=_input["messages"][0]["content"])
    ]
    "messages": [HumanMessage(content=input_text)]
    }


    response = app.invoke(message,
    {"configurable": {"thread_id": thread_id}},)

    response = app.invoke(message, {"configurable": {"thread_id": thread_id}})
    return {"output": response}

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue | 🟡 Minor

    Falsy question falls through to messages[0]["content"].

    _input.get("question") or _input["messages"][0]["content"] also triggers the fallback when question is present but an empty string, which will then KeyError if messages isn't populated. Use an explicit None check, and guard the messages indexing so a malformed example raises a clear error rather than a cryptic KeyError/IndexError.

    🛠️ Proposed fix
    -def evaluate_result(_input, thread_id: int = 1):
    -    """Evaluate the result based on input."""
    -    # Note: Adjust the 'question' key below if your dataset uses a different input key
    -    input_text = _input.get("question") or _input["messages"][0]["content"]
    -
    -    message = {
    -        "messages": [HumanMessage(content=input_text)]
    -    }
    +def evaluate_result(_input, thread_id: int = 1):
    +    """Evaluate the result based on input."""
    +    question = _input.get("question")
    +    if question is not None:
    +        input_text = question
    +    else:
    +        messages = _input.get("messages") or []
    +        if not messages or "content" not in messages[0]:
    +            raise ValueError(f"Example is missing both 'question' and a usable 'messages' entry: {_input!r}")
    +        input_text = messages[0]["content"]
    +
    +    message = {"messages": [HumanMessage(content=input_text)]}
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/evaluation.py` around lines 118 - 128, In evaluate_result,
    avoid using `_input.get("question") or _input["messages"][0]["content"]` because
    an empty string will incorrectly fall through and can cause KeyError/IndexError;
    instead explicitly check for None/absence of the "question" key (e.g., `question
    is not None`) and only then use it, otherwise validate that `_input` contains a
    non-empty "messages" list and that `messages[0]["content"]` exists; if
    validation fails raise a clear ValueError describing the missing field so
    callers of evaluate_result get a meaningful error before constructing the
    HumanMessage and calling app.invoke.
    

    Comment on lines +2 to +7
    from pathlib import Path
    from app.core.main import main

    REPO_ROOT = Path(__file__).resolve().parents[2]
    if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue | 🟠 Major

    Import order defeats the sys.path bootstrap.

    from app.core.main import main on line 3 runs before the REPO_ROOT insertion on lines 5–7. If the repo root isn't already on sys.path (e.g., running python app/tests/installation_test.py directly, which is the scenario this bootstrap is meant to support), the import will fail with ModuleNotFoundError and the path fix below is never reached. Perform the sys.path mutation before importing from app.

    🛠️ Proposed fix
     import sys
     from pathlib import Path
    -from app.core.main import main
    -
    -REPO_ROOT = Path(__file__).resolve().parents[2]
    -if str(REPO_ROOT) not in sys.path:
    -    sys.path.insert(0, str(REPO_ROOT))
    +
    +REPO_ROOT = Path(__file__).resolve().parents[2]
    +if str(REPO_ROOT) not in sys.path:
    +    sys.path.insert(0, str(REPO_ROOT))
    +
    +from app.core.main import main  # noqa: E402  (import after sys.path bootstrap)
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/tests/installation_test.py` around lines 2 - 7, The import of main occurs
    before the REPO_ROOT sys.path bootstrap, causing ModuleNotFoundError when
    running the test directly; move the sys.path mutation (REPO_ROOT calculation and
    insertion into sys.path) above the import and then import main (from
    app.core.main import main) after ensuring REPO_ROOT is inserted, so adjust the
    file to compute REPO_ROOT, insert it into sys.path if missing, and only then
    import main.
    

    madina1203 and others added 4 commits April 24, 2026 16:13
    …mode
    
    - Added Interpreter Agent functionality to execute user-generated code in a controlled environment.
    - Introduced security measures to restrict imports and filesystem access within the interpreter.
    - Configured trusted mode to enable or disable Interpreter Agent features based on environment variables.
    - Updated load_config to conditionally include Interpreter Agent based on trusted mode.
    - Enhanced Streamlit app to reflect trusted mode status and manage file uploads accordingly.
    - Created sandbox_runner for executing code with resource limits and timeout handling.
    - Added security module for managing trusted mode and session path validation.
    - Implemented tests for security features and configuration loading behavior.
    - Documented changes and security implications in the agentic engineering log.
    @dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 24, 2026
    Improve interpreter free-text filepath extraction to handle quoted paths, spaces, and dotted filenames, and track files updated in place as well as newly created outputs. Add focused tests for path parsing, DB fallback behavior, and changed-file detection.
    - Introduced SessionFilePreparationError for better error management during file staging.
    - Enhanced _prepare_session_files function to handle various file-related errors.
    - Added unit tests for session file preparation, covering valid file copying and error scenarios.
    - Documented changes in the agentic engineering log.

    @coderabbitai coderabbitai Bot left a comment

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 8

    🧹 Nitpick comments (5)
    streamlit_webapp/streamlit_app.py (2)

    29-29: Duplicate import os.

    os is already imported at line 4 inside the session-bootstrap block. The re-import at line 29 is harmless but redundant.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@streamlit_webapp/streamlit_app.py` at line 29, Remove the redundant "import
    os" statement (the duplicate import) so only the original import in the
    session-bootstrap block remains; locate the second occurrence of the import
    statement (the extra "import os") and delete it, then run linting/formatting to
    confirm no unused-import warnings remain.
    

    373-400: Style: drop == True / == False comparisons (Ruff E712).

    Static analysis flags lines 373, 374, 390, and 400 for explicit boolean equality. Direct truthiness checks read better and align with the rest of the file:

    ♻️ Suggested change
    -if st.session_state.openai_key_success == True and st.session_state.endpoint_url_success == True:
    -    if st.session_state.langgraph_app_created == False:
    +if st.session_state.openai_key_success and st.session_state.endpoint_url_success:
    +    if not st.session_state.langgraph_app_created:
    @@
    -        if st.session_state.trusted_mode_enabled == False:
    +        if not st.session_state.trusted_mode_enabled:
    @@
    -        if st.session_state.trusted_mode_enabled == True:
    +        if st.session_state.trusted_mode_enabled:
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@streamlit_webapp/streamlit_app.py` around lines 373 - 400, Replace explicit
    boolean equality checks like st.session_state.openai_key_success == True,
    st.session_state.endpoint_url_success == True,
    st.session_state.langgraph_app_created == False, and
    st.session_state.trusted_mode_enabled == False/== True with direct
    truthiness/negation checks (e.g., if st.session_state.openai_key_success:, if
    not st.session_state.langgraph_app_created:, if not
    st.session_state.trusted_mode_enabled:, if
    st.session_state.trusted_mode_enabled:) in the streamlit_app control flow around
    llm_creation and create_workflow so the logic is cleaner and Ruff E712 warnings
    are resolved.
    
    app/core/security.py (1)

    22-38: Optional: handle invalid env var values gracefully.

    If any of METABOT_INTERPRETER_TIMEOUT_SECONDS, METABOT_INTERPRETER_CPU_SECONDS, METABOT_INTERPRETER_MEMORY_MB, or METABOT_INTERPRETER_MAX_FILE_MB are set to a non-integer (e.g., "15s"), int(...) will raise ValueError at call-time, surfacing as a confusing error far from the configuration source. Consider wrapping the parse with a try/except that logs the offending var and falls back to the default.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/security.py` around lines 22 - 38, The parsing of environment
    variables in get_interpreter_timeout_seconds, get_interpreter_cpu_seconds,
    get_interpreter_memory_bytes, and get_interpreter_max_file_bytes can raise
    ValueError for non-integer values; update each function to catch parsing errors
    (ValueError) around int(...) calls, log the variable name and its invalid value
    (or use a provided logger), and fall back to the same default used now (e.g.,
    "15" for timeout, use timeout value for CPU, "512" MB for memory, "64" MB for
    max file) before applying the min/max constraints and unit conversions so
    callers still get a valid int instead of an exception.
    
    app/core/agents/supervisor/prompt.py (1)

    16-27: Note: Interpreter_agent is already absent from supervisor's enum when disabled.

    Defensive prompting is fine, but worth noting that when trusted mode is off, Interpreter_agent is excluded from members (and hence from the route function's enum in app/core/agents/supervisor/agent.py:32). The model is structurally prevented from selecting it, so these prompt gates are belt-and-suspenders. Consider whether the multiple repetitions ("only when Interpreter_agent is available", etc.) actually help the model or just inflate the system prompt — a single paragraph plus the original instructions may suffice.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/agents/supervisor/prompt.py` around lines 16 - 27, The prompt
    repeats availability checks for Interpreter_agent which are redundant because
    Interpreter_agent is already removed from the supervisor members enum when
    disabled; simplify the wording to a single clear paragraph that: states only
    call Interpreter_agent when it appears in members/route enum, otherwise use
    SPARQL answer or state visualization is disabled; preserves behaviors around
    delegation from ENPKG_agent to Sparql_query_runner and when to mark FINISH; and
    references the relevant symbols Interpreter_agent, Sparql_query_runner,
    ENPKG_agent and the supervisor route/members enum so readers can locate and
    update the prompt in app/core/agents/supervisor/prompt.py and the route logic in
    app/core/agents/supervisor/agent.py.
    
    app/core/utils.py (1)

    44-52: Fragile positional insertions — prefer semantic placement.

    Two positional assumptions baked in here:

    • Line 45: trusted_config["agents"].insert(3, interpreter_agent) assumes a specific ordering and at least 3 existing agents. If langgraph.json is reordered or shrinks, the interpreter agent ends up in an unexpected slot (or this raises unexpectedly when the list is shorter than 3 — actually insert is forgiving, but ordering meaning is lost).
    • Line 52: edge["targets"].insert(-1, interpreter_target) assumes the last target is always FINISH/__end__. If a future edit reorders targets, the interpreter target gets placed before some other live target instead.

    Since insertion order doesn't matter for correctness (the routing dict in langraph_workflow.py is built from a comprehension), append is safer:

    ♻️ Suggested refactor
    -    if interpreter_agent not in trusted_config["agents"]:
    -        trusted_config["agents"].insert(3, interpreter_agent)
    +    if interpreter_agent not in trusted_config["agents"]:
    +        trusted_config["agents"].append(interpreter_agent)
    @@
    -    for edge in trusted_config["conditional_edges"]:
    -        if edge["source"] == "supervisor" and interpreter_target not in edge["targets"]:
    -            edge["targets"].insert(-1, interpreter_target)
    +    for edge in trusted_config["conditional_edges"]:
    +        if edge["source"] == "supervisor" and interpreter_target not in edge["targets"]:
    +            edge["targets"].append(interpreter_target)
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/utils.py` around lines 44 - 52, The current code uses fragile
    positional inserts (trusted_config["agents"].insert(3, interpreter_agent) and
    edge["targets"].insert(-1, interpreter_target]) which rely on list ordering;
    change both to semantic appends so placement is not order-dependent: when adding
    the interpreter agent/edge/target in the block that checks trusted_config
    (symbols: trusted_config, interpreter_agent, interpreter_edge,
    interpreter_target, trusted_config["conditional_edges"], edge["targets"],
    trusted_config["agents"], trusted_config["edges"]) replace the positional
    inserts with .append(...) for agents/edges and append the interpreter_target to
    edge["targets"] instead of insert(-1) to ensure safe, order-independent
    addition.
    
    🤖 Prompt for all review comments with AI agents
    Verify each finding against the current code and only fix it if needed.
    
    Inline comments:
    In `@app/core/agents/interpreter/sandbox_runner.py`:
    - Around line 49-63: The AST validation in validate_code currently only blocks
    names but can be bypassed via attribute access; update validate_code to also
    walk and reject ast.Attribute nodes where node.attr exists and node.attr starts
    and ends with double underscores (node.attr.startswith("__") and
    node.attr.endswith("__")), raising ValueError with a clear message (e.g.,
    "Dunder attribute access is not allowed."); keep the existing
    Import/ImportFrom/Name checks and add a brief comment in the function noting
    this is defense-in-depth rather than a security boundary.
    - Around line 248-258: The current try/finally around exec in execute_user_code
    lets exceptions escape the stdout/stderr redirect so captured output is lost;
    wrap the exec(...) call in a try/except BaseException block (inside the existing
    redirect context) and on any exception return a structured result that includes
    success: False, the stdout_buffer.getvalue(), stderr_buffer.getvalue(), and the
    exception string (or traceback) so main() receives printed output even when user
    code raises; keep the existing signal.alarm(0) in the finally block to clear the
    timer.
    
    In `@app/core/agents/interpreter/tool_interpreter.py`:
    - Around line 203-214: The function extract_paths_from_db currently assumes
    payload_json and its "output" field are dicts and can call .get directly, which
    can raise AttributeError for malformed payloads; update extract_paths_from_db to
    validate types before accessing keys: after json.loads(payload) check that
    payload_json is a dict, retrieve output = payload_json.get("output") and if
    isinstance(output, dict) return output.get("paths", []); if output is None or
    not a dict, attempt to safely extract a top-level "paths" list (if
    payload_json.get("paths") is a list) otherwise log a warning via logger.warning
    about the unexpected payload shape (include tool_name and type info) and return
    [] to fail softly; keep the existing JSONDecodeError handling and ensure no
    AttributeErrors propagate.
    - Around line 303-341: The result tempfile result_path is only unlinked on
    timeout and after successful read, leaking it on other exceptions; modify the
    try/finally around the subprocess invocation (the block that calls
    subprocess.Popen and process.communicate) to ensure
    result_path.unlink(missing_ok=True) runs in the same finally that currently
    removes payload_path, and only skip deletion on the explicit success path after
    you have read and processed the result file (e.g., set a local success flag or
    delete conditionally after reading). Update references around subprocess.Popen,
    process.communicate, payload_path.unlink, and any early-return branches so
    result_path is always cleaned up on error paths.
    - Around line 124-128: Wrap the llm.invoke call (where
    SystemMessage/HumanMessage are passed) in a try/except to catch any exceptions
    and log them, then return a user-friendly error string consistent with other
    tool error paths; after a successful call, ensure response.content is normalized
    to a str (e.g., if it's a list/dict/other, serialize or join into a string)
    before passing it to extract_python_code so you avoid TypeError when
    response.content is not a string. Keep references to llm.invoke, SystemMessage,
    HumanMessage, response.content, and extract_python_code when implementing the
    fix.
    
    In `@app/core/agents/supervisor/prompt.py`:
    - Line 33: Fix the typo "aproppriate" → "appropriate" in the supervisor prompt
    text in app/core/agents/supervisor/prompt.py; locate the prompt string (e.g.,
    the PROMPT_TEMPLATE or function build_supervisor_prompt / variable holding the
    prompt) that contains the sentence about delegating interpretation to the
    interpreter agent and correct the misspelling so the sentence reads "appropriate
    agent".
    
    In `@app/core/tests/test_interpreter_tool.py`:
    - Around line 104-118: The test
    test_find_changed_or_new_files_includes_overwritten_file is flaky on filesystems
    with coarse mtime granularity because capture_session_file_metadata compares
    (stat.st_mtime_ns, stat.st_size); ensure the file metadata actually changes by
    modifying the test to either increase the sleep duration to >= 1 second between
    writes or write content with different byte lengths (so stat.st_size differs)
    before calling capture_session_file_metadata, referencing the test function name
    and the methods capture_session_file_metadata and find_changed_or_new_files to
    locate the code to update.
    
    In `@streamlit_webapp/streamlit_app.py`:
    - Around line 401-413: Sanitize client-supplied filenames before writing to
    session_state.user_input_dir: do not trust file.name directly (from
    st.file_uploader). Replace the direct join with a sanitized basename (e.g.,
    Path(file.name).name or os.path.basename(file.name)) or pass the candidate path
    through the existing resolve_session_path(..., must_exist=False) helper to
    ensure path traversal is rejected, then open and write to that safe path; update
    references around user_files loop where file_path is computed so uploads cannot
    escape user_input_dir.
    
    ---
    
    Nitpick comments:
    In `@app/core/agents/supervisor/prompt.py`:
    - Around line 16-27: The prompt repeats availability checks for
    Interpreter_agent which are redundant because Interpreter_agent is already
    removed from the supervisor members enum when disabled; simplify the wording to
    a single clear paragraph that: states only call Interpreter_agent when it
    appears in members/route enum, otherwise use SPARQL answer or state
    visualization is disabled; preserves behaviors around delegation from
    ENPKG_agent to Sparql_query_runner and when to mark FINISH; and references the
    relevant symbols Interpreter_agent, Sparql_query_runner, ENPKG_agent and the
    supervisor route/members enum so readers can locate and update the prompt in
    app/core/agents/supervisor/prompt.py and the route logic in
    app/core/agents/supervisor/agent.py.
    
    In `@app/core/security.py`:
    - Around line 22-38: The parsing of environment variables in
    get_interpreter_timeout_seconds, get_interpreter_cpu_seconds,
    get_interpreter_memory_bytes, and get_interpreter_max_file_bytes can raise
    ValueError for non-integer values; update each function to catch parsing errors
    (ValueError) around int(...) calls, log the variable name and its invalid value
    (or use a provided logger), and fall back to the same default used now (e.g.,
    "15" for timeout, use timeout value for CPU, "512" MB for memory, "64" MB for
    max file) before applying the min/max constraints and unit conversions so
    callers still get a valid int instead of an exception.
    
    In `@app/core/utils.py`:
    - Around line 44-52: The current code uses fragile positional inserts
    (trusted_config["agents"].insert(3, interpreter_agent) and
    edge["targets"].insert(-1, interpreter_target]) which rely on list ordering;
    change both to semantic appends so placement is not order-dependent: when adding
    the interpreter agent/edge/target in the block that checks trusted_config
    (symbols: trusted_config, interpreter_agent, interpreter_edge,
    interpreter_target, trusted_config["conditional_edges"], edge["targets"],
    trusted_config["agents"], trusted_config["edges"]) replace the positional
    inserts with .append(...) for agents/edges and append the interpreter_target to
    edge["targets"] instead of insert(-1) to ensure safe, order-independent
    addition.
    
    In `@streamlit_webapp/streamlit_app.py`:
    - Line 29: Remove the redundant "import os" statement (the duplicate import) so
    only the original import in the session-bootstrap block remains; locate the
    second occurrence of the import statement (the extra "import os") and delete it,
    then run linting/formatting to confirm no unused-import warnings remain.
    - Around line 373-400: Replace explicit boolean equality checks like
    st.session_state.openai_key_success == True,
    st.session_state.endpoint_url_success == True,
    st.session_state.langgraph_app_created == False, and
    st.session_state.trusted_mode_enabled == False/== True with direct
    truthiness/negation checks (e.g., if st.session_state.openai_key_success:, if
    not st.session_state.langgraph_app_created:, if not
    st.session_state.trusted_mode_enabled:, if
    st.session_state.trusted_mode_enabled:) in the streamlit_app control flow around
    llm_creation and create_workflow so the logic is cleaner and Ruff E712 warnings
    are resolved.
    
    🪄 Autofix (Beta)

    Fix all unresolved CodeRabbit comments on this PR:

    • Push a commit to this branch (recommended)
    • Create a new PR with the fixes

    ℹ️ Review info
    ⚙️ Run configuration

    Configuration used: Repository UI

    Review profile: CHILL

    Plan: Pro

    Run ID: 2f10a01f-a2ee-46b8-931a-28b4590ac276

    📥 Commits

    Reviewing files that changed from the base of the PR and between 09b836c and 115678a.

    ⛔ Files ignored due to path filters (1)
    • app/config/langgraph.json is excluded by !**/*.json
    📒 Files selected for processing (11)
    • app/core/agents/agents_factory.py
    • app/core/agents/interpreter/prompt.py
    • app/core/agents/interpreter/sandbox_runner.py
    • app/core/agents/interpreter/tool_interpreter.py
    • app/core/agents/supervisor/agent.py
    • app/core/agents/supervisor/prompt.py
    • app/core/security.py
    • app/core/tests/test_interpreter_tool.py
    • app/core/tests/test_security.py
    • app/core/utils.py
    • streamlit_webapp/streamlit_app.py
    ✅ Files skipped from review due to trivial changes (1)
    • app/core/agents/interpreter/prompt.py

    Comment thread app/core/agents/interpreter/sandbox_runner.py
    Comment thread app/core/agents/interpreter/sandbox_runner.py
    Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated
    Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated
    Comment thread app/core/agents/interpreter/tool_interpreter.py
    Comment thread app/core/agents/supervisor/prompt.py Outdated
    Comment thread app/core/tests/test_interpreter_tool.py
    Comment thread streamlit_webapp/streamlit_app.py
    lfnothias and others added 2 commits April 25, 2026 00:35
    Improve interpreter sandbox validation and failure handling by blocking dunder attribute access, preserving captured output on execution errors, softening malformed DB payload parsing, and cleaning up subprocess tempfiles on all error paths. Also sanitize uploaded filenames in Streamlit, fix the supervisor prompt typo, and add focused regression tests for the sandbox runner and interpreter tool.
    add CLI file input support for FILE_ANALYZER tool when running locally
    @lfnothias

    Copy link
    Copy Markdown
    Contributor Author

    @qodo-code-review

    @coderabbitai coderabbitai Bot left a comment

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 1

    🧹 Nitpick comments (7)
    app/core/tests/test_sparql_tool.py (2)

    5-8: Minor: construct fallback will likely never hit on supported pydantic versions.

    The project is on pydantic v2 (langchain/openai stack in this PR), where model_construct always exists. The construct fallback is effectively dead code unless you explicitly support pydantic v1 in tests. Not blocking — just worth simplifying if v1 support isn't required.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_sparql_tool.py` around lines 5 - 8, The helper
    _construct_tool contains a fallback to GraphSparqlQAChain.construct which is
    unnecessary on pydantic v2; simplify by removing the fallback and always using
    tool_sparql.GraphSparqlQAChain.model_construct(**kwargs) (update the
    _construct_tool function to call model_construct directly and remove the hasattr
    check and construct branch).
    

    55-84: Optional: add coverage for the OPENAI_API_KEY env fallback.

    _construct_tool uses model_construct/construct which bypasses GraphSparqlQAChain.__init__, so the openai_key or os.getenv("OPENAI_API_KEY") fallback at tool_sparql.py:261 is never exercised. Consider an additional test that instantiates via the real __init__ (with mocked LLMChain) or at least pre-sets self.openai_key = None and monkeypatches os.environ to assert the env fallback feeds OpenAIEmbeddings.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_sparql_tool.py` around lines 55 - 84, Add a new test that
    verifies the OPENAI_API_KEY env fallback by ensuring OpenAIEmbeddings receives
    the env value: instantiate the real GraphSparqlQAChain (or construct the same
    Tool instance but set its self.openai_key = None) instead of using
    _construct_tool so the __init__ path runs, monkeypatch
    os.environ["OPENAI_API_KEY"] to a test key, monkeypatch
    tool_sparql.OpenAIEmbeddings to capture the api_key, and call the search method
    (e.g., search_nodes) to assert captured["api_key"] equals the env value;
    reference GraphSparqlQAChain.__init__, _construct_tool, and
    tool_sparql.OpenAIEmbeddings to locate where to change the test.
    
    app/core/agents/sparql/tool_sparql.py (1)

    237-261: Optional: clarify api_key fallback handling

    The current code safely falls back to the OPENAI_API_KEY environment variable when openai_key=None is passed to OpenAIEmbeddings(). However, for better code clarity and to make the fallback intent explicit, consider only passing api_key when a value is present:

    Suggested refactor
    -        embeddings = OpenAIEmbeddings(api_key=self.openai_key)
    +        embeddings = (
    +            OpenAIEmbeddings(api_key=self.openai_key)
    +            if self.openai_key
    +            else OpenAIEmbeddings()
    +        )
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/agents/sparql/tool_sparql.py` around lines 237 - 261, The
    constructor currently sets self.openai_key = openai_key or
    os.getenv("OPENAI_API_KEY"); update usages so the OpenAIEmbeddings instantiation
    (and any other OpenAI client creation) only includes the api_key parameter when
    self.openai_key is truthy—i.e., pass api_key=self.openai_key conditionally
    instead of always passing api_key=None—so that the fallback to the environment
    variable remains explicit and avoids sending a None value; locate this in code
    paths that construct OpenAIEmbeddings or OpenAI clients referenced from this
    class (use the __init__ and self.openai_key symbols to find the relevant spots).
    
    app/core/tests/test_main.py (2)

    52-56: Replace assert False + try/except with pytest.raises (Ruff B011).

    assert False is stripped under python -O, which silently turns these negative-path tests into pass-throughs (Ruff B011). The idiomatic pytest pattern is pytest.raises, which also lets you assert the message via the match= regex and removes the manual try/except boilerplate at lines 52‑56, 76‑80, and 92‑96.

    ♻️ Proposed refactor (apply to all three negative-path tests)
    +import pytest
    @@
    -    try:
    -        main_module._prepare_session_files("session-id", [str(source_dir)])
    -        assert False, "Expected SessionFilePreparationError"
    -    except main_module.SessionFilePreparationError as exc:
    -        assert "not a file" in str(exc)
    +    with pytest.raises(main_module.SessionFilePreparationError, match="not a file"):
    +        main_module._prepare_session_files("session-id", [str(source_dir)])
    @@
    -    try:
    -        main_module._prepare_session_files("session-id", [str(first_file), str(second_file)])
    -        assert False, "Expected SessionFilePreparationError"
    -    except main_module.SessionFilePreparationError as exc:
    -        assert "would overwrite" in str(exc)
    +    with pytest.raises(main_module.SessionFilePreparationError, match="would overwrite"):
    +        main_module._prepare_session_files("session-id", [str(first_file), str(second_file)])
    @@
    -    try:
    -        main_module._prepare_session_files("session-id", [str(staged_file)])
    -        assert False, "Expected SessionFilePreparationError"
    -    except main_module.SessionFilePreparationError as exc:
    -        assert "already staged" in str(exc)
    +    with pytest.raises(main_module.SessionFilePreparationError, match="already staged"):
    +        main_module._prepare_session_files("session-id", [str(staged_file)])

    Also applies to: 76-80, 92-96

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_main.py` around lines 52 - 56, Replace the try/except +
    assert False negative-path tests with pytest.raises: for each occurrence testing
    main_module._prepare_session_files that expects
    main_module.SessionFilePreparationError (including the other two blocks at lines
    76-80 and 92-96), use pytest.raises(main_module.SessionFilePreparationError,
    match="not a file") to both assert the exception is raised and verify the
    message; remove the manual try/except and assert False boilerplate and import
    pytest if not already imported.
    

    100-100: Redundant sys.argv save/restore — monkeypatch already handles cleanup.

    monkeypatch.setattr(sys, "argv", ...) is automatically reverted at test teardown, so saving original_argv and re-monkeypatching it inside finally is unnecessary. Calling monkeypatch.setattr from a finally block also adds an undo entry that fires after the test exits — harmless here, but noisy. You can drop the original_argv bookkeeping and the try/finally wrapping main_module.main() in both tests.

    ♻️ Proposed cleanup
    -def test_main_passes_cli_api_key_and_reconfigures_logger(monkeypatch):
    -    original_argv = sys.argv[:]
    -    state = {"session_initialized": False, "setup_logger_states": []}
    +def test_main_passes_cli_api_key_and_reconfigures_logger(monkeypatch):
    +    state = {"session_initialized": False, "setup_logger_states": []}
    @@
    -    try:
    -        main_module.main()
    -    finally:
    -        monkeypatch.setattr(sys, "argv", original_argv)
    +    main_module.main()
    @@
    -def test_main_prints_user_friendly_error_for_bad_staged_file(monkeypatch, capsys):
    -    original_argv = sys.argv[:]
    -
    -    monkeypatch.setattr(sys, "argv", ["prog", "-c", "hello", "-f", "/missing/file.csv"])
    +def test_main_prints_user_friendly_error_for_bad_staged_file(monkeypatch, capsys):
    +    monkeypatch.setattr(sys, "argv", ["prog", "-c", "hello", "-f", "/missing/file.csv"])
    @@
    -    try:
    -        main_module.main()
    -    finally:
    -        monkeypatch.setattr(sys, "argv", original_argv)
    +    main_module.main()

    If you also drop import sys (no longer referenced), the module gets slightly tidier.

    Also applies to: 134-137, 148-148, 166-169

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_main.py` at line 100, Remove the redundant save/restore
    of sys.argv and the try/finally around main_module.main(): drop the
    original_argv = sys.argv[:] variable and the finally block that calls
    monkeypatch.setattr(sys, "argv", original_argv), since monkeypatch.setattr(sys,
    "argv", ...) already auto-reverts at teardown; update both tests that wrap
    main_module.main() (and the other occurrences referenced) to call
    main_module.main() directly under the monkeypatch, and remove the now-unused
    import sys from the test module.
    
    app/core/main.py (1)

    10-10: Move ChatOpenAI import to langchain_openai package.

    ChatOpenAI is deprecated in langchain_community.chat_models and has moved to langchain_openai (already in your dependencies at v0.3.35). Importing from the old path triggers a LangChainDeprecationWarning. Split the imports so ChatOpenAI comes from langchain_openai and ChatLiteLLM remains in langchain_community.

    Proposed import change
    -from langchain_community.chat_models import ChatOpenAI, ChatLiteLLM
    +from langchain_openai import ChatOpenAI
    +from langchain_community.chat_models import ChatLiteLLM
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/main.py` at line 10, Update the imports so ChatOpenAI is imported
    from langchain_openai while ChatLiteLLM remains imported from
    langchain_community.chat_models: locate the import line that currently
    references both ChatOpenAI and ChatLiteLLM and split it into two imports, using
    ChatOpenAI from the langchain_openai package and ChatLiteLLM from
    langchain_community.chat_models to eliminate the deprecation warning.
    
    app/core/tests/test_sandbox_runner.py (1)

    6-11: Prefer pytest.raises over assert False (Ruff B011).

    assert False is stripped under python -O, which would silently turn this test into a no-op. Using pytest.raises also gives clearer failure output.

    ♻️ Suggested change
    +import pytest
    +
     def test_validate_code_rejects_dunder_attribute_access():
    -    try:
    -        validate_code("x = object().__class__")
    -        assert False, "Expected ValueError"
    -    except ValueError as exc:
    -        assert "Dunder attribute access" in str(exc)
    +    with pytest.raises(ValueError, match="Dunder attribute access"):
    +        validate_code("x = object().__class__")
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_sandbox_runner.py` around lines 6 - 11, The test
    test_validate_code_rejects_dunder_attribute_access uses a try/except with assert
    False which is removed under python -O; replace that pattern with
    pytest.raises(ValueError) as excinfo and assert the error message contains
    "Dunder attribute access" using str(excinfo.value). Update the test to import
    pytest if needed and call validate_code("x = object().__class__") inside the
    pytest.raises context to ensure the ValueError is properly asserted.
    
    🤖 Prompt for all review comments with AI agents
    Verify each finding against the current code and only fix it if needed.
    
    Inline comments:
    In `@app/core/tests/test_sandbox_runner.py`:
    - Around line 14-35: The test calls execute_user_code directly which mutates
    global process state (os.chdir, apply_resource_limits setting RLIMITs,
    disable_network monkey-patching, and tempfile.tempdir) and therefore must
    instead spawn the real runner as a separate process; update the test to invoke
    the sandbox entrypoint (the same invocation path used by
    tool_interpreter.execute_in_subprocess that runs python -I sandbox_runner.py
    ...) with the same payload, capture stdout/stderr and exit status, and assert
    the captured output/error/traceback from the subprocess result rather than
    calling execute_user_code in-process, ensuring no global state (disable_network,
    tempfile.tempdir, RLIMITs, CWD) is modified in the pytest worker.
    
    ---
    
    Nitpick comments:
    In `@app/core/agents/sparql/tool_sparql.py`:
    - Around line 237-261: The constructor currently sets self.openai_key =
    openai_key or os.getenv("OPENAI_API_KEY"); update usages so the OpenAIEmbeddings
    instantiation (and any other OpenAI client creation) only includes the api_key
    parameter when self.openai_key is truthy—i.e., pass api_key=self.openai_key
    conditionally instead of always passing api_key=None—so that the fallback to the
    environment variable remains explicit and avoids sending a None value; locate
    this in code paths that construct OpenAIEmbeddings or OpenAI clients referenced
    from this class (use the __init__ and self.openai_key symbols to find the
    relevant spots).
    
    In `@app/core/main.py`:
    - Line 10: Update the imports so ChatOpenAI is imported from langchain_openai
    while ChatLiteLLM remains imported from langchain_community.chat_models: locate
    the import line that currently references both ChatOpenAI and ChatLiteLLM and
    split it into two imports, using ChatOpenAI from the langchain_openai package
    and ChatLiteLLM from langchain_community.chat_models to eliminate the
    deprecation warning.
    
    In `@app/core/tests/test_main.py`:
    - Around line 52-56: Replace the try/except + assert False negative-path tests
    with pytest.raises: for each occurrence testing
    main_module._prepare_session_files that expects
    main_module.SessionFilePreparationError (including the other two blocks at lines
    76-80 and 92-96), use pytest.raises(main_module.SessionFilePreparationError,
    match="not a file") to both assert the exception is raised and verify the
    message; remove the manual try/except and assert False boilerplate and import
    pytest if not already imported.
    - Line 100: Remove the redundant save/restore of sys.argv and the try/finally
    around main_module.main(): drop the original_argv = sys.argv[:] variable and the
    finally block that calls monkeypatch.setattr(sys, "argv", original_argv), since
    monkeypatch.setattr(sys, "argv", ...) already auto-reverts at teardown; update
    both tests that wrap main_module.main() (and the other occurrences referenced)
    to call main_module.main() directly under the monkeypatch, and remove the
    now-unused import sys from the test module.
    
    In `@app/core/tests/test_sandbox_runner.py`:
    - Around line 6-11: The test test_validate_code_rejects_dunder_attribute_access
    uses a try/except with assert False which is removed under python -O; replace
    that pattern with pytest.raises(ValueError) as excinfo and assert the error
    message contains "Dunder attribute access" using str(excinfo.value). Update the
    test to import pytest if needed and call validate_code("x = object().__class__")
    inside the pytest.raises context to ensure the ValueError is properly asserted.
    
    In `@app/core/tests/test_sparql_tool.py`:
    - Around line 5-8: The helper _construct_tool contains a fallback to
    GraphSparqlQAChain.construct which is unnecessary on pydantic v2; simplify by
    removing the fallback and always using
    tool_sparql.GraphSparqlQAChain.model_construct(**kwargs) (update the
    _construct_tool function to call model_construct directly and remove the hasattr
    check and construct branch).
    - Around line 55-84: Add a new test that verifies the OPENAI_API_KEY env
    fallback by ensuring OpenAIEmbeddings receives the env value: instantiate the
    real GraphSparqlQAChain (or construct the same Tool instance but set its
    self.openai_key = None) instead of using _construct_tool so the __init__ path
    runs, monkeypatch os.environ["OPENAI_API_KEY"] to a test key, monkeypatch
    tool_sparql.OpenAIEmbeddings to capture the api_key, and call the search method
    (e.g., search_nodes) to assert captured["api_key"] equals the env value;
    reference GraphSparqlQAChain.__init__, _construct_tool, and
    tool_sparql.OpenAIEmbeddings to locate where to change the test.
    
    🪄 Autofix (Beta)

    Fix all unresolved CodeRabbit comments on this PR:

    • Push a commit to this branch (recommended)
    • Create a new PR with the fixes

    ℹ️ Review info
    ⚙️ Run configuration

    Configuration used: Repository UI

    Review profile: CHILL

    Plan: Pro

    Run ID: 33cc3106-5fc7-4586-a12d-c4a9c3a2f897

    📥 Commits

    Reviewing files that changed from the base of the PR and between 115678a and 94c7840.

    📒 Files selected for processing (14)
    • README.md
    • app/core/agents/interpreter/sandbox_runner.py
    • app/core/agents/interpreter/tool_interpreter.py
    • app/core/agents/sparql/agent.py
    • app/core/agents/sparql/tool_sparql.py
    • app/core/agents/supervisor/prompt.py
    • app/core/main.py
    • app/core/tests/test_interpreter_tool.py
    • app/core/tests/test_main.py
    • app/core/tests/test_sandbox_runner.py
    • app/core/tests/test_sparql_tool.py
    • streamlit_webapp/README.md
    • streamlit_webapp/streamlit_app.py
    • streamlit_webapp/test_streamlit_app.py
    ✅ Files skipped from review due to trivial changes (2)
    • README.md
    • streamlit_webapp/README.md
    🚧 Files skipped from review as they are similar to previous changes (1)
    • app/core/agents/supervisor/prompt.py

    Comment thread app/core/tests/test_sandbox_runner.py Outdated
    lfnothias and others added 2 commits April 25, 2026 18:57
    Stabilize Streamlit startup/import handling, add a render smoke test, and update the README with v1.1.0 launch guidance. Also harden follow-up test coverage by moving sandbox-runner execution checks into a subprocess, cleaning up pytest assertions, and refining SPARQL/OpenAI key handling to avoid env-only regressions and deprecation warnings.

    @coderabbitai coderabbitai Bot left a comment

    Copy link
    Copy Markdown

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 3

    🧹 Nitpick comments (7)
    app/core/tests/test_sparql_tool.py (1)

    42-47: Optional: relax dict-equality to a subset check.

    assert captured["tool_kwargs"] == {...} will break the moment create_agent forwards an extra kwarg (e.g., if llm_instance ever flows into tool_parameters). Consider asserting individual keys or using a subset comparison so this test stays focused on the openai_key plumbing it's actually verifying.

    ♻️ Suggested adjustment
    -    assert captured["tool_kwargs"] == {
    -        "graph": "graph",
    -        "llm": {sparql_agent.MODEL_CHOICE: "model"},
    -        "session_id": "session-123",
    -        "openai_key": "cli-key",
    -    }
    +    assert captured["tool_kwargs"]["graph"] == "graph"
    +    assert captured["tool_kwargs"]["llm"] == {sparql_agent.MODEL_CHOICE: "model"}
    +    assert captured["tool_kwargs"]["session_id"] == "session-123"
    +    assert captured["tool_kwargs"]["openai_key"] == "cli-key"
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_sparql_tool.py` around lines 42 - 47, The test currently
    asserts exact dict equality on captured["tool_kwargs"], which is brittle; change
    the assertion to only check the relevant keys (e.g., assert
    captured["tool_kwargs"]["openai_key"] == "cli-key" and assert
    captured["tool_kwargs"]["session_id"] == "session-123") or use a subset
    comparison (e.g., for each key in {"openai_key","session_id","graph"} assert key
    in captured["tool_kwargs"] and values match) so create_agent forwarding extra
    kwargs (like llm_instance) won't break the test.
    
    app/core/agents/sparql/tool_sparql.py (1)

    397-399: Optional: simplify by always forwarding api_key=self.openai_key.

    OpenAIEmbeddings in langchain_openai treats api_key=None as "use environment variable", so the conditional kwargs is functionally equivalent to passing the argument directly:

    ♻️ Optional simplification
    -        embedding_kwargs = {"api_key": self.openai_key} if self.openai_key else {}
    -        embeddings = OpenAIEmbeddings(**embedding_kwargs)
    +        embeddings = OpenAIEmbeddings(api_key=self.openai_key)

    Keep the current form if you prefer being explicit about the conditional logic.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/agents/sparql/tool_sparql.py` around lines 397 - 399, The code
    currently builds embedding_kwargs conditionally before instantiating
    OpenAIEmbeddings and loading FAISS; simplify by passing api_key=self.openai_key
    directly to OpenAIEmbeddings (replace the embedding_kwargs +
    OpenAIEmbeddings(**embedding_kwargs) pattern with
    OpenAIEmbeddings(api_key=self.openai_key)), leaving FAISS.load_local(db_path,
    embeddings, allow_dangerous_deserialization=True) unchanged; update references
    to embedding_kwargs/embeddings accordingly and remove the now-unneeded
    embedding_kwargs variable.
    
    README.md (1)

    109-116: Reconcile the two Streamlit launch recipes.

    Lines 109-116 advertise python -m streamlit run streamlit_webapp/streamlit_app.py from the repo root as the recommended/tested workflow, but lines 171-179 still document the legacy export PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit run ... recipe. Now that streamlit_app.py inserts the repo root into sys.path itself, the PYTHONPATH workaround is obsolete and risks confusing readers. Consider deleting the lines 175-178 block (or merging it into the v1.1.0-aligned one) so there is a single source of truth.

    Also applies to: 171-179

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@README.md` around lines 109 - 116, Remove the legacy PYTHONPATH launch recipe
    and keep a single, up-to-date Streamlit instruction: delete the export
    PYTHONPATH block (the legacy recipe that uses export
    PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit run ...) and ensure README only
    documents the repo-root invocation using python -m streamlit run
    streamlit_webapp/streamlit_app.py, since streamlit_app.py already injects the
    repo root into sys.path; alternatively, if you prefer consolidation, merge the
    legacy text into the v1.1.0-aligned paragraph and note that contributor/admin
    keys can still be set via environment variables.
    
    app/core/main.py (2)

    347-357: Stage/validate files before initializing LLMs to fail fast.

    llm_creation(...) constructs ChatOpenAI/ChatLiteLLM clients (and validates API key configuration) before _prepare_session_files runs. If the user passes a bad -f path, they currently pay the full LLM-init cost (and may get a confusing key-related error first if the env is misconfigured) before hitting the file-staging error. Reordering so that staging happens immediately after argument parsing — or at minimum before llm_creation — gives cleaner, faster CLI feedback.

    ♻️ Proposed reorder
    -    # Initialize language models
    -    models = llm_creation(api_key=args.api_key)
    -
    -    # Stage user-provided files into the session's input directory
         if args.file:
             try:
                 _prepare_session_files(session_id, args.file)
             except SessionFilePreparationError as exc:
                 logger.error(str(exc))
                 print(f"Error: {exc}")
                 return
    +
    +    # Initialize language models
    +    models = llm_creation(api_key=args.api_key)
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/main.py` around lines 347 - 357, The LLM clients are being
    initialized by llm_creation(...) before staging/validating user files, which can
    cause costly/ confusing failures; move the _prepare_session_files(session_id,
    args.file) call (and its SessionFilePreparationError handling) to run
    immediately after argument parsing and before calling llm_creation(...), so the
    CLI validates and fails fast on bad -f paths; ensure args.file is checked the
    same way and that models = llm_creation(api_key=args.api_key) only executes
    after successful _prepare_session_files.
    

    214-278: Solid staging logic; one optional hardening suggestion.

    The validation chain (exists → is_file → basename collision → already-staged → destination conflict → copy with error wrapping) is well-ordered and the test suite covers the main branches. Path.name plus resolve(strict=False) correctly neutralizes .. traversal attempts in user-supplied paths.

    Optional: shutil.copy2 follows symlinks on the source by default, which matches typical staging semantics, but if you want to refuse symlinked inputs (defense in depth, given the staged files are later read by the interpreter agent in trusted mode), you could add if src.is_symlink(): raise SessionFilePreparationError(...) before the copy. Not blocking.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/main.py` around lines 214 - 278, Add a defensive check to refuse
    symlinked inputs in _prepare_session_files: after resolving src and validating
    existence/is_file, if src.is_symlink() raise SessionFilePreparationError(src,
    f"Input file is a symlink and is not allowed: {src}") (place this before the
    shutil.copy2 call and before the src == dest.resolve(...) check so symlinks are
    rejected early); this prevents staging symlinked sources even though
    shutil.copy2 would follow them.
    
    app/core/tests/test_main.py (2)

    91-132: LGTM — verifies the new --api-key and session_id plumbing.

    Capturing both llm_api_key and workflow_api_key ensures the CLI flag flows into llm_creation and create_workflow (matching main.py:348 and main.py:362-365). The setup_logger_states == [True] assertion neatly enforces that logger reconfiguration happens after initialize_session_context, which is the whole point of the global rebinding at main.py:334-335.

    One small caveat to keep in mind: the create_user_session mock (line 119) omits the user_session_dir parameter that exists on the real signature — fine today, but if main() ever starts requesting user_session_dir=True, this lambda will raise TypeError. Consider **kwargs for resilience.

    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_main.py` around lines 91 - 132, The test's mock for
    create_user_session in test_main_passes_cli_api_key_and_reconfigures_logger does
    not match the real signature (it omits user_session_dir), risking TypeError if
    main() passes that kwarg; update the monkeypatched mock (the lambda passed to
    main_module.create_user_session) to accept the same optional parameters or use a
    catch-all like session_id=None, **kwargs and return "session-123" so it remains
    resilient if main() starts supplying user_session_dir or other kwargs.
    

    143-150: Optional: replace the (_ for _ in ()).throw(...) idiom with a small helper.

    The generator-throw trick works but is hard to read at a glance. A named helper makes intent obvious and lets future maintainers grep for it.

    ♻️ Suggested refactor
    +    def _raise_missing(session_id, file_paths):
    +        raise main_module.SessionFilePreparationError(
    +            Path(file_paths[0]), f"File not found: {file_paths[0]}"
    +        )
    +
    -    monkeypatch.setattr(
    -        main_module,
    -        "_prepare_session_files",
    -        lambda session_id, file_paths: (_ for _ in ()).throw(
    -            main_module.SessionFilePreparationError(Path(file_paths[0]), f"File not found: {file_paths[0]}")
    -        ),
    -    )
    -    monkeypatch.setattr(main_module, "create_workflow", lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("workflow should not run")))
    +    def _fail_create_workflow(*args, **kwargs):
    +        raise AssertionError("workflow should not run")
    +
    +    monkeypatch.setattr(main_module, "_prepare_session_files", _raise_missing)
    +    monkeypatch.setattr(main_module, "create_workflow", _fail_create_workflow)
    🤖 Prompt for AI Agents
    Verify each finding against the current code and only fix it if needed.
    
    In `@app/core/tests/test_main.py` around lines 143 - 150, The test uses the
    hard-to-read "(_ for _ in ()).throw(...)" trick when monkeypatching
    _prepare_session_files and create_workflow; add a small named helper (e.g.,
    raise_when_called or raising) that takes an exception and returns a callable
    which raises that exception when invoked, then replace the generator-throw
    usages with calls to that helper for main_module._prepare_session_files (raising
    SessionFilePreparationError(...)) and main_module.create_workflow (raising
    AssertionError("workflow should not run")) so intent is clear and easy to grep.
    
    🤖 Prompt for all review comments with AI agents
    Verify each finding against the current code and only fix it if needed.
    
    Inline comments:
    In `@app/core/tests/test_sandbox_runner.py`:
    - Around line 43-61: The subprocess.run invocation that calls the sandbox runner
    (the call with sys.executable and str(Path(sandbox_runner.__file__).resolve())
    using payload_path and result_path, executed in cwd=session_dir) needs an outer
    timeout to avoid hanging the pytest worker if the sandbox's internal timeout
    fails; add a timeout parameter (e.g., timeout=8 or a few seconds greater than
    the sandbox's timeout_seconds: 5) to the subprocess.run call and ensure the test
    treats subprocess.TimeoutExpired as a failure so the test fails fast rather than
    hanging.
    
    In `@README.md`:
    - Line 32: Remove the stray Git merge-conflict marker by deleting the literal
    line "=======" from README.md (the unresolved conflict delimiter), then verify
    surrounding content for any leftover conflict markers like "<<<<<<<" or
    ">>>>>>>" and ensure the README renders correctly; commit the cleaned file with
    a clear message indicating the merge conflict was resolved.
    
    In `@streamlit_webapp/README.md`:
    - Line 121: The README currently contains a literal placeholder "[Include
    support contact information here]"; remove that placeholder or replace it with
    the real support channel (e.g., a GitHub Issues URL, maintainer email, or
    mailing list) so the published README doesn't show an unfilled template; update
    the README line accordingly or delete it if no public contact is intended.
    
    ---
    
    Nitpick comments:
    In `@app/core/agents/sparql/tool_sparql.py`:
    - Around line 397-399: The code currently builds embedding_kwargs conditionally
    before instantiating OpenAIEmbeddings and loading FAISS; simplify by passing
    api_key=self.openai_key directly to OpenAIEmbeddings (replace the
    embedding_kwargs + OpenAIEmbeddings(**embedding_kwargs) pattern with
    OpenAIEmbeddings(api_key=self.openai_key)), leaving FAISS.load_local(db_path,
    embeddings, allow_dangerous_deserialization=True) unchanged; update references
    to embedding_kwargs/embeddings accordingly and remove the now-unneeded
    embedding_kwargs variable.
    
    In `@app/core/main.py`:
    - Around line 347-357: The LLM clients are being initialized by
    llm_creation(...) before staging/validating user files, which can cause costly/
    confusing failures; move the _prepare_session_files(session_id, args.file) call
    (and its SessionFilePreparationError handling) to run immediately after argument
    parsing and before calling llm_creation(...), so the CLI validates and fails
    fast on bad -f paths; ensure args.file is checked the same way and that models =
    llm_creation(api_key=args.api_key) only executes after successful
    _prepare_session_files.
    - Around line 214-278: Add a defensive check to refuse symlinked inputs in
    _prepare_session_files: after resolving src and validating existence/is_file, if
    src.is_symlink() raise SessionFilePreparationError(src, f"Input file is a
    symlink and is not allowed: {src}") (place this before the shutil.copy2 call and
    before the src == dest.resolve(...) check so symlinks are rejected early); this
    prevents staging symlinked sources even though shutil.copy2 would follow them.
    
    In `@app/core/tests/test_main.py`:
    - Around line 91-132: The test's mock for create_user_session in
    test_main_passes_cli_api_key_and_reconfigures_logger does not match the real
    signature (it omits user_session_dir), risking TypeError if main() passes that
    kwarg; update the monkeypatched mock (the lambda passed to
    main_module.create_user_session) to accept the same optional parameters or use a
    catch-all like session_id=None, **kwargs and return "session-123" so it remains
    resilient if main() starts supplying user_session_dir or other kwargs.
    - Around line 143-150: The test uses the hard-to-read "(_ for _ in
    ()).throw(...)" trick when monkeypatching _prepare_session_files and
    create_workflow; add a small named helper (e.g., raise_when_called or raising)
    that takes an exception and returns a callable which raises that exception when
    invoked, then replace the generator-throw usages with calls to that helper for
    main_module._prepare_session_files (raising SessionFilePreparationError(...))
    and main_module.create_workflow (raising AssertionError("workflow should not
    run")) so intent is clear and easy to grep.
    
    In `@app/core/tests/test_sparql_tool.py`:
    - Around line 42-47: The test currently asserts exact dict equality on
    captured["tool_kwargs"], which is brittle; change the assertion to only check
    the relevant keys (e.g., assert captured["tool_kwargs"]["openai_key"] ==
    "cli-key" and assert captured["tool_kwargs"]["session_id"] == "session-123") or
    use a subset comparison (e.g., for each key in
    {"openai_key","session_id","graph"} assert key in captured["tool_kwargs"] and
    values match) so create_agent forwarding extra kwargs (like llm_instance) won't
    break the test.
    
    In `@README.md`:
    - Around line 109-116: Remove the legacy PYTHONPATH launch recipe and keep a
    single, up-to-date Streamlit instruction: delete the export PYTHONPATH block
    (the legacy recipe that uses export PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit
    run ...) and ensure README only documents the repo-root invocation using python
    -m streamlit run streamlit_webapp/streamlit_app.py, since streamlit_app.py
    already injects the repo root into sys.path; alternatively, if you prefer
    consolidation, merge the legacy text into the v1.1.0-aligned paragraph and note
    that contributor/admin keys can still be set via environment variables.
    
    🪄 Autofix (Beta)

    Fix all unresolved CodeRabbit comments on this PR:

    • Push a commit to this branch (recommended)
    • Create a new PR with the fixes

    ℹ️ Review info
    ⚙️ Run configuration

    Configuration used: Repository UI

    Review profile: CHILL

    Plan: Pro

    Run ID: e7ef6a79-580d-4f04-93d1-fd8a636ff49f

    📥 Commits

    Reviewing files that changed from the base of the PR and between 94c7840 and 84892c8.

    📒 Files selected for processing (7)
    • README.md
    • app/core/agents/sparql/tool_sparql.py
    • app/core/main.py
    • app/core/tests/test_main.py
    • app/core/tests/test_sandbox_runner.py
    • app/core/tests/test_sparql_tool.py
    • streamlit_webapp/README.md

    Comment thread app/core/tests/test_sandbox_runner.py Outdated
    Comment thread README.md
    Comment thread streamlit_webapp/README.md
    @lfnothias lfnothias merged commit 7b11693 into main May 5, 2026
    1 check was pending
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Configuration changes Dependencies Review effort 4/5 size:XXL This PR changes 1000+ lines, ignoring generated files.

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants