MetaboT v1.1 0 Update dependencies, replace code interpreter, some other improvements by lfnothias · Pull Request #170 · HolobiomicsLab/MetaboT

lfnothias · 2026-04-24T13:25:44Z

User description

MetaboT v1.1 0 Update dependencies, replace code interpreter, someother improvements

PR Type

enhancement, bug_fix, dependencies, tests

Description

Replace codeinterpreterapi with PythonREPL for local execution
Update dependencies including langchain, openai, pydantic, and numpy
Fix deprecated pydantic imports and update requirements
Add big_benchmark.csv for automated evaluation

Diagram Walkthrough

flowchart LR
  A["Replace codeinterpreterapi"] -- "with" --> B["PythonREPL"]
  C["Update dependencies"] --> D["langchain, openai, pydantic, numpy"]
  E["Fix imports"] --> F["pydantic"]
  G["Add benchmark"] --> H["big_benchmark.csv"]

File Walkthrough

Relevant files

Enhancement

14 files

tool_chemicals.py Update `pydantic` import and type annotations	+2/-2
tool_smiles.py Update `pydantic` import and type annotations	+2/-2
tool_target.py Update `pydantic` import and type annotations	+2/-2
tool_taxon.py Update `pydantic` import and type annotations	+5/-7
tool_filesparser.py Add `FileAnalyzerInput` schema and update imports	+6/-2
agent.py Pass `llm_instance` to tool constructor	+1/-0
tool_interpreter.py Replace `codeinterpreterapi` with `PythonREPL`	+133/-103
tool_spectrum.py Update `pydantic` import and type annotations	+2/-2
tool_merge_result.py Update `pydantic` import and type annotations	+2/-2
tool_sparql.py Update `pydantic` import and type annotations	+5/-5
tool_wikidata_query.py Update `pydantic` import and type annotations	+5/-5
tool_validator.py Update `pydantic` import and type annotations	+2/-2
database_manager.py Update memory database manager to use `MemorySaver`	+40/-39
langraph_workflow.py `Improve error logging with exception info`	+1/-1

Cleanup

1 files

custom_sqlite_file.py `Remove unused custom SQLite checkpoint file`	+0/-97

Tests

2 files

evaluation.py `Add automated evaluation setup and checks`	+74/-59
installation_test.py `Update installation test setup`	+8/-6

Configuration changes

1 files

params.ini `Update default OpenAI models to gpt-5.4`	+4/-4

Dependencies

1 files

environment.yml `Update and add dependencies for new features`	+14/-15

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

Sync branches

…hain, openai, pydantic, and numpy

…onment.yml and requirements.txt

Dev madina

The interpreter tool previously relied on codeinterpreterapi (remote sandbox) which is no longer compatible with the current dependency stack. This replaces it with langchain_experimental PythonREPL (local execution) while preserving the same agent contract. Key changes: - tool_interpreter.py: rewrite using PythonREPL — injects file previews (columns, dtypes, 3-row sample) into the LLM prompt so generated code uses exact column names; truncates REPL stdout at 2000 chars to prevent context overflow; wires shared llm_instance from the agent instead of instantiating a separate model - agent.py: passes llm_instance to tool constructor via import_tools introspection - params.ini: bump default OpenAI models from gpt-4o / gpt-4o-mini to gpt-5.4 / gpt-5.4-mini - requirements.txt / environment.yml: pin langchain_experimental==0.3.4 and remove codeinterpreterapi / codeboxapi dependencies - Remove custom_sqlite_file.py (unused after langgraph checkpoint refactor) Security note: PythonREPL executes LLM-generated code in-process without sandboxing. A warning comment has been added; restrict to trusted deployment environments.

coderabbitai · 2026-04-24T13:25:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Migrate many tools from LangChain pydantic shims to native pydantic, add explicit type annotations, replace in-process code-interpreter with LLM-driven script generation + sandboxed subprocess execution, add trusted-mode and session/file security, remove a custom SQLite checkpointer, update memory fallback, add tests, and bump dependencies.

Changes

Cohort / File(s)	Summary
Pydantic migration (multiple tools) `app/core/agents/enpkg/tool_chemicals.py`, `app/core/agents/enpkg/tool_smiles.py`, `app/core/agents/enpkg/tool_target.py`, `app/core/agents/interpreter/tool_spectrum.py`, `app/core/agents/sparql/tool_merge_result.py`, `app/core/agents/validator/tool_validator.py`	Replace `langchain.pydantic_v1` imports with `pydantic` and add explicit `args_schema: type[BaseModel] = ...` annotations.
Typed attributes & constants `app/core/agents/enpkg/tool_taxon.py`, `app/core/agents/sparql/tool_sparql.py`, `app/core/agents/sparql/tool_wikidata_query.py`	Add/strengthen type annotations (`ClassVar[str]`, `name: str`, `description: str`, `requires_params: bool`) and annotate `args_schema` as `type[BaseModel]`.
File parser input schema `app/core/agents/entry/tool_filesparser.py`	Introduce `FileAnalyzerInput: BaseModel` and set `FileAnalyzer.args_schema: type[BaseModel] = FileAnalyzerInput`; switch to native pydantic imports.
Interpreter tool & sandbox (major) `app/core/agents/interpreter/tool_interpreter.py`, `app/core/agents/interpreter/sandbox_runner.py`, `app/core/agents/interpreter/prompt.py`, `app/core/tests/test_interpreter_tool.py`, `app/core/tests/test_sandbox_runner.py`	Replace prior code-interpreter integration with LLM-driven single fenced-Python script generation, robust path extraction/allowlisting, hardened sandboxed subprocess execution (`sandbox_runner.py`) with AST/import/resource/network/filesystem restrictions, snapshot/changed-file detection, truncated outputs, and extensive tests.
Tool/agent injection `app/core/agents/interpreter/agent.py`, `app/core/agents/sparql/agent.py`	Pass selected `llm_instance` (and optional `openai_key`) into `import_tools` via `tool_parameters` so tools can access the LLM/OpenAI key.
Memory / checkpoints `app/core/memory/custom_sqlite_file.py`, `app/core/memory/database_manager.py`	Remove custom `SqliteCheckpointerSaver`; `memory_database()` falls back to LangGraph’s `MemorySaver` when DB config is absent (conditional in-function import).
Security & trusted mode `app/core/security.py`, `app/core/utils.py`, `streamlit_webapp/streamlit_app.py`, `app/core/agents/supervisor/prompt.py`	Add env-driven trusted-mode and interpreter resource limits, path-safety helpers, post-process config to inject `Interpreter_agent` when trusted, and gate interpreter/Streamlit features accordingly.
CLI session staging & main changes `app/core/main.py`, `app/core/tests/test_main.py`	Add `-f/--file` session file staging with validation and collision handling, create `session_id` and initialize session context early, reinitialize logger per session, and thread `--api-key` into LLM creation/workflow.
Workflow & logging `app/core/workflow/langraph_workflow.py`	On stream errors include full traceback in logs (`exc_info=True`).
Tests, evaluation & installation `app/core/tests/test_security.py`, `app/core/tests/evaluation.py`, `app/tests/installation_test.py`	Add/modify tests for security, interpreter, sandbox, main staging; require `OPENAI_API_KEY` for evaluation, dataset bootstrapping, and ensure repo root on `sys.path` in installation test.
Dependencies & docs `requirements.txt`, `README.md`, `streamlit_webapp/README.md`	Remove `codeboxapi`/`codeinterpreterapi`, bump LangChain stack to 0.3.x, add `langchain_experimental`, upgrade other deps, and update README/Streamlit notes and startup guidance.
Streamlit UI tweaks `streamlit_webapp/streamlit_utils.py`, `streamlit_webapp/streamlit_app.py`, `streamlit_webapp/test_streamlit_app.py`	Adjust agent output key names (`Entry_Agent`, `Validator`), expose uploader only in trusted mode, write uploads via `resolve_session_path`, update imports and UI sizing, and add a smoke test for app import.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Agent as InterpreterTool
    participant DB as Database
    participant LLM as OpenAI Chat LLM
    participant Sandbox as SandboxRunner
    participant FS as FileSystem

    User->>Agent: invoke InterpreterTool(input, session_id)
    alt no explicit file paths
        Agent->>DB: query session files
        DB-->>Agent: file_paths (JSON)
    else explicit paths provided
        Agent-->>Agent: parse quoted/unquoted paths
    end
    Agent->>FS: read previews (truncated)
    FS-->>Agent: previews
    Agent->>LLM: request single fenced Python script
    LLM-->>Agent: returns ```python ... ```
    Agent->>Sandbox: run script via sandbox_runner (json config)
    Sandbox-->>Agent: result {success, stdout, stderr, error}
    Agent->>FS: snapshot session dir, detect new/changed files
    Agent->>DB: persist outputs and discovered file paths
    Agent-->>User: return truncated output + files_created

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Dev #161 — overlaps on LLM API-key handling and llm_creation/api_key propagation.

Suggested labels

Possible security concern, Review effort 3/5

Suggested reviewers

madina1203

Poem

🐰 I hop from shims to Pydantic bright,
LLMs write scripts by soft lamplight.
A sandbox keeps mischief out of sight,
Sessions safe, checkpoints take flight.
New tests and keys — the rabbit's delight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title references the main v1.1.0 release with multiple focus areas: dependency updates, code interpreter replacement, and other improvements, which aligns with the substantial changeset covering 40+ files.
Description check	✅ Passed	The description relates to the changeset, covering dependency updates, codeinterpreterapi replacement with PythonREPL, pydantic import fixes, and benchmark additions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-04-24T13:26:07Z

Review Summary by Qodo

Replace code interpreter with PythonREPL and fix deprecated pydantic imports

✨ Enhancement 🐞 Bug fix

Walkthroughs

Description

• Replace codeinterpreterapi with langchain_experimental PythonREPL for local code execution
  - Inject file previews into LLM prompts for accurate column name usage
  - Truncate REPL output at 2000 chars to prevent context overflow
  - Wire shared llm_instance from agent instead of separate model instantiation
• Fix deprecated pydantic imports across all tool files
  - Replace langchain.pydantic_v1 with native pydantic imports
  - Add type annotations to args_schema and class variables
• Update dependencies to latest versions
  - langchain, langchain_core, langchain_openai, langgraph, litellm, openai, pydantic
  - Add langchain_experimental==0.3.4 for PythonREPL support
• Update default OpenAI models from gpt-4o to gpt-5.4 in configuration
• Remove custom_sqlite_file.py and use langgraph's built-in MemorySaver
• Improve evaluation script with dataset creation and better error handling

Diagram

flowchart LR
  A["codeinterpreterapi<br/>remote sandbox"] -->|"replaced with"| B["PythonREPL<br/>local execution"]
  B --> C["File previews<br/>injected to LLM"]
  C --> D["Generated code<br/>with exact columns"]
  D --> E["Output truncated<br/>at 2000 chars"]
  F["Deprecated<br/>langchain.pydantic_v1"] -->|"replaced with"| G["Native pydantic<br/>imports"]
  H["gpt-4o"] -->|"updated to"| I["gpt-5.4"]
  J["custom_sqlite_file.py"] -->|"removed"| K["langgraph MemorySaver"]

File Changes

1. app/core/agents/interpreter/tool_interpreter.py Enhancement, bug fix +133/-103

Replace codeinterpreterapi with PythonREPL implementation

app/core/agents/interpreter/tool_interpreter.py

2. app/core/agents/interpreter/agent.py ✨ Enhancement +1/-0

Pass llm_instance to tool constructor

app/core/agents/interpreter/agent.py

3. app/core/agents/enpkg/tool_chemicals.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/enpkg/tool_chemicals.py

View more (16)

4. app/core/agents/enpkg/tool_smiles.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/enpkg/tool_smiles.py

5. app/core/agents/enpkg/tool_target.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/enpkg/tool_target.py

6. app/core/agents/enpkg/tool_taxon.py 🐞 Bug fix +5/-7

Fix pydantic imports and add ClassVar annotations

app/core/agents/enpkg/tool_taxon.py

7. app/core/agents/entry/tool_filesparser.py 🐞 Bug fix +6/-2

Fix pydantic imports and create FileAnalyzerInput schema

app/core/agents/entry/tool_filesparser.py

8. app/core/agents/interpreter/tool_spectrum.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/interpreter/tool_spectrum.py

9. app/core/agents/sparql/tool_merge_result.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/sparql/tool_merge_result.py

10. app/core/agents/sparql/tool_sparql.py 🐞 Bug fix +5/-5

Fix pydantic imports and add type annotations

app/core/agents/sparql/tool_sparql.py

11. app/core/agents/sparql/tool_wikidata_query.py 🐞 Bug fix +5/-5

Fix pydantic imports and add ClassVar annotations

app/core/agents/sparql/tool_wikidata_query.py

12. app/core/agents/validator/tool_validator.py 🐞 Bug fix +2/-2

Fix pydantic imports and add type annotations

app/core/agents/validator/tool_validator.py

13. app/core/memory/custom_sqlite_file.py Miscellaneous +0/-97

Remove unused custom SQLite checkpoint implementation

app/core/memory/custom_sqlite_file.py

14. app/core/memory/database_manager.py ✨ Enhancement +40/-39

Replace custom SQLite with langgraph MemorySaver

app/core/memory/database_manager.py

15. app/core/tests/evaluation.py ✨ Enhancement +74/-59

Improve evaluation script with dataset creation logic

app/core/tests/evaluation.py

16. app/core/workflow/langraph_workflow.py ✨ Enhancement +1/-1

Add exception info to error logging

app/core/workflow/langraph_workflow.py

17. app/tests/installation_test.py ✨ Enhancement +8/-6

Improve test script with proper path handling

app/tests/installation_test.py

18. app/config/params.ini ⚙️ Configuration changes +4/-4

Update default OpenAI models to gpt-5.4

app/config/params.ini

19. environment.yml Dependencies +14/-15

Update all dependencies to latest compatible versions

environment.yml

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

qodo-code-review · 2026-04-24T13:26:09Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~LLM code execution RCE~~ ✓ Resolved 🐞 Bug ⛨ Security

Description

Interpreter tool executes LLM-generated Python via PythonREPL in the server process without
sandboxing or execution limits, enabling remote code execution and denial-of-service. This tool is
reachable from end-user prompts through the Streamlit-driven LangGraph workflow.

Code

app/core/agents/interpreter/tool_interpreter.py[R147-148]
+        repl = PythonREPL()
+        execution_output = repl.run(code)

Evidence
The interpreter runs arbitrary Python returned by the model (PythonREPL.run). The app’s workflow
includes the Interpreter agent, and Streamlit streams user prompts into the LangGraph app, making
this path user-reachable in typical deployments.
app/core/agents/interpreter/tool_interpreter.py[3-8]
app/core/agents/interpreter/tool_interpreter.py[128-153]
app/config/langgraph.json[2-8]
streamlit_webapp/streamlit_app.py[468-472]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Interpreter` executes LLM-generated Python directly via `PythonREPL().run(code)` in the server process with no sandbox, timeout, resource limits, or I/O/network restrictions. In the Streamlit/LangGraph workflow this is reachable from user prompts, creating an RCE + DoS vector.
### Issue Context
The interpreter agent is part of the configured workflow and Streamlit streams user input into the graph. The tool currently asks the LLM to return Python code and then executes it verbatim.
### Fix Focus Areas
- app/core/agents/interpreter/tool_interpreter.py[84-173]
- app/config/langgraph.json[2-8]
- streamlit_webapp/streamlit_app.py[468-472]
### Implementation guidance
- Replace in-process `PythonREPL` execution with an isolated runner (separate OS process / container) that:
- enforces a hard timeout
- applies CPU/memory limits
- runs as a restricted user
- restricts filesystem access to the session directory
- disables/limits outbound network by default
- If isolation is not available, disable/remove this tool from public-facing deployments (e.g., do not register `Interpreter_agent` / do not expose it via the supervisor routing) and gate it behind an explicit trusted-mode flag.
- Add explicit allowlist for readable input files (e.g., only under the session dir) and validate paths before preview/execution.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~File path regex misses dots~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

Interpreter.extract_file_paths() cannot match filenames containing dots in the stem (e.g.,
"sample.v1.csv"), so user-provided file references may not be detected and the interpreter may
analyze the wrong fallback files. This is likely in Streamlit because uploaded files are saved using
the original filename.

Code

app/core/agents/interpreter/tool_interpreter.py[R175-177]

def extract_file_paths(self, text: str):
-        # Regex to find file paths or filenames with extensions, possibly surrounded by quotes
regex = r"['\"]?([a-zA-Z0-9_/\\-]+(?:\.csv|\.tsv|\.mgf|\.txt|\.xlsx|\.xls))['\"]?"
matches = re.finditer(regex, text)
-        file_paths = [match.group(1).replace("'", "").replace('"', '') for match in matches]
-        return file_paths

Evidence
The regex character class for the path stem excludes '.', so dotted filenames won’t match. Streamlit
persists uploads using file.name, which commonly contains dots, making the mismatch user-visible.
app/core/agents/interpreter/tool_interpreter.py[175-178]
streamlit_webapp/streamlit_app.py[390-398]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Interpreter.extract_file_paths()` uses a restrictive regex that does not allow `.` in the filename/path stem, so paths like `foo.v1.csv` are not captured.
### Issue Context
Streamlit saves uploaded files using the original `file.name`, which can include dots. If extraction fails, the interpreter falls back to DB-derived paths which may be unrelated to the user’s intended files.
### Fix Focus Areas
- app/core/agents/interpreter/tool_interpreter.py[175-178]
### Implementation guidance
- Update the regex to allow dots in the stem, e.g. include `.` in the character class: `[a-zA-Z0-9_./\\-]+` (and consider spaces if you expect them).
- Alternatively, avoid regex parsing for paths: pass structured file paths to the interpreter tool explicitly (preferred) and/or validate paths with `Path(fp).suffix` checks.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Missing OpenAI key check 🐞 Bug ☼ Reliability

Description

Interpreter allows openai_key to be None and only fails later when the first LLM call is attempted,
producing a late runtime failure instead of a clear configuration error. This is especially
problematic when create_all_agents falls back to OPENAI_API_KEY but it is unset.

Code

app/core/agents/interpreter/tool_interpreter.py[R51-88]

+    def __init__(self, openai_key: Optional[str] = None, session_id: Optional[str] = None, llm_instance: Optional[Any] = None):
super().__init__()
-        self.openai_key = openai_key
+        self.openai_key = openai_key or os.getenv("OPENAI_API_KEY")
self.session_id = session_id
+        self.llm_instance = llm_instance
def _run(
self,
input: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
-    ) -> None:
-
+    ) -> str:
logger.info(f"Input: {input}")
file_paths = self.extract_file_paths(input)
logger.info(f"File paths: {file_paths}")
session_dir = create_user_session(self.session_id, user_session_dir=True)
-
-        settings.OPENAI_API_KEY = self.openai_key
-        settings.MODEL = "gpt-3.5-turbo"
-
-        with CodeInterpreterSession() as session:
-
-            db_manager = tools_database()
-
-            user_request = (
-                "You are an interpreter helping to analyze different questions, files and outputs generated from a series of LLMs."
-                f"The details of the current request: {input}"
-                "Please interpret the current request to generate a meaningful answer."
-                "Here's some instructions that you have to follow for acheiving the task:"
-                "1. For any file provided, analyse if and provide clear and brief information about it unless something else is asked."
-                "2. Only if a specific visualization (e.g., bar chart, diagram) is requested in the question, use the provided information to generate a .json file containing the JSON code for a Plotly graph. This file should be named identically to the analyzed file."
-                "3. After you finish your tasks, your answer should contain both the interpretation asked and the full visualization file name if visualization was requested."
-            )
-
-            files = []
-
-            for file in file_paths:
-                files.append(File.from_path(file))
-                logger.info(f"File added to interpreter Agent: {file}")
-
-            if not files:
-                
-                logger.info("No files provided from the Interpreter Agent. Manually scrapping the database")
-                output_merged = db_manager.get("tool_merge_result")  # Get all file paths from the database associated with the tool_interpreter
-                if output_merged is not None:
-                    output_merged_json = json.loads(output_merged)
-                    merged_filepaths = output_merged_json['output']['paths']
-
-                    for merged_filepath in merged_filepaths:
-                        logger.info(f"File added to interpreter Agent from the Output Merged tool: {merged_filepath}")
-                        files.append(File.from_path(merged_filepath))
-
-            if not files:
-
-                sparql_output = db_manager.get("tool_sparql")  # Get all file paths from the database associated with the tool_interpreter
-                if sparql_output is not None:
-                    sparql_output_json = json.loads(sparql_output)
-                    sparql_filepaths = sparql_output_json['output']['paths']
-
-                    for sparql_filepath in sparql_filepaths:
-                        logger.info(f"File added to interpreter Agent from the SPARQL tool: {sparql_filepath}")
-                        files.append(File.from_path(sparql_filepath))
-                
-            # generate the response
-            logger.info(f"Files submitted: {files}")
-            response = session.generate_response(user_request, files=files)
-            logger.info(f"Interpreter Agent Response: {response}")
-
-            filepaths = []
-
-            # Handling and saving output files
-            if response.files:
-                for file in response.files:
-                    logger.info(f"File: {file.name}")
-
-                    generated_file_path = session_dir / file.name
-                    with open(generated_file_path, 'wb') as f:
-                        f.write(file.content)
-                    filepaths.append(str(generated_file_path))
-
-                    logger.info(f"File saved: {file.name}")
-            else:
-                logger.info("No files generated by Interpreter Tool.")
-
-            output_data = {
-                "output": {
-                    "paths": filepaths
-                }
-            }
-
-            
+        db_manager = tools_database()
+
+        if not file_paths:
+            output_merged = db_manager.get("tool_merge_result")
+            if output_merged is not None:
+                merged_json = json.loads(output_merged)
+                file_paths = merged_json.get("output", {}).get("paths", [])
+                logger.info(f"File paths from merge result: {file_paths}")
+
+        if not file_paths:
+            sparql_output = db_manager.get("tool_sparql")
+            if sparql_output is not None:
+                sparql_json = json.loads(sparql_output)
+                file_paths = sparql_json.get("output", {}).get("paths", [])
+                logger.info(f"File paths from SPARQL tool: {file_paths}")
+
+        llm = self.llm_instance or ChatOpenAI(
+            api_key=self.openai_key,
+            model="gpt-5.4",
+            temperature=0,
+        )

Evidence
Interpreter sets self.openai_key from the parameter or env var, but does not validate it before
creating/using an LLM. Agent factory similarly takes OPENAI_API_KEY from env without enforcing
presence, so the interpreter can reach llm.invoke(...) with missing credentials.
app/core/agents/interpreter/tool_interpreter.py[51-88]
app/core/agents/interpreter/tool_interpreter.py[128-132]
app/core/agents/agents_factory.py[33-38]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Interpreter` can be constructed with `openai_key=None` (and env var missing), which leads to a late failure when `llm.invoke()` is called.
### Issue Context
`create_all_agents` may pass `openai_key=None` if `OPENAI_API_KEY` is not set. The interpreter then builds/uses an LLM without explicit validation.
### Fix Focus Areas
- app/core/agents/interpreter/tool_interpreter.py[51-88]
- app/core/agents/agents_factory.py[33-38]
### Implementation guidance
- In `Interpreter.__init__`, if `llm_instance is None` and `self.openai_key` is falsy, raise a clear `ValueError` (e.g., "OPENAI_API_KEY is required for Interpreter tool") before `_run`.
- Optionally validate earlier in agent/workflow initialization to fail fast for the whole app when required keys are missing.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Sys.path fix too late 🐞 Bug ≡ Correctness

Description

installation_test inserts the repo root into sys.path after importing app.core.main, so the sys.path
adjustment cannot help when running the script from a working directory that doesn’t already include
the repo root. This makes the intended “run from anywhere” behavior unreliable.

Code

app/tests/installation_test.py[R1-7]

import sys
+from pathlib import Path
+from app.core.main import main
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))

Evidence

Python executes top-level imports before executing the later sys.path mutation, so `from
app.core.main import main` can fail before the repo root is inserted.

app/tests/installation_test.py[1-8]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`installation_test.py` modifies `sys.path` after importing `app.core.main`, making the sys.path workaround ineffective if the import would otherwise fail.
### Issue Context
This script appears intended to be runnable as a quick installation check.
### Fix Focus Areas
- app/tests/installation_test.py[1-8]
### Implementation guidance
- Move the repo-root sys.path insertion above `from app.core.main import main`, or defer the import into the `if __name__ == "__main__":` block after sys.path is adjusted.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

github-actions · 2026-04-24T13:27:15Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🔒 No security concerns identified
⚡ Recommended focus areas for review Docstring Format Several key tool classes (for example, ChemicalResolver) still use a basic multi‐line description that does not fully comply with the Google docstring style guidelines. Consider updating them to use the recommended format, with parameter types in parentheses and clear sections such as Args, Returns, and Raises. class ChemicalResolver(BaseTool): name: str = "CHEMICAL_RESOLVER" description: str = """ Resolves chemicals to NPC Class URIs using a specific CSV database. Args: chemical_name str: the chemical name string. Returns: Dict[str, str]: a dictionary that contains the output chemical name and corresponding NPC Class URI. """ Duplicate Import The typing module is imported twice for the same symbols (e.g. Optional) in the file. Removing the redundant import will improve code clarity. from typing import List, Optional from typing import Any, Dict, Optional from urllib.error import HTTPError, URLError

github-actions · 2026-04-24T13:27:33Z

PR Code Suggestions ✨

No code suggestions found for the PR.

lfnothias · 2026-04-24T13:27:58Z

@CodeRabbit

coderabbitai · 2026-04-24T13:28:14Z

Tip

For best results, initiate chat on the files or code changes.

@lfnothias, it looks like your message may have been sent incomplete! How can I help you with this PR? For example, I can:

🔍 Review the changes — trigger a full review of the PR
📝 Answer questions — explain specific changes or design decisions
🛠️ Suggest improvements — analyze code quality, security, or performance
📋 Summarize the PR — provide an overview of what changed and why

Just let me know what you need!

lfnothias · 2026-04-24T13:29:20Z

@CodeRabbit review

coderabbitai · 2026-04-24T13:29:25Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/core/tests/evaluation.py (1)
141-143: ⚠️ Potential issue | 🟡 Minor

Update project_metadata.model to reflect the actual default model.

The hardcoded "gpt-4o" in the project metadata is outdated. The workflow uses llm_o which is now configured as gpt-5.4 in params.ini. Either read the actual model name from the models dictionary (e.g., extract from models["llm_o"]) or update the literal to "gpt-5.4" to ensure LangSmith runs are tagged with the correct model.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/evaluation.py` around lines 141 - 143, The project_metadata
currently hardcodes "gpt-4o"; update it to reflect the actual default model by
using the configured models map or the new literal. Replace the literal with the
value from the models dictionary (e.g., project_metadata={"model":
models["llm_o"]} or models.get("llm_o", "gpt-5.4")) or directly set it to
"gpt-5.4"; ensure the symbol models and key "llm_o" are in scope where
project_metadata is constructed so LangSmith runs are tagged with the correct
model.

🧹 Nitpick comments (2)

app/core/memory/database_manager.py (1)

29-36: Minor: redundant if class_path: guard.

class_path is guaranteed truthy here because the outer if already requires os.getenv("MEMORY_DATABASE_MANAGER_CLASS") (and the same applies in tools_database). You can drop the inner branch, which also removes the implicit None return if it ever fell through. Same nit applies to lines 13-18.

♻️ Suggested simplification

-    if os.getenv('DATABASE_URL') and os.getenv("MEMORY_DATABASE_MANAGER_CLASS"):
-        class_path = os.getenv('MEMORY_DATABASE_MANAGER_CLASS')
-        if class_path:
-            # Dynamically import the module and class based on the environment variable
-            module_name, class_name = class_path.rsplit('.', 1)
-            module = importlib.import_module(module_name)
-            manager_class = getattr(module, class_name)
-            return manager_class()
-    else:
-        # Use langgraph's built-in MemorySaver which implements the current checkpoint API
-        from langgraph.checkpoint.memory import MemorySaver
-        return MemorySaver()
+    class_path = os.getenv("MEMORY_DATABASE_MANAGER_CLASS")
+    if os.getenv("DATABASE_URL") and class_path:
+        module_name, class_name = class_path.rsplit(".", 1)
+        module = importlib.import_module(module_name)
+        manager_class = getattr(module, class_name)
+        return manager_class()
+
+    # Use langgraph's built-in MemorySaver which implements the current checkpoint API
+    from langgraph.checkpoint.memory import MemorySaver
+    return MemorySaver()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/core/memory/database_manager.py` around lines 29 - 36, Remove the
redundant inner guard around class_path in the environment-driven dynamic import
logic: the outer condition already checks
os.getenv("MEMORY_DATABASE_MANAGER_CLASS"), so delete the inner "if class_path:"
branch and directly perform module_name, class_name = class_path.rsplit('.', 1),
module = importlib.import_module(module_name), manager_class = getattr(module,
class_name) and return manager_class(); apply the same simplification to the
analogous tools_database block (the earlier block that checks the same env var)
to avoid the unnecessary implicit None return path.

app/core/agents/enpkg/tool_smiles.py (1)

4-39: LGTM on the pydantic migration.

BaseModel/Field directly from pydantic and the typed args_schema are correct for the 0.3.x stack.

Nit (pre-existing): from langchain.tools import BaseTool is imported twice (lines 3 and 14). Can be cleaned up while touching this file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/agents/enpkg/tool_smiles.py` around lines 4 - 39, The file imports
BaseTool twice; remove the redundant import and keep a single import of BaseTool
(referenced by the class SMILESResolver) so there is no duplicate import
statement; ensure the remaining imports still include BaseModel and Field from
pydantic, Optional if used elsewhere, CallbackManagerForToolRun if needed, and
setup_logger, then run linters/import-sort to confirm clean import ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/core/agents/interpreter/tool_interpreter.py`:
- Around line 144-160: The current diff only records newly created files using
existing_files and new_files, so files that were overwritten or modified in
place (saved to session_dir with the same name) are missed; to fix, before
calling PythonREPL.run(code) capture a mapping of existing files in session_dir
to their diagnostics (e.g., stat().st_mtime and st_size or a content hash) using
session_dir.iterdir() (existing_files -> existing_meta), then after execution
enumerate session_dir again and compare each path's new mtime/size/hash to the
saved meta to build a set of changed_or_new_files, replace the new_files
calculation with that set, and populate filepaths from changed_or_new_files so
modified files (not just newly created ones) are included in the DB entry and
logs (referencing session_dir, existing_files/existing_meta, new_files,
PythonREPL.run, and filepaths).
- Around line 147-153: The PythonREPL.run call is missing a timeout, which can
cause the agent to hang; update the call to use the timeout parameter (e.g.,
repl.run(code, timeout=30) or a configurable value) and wrap the call in a
try/except to catch the timeout exception (TimeoutError or the specific
exception thrown by langchain_experimental) and any other execution errors, set
execution_output to a clear timeout/error message, and log the incident via
logger.info/logger.error so the agent handles timeouts gracefully (refer to
PythonREPL.run, execution_output, and logger).

In `@app/core/memory/database_manager.py`:
- Around line 22-40: The current memory_database() falls back to
langgraph.checkpoint.memory.MemorySaver (ephemeral) which makes state
non-persistent for Streamlit and the compiled workflow; decide whether
persistence is intended and either (A) change the default fallback to a
persistent saver such as langgraph.checkpoint.sqlite.SqliteSaver (or SqliteSaver
from langgraph-checkpoint-sqlite) so memory_database() returns a persistent
checkpointer when MEMORY_DATABASE_MANAGER_CLASS is not set, or (B) make
persistence explicit via a new env var (e.g., MEMORY_PERSISTENT=true) and, when
set, instantiate the persistent class (SqliteSaver) otherwise keep MemorySaver;
update memory_database(), reference to MEMORY_DATABASE_MANAGER_CLASS,
MemorySaver and tools_database for consistency with tools default
(SqliteToolsDatabaseManager) so behavior is consistent across environments.

In `@app/core/tests/evaluation.py`:
- Around line 118-128: In evaluate_result, avoid using `_input.get("question")
or _input["messages"][0]["content"]` because an empty string will incorrectly
fall through and can cause KeyError/IndexError; instead explicitly check for
None/absence of the "question" key (e.g., `question is not None`) and only then
use it, otherwise validate that `_input` contains a non-empty "messages" list
and that `messages[0]["content"]` exists; if validation fails raise a clear
ValueError describing the missing field so callers of evaluate_result get a
meaningful error before constructing the HumanMessage and calling app.invoke.
- Around line 44-77: Narrow the broad except around client.read_dataset to only
catch the LangSmith "not found" error (e.g., LangSmithNotFoundError) instead of
Exception, preserve original failure causes when re-raising, and change the
FileNotFoundError raise to use "raise ... from None" to avoid chained context;
after parsing rows with json.loads into inputs/outputs (used by
client.create_examples and dataset.id), guard against calling
client.create_examples if inputs is empty (log and abort dataset creation) and
fix the double space in the dataset description ("MetaboT Benchmark") so the
created dataset uses the corrected description.

In `@app/tests/installation_test.py`:
- Around line 2-7: The import of main occurs before the REPO_ROOT sys.path
bootstrap, causing ModuleNotFoundError when running the test directly; move the
sys.path mutation (REPO_ROOT calculation and insertion into sys.path) above the
import and then import main (from app.core.main import main) after ensuring
REPO_ROOT is inserted, so adjust the file to compute REPO_ROOT, insert it into
sys.path if missing, and only then import main.

---

Outside diff comments:
In `@app/core/tests/evaluation.py`:
- Around line 141-143: The project_metadata currently hardcodes "gpt-4o"; update
it to reflect the actual default model by using the configured models map or the
new literal. Replace the literal with the value from the models dictionary
(e.g., project_metadata={"model": models["llm_o"]} or models.get("llm_o",
"gpt-5.4")) or directly set it to "gpt-5.4"; ensure the symbol models and key
"llm_o" are in scope where project_metadata is constructed so LangSmith runs are
tagged with the correct model.

---

Nitpick comments:
In `@app/core/agents/enpkg/tool_smiles.py`:
- Around line 4-39: The file imports BaseTool twice; remove the redundant import
and keep a single import of BaseTool (referenced by the class SMILESResolver) so
there is no duplicate import statement; ensure the remaining imports still
include BaseModel and Field from pydantic, Optional if used elsewhere,
CallbackManagerForToolRun if needed, and setup_logger, then run
linters/import-sort to confirm clean import ordering.

In `@app/core/memory/database_manager.py`:
- Around line 29-36: Remove the redundant inner guard around class_path in the
environment-driven dynamic import logic: the outer condition already checks
os.getenv("MEMORY_DATABASE_MANAGER_CLASS"), so delete the inner "if class_path:"
branch and directly perform module_name, class_name = class_path.rsplit('.', 1),
module = importlib.import_module(module_name), manager_class = getattr(module,
class_name) and return manager_class(); apply the same simplification to the
analogous tools_database block (the earlier block that checks the same env var)
to avoid the unnecessary implicit None return path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: da086e6d-0f23-4e28-838b-a779bdfe6d75

📥 Commits

Reviewing files that changed from the base of the PR and between de6d438 and 8adb7a3.

⛔ Files ignored due to path filters (4)

app/config/params.ini is excluded by !**/*.ini
app/data/big_benchmark.csv is excluded by !**/*.csv, !**/*.csv
app/data/evaluation_dataset.csv is excluded by !**/*.csv, !**/*.csv
environment.yml is excluded by !**/*.yml

📒 Files selected for processing (18)

app/core/agents/enpkg/tool_chemicals.py
app/core/agents/enpkg/tool_smiles.py
app/core/agents/enpkg/tool_target.py
app/core/agents/enpkg/tool_taxon.py
app/core/agents/entry/tool_filesparser.py
app/core/agents/interpreter/agent.py
app/core/agents/interpreter/tool_interpreter.py
app/core/agents/interpreter/tool_spectrum.py
app/core/agents/sparql/tool_merge_result.py
app/core/agents/sparql/tool_sparql.py
app/core/agents/sparql/tool_wikidata_query.py
app/core/agents/validator/tool_validator.py
app/core/memory/custom_sqlite_file.py
app/core/memory/database_manager.py
app/core/tests/evaluation.py
app/core/workflow/langraph_workflow.py
app/tests/installation_test.py
requirements.txt

💤 Files with no reviewable changes (1)

app/core/memory/custom_sqlite_file.py

coderabbitai · 2026-04-24T13:35:08Z

+def memory_database():
+    """
+    Get the database manager based on the environment variables
+
+    :return: The database manager
+    """
+
+    if os.getenv('DATABASE_URL') and os.getenv("MEMORY_DATABASE_MANAGER_CLASS"):
+        class_path = os.getenv('MEMORY_DATABASE_MANAGER_CLASS')
+        if class_path:
+            # Dynamically import the module and class based on the environment variable
+            module_name, class_name = class_path.rsplit('.', 1)
+            module = importlib.import_module(module_name)
+            manager_class = getattr(module, class_name)
+            return manager_class()
+    else:
+        # Use langgraph's built-in MemorySaver which implements the current checkpoint API
+        from langgraph.checkpoint.memory import MemorySaver
+        return MemorySaver()


⚠️ Potential issue | 🟠 Major

Default memory is now ephemeral — confirm this is intentional for Streamlit/production.

Falling back to MemorySaver means LangGraph checkpoints live only in the current Python process. For the two call sites of memory_database():

streamlit_webapp/streamlit_app.py stores the saver in st.session_state.memory. Any server restart (or Streamlit worker recycle) will drop all conversation state silently — previously the SQLite checkpointer would persist across restarts.

app/core/workflow/langraph_workflow.py uses it as the compiled workflow's checkpointer, so thread state won't survive restarts either.

This also creates an inconsistency with tools_database() above, which still defaults to the persistent SqliteToolsDatabaseManager. If ephemeral memory is desired for dev, great; if not, consider defaulting to a persistent LangGraph checkpointer (e.g. SqliteSaver from langgraph-checkpoint-sqlite) to preserve the previous persistence guarantee.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/core/memory/database_manager.py` around lines 22 - 40, The current memory_database() falls back to langgraph.checkpoint.memory.MemorySaver (ephemeral) which makes state non-persistent for Streamlit and the compiled workflow; decide whether persistence is intended and either (A) change the default fallback to a persistent saver such as langgraph.checkpoint.sqlite.SqliteSaver (or SqliteSaver from langgraph-checkpoint-sqlite) so memory_database() returns a persistent checkpointer when MEMORY_DATABASE_MANAGER_CLASS is not set, or (B) make persistence explicit via a new env var (e.g., MEMORY_PERSISTENT=true) and, when set, instantiate the persistent class (SqliteSaver) otherwise keep MemorySaver; update memory_database(), reference to MEMORY_DATABASE_MANAGER_CLASS, MemorySaver and tools_database for consistency with tools default (SqliteToolsDatabaseManager) so behavior is consistent across environments.

coderabbitai · 2026-04-24T13:35:08Z

+try:
+    client.read_dataset(dataset_name=dataset_name)
+    logger.info(f"Dataset '{dataset_name}' found in LangSmith workspace.")
+except Exception:
+    logger.info(f"Dataset '{dataset_name}' not found. Attempting to create it from local file...")
+
+
+    if not os.path.exists(local_data_path):
+        raise FileNotFoundError(
+            f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script.")
+
+
+    # Load local data and create the dataset in the user's LangSmith account
+    df = pd.read_csv(local_data_path)
+    dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT  Benchmark")
+
+    inputs = []
+    outputs = []
+
+    for _, row in df.iterrows():
+        try:
+            # The CSV stores the JSON arrays as strings, so we need to parse them back into Python objects
+            parsed_messages = json.loads(row["messages"])
+            parsed_end_state = json.loads(row["__end__"])

+            inputs.append({"messages": parsed_messages})
+            outputs.append({"__end__": parsed_end_state})
+
+        except json.JSONDecodeError as e:
+            logger.warning(f"Skipping a row due to JSON parsing error: {e}")
+            continue
+
+    client.create_examples(inputs=inputs, outputs=outputs, dataset_id=dataset.id)
+    logger.info(f"Successfully created dataset '{dataset_name}' in LangSmith.")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

langsmith Python client exception class raised when read_dataset does not find a dataset

💡 Result:

The exception class raised by the LangSmith Python client when read_dataset does not find a dataset is LangSmithNotFoundError.

Citations:

1: Issue: langsmith client is failing because of internal JSONDecodeError langchain-ai/langsmith-sdk#1501

🏁 Script executed:

find . -name "evaluation.py" -path "*/core/tests/*" | head -5

Repository: HolobiomicsLab/MetaboT

Length of output: 97

🏁 Script executed:

wc -l $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1)

Repository: HolobiomicsLab/MetaboT

Length of output: 101

🏁 Script executed:

sed -n '40,80p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

Repository: HolobiomicsLab/MetaboT

Length of output: 1972

🏁 Script executed:

sed -n '115,130p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

Repository: HolobiomicsLab/MetaboT

Length of output: 632

🏁 Script executed:

sed -n '135,150p' $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1) | cat -n

Repository: HolobiomicsLab/MetaboT

Length of output: 508

🏁 Script executed:

grep -n "from langsmith" $(find . -name "evaluation.py" -path "*/core/tests/*" | head -1)

Repository: HolobiomicsLab/MetaboT

Length of output: 165

🏁 Script executed:

python3 << 'EOF' try: from langsmith.utils import LangSmithNotFoundError print("✓ LangSmithNotFoundError import from langsmith.utils: SUCCESS") except ImportError as e: print(f"✗ LangSmithNotFoundError import from langsmith.utils: FAILED - {e}") try: from langsmith import LangSmithNotFoundError print("✓ LangSmithNotFoundError import from langsmith: SUCCESS") except ImportError as e: print(f"✗ LangSmithNotFoundError import from langsmith: FAILED - {e}") EOF

Repository: HolobiomicsLab/MetaboT

Length of output: 242

🏁 Script executed:

rg "LangSmithNotFoundError|langsmith.*Error" --type py | head -20

Repository: HolobiomicsLab/MetaboT

Length of output: 48

🏁 Script executed:

rg "\.get$.*$\s+or\s+" --type py -A 2 | head -40

Repository: HolobiomicsLab/MetaboT

Length of output: 410

🏁 Script executed:

rg "read_dataset|create_dataset" --type py -B 2 -A 5 | head -50

Repository: HolobiomicsLab/MetaboT

Length of output: 1125

Dataset-bootstrap block: narrow the exception, preserve the cause, and guard against an all-bad CSV.

Several robustness concerns in this newly added bootstrap path:

Line 47 — blind except Exception. Any error (network, auth, permissions) is silently treated as "dataset missing" and triggers a create attempt. Narrow this to the LangSmith "not found" case (e.g., LangSmithNotFoundError) so real failures surface. (Ruff BLE001)

Lines 52–53 — re-raise inside except. Use raise ... from None to avoid the misleading "During handling of the above exception…" chain. (Ruff B904)

Lines 60–76 — empty-examples edge case. If every row fails json.loads, inputs/outputs stay empty and create_examples is still invoked, leaving an empty dataset created in the user's workspace. Skip the call when inputs is empty.

Line 58 — minor: double space in "MetaboT Benchmark".

🛠️ Proposed fix

+from langsmith import LangSmithNotFoundError + try: client.read_dataset(dataset_name=dataset_name) logger.info(f"Dataset '{dataset_name}' found in LangSmith workspace.") -except Exception: +except LangSmithNotFoundError: logger.info(f"Dataset '{dataset_name}' not found. Attempting to create it from local file...") if not os.path.exists(local_data_path): raise FileNotFoundError( - f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script.") + f"Could not find '{local_data_path}'. Ensure the local dataset file is shared alongside this script." + ) from None # Load local data and create the dataset in the user's LangSmith account df = pd.read_csv(local_data_path) - dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT Benchmark") + dataset = client.create_dataset(dataset_name=dataset_name, description="MetaboT Benchmark") inputs = [] outputs = [] for _, row in df.iterrows(): try: # The CSV stores the JSON arrays as strings, so we need to parse them back into Python objects parsed_messages = json.loads(row["messages"]) parsed_end_state = json.loads(row["__end__"]) inputs.append({"messages": parsed_messages}) outputs.append({"__end__": parsed_end_state}) except json.JSONDecodeError as e: logger.warning(f"Skipping a row due to JSON parsing error: {e}") continue + if not inputs: + raise ValueError( + f"No valid examples parsed from '{local_data_path}'; aborting dataset creation." + ) + client.create_examples(inputs=inputs, outputs=outputs, dataset_id=dataset.id) logger.info(f"Successfully created dataset '{dataset_name}' in LangSmith.")

🧰 Tools

🪛 Ruff (0.15.11)

[warning] 47-47: Do not catch blind exception: Exception

(BLE001)

[warning] 52-53: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/core/tests/evaluation.py` around lines 44 - 77, Narrow the broad except around client.read_dataset to only catch the LangSmith "not found" error (e.g., LangSmithNotFoundError) instead of Exception, preserve original failure causes when re-raising, and change the FileNotFoundError raise to use "raise ... from None" to avoid chained context; after parsing rows with json.loads into inputs/outputs (used by client.create_examples and dataset.id), guard against calling client.create_examples if inputs is empty (log and abort dataset creation) and fix the double space in the dataset description ("MetaboT Benchmark") so the created dataset uses the corrected description.

coderabbitai · 2026-04-24T13:35:08Z

 def evaluate_result(_input, thread_id: int = 1):
-    """
-    Evaluate the result based on input.
-
-    Args:
-        _input (dict): Input containing messages to process
-
-    Returns:
-        dict: The evaluation result
-    """
-
-    # Prepare the message
+    """Evaluate the result based on input."""
+    # Note: Adjust the 'question' key below if your dataset uses a different input key
+    input_text = _input.get("question") or _input["messages"][0]["content"]
+
    message = {
-        "messages": [
-            HumanMessage(content=_input["messages"][0]["content"])
-        ]
+        "messages": [HumanMessage(content=input_text)]
    }
-

-    response = app.invoke(message,
-                          {"configurable": {"thread_id": thread_id}},)
-
+    response = app.invoke(message, {"configurable": {"thread_id": thread_id}})
    return {"output": response}


⚠️ Potential issue | 🟡 Minor

Falsy question falls through to messages[0]["content"].

_input.get("question") or _input["messages"][0]["content"] also triggers the fallback when question is present but an empty string, which will then KeyError if messages isn't populated. Use an explicit None check, and guard the messages indexing so a malformed example raises a clear error rather than a cryptic KeyError/IndexError.

🛠️ Proposed fix

-def evaluate_result(_input, thread_id: int = 1): - """Evaluate the result based on input.""" - # Note: Adjust the 'question' key below if your dataset uses a different input key - input_text = _input.get("question") or _input["messages"][0]["content"] - - message = { - "messages": [HumanMessage(content=input_text)] - } +def evaluate_result(_input, thread_id: int = 1): + """Evaluate the result based on input.""" + question = _input.get("question") + if question is not None: + input_text = question + else: + messages = _input.get("messages") or [] + if not messages or "content" not in messages[0]: + raise ValueError(f"Example is missing both 'question' and a usable 'messages' entry: {_input!r}") + input_text = messages[0]["content"] + + message = {"messages": [HumanMessage(content=input_text)]}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/core/tests/evaluation.py` around lines 118 - 128, In evaluate_result, avoid using `_input.get("question") or _input["messages"][0]["content"]` because an empty string will incorrectly fall through and can cause KeyError/IndexError; instead explicitly check for None/absence of the "question" key (e.g., `question is not None`) and only then use it, otherwise validate that `_input` contains a non-empty "messages" list and that `messages[0]["content"]` exists; if validation fails raise a clear ValueError describing the missing field so callers of evaluate_result get a meaningful error before constructing the HumanMessage and calling app.invoke.

coderabbitai · 2026-04-24T13:35:08Z

+from pathlib import Path
+from app.core.main import main
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))


⚠️ Potential issue | 🟠 Major

Import order defeats the sys.path bootstrap.

from app.core.main import main on line 3 runs before the REPO_ROOT insertion on lines 5–7. If the repo root isn't already on sys.path (e.g., running python app/tests/installation_test.py directly, which is the scenario this bootstrap is meant to support), the import will fail with ModuleNotFoundError and the path fix below is never reached. Perform the sys.path mutation before importing from app.

🛠️ Proposed fix

import sys from pathlib import Path -from app.core.main import main - -REPO_ROOT = Path(__file__).resolve().parents[2] -if str(REPO_ROOT) not in sys.path: - sys.path.insert(0, str(REPO_ROOT)) + +REPO_ROOT = Path(__file__).resolve().parents[2] +if str(REPO_ROOT) not in sys.path: + sys.path.insert(0, str(REPO_ROOT)) + +from app.core.main import main # noqa: E402 (import after sys.path bootstrap)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/tests/installation_test.py` around lines 2 - 7, The import of main occurs before the REPO_ROOT sys.path bootstrap, causing ModuleNotFoundError when running the test directly; move the sys.path mutation (REPO_ROOT calculation and insertion into sys.path) above the import and then import main (from app.core.main import main) after ensuring REPO_ROOT is inserted, so adjust the file to compute REPO_ROOT, insert it into sys.path if missing, and only then import main.

Fix node name mismatch

…mode - Added Interpreter Agent functionality to execute user-generated code in a controlled environment. - Introduced security measures to restrict imports and filesystem access within the interpreter. - Configured trusted mode to enable or disable Interpreter Agent features based on environment variables. - Updated load_config to conditionally include Interpreter Agent based on trusted mode. - Enhanced Streamlit app to reflect trusted mode status and manage file uploads accordingly. - Created sandbox_runner for executing code with resource limits and timeout handling. - Added security module for managing trusted mode and session path validation. - Implemented tests for security features and configuration loading behavior. - Documented changes and security implications in the agentic engineering log.

Improve interpreter free-text filepath extraction to handle quoted paths, spaces, and dotted filenames, and track files updated in place as well as newly created outputs. Add focused tests for path parsing, DB fallback behavior, and changed-file detection.

- Introduced SessionFilePreparationError for better error management during file staging. - Enhanced _prepare_session_files function to handle various file-related errors. - Added unit tests for session file preparation, covering valid file copying and error scenarios. - Documented changes in the agentic engineering log.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (5)

streamlit_webapp/streamlit_app.py (2)
29-29: Duplicate import os.

os is already imported at line 4 inside the session-bootstrap block. The re-import at line 29 is harmless but redundant.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@streamlit_webapp/streamlit_app.py` at line 29, Remove the redundant "import
os" statement (the duplicate import) so only the original import in the
session-bootstrap block remains; locate the second occurrence of the import
statement (the extra "import os") and delete it, then run linting/formatting to
confirm no unused-import warnings remain.
373-400: Style: drop == True / == False comparisons (Ruff E712).

Static analysis flags lines 373, 374, 390, and 400 for explicit boolean equality. Direct truthiness checks read better and align with the rest of the file:
♻️ Suggested change
-if st.session_state.openai_key_success == True and st.session_state.endpoint_url_success == True:
-    if st.session_state.langgraph_app_created == False:
+if st.session_state.openai_key_success and st.session_state.endpoint_url_success:
+    if not st.session_state.langgraph_app_created:
@@
-        if st.session_state.trusted_mode_enabled == False:
+        if not st.session_state.trusted_mode_enabled:
@@
-        if st.session_state.trusted_mode_enabled == True:
+        if st.session_state.trusted_mode_enabled:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@streamlit_webapp/streamlit_app.py` around lines 373 - 400, Replace explicit
boolean equality checks like st.session_state.openai_key_success == True,
st.session_state.endpoint_url_success == True,
st.session_state.langgraph_app_created == False, and
st.session_state.trusted_mode_enabled == False/== True with direct
truthiness/negation checks (e.g., if st.session_state.openai_key_success:, if
not st.session_state.langgraph_app_created:, if not
st.session_state.trusted_mode_enabled:, if
st.session_state.trusted_mode_enabled:) in the streamlit_app control flow around
llm_creation and create_workflow so the logic is cleaner and Ruff E712 warnings
are resolved.
app/core/security.py (1)
22-38: Optional: handle invalid env var values gracefully.

If any of METABOT_INTERPRETER_TIMEOUT_SECONDS, METABOT_INTERPRETER_CPU_SECONDS, METABOT_INTERPRETER_MEMORY_MB, or METABOT_INTERPRETER_MAX_FILE_MB are set to a non-integer (e.g., "15s"), int(...) will raise ValueError at call-time, surfacing as a confusing error far from the configuration source. Consider wrapping the parse with a try/except that logs the offending var and falls back to the default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/security.py` around lines 22 - 38, The parsing of environment
variables in get_interpreter_timeout_seconds, get_interpreter_cpu_seconds,
get_interpreter_memory_bytes, and get_interpreter_max_file_bytes can raise
ValueError for non-integer values; update each function to catch parsing errors
(ValueError) around int(...) calls, log the variable name and its invalid value
(or use a provided logger), and fall back to the same default used now (e.g.,
"15" for timeout, use timeout value for CPU, "512" MB for memory, "64" MB for
max file) before applying the min/max constraints and unit conversions so
callers still get a valid int instead of an exception.
app/core/agents/supervisor/prompt.py (1)
16-27: Note: Interpreter_agent is already absent from supervisor's enum when disabled.

Defensive prompting is fine, but worth noting that when trusted mode is off, Interpreter_agent is excluded from members (and hence from the route function's enum in app/core/agents/supervisor/agent.py:32). The model is structurally prevented from selecting it, so these prompt gates are belt-and-suspenders. Consider whether the multiple repetitions ("only when Interpreter_agent is available", etc.) actually help the model or just inflate the system prompt — a single paragraph plus the original instructions may suffice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/agents/supervisor/prompt.py` around lines 16 - 27, The prompt
repeats availability checks for Interpreter_agent which are redundant because
Interpreter_agent is already removed from the supervisor members enum when
disabled; simplify the wording to a single clear paragraph that: states only
call Interpreter_agent when it appears in members/route enum, otherwise use
SPARQL answer or state visualization is disabled; preserves behaviors around
delegation from ENPKG_agent to Sparql_query_runner and when to mark FINISH; and
references the relevant symbols Interpreter_agent, Sparql_query_runner,
ENPKG_agent and the supervisor route/members enum so readers can locate and
update the prompt in app/core/agents/supervisor/prompt.py and the route logic in
app/core/agents/supervisor/agent.py.
app/core/utils.py (1)
44-52: Fragile positional insertions — prefer semantic placement.

Two positional assumptions baked in here:

Line 45: trusted_config["agents"].insert(3, interpreter_agent) assumes a specific ordering and at least 3 existing agents. If langgraph.json is reordered or shrinks, the interpreter agent ends up in an unexpected slot (or this raises unexpectedly when the list is shorter than 3 — actually insert is forgiving, but ordering meaning is lost).

Line 52: edge["targets"].insert(-1, interpreter_target) assumes the last target is always FINISH/__end__. If a future edit reorders targets, the interpreter target gets placed before some other live target instead.

Since insertion order doesn't matter for correctness (the routing dict in langraph_workflow.py is built from a comprehension), append is safer:
♻️ Suggested refactor
-    if interpreter_agent not in trusted_config["agents"]:
-        trusted_config["agents"].insert(3, interpreter_agent)
+    if interpreter_agent not in trusted_config["agents"]:
+        trusted_config["agents"].append(interpreter_agent)
@@
-    for edge in trusted_config["conditional_edges"]:
-        if edge["source"] == "supervisor" and interpreter_target not in edge["targets"]:
-            edge["targets"].insert(-1, interpreter_target)
+    for edge in trusted_config["conditional_edges"]:
+        if edge["source"] == "supervisor" and interpreter_target not in edge["targets"]:
+            edge["targets"].append(interpreter_target)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/utils.py` around lines 44 - 52, The current code uses fragile
positional inserts (trusted_config["agents"].insert(3, interpreter_agent) and
edge["targets"].insert(-1, interpreter_target]) which rely on list ordering;
change both to semantic appends so placement is not order-dependent: when adding
the interpreter agent/edge/target in the block that checks trusted_config
(symbols: trusted_config, interpreter_agent, interpreter_edge,
interpreter_target, trusted_config["conditional_edges"], edge["targets"],
trusted_config["agents"], trusted_config["edges"]) replace the positional
inserts with .append(...) for agents/edges and append the interpreter_target to
edge["targets"] instead of insert(-1) to ensure safe, order-independent
addition.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/core/agents/interpreter/sandbox_runner.py`:
- Around line 49-63: The AST validation in validate_code currently only blocks
names but can be bypassed via attribute access; update validate_code to also
walk and reject ast.Attribute nodes where node.attr exists and node.attr starts
and ends with double underscores (node.attr.startswith("__") and
node.attr.endswith("__")), raising ValueError with a clear message (e.g.,
"Dunder attribute access is not allowed."); keep the existing
Import/ImportFrom/Name checks and add a brief comment in the function noting
this is defense-in-depth rather than a security boundary.
- Around line 248-258: The current try/finally around exec in execute_user_code
lets exceptions escape the stdout/stderr redirect so captured output is lost;
wrap the exec(...) call in a try/except BaseException block (inside the existing
redirect context) and on any exception return a structured result that includes
success: False, the stdout_buffer.getvalue(), stderr_buffer.getvalue(), and the
exception string (or traceback) so main() receives printed output even when user
code raises; keep the existing signal.alarm(0) in the finally block to clear the
timer.

In `@app/core/agents/interpreter/tool_interpreter.py`:
- Around line 203-214: The function extract_paths_from_db currently assumes
payload_json and its "output" field are dicts and can call .get directly, which
can raise AttributeError for malformed payloads; update extract_paths_from_db to
validate types before accessing keys: after json.loads(payload) check that
payload_json is a dict, retrieve output = payload_json.get("output") and if
isinstance(output, dict) return output.get("paths", []); if output is None or
not a dict, attempt to safely extract a top-level "paths" list (if
payload_json.get("paths") is a list) otherwise log a warning via logger.warning
about the unexpected payload shape (include tool_name and type info) and return
[] to fail softly; keep the existing JSONDecodeError handling and ensure no
AttributeErrors propagate.
- Around line 303-341: The result tempfile result_path is only unlinked on
timeout and after successful read, leaking it on other exceptions; modify the
try/finally around the subprocess invocation (the block that calls
subprocess.Popen and process.communicate) to ensure
result_path.unlink(missing_ok=True) runs in the same finally that currently
removes payload_path, and only skip deletion on the explicit success path after
you have read and processed the result file (e.g., set a local success flag or
delete conditionally after reading). Update references around subprocess.Popen,
process.communicate, payload_path.unlink, and any early-return branches so
result_path is always cleaned up on error paths.
- Around line 124-128: Wrap the llm.invoke call (where
SystemMessage/HumanMessage are passed) in a try/except to catch any exceptions
and log them, then return a user-friendly error string consistent with other
tool error paths; after a successful call, ensure response.content is normalized
to a str (e.g., if it's a list/dict/other, serialize or join into a string)
before passing it to extract_python_code so you avoid TypeError when
response.content is not a string. Keep references to llm.invoke, SystemMessage,
HumanMessage, response.content, and extract_python_code when implementing the
fix.

In `@app/core/agents/supervisor/prompt.py`:
- Line 33: Fix the typo "aproppriate" → "appropriate" in the supervisor prompt
text in app/core/agents/supervisor/prompt.py; locate the prompt string (e.g.,
the PROMPT_TEMPLATE or function build_supervisor_prompt / variable holding the
prompt) that contains the sentence about delegating interpretation to the
interpreter agent and correct the misspelling so the sentence reads "appropriate
agent".

In `@app/core/tests/test_interpreter_tool.py`:
- Around line 104-118: The test
test_find_changed_or_new_files_includes_overwritten_file is flaky on filesystems
with coarse mtime granularity because capture_session_file_metadata compares
(stat.st_mtime_ns, stat.st_size); ensure the file metadata actually changes by
modifying the test to either increase the sleep duration to >= 1 second between
writes or write content with different byte lengths (so stat.st_size differs)
before calling capture_session_file_metadata, referencing the test function name
and the methods capture_session_file_metadata and find_changed_or_new_files to
locate the code to update.

In `@streamlit_webapp/streamlit_app.py`:
- Around line 401-413: Sanitize client-supplied filenames before writing to
session_state.user_input_dir: do not trust file.name directly (from
st.file_uploader). Replace the direct join with a sanitized basename (e.g.,
Path(file.name).name or os.path.basename(file.name)) or pass the candidate path
through the existing resolve_session_path(..., must_exist=False) helper to
ensure path traversal is rejected, then open and write to that safe path; update
references around user_files loop where file_path is computed so uploads cannot
escape user_input_dir.

---

Nitpick comments:
In `@app/core/agents/supervisor/prompt.py`:
- Around line 16-27: The prompt repeats availability checks for
Interpreter_agent which are redundant because Interpreter_agent is already
removed from the supervisor members enum when disabled; simplify the wording to
a single clear paragraph that: states only call Interpreter_agent when it
appears in members/route enum, otherwise use SPARQL answer or state
visualization is disabled; preserves behaviors around delegation from
ENPKG_agent to Sparql_query_runner and when to mark FINISH; and references the
relevant symbols Interpreter_agent, Sparql_query_runner, ENPKG_agent and the
supervisor route/members enum so readers can locate and update the prompt in
app/core/agents/supervisor/prompt.py and the route logic in
app/core/agents/supervisor/agent.py.

In `@app/core/security.py`:
- Around line 22-38: The parsing of environment variables in
get_interpreter_timeout_seconds, get_interpreter_cpu_seconds,
get_interpreter_memory_bytes, and get_interpreter_max_file_bytes can raise
ValueError for non-integer values; update each function to catch parsing errors
(ValueError) around int(...) calls, log the variable name and its invalid value
(or use a provided logger), and fall back to the same default used now (e.g.,
"15" for timeout, use timeout value for CPU, "512" MB for memory, "64" MB for
max file) before applying the min/max constraints and unit conversions so
callers still get a valid int instead of an exception.

In `@app/core/utils.py`:
- Around line 44-52: The current code uses fragile positional inserts
(trusted_config["agents"].insert(3, interpreter_agent) and
edge["targets"].insert(-1, interpreter_target]) which rely on list ordering;
change both to semantic appends so placement is not order-dependent: when adding
the interpreter agent/edge/target in the block that checks trusted_config
(symbols: trusted_config, interpreter_agent, interpreter_edge,
interpreter_target, trusted_config["conditional_edges"], edge["targets"],
trusted_config["agents"], trusted_config["edges"]) replace the positional
inserts with .append(...) for agents/edges and append the interpreter_target to
edge["targets"] instead of insert(-1) to ensure safe, order-independent
addition.

In `@streamlit_webapp/streamlit_app.py`:
- Line 29: Remove the redundant "import os" statement (the duplicate import) so
only the original import in the session-bootstrap block remains; locate the
second occurrence of the import statement (the extra "import os") and delete it,
then run linting/formatting to confirm no unused-import warnings remain.
- Around line 373-400: Replace explicit boolean equality checks like
st.session_state.openai_key_success == True,
st.session_state.endpoint_url_success == True,
st.session_state.langgraph_app_created == False, and
st.session_state.trusted_mode_enabled == False/== True with direct
truthiness/negation checks (e.g., if st.session_state.openai_key_success:, if
not st.session_state.langgraph_app_created:, if not
st.session_state.trusted_mode_enabled:, if
st.session_state.trusted_mode_enabled:) in the streamlit_app control flow around
llm_creation and create_workflow so the logic is cleaner and Ruff E712 warnings
are resolved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2f10a01f-a2ee-46b8-931a-28b4590ac276

📥 Commits

Reviewing files that changed from the base of the PR and between 09b836c and 115678a.

⛔ Files ignored due to path filters (1)

app/config/langgraph.json is excluded by !**/*.json

📒 Files selected for processing (11)

app/core/agents/agents_factory.py
app/core/agents/interpreter/prompt.py
app/core/agents/interpreter/sandbox_runner.py
app/core/agents/interpreter/tool_interpreter.py
app/core/agents/supervisor/agent.py
app/core/agents/supervisor/prompt.py
app/core/security.py
app/core/tests/test_interpreter_tool.py
app/core/tests/test_security.py
app/core/utils.py
streamlit_webapp/streamlit_app.py

✅ Files skipped from review due to trivial changes (1)

app/core/agents/interpreter/prompt.py

Improve interpreter sandbox validation and failure handling by blocking dunder attribute access, preserving captured output on execution errors, softening malformed DB payload parsing, and cleaning up subprocess tempfiles on all error paths. Also sanitize uploaded filenames in Streamlit, fix the supervisor prompt typo, and add focused regression tests for the sandbox runner and interpreter tool.

add CLI file input support for FILE_ANALYZER tool when running locally

lfnothias · 2026-04-24T22:47:26Z

@qodo-code-review

…ionality

…aunch instructions; add test for Streamlit app rendering

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

app/core/tests/test_sparql_tool.py (2)

5-8: Minor: construct fallback will likely never hit on supported pydantic versions.

The project is on pydantic v2 (langchain/openai stack in this PR), where model_construct always exists. The construct fallback is effectively dead code unless you explicitly support pydantic v1 in tests. Not blocking — just worth simplifying if v1 support isn't required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_sparql_tool.py` around lines 5 - 8, The helper
_construct_tool contains a fallback to GraphSparqlQAChain.construct which is
unnecessary on pydantic v2; simplify by removing the fallback and always using
tool_sparql.GraphSparqlQAChain.model_construct(**kwargs) (update the
_construct_tool function to call model_construct directly and remove the hasattr
check and construct branch).
55-84: Optional: add coverage for the OPENAI_API_KEY env fallback.

_construct_tool uses model_construct/construct which bypasses GraphSparqlQAChain.__init__, so the openai_key or os.getenv("OPENAI_API_KEY") fallback at tool_sparql.py:261 is never exercised. Consider an additional test that instantiates via the real __init__ (with mocked LLMChain) or at least pre-sets self.openai_key = None and monkeypatches os.environ to assert the env fallback feeds OpenAIEmbeddings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_sparql_tool.py` around lines 55 - 84, Add a new test that
verifies the OPENAI_API_KEY env fallback by ensuring OpenAIEmbeddings receives
the env value: instantiate the real GraphSparqlQAChain (or construct the same
Tool instance but set its self.openai_key = None) instead of using
_construct_tool so the __init__ path runs, monkeypatch
os.environ["OPENAI_API_KEY"] to a test key, monkeypatch
tool_sparql.OpenAIEmbeddings to capture the api_key, and call the search method
(e.g., search_nodes) to assert captured["api_key"] equals the env value;
reference GraphSparqlQAChain.__init__, _construct_tool, and
tool_sparql.OpenAIEmbeddings to locate where to change the test.

app/core/agents/sparql/tool_sparql.py (1)

237-261: Optional: clarify api_key fallback handling

The current code safely falls back to the OPENAI_API_KEY environment variable when openai_key=None is passed to OpenAIEmbeddings(). However, for better code clarity and to make the fallback intent explicit, consider only passing api_key when a value is present:

Suggested refactor

-        embeddings = OpenAIEmbeddings(api_key=self.openai_key)
+        embeddings = (
+            OpenAIEmbeddings(api_key=self.openai_key)
+            if self.openai_key
+            else OpenAIEmbeddings()
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/core/agents/sparql/tool_sparql.py` around lines 237 - 261, The
constructor currently sets self.openai_key = openai_key or
os.getenv("OPENAI_API_KEY"); update usages so the OpenAIEmbeddings instantiation
(and any other OpenAI client creation) only includes the api_key parameter when
self.openai_key is truthy—i.e., pass api_key=self.openai_key conditionally
instead of always passing api_key=None—so that the fallback to the environment
variable remains explicit and avoids sending a None value; locate this in code
paths that construct OpenAIEmbeddings or OpenAI clients referenced from this
class (use the __init__ and self.openai_key symbols to find the relevant spots).

app/core/tests/test_main.py (2)

52-56: Replace assert False + try/except with pytest.raises (Ruff B011).

assert False is stripped under python -O, which silently turns these negative-path tests into pass-throughs (Ruff B011). The idiomatic pytest pattern is pytest.raises, which also lets you assert the message via the match= regex and removes the manual try/except boilerplate at lines 52‑56, 76‑80, and 92‑96.

♻️ Proposed refactor (apply to all three negative-path tests)

+import pytest
@@
-    try:
-        main_module._prepare_session_files("session-id", [str(source_dir)])
-        assert False, "Expected SessionFilePreparationError"
-    except main_module.SessionFilePreparationError as exc:
-        assert "not a file" in str(exc)
+    with pytest.raises(main_module.SessionFilePreparationError, match="not a file"):
+        main_module._prepare_session_files("session-id", [str(source_dir)])
@@
-    try:
-        main_module._prepare_session_files("session-id", [str(first_file), str(second_file)])
-        assert False, "Expected SessionFilePreparationError"
-    except main_module.SessionFilePreparationError as exc:
-        assert "would overwrite" in str(exc)
+    with pytest.raises(main_module.SessionFilePreparationError, match="would overwrite"):
+        main_module._prepare_session_files("session-id", [str(first_file), str(second_file)])
@@
-    try:
-        main_module._prepare_session_files("session-id", [str(staged_file)])
-        assert False, "Expected SessionFilePreparationError"
-    except main_module.SessionFilePreparationError as exc:
-        assert "already staged" in str(exc)
+    with pytest.raises(main_module.SessionFilePreparationError, match="already staged"):
+        main_module._prepare_session_files("session-id", [str(staged_file)])

Also applies to: 76-80, 92-96

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_main.py` around lines 52 - 56, Replace the try/except +
assert False negative-path tests with pytest.raises: for each occurrence testing
main_module._prepare_session_files that expects
main_module.SessionFilePreparationError (including the other two blocks at lines
76-80 and 92-96), use pytest.raises(main_module.SessionFilePreparationError,
match="not a file") to both assert the exception is raised and verify the
message; remove the manual try/except and assert False boilerplate and import
pytest if not already imported.

100-100: Redundant sys.argv save/restore — monkeypatch already handles cleanup.

monkeypatch.setattr(sys, "argv", ...) is automatically reverted at test teardown, so saving original_argv and re-monkeypatching it inside finally is unnecessary. Calling monkeypatch.setattr from a finally block also adds an undo entry that fires after the test exits — harmless here, but noisy. You can drop the original_argv bookkeeping and the try/finally wrapping main_module.main() in both tests.

♻️ Proposed cleanup

-def test_main_passes_cli_api_key_and_reconfigures_logger(monkeypatch):
-    original_argv = sys.argv[:]
-    state = {"session_initialized": False, "setup_logger_states": []}
+def test_main_passes_cli_api_key_and_reconfigures_logger(monkeypatch):
+    state = {"session_initialized": False, "setup_logger_states": []}
@@
-    try:
-        main_module.main()
-    finally:
-        monkeypatch.setattr(sys, "argv", original_argv)
+    main_module.main()
@@
-def test_main_prints_user_friendly_error_for_bad_staged_file(monkeypatch, capsys):
-    original_argv = sys.argv[:]
-
-    monkeypatch.setattr(sys, "argv", ["prog", "-c", "hello", "-f", "/missing/file.csv"])
+def test_main_prints_user_friendly_error_for_bad_staged_file(monkeypatch, capsys):
+    monkeypatch.setattr(sys, "argv", ["prog", "-c", "hello", "-f", "/missing/file.csv"])
@@
-    try:
-        main_module.main()
-    finally:
-        monkeypatch.setattr(sys, "argv", original_argv)
+    main_module.main()

If you also drop import sys (no longer referenced), the module gets slightly tidier.

Also applies to: 134-137, 148-148, 166-169

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_main.py` at line 100, Remove the redundant save/restore
of sys.argv and the try/finally around main_module.main(): drop the
original_argv = sys.argv[:] variable and the finally block that calls
monkeypatch.setattr(sys, "argv", original_argv), since monkeypatch.setattr(sys,
"argv", ...) already auto-reverts at teardown; update both tests that wrap
main_module.main() (and the other occurrences referenced) to call
main_module.main() directly under the monkeypatch, and remove the now-unused
import sys from the test module.

app/core/main.py (1)

10-10: Move ChatOpenAI import to langchain_openai package.

ChatOpenAI is deprecated in langchain_community.chat_models and has moved to langchain_openai (already in your dependencies at v0.3.35). Importing from the old path triggers a LangChainDeprecationWarning. Split the imports so ChatOpenAI comes from langchain_openai and ChatLiteLLM remains in langchain_community.
Proposed import change
-from langchain_community.chat_models import ChatOpenAI, ChatLiteLLM
+from langchain_openai import ChatOpenAI
+from langchain_community.chat_models import ChatLiteLLM
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/main.py` at line 10, Update the imports so ChatOpenAI is imported
from langchain_openai while ChatLiteLLM remains imported from
langchain_community.chat_models: locate the import line that currently
references both ChatOpenAI and ChatLiteLLM and split it into two imports, using
ChatOpenAI from the langchain_openai package and ChatLiteLLM from
langchain_community.chat_models to eliminate the deprecation warning.

app/core/tests/test_sandbox_runner.py (1)

6-11: Prefer pytest.raises over assert False (Ruff B011).

assert False is stripped under python -O, which would silently turn this test into a no-op. Using pytest.raises also gives clearer failure output.

♻️ Suggested change

+import pytest
+
 def test_validate_code_rejects_dunder_attribute_access():
-    try:
-        validate_code("x = object().__class__")
-        assert False, "Expected ValueError"
-    except ValueError as exc:
-        assert "Dunder attribute access" in str(exc)
+    with pytest.raises(ValueError, match="Dunder attribute access"):
+        validate_code("x = object().__class__")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_sandbox_runner.py` around lines 6 - 11, The test
test_validate_code_rejects_dunder_attribute_access uses a try/except with assert
False which is removed under python -O; replace that pattern with
pytest.raises(ValueError) as excinfo and assert the error message contains
"Dunder attribute access" using str(excinfo.value). Update the test to import
pytest if needed and call validate_code("x = object().__class__") inside the
pytest.raises context to ensure the ValueError is properly asserted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/core/tests/test_sandbox_runner.py`:
- Around line 14-35: The test calls execute_user_code directly which mutates
global process state (os.chdir, apply_resource_limits setting RLIMITs,
disable_network monkey-patching, and tempfile.tempdir) and therefore must
instead spawn the real runner as a separate process; update the test to invoke
the sandbox entrypoint (the same invocation path used by
tool_interpreter.execute_in_subprocess that runs python -I sandbox_runner.py
...) with the same payload, capture stdout/stderr and exit status, and assert
the captured output/error/traceback from the subprocess result rather than
calling execute_user_code in-process, ensuring no global state (disable_network,
tempfile.tempdir, RLIMITs, CWD) is modified in the pytest worker.

---

Nitpick comments:
In `@app/core/agents/sparql/tool_sparql.py`:
- Around line 237-261: The constructor currently sets self.openai_key =
openai_key or os.getenv("OPENAI_API_KEY"); update usages so the OpenAIEmbeddings
instantiation (and any other OpenAI client creation) only includes the api_key
parameter when self.openai_key is truthy—i.e., pass api_key=self.openai_key
conditionally instead of always passing api_key=None—so that the fallback to the
environment variable remains explicit and avoids sending a None value; locate
this in code paths that construct OpenAIEmbeddings or OpenAI clients referenced
from this class (use the __init__ and self.openai_key symbols to find the
relevant spots).

In `@app/core/main.py`:
- Line 10: Update the imports so ChatOpenAI is imported from langchain_openai
while ChatLiteLLM remains imported from langchain_community.chat_models: locate
the import line that currently references both ChatOpenAI and ChatLiteLLM and
split it into two imports, using ChatOpenAI from the langchain_openai package
and ChatLiteLLM from langchain_community.chat_models to eliminate the
deprecation warning.

In `@app/core/tests/test_main.py`:
- Around line 52-56: Replace the try/except + assert False negative-path tests
with pytest.raises: for each occurrence testing
main_module._prepare_session_files that expects
main_module.SessionFilePreparationError (including the other two blocks at lines
76-80 and 92-96), use pytest.raises(main_module.SessionFilePreparationError,
match="not a file") to both assert the exception is raised and verify the
message; remove the manual try/except and assert False boilerplate and import
pytest if not already imported.
- Line 100: Remove the redundant save/restore of sys.argv and the try/finally
around main_module.main(): drop the original_argv = sys.argv[:] variable and the
finally block that calls monkeypatch.setattr(sys, "argv", original_argv), since
monkeypatch.setattr(sys, "argv", ...) already auto-reverts at teardown; update
both tests that wrap main_module.main() (and the other occurrences referenced)
to call main_module.main() directly under the monkeypatch, and remove the
now-unused import sys from the test module.

In `@app/core/tests/test_sandbox_runner.py`:
- Around line 6-11: The test test_validate_code_rejects_dunder_attribute_access
uses a try/except with assert False which is removed under python -O; replace
that pattern with pytest.raises(ValueError) as excinfo and assert the error
message contains "Dunder attribute access" using str(excinfo.value). Update the
test to import pytest if needed and call validate_code("x = object().__class__")
inside the pytest.raises context to ensure the ValueError is properly asserted.

In `@app/core/tests/test_sparql_tool.py`:
- Around line 5-8: The helper _construct_tool contains a fallback to
GraphSparqlQAChain.construct which is unnecessary on pydantic v2; simplify by
removing the fallback and always using
tool_sparql.GraphSparqlQAChain.model_construct(**kwargs) (update the
_construct_tool function to call model_construct directly and remove the hasattr
check and construct branch).
- Around line 55-84: Add a new test that verifies the OPENAI_API_KEY env
fallback by ensuring OpenAIEmbeddings receives the env value: instantiate the
real GraphSparqlQAChain (or construct the same Tool instance but set its
self.openai_key = None) instead of using _construct_tool so the __init__ path
runs, monkeypatch os.environ["OPENAI_API_KEY"] to a test key, monkeypatch
tool_sparql.OpenAIEmbeddings to capture the api_key, and call the search method
(e.g., search_nodes) to assert captured["api_key"] equals the env value;
reference GraphSparqlQAChain.__init__, _construct_tool, and
tool_sparql.OpenAIEmbeddings to locate where to change the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 33cc3106-5fc7-4586-a12d-c4a9c3a2f897

📥 Commits

Reviewing files that changed from the base of the PR and between 115678a and 94c7840.

📒 Files selected for processing (14)

README.md
app/core/agents/interpreter/sandbox_runner.py
app/core/agents/interpreter/tool_interpreter.py
app/core/agents/sparql/agent.py
app/core/agents/sparql/tool_sparql.py
app/core/agents/supervisor/prompt.py
app/core/main.py
app/core/tests/test_interpreter_tool.py
app/core/tests/test_main.py
app/core/tests/test_sandbox_runner.py
app/core/tests/test_sparql_tool.py
streamlit_webapp/README.md
streamlit_webapp/streamlit_app.py
streamlit_webapp/test_streamlit_app.py

✅ Files skipped from review due to trivial changes (2)

README.md
streamlit_webapp/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

app/core/agents/supervisor/prompt.py

Stabilize Streamlit startup/import handling, add a render smoke test, and update the README with v1.1.0 launch guidance. Also harden follow-up test coverage by moving sandbox-runner execution checks into a subprocess, cleaning up pytest assertions, and refining SPARQL/OpenAI key handling to avoid env-only regressions and deprecation warnings.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

app/core/tests/test_sparql_tool.py (1)
42-47: Optional: relax dict-equality to a subset check.

assert captured["tool_kwargs"] == {...} will break the moment create_agent forwards an extra kwarg (e.g., if llm_instance ever flows into tool_parameters). Consider asserting individual keys or using a subset comparison so this test stays focused on the openai_key plumbing it's actually verifying.
♻️ Suggested adjustment
-    assert captured["tool_kwargs"] == {
-        "graph": "graph",
-        "llm": {sparql_agent.MODEL_CHOICE: "model"},
-        "session_id": "session-123",
-        "openai_key": "cli-key",
-    }
+    assert captured["tool_kwargs"]["graph"] == "graph"
+    assert captured["tool_kwargs"]["llm"] == {sparql_agent.MODEL_CHOICE: "model"}
+    assert captured["tool_kwargs"]["session_id"] == "session-123"
+    assert captured["tool_kwargs"]["openai_key"] == "cli-key"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_sparql_tool.py` around lines 42 - 47, The test currently
asserts exact dict equality on captured["tool_kwargs"], which is brittle; change
the assertion to only check the relevant keys (e.g., assert
captured["tool_kwargs"]["openai_key"] == "cli-key" and assert
captured["tool_kwargs"]["session_id"] == "session-123") or use a subset
comparison (e.g., for each key in {"openai_key","session_id","graph"} assert key
in captured["tool_kwargs"] and values match) so create_agent forwarding extra
kwargs (like llm_instance) won't break the test.
app/core/agents/sparql/tool_sparql.py (1)
397-399: Optional: simplify by always forwarding api_key=self.openai_key.

OpenAIEmbeddings in langchain_openai treats api_key=None as "use environment variable", so the conditional kwargs is functionally equivalent to passing the argument directly:
♻️ Optional simplification
-        embedding_kwargs = {"api_key": self.openai_key} if self.openai_key else {}
-        embeddings = OpenAIEmbeddings(**embedding_kwargs)
+        embeddings = OpenAIEmbeddings(api_key=self.openai_key)
Keep the current form if you prefer being explicit about the conditional logic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/agents/sparql/tool_sparql.py` around lines 397 - 399, The code
currently builds embedding_kwargs conditionally before instantiating
OpenAIEmbeddings and loading FAISS; simplify by passing api_key=self.openai_key
directly to OpenAIEmbeddings (replace the embedding_kwargs +
OpenAIEmbeddings(**embedding_kwargs) pattern with
OpenAIEmbeddings(api_key=self.openai_key)), leaving FAISS.load_local(db_path,
embeddings, allow_dangerous_deserialization=True) unchanged; update references
to embedding_kwargs/embeddings accordingly and remove the now-unneeded
embedding_kwargs variable.
README.md (1)
109-116: Reconcile the two Streamlit launch recipes.

Lines 109-116 advertise python -m streamlit run streamlit_webapp/streamlit_app.py from the repo root as the recommended/tested workflow, but lines 171-179 still document the legacy export PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit run ... recipe. Now that streamlit_app.py inserts the repo root into sys.path itself, the PYTHONPATH workaround is obsolete and risks confusing readers. Consider deleting the lines 175-178 block (or merging it into the v1.1.0-aligned one) so there is a single source of truth.

Also applies to: 171-179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 109 - 116, Remove the legacy PYTHONPATH launch recipe
and keep a single, up-to-date Streamlit instruction: delete the export
PYTHONPATH block (the legacy recipe that uses export
PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit run ...) and ensure README only
documents the repo-root invocation using python -m streamlit run
streamlit_webapp/streamlit_app.py, since streamlit_app.py already injects the
repo root into sys.path; alternatively, if you prefer consolidation, merge the
legacy text into the v1.1.0-aligned paragraph and note that contributor/admin
keys can still be set via environment variables.
app/core/main.py (2)
347-357: Stage/validate files before initializing LLMs to fail fast.

llm_creation(...) constructs ChatOpenAI/ChatLiteLLM clients (and validates API key configuration) before _prepare_session_files runs. If the user passes a bad -f path, they currently pay the full LLM-init cost (and may get a confusing key-related error first if the env is misconfigured) before hitting the file-staging error. Reordering so that staging happens immediately after argument parsing — or at minimum before llm_creation — gives cleaner, faster CLI feedback.
♻️ Proposed reorder
-    # Initialize language models
-    models = llm_creation(api_key=args.api_key)
-
-    # Stage user-provided files into the session's input directory
     if args.file:
         try:
             _prepare_session_files(session_id, args.file)
         except SessionFilePreparationError as exc:
             logger.error(str(exc))
             print(f"Error: {exc}")
             return
+
+    # Initialize language models
+    models = llm_creation(api_key=args.api_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/main.py` around lines 347 - 357, The LLM clients are being
initialized by llm_creation(...) before staging/validating user files, which can
cause costly/ confusing failures; move the _prepare_session_files(session_id,
args.file) call (and its SessionFilePreparationError handling) to run
immediately after argument parsing and before calling llm_creation(...), so the
CLI validates and fails fast on bad -f paths; ensure args.file is checked the
same way and that models = llm_creation(api_key=args.api_key) only executes
after successful _prepare_session_files.
214-278: Solid staging logic; one optional hardening suggestion.

The validation chain (exists → is_file → basename collision → already-staged → destination conflict → copy with error wrapping) is well-ordered and the test suite covers the main branches. Path.name plus resolve(strict=False) correctly neutralizes .. traversal attempts in user-supplied paths.

Optional: shutil.copy2 follows symlinks on the source by default, which matches typical staging semantics, but if you want to refuse symlinked inputs (defense in depth, given the staged files are later read by the interpreter agent in trusted mode), you could add if src.is_symlink(): raise SessionFilePreparationError(...) before the copy. Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/main.py` around lines 214 - 278, Add a defensive check to refuse
symlinked inputs in _prepare_session_files: after resolving src and validating
existence/is_file, if src.is_symlink() raise SessionFilePreparationError(src,
f"Input file is a symlink and is not allowed: {src}") (place this before the
shutil.copy2 call and before the src == dest.resolve(...) check so symlinks are
rejected early); this prevents staging symlinked sources even though
shutil.copy2 would follow them.
app/core/tests/test_main.py (2)
91-132: LGTM — verifies the new --api-key and session_id plumbing.

Capturing both llm_api_key and workflow_api_key ensures the CLI flag flows into llm_creation and create_workflow (matching main.py:348 and main.py:362-365). The setup_logger_states == [True] assertion neatly enforces that logger reconfiguration happens after initialize_session_context, which is the whole point of the global rebinding at main.py:334-335.

One small caveat to keep in mind: the create_user_session mock (line 119) omits the user_session_dir parameter that exists on the real signature — fine today, but if main() ever starts requesting user_session_dir=True, this lambda will raise TypeError. Consider **kwargs for resilience.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_main.py` around lines 91 - 132, The test's mock for
create_user_session in test_main_passes_cli_api_key_and_reconfigures_logger does
not match the real signature (it omits user_session_dir), risking TypeError if
main() passes that kwarg; update the monkeypatched mock (the lambda passed to
main_module.create_user_session) to accept the same optional parameters or use a
catch-all like session_id=None, **kwargs and return "session-123" so it remains
resilient if main() starts supplying user_session_dir or other kwargs.
143-150: Optional: replace the (_ for _ in ()).throw(...) idiom with a small helper.

The generator-throw trick works but is hard to read at a glance. A named helper makes intent obvious and lets future maintainers grep for it.
♻️ Suggested refactor
+    def _raise_missing(session_id, file_paths):
+        raise main_module.SessionFilePreparationError(
+            Path(file_paths[0]), f"File not found: {file_paths[0]}"
+        )
+
-    monkeypatch.setattr(
-        main_module,
-        "_prepare_session_files",
-        lambda session_id, file_paths: (_ for _ in ()).throw(
-            main_module.SessionFilePreparationError(Path(file_paths[0]), f"File not found: {file_paths[0]}")
-        ),
-    )
-    monkeypatch.setattr(main_module, "create_workflow", lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("workflow should not run")))
+    def _fail_create_workflow(*args, **kwargs):
+        raise AssertionError("workflow should not run")
+
+    monkeypatch.setattr(main_module, "_prepare_session_files", _raise_missing)
+    monkeypatch.setattr(main_module, "create_workflow", _fail_create_workflow)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/tests/test_main.py` around lines 143 - 150, The test uses the
hard-to-read "(_ for _ in ()).throw(...)" trick when monkeypatching
_prepare_session_files and create_workflow; add a small named helper (e.g.,
raise_when_called or raising) that takes an exception and returns a callable
which raises that exception when invoked, then replace the generator-throw
usages with calls to that helper for main_module._prepare_session_files (raising
SessionFilePreparationError(...)) and main_module.create_workflow (raising
AssertionError("workflow should not run")) so intent is clear and easy to grep.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/core/tests/test_sandbox_runner.py`:
- Around line 43-61: The subprocess.run invocation that calls the sandbox runner
(the call with sys.executable and str(Path(sandbox_runner.__file__).resolve())
using payload_path and result_path, executed in cwd=session_dir) needs an outer
timeout to avoid hanging the pytest worker if the sandbox's internal timeout
fails; add a timeout parameter (e.g., timeout=8 or a few seconds greater than
the sandbox's timeout_seconds: 5) to the subprocess.run call and ensure the test
treats subprocess.TimeoutExpired as a failure so the test fails fast rather than
hanging.

In `@README.md`:
- Line 32: Remove the stray Git merge-conflict marker by deleting the literal
line "=======" from README.md (the unresolved conflict delimiter), then verify
surrounding content for any leftover conflict markers like "<<<<<<<" or
">>>>>>>" and ensure the README renders correctly; commit the cleaned file with
a clear message indicating the merge conflict was resolved.

In `@streamlit_webapp/README.md`:
- Line 121: The README currently contains a literal placeholder "[Include
support contact information here]"; remove that placeholder or replace it with
the real support channel (e.g., a GitHub Issues URL, maintainer email, or
mailing list) so the published README doesn't show an unfilled template; update
the README line accordingly or delete it if no public contact is intended.

---

Nitpick comments:
In `@app/core/agents/sparql/tool_sparql.py`:
- Around line 397-399: The code currently builds embedding_kwargs conditionally
before instantiating OpenAIEmbeddings and loading FAISS; simplify by passing
api_key=self.openai_key directly to OpenAIEmbeddings (replace the
embedding_kwargs + OpenAIEmbeddings(**embedding_kwargs) pattern with
OpenAIEmbeddings(api_key=self.openai_key)), leaving FAISS.load_local(db_path,
embeddings, allow_dangerous_deserialization=True) unchanged; update references
to embedding_kwargs/embeddings accordingly and remove the now-unneeded
embedding_kwargs variable.

In `@app/core/main.py`:
- Around line 347-357: The LLM clients are being initialized by
llm_creation(...) before staging/validating user files, which can cause costly/
confusing failures; move the _prepare_session_files(session_id, args.file) call
(and its SessionFilePreparationError handling) to run immediately after argument
parsing and before calling llm_creation(...), so the CLI validates and fails
fast on bad -f paths; ensure args.file is checked the same way and that models =
llm_creation(api_key=args.api_key) only executes after successful
_prepare_session_files.
- Around line 214-278: Add a defensive check to refuse symlinked inputs in
_prepare_session_files: after resolving src and validating existence/is_file, if
src.is_symlink() raise SessionFilePreparationError(src, f"Input file is a
symlink and is not allowed: {src}") (place this before the shutil.copy2 call and
before the src == dest.resolve(...) check so symlinks are rejected early); this
prevents staging symlinked sources even though shutil.copy2 would follow them.

In `@app/core/tests/test_main.py`:
- Around line 91-132: The test's mock for create_user_session in
test_main_passes_cli_api_key_and_reconfigures_logger does not match the real
signature (it omits user_session_dir), risking TypeError if main() passes that
kwarg; update the monkeypatched mock (the lambda passed to
main_module.create_user_session) to accept the same optional parameters or use a
catch-all like session_id=None, **kwargs and return "session-123" so it remains
resilient if main() starts supplying user_session_dir or other kwargs.
- Around line 143-150: The test uses the hard-to-read "(_ for _ in
()).throw(...)" trick when monkeypatching _prepare_session_files and
create_workflow; add a small named helper (e.g., raise_when_called or raising)
that takes an exception and returns a callable which raises that exception when
invoked, then replace the generator-throw usages with calls to that helper for
main_module._prepare_session_files (raising SessionFilePreparationError(...))
and main_module.create_workflow (raising AssertionError("workflow should not
run")) so intent is clear and easy to grep.

In `@app/core/tests/test_sparql_tool.py`:
- Around line 42-47: The test currently asserts exact dict equality on
captured["tool_kwargs"], which is brittle; change the assertion to only check
the relevant keys (e.g., assert captured["tool_kwargs"]["openai_key"] ==
"cli-key" and assert captured["tool_kwargs"]["session_id"] == "session-123") or
use a subset comparison (e.g., for each key in
{"openai_key","session_id","graph"} assert key in captured["tool_kwargs"] and
values match) so create_agent forwarding extra kwargs (like llm_instance) won't
break the test.

In `@README.md`:
- Around line 109-116: Remove the legacy PYTHONPATH launch recipe and keep a
single, up-to-date Streamlit instruction: delete the export PYTHONPATH block
(the legacy recipe that uses export PYTHONPATH="$(pwd):${PYTHONPATH}"; streamlit
run ...) and ensure README only documents the repo-root invocation using python
-m streamlit run streamlit_webapp/streamlit_app.py, since streamlit_app.py
already injects the repo root into sys.path; alternatively, if you prefer
consolidation, merge the legacy text into the v1.1.0-aligned paragraph and note
that contributor/admin keys can still be set via environment variables.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e7ef6a79-580d-4f04-93d1-fd8a636ff49f

📥 Commits

Reviewing files that changed from the base of the PR and between 94c7840 and 84892c8.

📒 Files selected for processing (7)

README.md
app/core/agents/sparql/tool_sparql.py
app/core/main.py
app/core/tests/test_main.py
app/core/tests/test_sandbox_runner.py
app/core/tests/test_sparql_tool.py
streamlit_webapp/README.md

Add how to run evaluation.py (LangSmith api key is needed)

lfnothias and others added 11 commits April 23, 2026 14:16

Merge pull request #166 from HolobiomicsLab/main

0574e5a

Sync branches

Update litellm version to 1.83.0 in environment.yml and requirements.txt

90d2a01

Update dependencies in environment.yml and requirements.txt for langc…

1a46419

…hain, openai, pydantic, and numpy

Update python-dotenv, requests, streamlit, and tqdm versions in envir…

10e9a55

…onment.yml and requirements.txt

Add big_benchmark.csv for automated evaluation

f9bf608

Fix deprecated pydantic imports and update requirements

69ec4a5

Update the order of questions to be consistent with automated evaluation

2dd5135

Update installation_test

3aa9458

Clean the evaluation dateset (remove redundant column)

5384796

Merge pull request #169 from HolobiomicsLab/dev_madina

085c973

Dev madina

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 24, 2026

github-actions Bot added the Review effort 4/5 label Apr 24, 2026

dosubot Bot added Configuration changes Dependencies labels Apr 24, 2026

qodo-code-review Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread app/core/agents/interpreter/tool_interpreter.py Outdated

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

madina1203 and others added 4 commits April 24, 2026 16:13

Fix node name mismatch

7227872

Merge pull request #171 from HolobiomicsLab/dev_madina

09b836c

Fix node name mismatch

add CLI file input support for FILE_ANALYZER tool

b41b611

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 24, 2026

lfnothias added 2 commits April 25, 2026 00:18

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

lfnothias and others added 2 commits April 25, 2026 00:35

Merge pull request #172 from HolobiomicsLab/dev_madina

6a32a28

add CLI file input support for FILE_ANALYZER tool when running locally

lfnothias and others added 3 commits April 25, 2026 00:49

feat: Add OpenAI key support to agent creation and enhance tool funct…

c72de39

…ionality

feat: Update README for v1.1.0 highlights and improve Streamlit app l…

94c7840

…aunch instructions; add test for Streamlit app rendering

Merge branch 'main' into dev

492636b

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread app/core/tests/test_sandbox_runner.py Outdated

lfnothias and others added 2 commits April 25, 2026 18:57

Update pr_agent.yml

4bd7a17

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread app/core/tests/test_sandbox_runner.py Outdated

Comment thread README.md

Comment thread streamlit_webapp/README.md

lfnothias and others added 4 commits April 25, 2026 21:28

fix: Add timeout handling for subprocess in test_execute_user_code

10af7b6

Add how to run evaluation.py (LangSmith api key is needed)

b708280

Merge pull request #173 from HolobiomicsLab/dev_madina

a1feb1b

Add how to run evaluation.py (LangSmith api key is needed)

Merge branch 'main' into dev

b3e04dd

lfnothias merged commit 7b11693 into main May 5, 2026
1 check was pending

Uh oh!

Conversation

lfnothias commented Apr 24, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

qodo-code-review Bot commented Apr 24, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

github-actions Bot commented Apr 24, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

lfnothias commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Uh oh!

lfnothias commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lfnothias commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

lfnothias commented Apr 24, 2026 •

edited by github-actions Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

qodo-code-review Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited by qodo-code-review Bot

Loading