feat: add agent driver abstraction layer for evaluation pipeline by rioloc · Pull Request #231 · lightspeed-core/lightspeed-evaluation

rioloc · 2026-05-07T09:46:36Z

Summary

Introduces an agent driver abstraction layer (AgentDriverRegistry → AgentDriver(ABC) → HttpApiDriver) that replaces direct APIDataAmender usage in the evaluation pipeline. This enables per-conversation agent resolution and lays the groundwork for supporting multiple agent types beyond HTTP API.

AgentDriver(ABC) abstract base with execute_turn() / validate_config() / close() interface, and HttpApiDriver implementation wrapping existing APIDataAmender
AgentDriverRegistry factory for driver creation by type
Pipeline integration: default driver at init (shared across threads), per-conversation driver resolution via EvaluationData.agent / agent_config fields
AgentsConfig.enabled kill switch passed at driver construction time (not stored per-agent, since HttpApiAgentConfig uses extra="forbid"). When disabled, the pipeline returns the default (disabled) driver for all conversations
Integration tests extended with native agents: format configs alongside legacy api: configs, verifying both backward compatibility and the new config path

Architecture

graph TD
    subgraph Pipeline
        EP[EvaluationPipeline]
        EP -->|creates| ADR[AgentDriverRegistry]
        EP -->|resolves config| DAC[_resolve_default_agent_config]
        EP -->|resolves per-conv| RDC[_resolve_driver_for_conversation]
    end

    subgraph "Driver Layer"
        ADR -->|create_driver| AD[AgentDriver ABC]
        AD -->|implements| HAD[HttpApiDriver]
        AD -.->|future| FD[Other Drivers...]
        HAD -->|wraps| ADA[APIDataAmender]
    end

    subgraph Processor
        CP[ConversationProcessor]
        CP -->|execute_turn| AD
    end

    EP -->|default driver shared| CP
    RDC -->|per-conv driver| CP

    style AD fill:#f9f,stroke:#333
    style HAD fill:#bbf,stroke:#333
    style FD fill:#ddd,stroke:#999,stroke-dasharray: 5 5

Key design decisions

Stateless drivers: execute_turn() receives conversation_id as parameter and returns it — no internal state, safe for thread sharing
Pipeline owns driver lifecycle: registry, config resolution, and driver creation/cleanup all live in EvaluationPipeline (SRP — processor has zero config responsibility)
Enabled state owned by AgentsConfig: enabled is resolved from AgentsConfig.enabled and passed as a constructor kwarg through create_driver → AgentDriver. The processor checks driver.enabled to decide whether to execute or skip

Changes

Production code

File	Change
`pipeline/evaluation/driver.py`	New: `AgentDriver(ABC)`, `HttpApiDriver`, `AgentDriverRegistry`
`pipeline/evaluation/pipeline.py`	Replace `APIClient`/`APIDataAmender` with driver architecture
`pipeline/evaluation/processor.py`	Accept `AgentDriver` per-call instead of stored `APIDataAmender`
`pipeline/evaluation/__init__.py`	Export driver classes

Tests

File	Change
`tests/unit/pipeline/evaluation/test_driver.py`	New: 13 tests for driver classes
`tests/unit/pipeline/evaluation/test_pipeline.py`	Updated for driver architecture
`tests/unit/pipeline/evaluation/test_processor.py`	Updated for driver-based processing
`tests/unit/pipeline/evaluation/conftest.py`	Updated fixtures
`tests/integration/test_full_evaluation.py`	Renamed `TestFullApiEvaluation`, extended parametrization for both config formats
`tests/integration/system-config-agents-query.yaml`	New: native `agents:` format config
`tests/integration/system-config-agents-streaming.yaml`	New: native `agents:` format config

Test plan

make black-format — all files formatted
make pre-commit — bandit, pyright, docstyle, ruff, pylint, black-check pass
make test — 877 unit tests pass, 0 failures
Integration tests verified with real stack (query endpoint passes for both config formats)
Per-conversation agent override verified (ask vs troubleshoot modes)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added flexible agent configuration management supporting query and streaming endpoint types.
- Agents can now be selectively enabled or disabled per conversation or system-wide.
- New system configuration templates for agent-based evaluation workflows.
Tests
- Extended integration and unit test coverage for agent configuration and execution scenarios.

coderabbitai · 2026-05-07T09:46:44Z

Walkthrough

The pull request introduces an agent driver abstraction to decouple turn execution from direct API client usage, refactors the evaluation pipeline and processor to route conversations through driver instances, and adds comprehensive unit and integration tests with system configuration examples for agents-based evaluation.

Changes

Agent Driver Architecture and Pipeline Refactoring

Layer / File(s)	Summary
Agent Driver Interface and Factory `src/lightspeed_evaluation/pipeline/evaluation/driver.py`	Defines `AgentDriver` abstract base with `enabled` kwarg, config validation, and abstract `execute_turn` and `validate_config` methods. Implements `HttpApiDriver` with conditional `APIClient`/`APIDataAmender` creation and turn delegation. Adds `AgentDriverRegistry` factory for type-based driver instantiation.
Public Module Exports `src/lightspeed_evaluation/pipeline/evaluation/__init__.py`	Adds `AgentDriver` and `AgentDriverRegistry` to lazy-import mechanism via `_LAZY_IMPORTS` and `__getattr__` resolution.
Conversation Processor Driver Integration `src/lightspeed_evaluation/pipeline/evaluation/processor.py`	Updates `process_conversation` to accept `agent_driver` parameter, stores driver in `TurnProcessingContext`, gates turn execution on `driver.enabled`, routes turns through `driver.execute_turn()`, and adds `skip_setup`/`skip_cleanup` parameters based on driver enablement.
Evaluation Pipeline Driver Integration `src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`	Refactors pipeline to create `AgentDriverRegistry` and `_default_driver`, adds helpers for default agent config resolution and per-conversation driver selection with fallback, routes conversation processing through `_process_conversation`, and gates amended-data persistence on driver enablement.
Unit Test Fixtures `tests/unit/pipeline/evaluation/conftest.py`	Adds `mock_agent_driver` fixture (disabled with stubbed `execute_turn`), removes `mock_api_amender` fixture, and updates `processor_components` fixtures to remove API amender dependency.
Agent Driver Unit Tests `tests/unit/pipeline/evaluation/test_driver.py`	Comprehensive tests for `AgentDriverRegistry` creation and error handling, `HttpApiDriver` turn execution delegation, enablement behavior, API client lifecycle, and base `AgentDriver` defaults.
Pipeline Unit Tests `tests/unit/pipeline/evaluation/test_pipeline.py`	Tests for pipeline initialization with `AgentDriverRegistry`, default driver creation, fallback behavior when agents config is absent, amended-data persistence gating, and driver/cache closure.
Processor Unit Tests `tests/unit/pipeline/evaluation/test_processor.py`	Tests for processor invocation with agent drivers, turn execution via drivers, setup/cleanup script execution with agent-disabled skipping, error cascading, and skip-on-failure behavior.
Integration Test System Configurations `tests/integration/system-config-agents-query.yaml`, `tests/integration/system-config-agents-streaming.yaml`	Two YAML configuration files for agents-based evaluation with HTTP query and streaming endpoints, including evaluation parameters, LLM judge/embedding settings, agent config, metrics metadata, storage output, visualization, and logging.
Integration Test Updates `tests/integration/test_full_evaluation.py`	Adds `get_agent_info()` helper to extract agent enablement and endpoint type from agents config, introduces agents-format config fixtures, renames test class to `TestFullApiEvaluation`, and updates end-to-end tests to validate endpoint types and driver enablement.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

lightspeed-core/lightspeed-evaluation#228: Introduces AgentsConfig and HttpApiAgentConfig models plus SystemConfig migration layer that provide the configuration contracts used by the driver architecture in this PR.

Suggested reviewers

asamal4
VladimirKadlec

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: adding an agent driver abstraction layer for the evaluation pipeline, which is the primary focus across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 92.41% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

tests/unit/pipeline/evaluation/test_driver.py (1)

171-207: 💤 Low value

StubDriver is defined identically in both base-class test methods — extractable as a shared class-level fixture.

Both test_enabled_default_is_true and test_close_is_noop define an identical StubDriver inner class. The duplication is minor but a class-level nested class (or a module-level class) would remove the repetition.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/pipeline/evaluation/test_driver.py` around lines 171 - 207, The
two tests test_enabled_default_is_true and test_close_is_noop both define an
identical inner StubDriver class; refactor by moving StubDriver out of the test
methods into a shared scope (e.g., as a class-level attribute on
TestAgentDriverBase or a module-level helper) so both tests instantiate the same
StubDriver without duplication; ensure the moved StubDriver implements
execute_turn and validate_config exactly as in the current inner definitions and
update the tests to construct driver = StubDriver({}) and call driver.enabled /
driver.close() accordingly.

tests/integration/test_full_evaluation.py (1)

43-54: 💤 Low value

Consider failing loudly when neither agents: nor a default agent is set.

The fallback return False, "query" silently treats a missing/empty agents config as "disabled query agent". Given that migrate_api_to_agents always populates agents when api: is present, hitting this branch in an integration config almost certainly indicates a misconfigured fixture YAML rather than legitimate behavior. Raising an explicit error (or asserting at fixture load time) would catch such mistakes earlier.

♻️ Possible tightening

 def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]:
     """Resolve agent enabled status and endpoint_type from any config format.

     Works for both legacy api: and native agents: configs since
     migrate_api_to_agents() ensures agents is always populated when
     api: is present.
     """
     if system_config.agents and system_config.agents.default.agent:
         name = system_config.agents.default.agent
         agent_def = system_config.agents.agents[name]
         return agent_def.enabled, agent_def.endpoint_type
-    return False, "query"
+    raise AssertionError(
+        "Integration config has neither agents: nor api: — "
+        "expected migration to populate agents.default.agent"
+    )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper
get_agent_info currently falls back to returning (False, "query") when no
agents/default agent is present, which hides misconfigured fixtures; change it
to raise an explicit error (e.g., raise RuntimeError or ValueError) when
system_config.agents is missing or system_config.agents.default.agent is empty,
including the problematic system_config (or its name/identifier) in the message
so tests fail loudly and point to the bad fixture; preserve the existing branch
that reads name = system_config.agents.default.agent and returns
agent_def.enabled, agent_def.endpoint_type when present.

src/lightspeed_evaluation/core/models/system.py (1)

805-829: 💤 Low value

Use exclude_unset=True in model_dump() and return a new dict instead of mutating the input.

Two optional improvements to migrate_api_to_agents:

Forwarding unset fields (line 820): api_data.model_dump(exclude={"enabled"}) serializes every field including those with None defaults. This means the synthesized agent config will explicitly contain those None values, preventing the agent's own defaults from taking effect if they differ from APIConfig's defaults. Use exclude_unset=True to forward only user-provided values.
In-place mutation of the input dict (line 825): data["agents"] = {...} mutates the caller's dict when they invoke SystemConfig.model_validate(my_dict). Return a shallow copy instead to avoid surprising side effects on shared dict references.

Suggested adjustment

     `@model_validator`(mode="before")
     `@classmethod`
     def migrate_api_to_agents(cls, data: Any) -> Any:
         """Auto-migrate api: to agents: when agents: is absent."""
         if not isinstance(data, dict):
             return data
         if data.get("agents") is not None or data.get("api") is None:
             return data

         api_data = data["api"]
         if isinstance(api_data, dict):
             api_enabled = api_data.get("enabled", True)
             agent_fields = {k: v for k, v in api_data.items() if k != "enabled"}
         elif isinstance(api_data, BaseModel):
             api_enabled = getattr(api_data, "enabled", True)
-            agent_fields = api_data.model_dump(exclude={"enabled"})
+            agent_fields = api_data.model_dump(
+                exclude={"enabled"}, exclude_unset=True
+            )
         else:
             return data

         agent_fields["type"] = "http_api"
-        data["agents"] = {
+        return {
+            **data,
             "default": {"agent": "http_api" if api_enabled else None},
             "http_api": agent_fields,
-        }
-        return data
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/models/system.py` around lines 805 - 829,
migrate_api_to_agents currently mutates the input dict and uses
api_data.model_dump(exclude={"enabled"}) which includes unset fields; change it
to call model_dump(exclude_unset=True, exclude={"enabled"}) to only forward
user-provided values, and return a shallow copy of the incoming mapping instead
of mutating it: make a new_data = dict(data) (or copy.copy(data)), populate
new_data["agents"] with the synthesized structure (ensure agent_fields is a new
dict in both dict and BaseModel branches), and return new_data; references:
migrate_api_to_agents, api_data.model_dump, data["agents"], agent_fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 405-410: Update the Pydantic Field for the agent attribute in the
model defined in src/lightspeed_evaluation/core/models/data.py to use a fully
anchored regex so leading/trailing/embedded whitespace is rejected: replace the
current pattern r"\S" on the agent Field with r"^\S+$" (you can also remove the
now-redundant min_length=1) so lookups against agent names (e.g., in agents
config) won't silently fail due to surrounding spaces.

In `@src/lightspeed_evaluation/pipeline/evaluation/driver.py`:
- Around line 97-102: The constructor currently treats an explicit empty dict
the same as None and also adds a file-level pylint disable; change
AgentDriverRegistry.__init__ to preserve injected empty registries by using an
identity None check (if drivers is None: self._drivers = AGENT_DRIVERS else:
self._drivers = drivers) and remove the "# pylint:
disable=too-few-public-methods" comment from the class declaration so linting
isn't suppressed; keep the class small as-is (or convert to a tiny factory
elsewhere if preferred), referencing AgentDriverRegistry.__init__ and
AGENT_DRIVERS to locate the change.
- Around line 50-66: The execute_turn method must no-op when the driver is
disabled: add a guard at the start of execute_turn that checks self._enabled
and, if False, returns (None, conversation_id) without calling
self._amender.amend_single_turn; also avoid constructing side-effecting members
when disabled by early-returning or skipping creation of self._api_client and
self._amender in __init__ (check _enabled after parsing config) so disabled
drivers do nothing but still present the same API.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 185-188: The save branch currently checks only
self._default_driver.enabled so per-conversation drivers that are enabled can be
missed; update the conversation driver resolution/processing code (where you
determine the effective driver for each conversation—e.g., the method that
resolves or applies drivers) to set a boolean flag like
self._used_enabled_driver whenever the effective driver for any processed
conversation is enabled, ensure this flag is initialized/reset at the start of
the evaluation run, and change the final save check to use
self._used_enabled_driver before calling
self._save_amended_data(evaluation_data) instead of checking
self._default_driver.enabled.

In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 99-112: The test_enabled_reflects_config test misses patching
APIClient so HttpApiDriver.__init__ creates a real client; patch
"lightspeed_evaluation.pipeline.evaluation.driver.APIClient" (e.g., with
mocker.patch) before instantiating HttpApiDriver in test_enabled_reflects_config
so the driver uses a mocked APIClient instance (similar to
test_close_with_api_client), ensuring isolation when mock_config.enabled = True.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 805-829: migrate_api_to_agents currently mutates the input dict
and uses api_data.model_dump(exclude={"enabled"}) which includes unset fields;
change it to call model_dump(exclude_unset=True, exclude={"enabled"}) to only
forward user-provided values, and return a shallow copy of the incoming mapping
instead of mutating it: make a new_data = dict(data) (or copy.copy(data)),
populate new_data["agents"] with the synthesized structure (ensure agent_fields
is a new dict in both dict and BaseModel branches), and return new_data;
references: migrate_api_to_agents, api_data.model_dump, data["agents"],
agent_fields.

In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info currently falls back to returning
(False, "query") when no agents/default agent is present, which hides
misconfigured fixtures; change it to raise an explicit error (e.g., raise
RuntimeError or ValueError) when system_config.agents is missing or
system_config.agents.default.agent is empty, including the problematic
system_config (or its name/identifier) in the message so tests fail loudly and
point to the bad fixture; preserve the existing branch that reads name =
system_config.agents.default.agent and returns agent_def.enabled,
agent_def.endpoint_type when present.

In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 171-207: The two tests test_enabled_default_is_true and
test_close_is_noop both define an identical inner StubDriver class; refactor by
moving StubDriver out of the test methods into a shared scope (e.g., as a
class-level attribute on TestAgentDriverBase or a module-level helper) so both
tests instantiate the same StubDriver without duplication; ensure the moved
StubDriver implements execute_turn and validate_config exactly as in the current
inner definitions and update the tests to construct driver = StubDriver({}) and
call driver.enabled / driver.close() accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb7bfdd5-39f1-4be6-b236-a466ec5155a4

📥 Commits

Reviewing files that changed from the base of the PR and between 811e1ff and 9a55579.

📒 Files selected for processing (21)

src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/__init__.py
src/lightspeed_evaluation/core/models/agents.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/system/loader.py
src/lightspeed_evaluation/pipeline/evaluation/__init__.py
src/lightspeed_evaluation/pipeline/evaluation/driver.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/integration/system-config-agents-query.yaml
tests/integration/system-config-agents-streaming.yaml
tests/integration/test_full_evaluation.py
tests/unit/core/models/test_agents.py
tests/unit/core/models/test_data.py
tests/unit/core/models/test_system.py
tests/unit/core/system/test_loader.py
tests/unit/pipeline/evaluation/conftest.py
tests/unit/pipeline/evaluation/test_driver.py
tests/unit/pipeline/evaluation/test_pipeline.py
tests/unit/pipeline/evaluation/test_processor.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

173-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset _has_agent_amendments at the start of each run.

The flag is only initialized in __init__. If the same EvaluationPipeline instance is reused, one earlier run with an enabled driver leaves it True, and later runs will still try to persist amended data even when every effective driver is disabled.

Possible fix

     def run_evaluation(
         self,
         evaluation_data: list[EvaluationData],
         original_data_path: Optional[str] = None,
     ) -> list[EvaluationResult]:
@@
         """
         self.original_data_path = original_data_path
+        self._has_agent_amendments = False
         logger.info("Starting evaluation")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` around lines 173 -
190, Reset the instance-level flag self._has_agent_amendments at the start of
each run to avoid carrying over state between runs; inside the method that
initializes the run (the block that sets self.original_data_path, calls
self.storage_backend.initialize(RunInfo(...)) and then calls
self._process_eval_data), add a statement to set self._has_agent_amendments =
False before processing so amended-data persistence only occurs for changes made
during the current invocation.

🧹 Nitpick comments (1)

tests/integration/test_full_evaluation.py (1)

43-54: ⚡ Quick win

Reuse resolve_agent_config() here to keep the test aligned with production resolution.

This helper duplicates the default-agent lookup by reaching into system_config.agents.agents[name] directly. Calling the same resolver the pipeline uses will make these tests follow future config-resolution changes automatically instead of drifting.

Possible refactor

 def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]:
@@
-    if system_config.agents and system_config.agents.default.agent:
-        name = system_config.agents.default.agent
-        agent_def = system_config.agents.agents[name]
-        return agent_def.enabled, agent_def.endpoint_type
+    if system_config.agents and system_config.agents.default.agent:
+        _name, agent_config = system_config.agents.resolve_agent_config()
+        return (
+            bool(agent_config.get("enabled", False)),
+            str(agent_config.get("endpoint_type", "query")),
+        )
     return False, "query"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper
get_agent_info duplicates config resolution; replace its manual lookup with the
production resolver resolve_agent_config: if a default agent name exists
(system_config.agents.default.agent) call resolve_agent_config(system_config,
<default_name>) to obtain the resolved agent config and return its enabled and
endpoint_type, and if resolve_agent_config returns no config fall back to
returning (False, "query"); keep function signature
get_agent_info(system_config: SystemConfig) and only change the body to use
resolve_agent_config.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Line 45: The class-level pylint suppression on EvaluationPipeline must be
removed and the attribute fan-out reduced: create a small data holder (e.g.,
EvaluationContext or PipelineContext) to encapsulate related runtime state and
collaborators currently stored as separate instance attributes on
EvaluationPipeline, move those attributes into that context object, update
EvaluationPipeline.__init__ to accept and store one context instance (and adjust
any references inside methods to use self.context.<name>), and delete the "#
pylint: disable=too-many-instance-attributes" comment; ensure tests and usages
construct the new context when instantiating EvaluationPipeline.
- Around line 147-157: The code currently raises ConfigurationError when
self.system_config.agents is None even if a per-conversation override exists;
update the logic in the block that checks self.system_config.agents so that if
conv_data.agent_config (or conv_data.agent) is provided and self._registry can
build a driver from that override, you bypass resolve_agent_config and call
self._registry.create_driver(agent_config_override) directly, returning (driver,
True); only raise ConfigurationError when no agents are configured and there is
no per-conversation agent_config to construct a driver. Ensure you reference
system_config.agents, conv_data.agent_config, resolve_agent_config,
ConfigurationError, and _registry.create_driver in the change.

---

Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 173-190: Reset the instance-level flag self._has_agent_amendments
at the start of each run to avoid carrying over state between runs; inside the
method that initializes the run (the block that sets self.original_data_path,
calls self.storage_backend.initialize(RunInfo(...)) and then calls
self._process_eval_data), add a statement to set self._has_agent_amendments =
False before processing so amended-data persistence only occurs for changes made
during the current invocation.

---

Nitpick comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info duplicates config resolution;
replace its manual lookup with the production resolver resolve_agent_config: if
a default agent name exists (system_config.agents.default.agent) call
resolve_agent_config(system_config, <default_name>) to obtain the resolved agent
config and return its enabled and endpoint_type, and if resolve_agent_config
returns no config fall back to returning (False, "query"); keep function
signature get_agent_info(system_config: SystemConfig) and only change the body
to use resolve_agent_config.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a840a7b3-29ba-43df-8487-0867d78897c0

📥 Commits

Reviewing files that changed from the base of the PR and between f8bd187 and d7d84e8.

📒 Files selected for processing (6)

src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/pipeline/evaluation/driver.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
tests/integration/test_full_evaluation.py
tests/unit/pipeline/evaluation/test_driver.py
tests/unit/pipeline/evaluation/test_pipeline.py

✅ Files skipped from review due to trivial changes (1)

tests/unit/pipeline/evaluation/test_driver.py

🚧 Files skipped from review as they are similar to previous changes (3)

src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/pipeline/evaluation/driver.py
tests/unit/pipeline/evaluation/test_pipeline.py

Introduce AgentDriver/HttpApiDriver/AgentDriverRegistry to decouple agent execution from the evaluation pipeline. The driver abstraction enables future agent types (subprocess, MCP) without modifying the pipeline core. Key changes: - AgentDriver ABC with execute_turn/validate_config/close interface - HttpApiDriver wraps existing API client logic behind the driver API - AgentDriverRegistry factory creates drivers by type from config - Pipeline resolves per-conversation agent overrides via EvaluationData - AgentsConfig.enabled top-level kill switch passed at driver construction (not stored per-agent, since HttpApiAgentConfig uses extra="forbid") - Processor uses driver.enabled instead of checking config directly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

asamal4

Thank you !!
Note that we need to make caching independent of individual agent driver, as it will be applicable for all.. Perhaps a different change before we add agentic ols flow.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/integration/test_full_evaluation.py (1)

110-114: ⚡ Quick win

Use xfail instead of commenting out streaming scenarios.

Commented tests hide streaming coverage in CI. Marking known failures as xfail keeps visibility and detects accidental fixes.

Suggested refactor

         [
             ("query_config_path", "query"),
             ("agents_query_config_path", "query"),
-            # TODO: investigate streaming parse error "No final response found
-            #  in streaming output" — fails for both legacy api: and agents:
-            #  config formats, not related to agent driver changes
-            # ("streaming_config_path", "streaming"),
-            # ("agents_streaming_config_path", "streaming"),
+            pytest.param(
+                "streaming_config_path",
+                "streaming",
+                marks=pytest.mark.xfail(
+                    reason='Known issue: "No final response found in streaming output"'
+                ),
+            ),
+            pytest.param(
+                "agents_streaming_config_path",
+                "streaming",
+                marks=pytest.mark.xfail(
+                    reason='Known issue: "No final response found in streaming output"'
+                ),
+            ),
         ],

Also applies to: 228-231

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_full_evaluation.py` around lines 110 - 114, Replace
the commented-out streaming parameter entries with pytest xfail markers so CI
still runs them but records expected failures: locate the parametrize block that
currently contains the commented ("streaming_config_path", "streaming") and
("agents_streaming_config_path", "streaming") entries (also the similar block
around 228-231), and instead add pytest.param(...) entries wrapped with
pytest.mark.xfail(reason="streaming parse error: No final response found in
streaming output", strict=False) referencing the exact parameter names
streaming_config_path and agents_streaming_config_path so the test harness runs
these scenarios but marks them as expected failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 50-53: get_agent_info currently assumes
system_config.agents.default.agent exists in system_config.agents.agents and
will raise a raw KeyError; add a guard in get_agent_info to check that
default.agent is present in the agents mapping before indexing, and if missing
either raise a clear ValueError (or log an explicit error) that includes the
missing agent name and context (e.g., use system_config.agents.default.agent and
system_config.agents.enabled in the message), and only return
(system_config.agents.enabled, agent_def.endpoint_type) after safely retrieving
agent_def from system_config.agents.agents[name].

---

Nitpick comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 110-114: Replace the commented-out streaming parameter entries
with pytest xfail markers so CI still runs them but records expected failures:
locate the parametrize block that currently contains the commented
("streaming_config_path", "streaming") and ("agents_streaming_config_path",
"streaming") entries (also the similar block around 228-231), and instead add
pytest.param(...) entries wrapped with pytest.mark.xfail(reason="streaming parse
error: No final response found in streaming output", strict=False) referencing
the exact parameter names streaming_config_path and agents_streaming_config_path
so the test harness runs these scenarios but marks them as expected failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6308208c-1673-460c-a2fb-651f30cc51a5

📥 Commits

Reviewing files that changed from the base of the PR and between d7d84e8 and 5c11437.

📒 Files selected for processing (11)

src/lightspeed_evaluation/pipeline/evaluation/__init__.py
src/lightspeed_evaluation/pipeline/evaluation/driver.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/integration/system-config-agents-query.yaml
tests/integration/system-config-agents-streaming.yaml
tests/integration/test_full_evaluation.py
tests/unit/pipeline/evaluation/conftest.py
tests/unit/pipeline/evaluation/test_driver.py
tests/unit/pipeline/evaluation/test_pipeline.py
tests/unit/pipeline/evaluation/test_processor.py

🚧 Files skipped from review as they are similar to previous changes (8)

src/lightspeed_evaluation/pipeline/evaluation/init.py
tests/integration/system-config-agents-query.yaml
tests/integration/system-config-agents-streaming.yaml
tests/unit/pipeline/evaluation/test_pipeline.py
tests/unit/pipeline/evaluation/test_driver.py
src/lightspeed_evaluation/pipeline/evaluation/processor.py
src/lightspeed_evaluation/pipeline/evaluation/driver.py
tests/unit/pipeline/evaluation/test_processor.py

asamal4

Thank you.. LGTM..

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

rioloc force-pushed the feat/agent-driver-layer branch 2 times, most recently from f8bd187 to d7d84e8 Compare May 8, 2026 09:41

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

Comment thread src/lightspeed_evaluation/pipeline/evaluation/pipeline.py Outdated

rioloc force-pushed the feat/agent-driver-layer branch from d7d84e8 to 4c91ba8 Compare May 8, 2026 10:02

rioloc mentioned this pull request May 8, 2026

[WIP] feat: add ProposalDriver for CRD-based agent execution #232

Draft

7 tasks

rioloc force-pushed the feat/agent-driver-layer branch 4 times, most recently from c5bfc3f to 361d20f Compare May 13, 2026 14:25

rioloc force-pushed the feat/agent-driver-layer branch from 361d20f to de6e59f Compare May 13, 2026 14:41

asamal4 reviewed May 15, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/pipeline/evaluation/driver.py Outdated

Comment thread src/lightspeed_evaluation/pipeline/evaluation/processor.py Outdated

Comment thread src/lightspeed_evaluation/pipeline/evaluation/processor.py

feat: remove parse_config. rename skip_setup_cleanup param

5c11437

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread tests/integration/test_full_evaluation.py

asamal4 approved these changes May 20, 2026

View reviewed changes

asamal4 merged commit 8bc9d6c into lightspeed-core:main May 20, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add agent driver abstraction layer for evaluation pipeline#231

feat: add agent driver abstraction layer for evaluation pipeline#231
asamal4 merged 2 commits into
lightspeed-core:mainfrom
rioloc:feat/agent-driver-layer

rioloc commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

asamal4 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

asamal4 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rioloc commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Key design decisions

Changes

Production code

Tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asamal4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asamal4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rioloc commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading