Skip to content

Channels Refactor Phase 3 | Refactor Evaluations Channel#3089

Draft
SmittieC wants to merge 2 commits intomainfrom
cs/evals_channel_refactor
Draft

Channels Refactor Phase 3 | Refactor Evaluations Channel#3089
SmittieC wants to merge 2 commits intomainfrom
cs/evals_channel_refactor

Conversation

@SmittieC
Copy link
Copy Markdown
Contributor

@SmittieC SmittieC commented Mar 25, 2026

Product Description

No change

Technical Description

Phase 3 of the channels refactor: migrates EvaluationChannel from the old ChannelBase in apps/chat/channels.py to the new stage-based pipeline architecture in apps/channels/channels_v2/.

Key changes:

  • New EvaluationChannel (apps/channels/channels_v2/evaluation_channel.py): Implements the evaluation channel using the new pipeline architecture. Uses NoOpSender (no messages sent externally), TracingService.empty() (no OCS tracer), and passes participant_data via ctx.channel_context.

  • New EvalsBotInteractionStage (apps/channels/channels_v2/stages/core.py): Specialized bot interaction stage for evaluations that uses EvalsBot instead of get_bot(). Reads participant_data from ctx.channel_context rather than the DB-backed ParticipantData model.

  • Pipeline composition: The evaluation pipeline omits SessionResolutionStage (session is always pre-set), ConsentFlowStage, and all sending stages (ResponseSendingStage, SendingErrorHandlerStage). Uses EvalsBotInteractionStage instead of BotInteractionStage.

  • Old implementation removed: Deleted EvaluationChannel class from apps/chat/channels.py and its tests from apps/channels/tests/test_evaluation_channel.py.

  • Import updates: Updated apps/channels/tasks.py to import from the new location.

Additional fixes from code review (commit 1):

  • Fixed /reset not ending the old session when ctx.experiment_session is None (channels that don't pre-set sessions like API/Telegram)
  • Extracted shared RESET_COMMAND constant
  • Added null guard for ctx.message in ResponseFormattingStage
  • Fixed operator precedence in voice response behaviour check (RECIPROCAL and user_sent_voice now correctly grouped with parentheses)
  • Replaced fragile string assertion in web channel test

Demo

N/A — internal refactor with no user-facing changes.

Docs and Changelog

  • This PR requires docs/changelog update

Add EvaluationChannel and EvalsBotInteractionStage to channels_v2,
update tasks.py imports, remove old implementation from apps/chat/channels.py,
and replace old tests with new pipeline-based tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@SmittieC
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

The PR migrates EvaluationChannel from apps/chat/channels.py to apps/channels/channels_v2/evaluation_channel.py, refactoring it to integrate with the new pipeline architecture. It adds a supports_multimedia class variable to ChannelBase, extends SessionResolutionStage to handle resets without preloaded sessions by searching and ending existing sessions, introduces EvalsBotInteractionStage for evaluation-specific bot interaction, tightens type annotations in MessageProcessingContext, consolidates the /reset command as a module constant, and updates imports in tasks.py. The old implementation is removed, and test coverage is reorganized with updated assertions across multiple test files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Channels Refactor Phase 3 | Refactor Evaluations Channel' directly summarizes the main change: migrating the EvaluationChannel implementation from the old architecture (apps/chat/channels.py) to the new pipeline-based architecture (apps/channels/channels_v2/evaluation_channel.py).
Description check ✅ Passed PR description covers all required template sections but Technical Description lacks key architectural rationale and design decisions for reviewer context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/channels/channels_v2/evaluation_channel.py`:
- Around line 57-60: The current _create_context implementation replaces
ctx.channel_context with a new dict, which can drop keys added by the base
class; instead call super()._create_context(message) to get ctx and then update
or set the participant data into the existing ctx.channel_context (e.g., ensure
ctx.channel_context is a dict and use ctx.channel_context.update({...}) or
ctx.channel_context.setdefault("participant_data", ...)) so you mutate the
existing MessageProcessingContext.channel_context rather than overwriting it.

In `@apps/channels/channels_v2/stages/core.py`:
- Around line 106-113: The fallback lookup uses .first() which can return an
older open session; change the query on ExperimentSession to pick the newest
non-complete session (e.g., replace .first() with
.order_by("-created_at").first() or .latest("created_at") to match the "normal
resolution" logic), and add a regression test that creates two open
ExperimentSession rows for the same participant and verifies /reset closes the
newest session only; update any test helper that constructs sessions to ensure
timestamps differ so the ordering is meaningful.
- Around line 447-450: The call to ctx.bot.process_input currently omits the
persisted human message, breaking the linkage used by EvalsBot/PipelineBot;
update the call in core.py where ctx.bot_response is set to pass the saved human
message (e.g., human_message=ctx.human_message) into
EvalsBot.process_input()/PipelineBot.process_input(), and add a regression test
in test_evals_bot_interaction.py that reproduces the detached-history failure
(write a failing test that asserts the eval run has the same ChatMessage
linkage) then make sure the test passes after the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5abe468a-14c2-42ec-a26e-ae13259c542c

📥 Commits

Reviewing files that changed from the base of the PR and between 4119068 and cda1c5a.

📒 Files selected for processing (13)
  • apps/channels/channels_v2/channel_base.py
  • apps/channels/channels_v2/evaluation_channel.py
  • apps/channels/channels_v2/pipeline.py
  • apps/channels/channels_v2/stages/core.py
  • apps/channels/channels_v2/stages/terminal.py
  • apps/channels/tasks.py
  • apps/channels/tests/channels/concrete/test_evaluation_channel.py
  • apps/channels/tests/channels/concrete/test_web_channel.py
  • apps/channels/tests/channels/stages/test_evals_bot_interaction.py
  • apps/channels/tests/channels/stages/test_session_resolution.py
  • apps/channels/tests/test_evaluation_channel.py
  • apps/chat/channels.py
  • docs/plans/channels_refactor.md
💤 Files with no reviewable changes (1)
  • apps/channels/tests/test_evaluation_channel.py

Comment thread apps/channels/channels_v2/evaluation_channel.py
Comment thread apps/channels/channels_v2/stages/core.py
Comment thread apps/channels/channels_v2/stages/core.py
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Mutate ctx.channel_context instead of replacing it wholesale in EvaluationChannel
- Add .order_by("-created_at") to reset fallback session lookup for consistency
- Forward ctx.human_message into EvalsBot.process_input() to preserve pipeline state linkage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Base automatically changed from cs/refactor_web_channel to main April 21, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants