fix: reframe peer card prompts as stable identity markers#686
Conversation
WalkthroughThis PR adds specialist-level peer-card write permissions, enforces structural peer-card entry rules (allowed prefixes, per-entry length cap, deduplication), disables peer-card writes for induction specialists, updates update_peer_card tooling/handler behavior and tests, and changes LLM backends to discard Anthropic thinking_effort and forward OpenAI thinking_budget_tokens via extra_body.reasoning. ChangesPeer-Card Access Control and Structural Validation
LLM Backend Thinking Parameter Handling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/llm/test_backends/test_openai.py (2)
156-175: ⚡ Quick winUse conftest fixtures for repeated mock setup in new tests.
Both new tests duplicate local client mock construction; please reuse shared fixtures from
tests/conftest.py.As per coding guidelines,
tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.Also applies to: 194-213
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_openai.py` around lines 156 - 175, Replace the duplicated local Mock/AsyncMock setup in the test (the `client` variable and its `chat.completions.create` AsyncMock) with the shared pytest fixture defined in tests/conftest.py (use the existing OpenAI/client mock fixture name from conftest), and update the other duplicated block (lines around the second occurrence) to also consume that fixture instead of constructing the mock locally; ensure the test function signature accepts the fixture and the test uses the fixture's mock to set return_value/side_effect as needed.
155-227: ⚡ Quick winAdd streaming coverage for
thinking_budget_tokens.
stream()was updated in parallel withcomplete(), but these new tests only assert complete-path behavior. Add stream tests for both positive and zero budgets.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_openai.py` around lines 155 - 227, Add streaming-path tests mirroring the existing complete-path tests: create two new async tests (e.g., test_openai_backend_passes_thinking_budget_via_extra_body_stream and test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that instantiate OpenAIBackend with a mocked client and call backend.stream(...) with thinking_budget_tokens set to a positive value and zero respectively; mock client.chat.completions.stream to return an async iterator or AsyncMock and after awaiting the stream initiation inspect client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning": {"max_tokens": N}} and when zero the "extra_body" key is absent, following the same structure and assertions used in the complete tests.tests/llm/test_backends/test_anthropic.py (2)
126-154: ⚡ Quick winCover the
stream()path for ignoredthinking_effort.This new test validates
complete(), but the backend change also affectsstream(). Add a matching stream assertion to prevent path drift.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_anthropic.py` around lines 126 - 154, The test only asserts that AnthropicBackend.complete() doesn't pass thinking_effort to the client but misses the stream path; update test_anthropic_backend_ignores_thinking_effort to also invoke backend.stream(...) (matching the same args as complete) and capture the mock call for the streaming client path (e.g., client.messages.stream or client.stream depending on the mock setup used by AnthropicBackend) then assert that the stream call's kwargs do not include "thinking" or "reasoning_effort"; ensure you reference AnthropicBackend.stream and reuse the same mock client used for complete so both paths are covered.
127-139: ⚡ Quick winUse shared pytest fixtures for backend/client setup.
The new test builds local mock setup inline; please switch to existing
tests/conftest.pyfixtures for consistency and teardown hygiene.As per coding guidelines,
tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_anthropic.py` around lines 127 - 139, Replace the inline Mock/AsyncMock setup in tests/llm/test_backends/test_anthropic.py (the local client and client.messages.create AsyncMock returning a SimpleNamespace with TextBlock/usage/stop_reason) with the shared pytest fixture from tests/conftest.py that provides a mocked Anthropic client; update the test function signature to accept that fixture, configure the fixture's messages.create return_value or side_effect as needed, and remove the local Mock/AsyncMock assignment and any manual teardown so the shared fixture handles setup/teardown.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1483-1486: Replace the sensitive content logging in the
logger.info call that currently emits the peer-card snippet (logger.info(...,
line[:80])) so it no longer logs user content; instead log non-sensitive
attributes like the rejection reason, the line length (len(line)), and whether
the prefix matched/was allowed (e.g., prefix or prefix_allowed flag). Locate the
logger.info usage around variable line in agent_tools.py (the current call using
"%r" and line[:80]) and change the message to include only those structural
details (reason, length, prefix status) while removing any direct or sliced
content of line.
---
Nitpick comments:
In `@tests/llm/test_backends/test_anthropic.py`:
- Around line 126-154: The test only asserts that AnthropicBackend.complete()
doesn't pass thinking_effort to the client but misses the stream path; update
test_anthropic_backend_ignores_thinking_effort to also invoke
backend.stream(...) (matching the same args as complete) and capture the mock
call for the streaming client path (e.g., client.messages.stream or
client.stream depending on the mock setup used by AnthropicBackend) then assert
that the stream call's kwargs do not include "thinking" or "reasoning_effort";
ensure you reference AnthropicBackend.stream and reuse the same mock client used
for complete so both paths are covered.
- Around line 127-139: Replace the inline Mock/AsyncMock setup in
tests/llm/test_backends/test_anthropic.py (the local client and
client.messages.create AsyncMock returning a SimpleNamespace with
TextBlock/usage/stop_reason) with the shared pytest fixture from
tests/conftest.py that provides a mocked Anthropic client; update the test
function signature to accept that fixture, configure the fixture's
messages.create return_value or side_effect as needed, and remove the local
Mock/AsyncMock assignment and any manual teardown so the shared fixture handles
setup/teardown.
In `@tests/llm/test_backends/test_openai.py`:
- Around line 156-175: Replace the duplicated local Mock/AsyncMock setup in the
test (the `client` variable and its `chat.completions.create` AsyncMock) with
the shared pytest fixture defined in tests/conftest.py (use the existing
OpenAI/client mock fixture name from conftest), and update the other duplicated
block (lines around the second occurrence) to also consume that fixture instead
of constructing the mock locally; ensure the test function signature accepts the
fixture and the test uses the fixture's mock to set return_value/side_effect as
needed.
- Around line 155-227: Add streaming-path tests mirroring the existing
complete-path tests: create two new async tests (e.g.,
test_openai_backend_passes_thinking_budget_via_extra_body_stream and
test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that
instantiate OpenAIBackend with a mocked client and call backend.stream(...) with
thinking_budget_tokens set to a positive value and zero respectively; mock
client.chat.completions.stream to return an async iterator or AsyncMock and
after awaiting the stream initiation inspect
client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that
when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning":
{"max_tokens": N}} and when zero the "extra_body" key is absent, following the
same structure and assertions used in the complete tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 818c7c72-91b7-41f6-96a9-3da08528acfa
📒 Files selected for processing (8)
src/dreamer/specialists.pysrc/llm/backends/anthropic.pysrc/llm/backends/openai.pysrc/utils/agent_tools.pytests/dreamer/test_model_config_usage.pytests/llm/test_backends/test_anthropic.pytests/llm/test_backends/test_openai.pytests/utils/test_agent_tools.py
| logger.info( | ||
| "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r", | ||
| line[:80], | ||
| ) |
There was a problem hiding this comment.
Avoid logging peer-card content snippets.
Line 1483 logs rejected peer-card text content (%r, line[:80]), which can leak sensitive identity data into logs. Log structural reason/length/prefix instead of user content.
Proposed fix
- logger.info(
- "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
- line[:80],
- )
+ logger.info(
+ "Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d",
+ ctx.workspace_name,
+ ctx.observer,
+ ctx.observed,
+ len(line),
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.info( | |
| "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r", | |
| line[:80], | |
| ) | |
| logger.info( | |
| "Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d", | |
| ctx.workspace_name, | |
| ctx.observer, | |
| ctx.observed, | |
| len(line), | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/agent_tools.py` around lines 1483 - 1486, Replace the sensitive
content logging in the logger.info call that currently emits the peer-card
snippet (logger.info(..., line[:80])) so it no longer logs user content; instead
log non-sensitive attributes like the rejection reason, the line length
(len(line)), and whether the prefix matched/was allowed (e.g., prefix or
prefix_allowed flag). Locate the logger.info usage around variable line in
agent_tools.py (the current call using "%r" and line[:80]) and change the
message to include only those structural details (reason, length, prefix status)
while removing any direct or sliced content of line.
Rajat-Ahuja1997
left a comment
There was a problem hiding this comment.
generally looks good but i'm concerned about valid information in an existing peer card being dropped because it doesn't have one of the required prefixes. should we add a section to the system prompt or deduction user prompt that tells the model to re-emit existing entries with a correct prefix? like name: Rajat to IDENTITY: Name: Rajat?
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1592-1594: The rejection-ratio total is computed using accepted
after max-facts truncation (normalized_peer_card), causing undercounting;
capture the pre-truncation accepted count (e.g., store
accepted_before_truncation = len(peer_card) or len(original_peer_card) before
you call normalization) and use that to compute total (total =
accepted_before_truncation + rejected_count) when building success_content and
calling _format_rejection_feedback so the "rejected X of Y" message reflects
submitted entries rather than the truncated set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ef10b2b6-efa1-465f-8ad6-d83682b1fa98
📒 Files selected for processing (3)
src/dreamer/specialists.pysrc/utils/agent_tools.pytests/utils/test_agent_tools.py
| accepted = len(normalized_peer_card) | ||
| total = accepted + rejected_count | ||
| success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}" |
There was a problem hiding this comment.
Use pre-truncation accepted count for rejection ratio.
total = accepted + rejected_count uses accepted after max-facts truncation, so the “rejected X of Y” message can undercount total submitted entries.
Suggested fix
- accepted = len(normalized_peer_card)
- total = accepted + rejected_count
- success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}"
+ accepted_before_cap = len(seen)
+ total = accepted_before_cap + rejected_count
+ success_content = (
+ f"{success_content} "
+ f"{_format_rejection_feedback(f'{rejected_count} of {total}')}"
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/agent_tools.py` around lines 1592 - 1594, The rejection-ratio total
is computed using accepted after max-facts truncation (normalized_peer_card),
causing undercounting; capture the pre-truncation accepted count (e.g., store
accepted_before_truncation = len(peer_card) or len(original_peer_card) before
you call normalization) and use that to compute total (total =
accepted_before_truncation + rejected_count) when building success_content and
calling _format_rejection_feedback so the "rejected X of Y" message reflects
submitted entries rather than the truncated set.
Summary by CodeRabbit
Improvements
Bug Fixes