Skip to content

fix: reframe peer card prompts as stable identity markers#686

Merged
VVoruganti merged 4 commits into
mainfrom
vineeth/dev-1737-peer-card-identity
May 21, 2026
Merged

fix: reframe peer card prompts as stable identity markers#686
VVoruganti merged 4 commits into
mainfrom
vineeth/dev-1737-peer-card-identity

Conversation

@VVoruganti
Copy link
Copy Markdown
Collaborator

@VVoruganti VVoruganti commented May 14, 2026

Summary by CodeRabbit

  • Improvements

    • Stricter peer-card rules and validation: allowed marker types, per-entry length cap, normalized deduplication, and clearer rejection feedback when entries are invalid.
    • Role-based peer-card behavior: one specialist type can no longer write to the peer card; another uses tightened identity-marker framing when writing.
    • OpenAI backend now accepts a reasoning token budget and forwards it to providers.
  • Bug Fixes

    • Anthropic backend no longer errors on an unsupported thinking parameter (it is ignored).

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Walkthrough

This PR adds specialist-level peer-card write permissions, enforces structural peer-card entry rules (allowed prefixes, per-entry length cap, deduplication), disables peer-card writes for induction specialists, updates update_peer_card tooling/handler behavior and tests, and changes LLM backends to discard Anthropic thinking_effort and forward OpenAI thinking_budget_tokens via extra_body.reasoning.

Changes

Peer-Card Access Control and Structural Validation

Layer / File(s) Summary
Permission gate infrastructure
src/dreamer/specialists.py
BaseSpecialist adds can_update_peer_card and updates peer-card instruction text; BaseSpecialist.run() gates peer-card fetching/injection on both the permission flag and peer_card.create config.
Deduction specialist peer-card rules
src/dreamer/specialists.py
DeductionSpecialist updates peer_card_update_instruction and rewrites the PEER CARD system prompt to enumerate allowed prefixes and strict stability/evidence/type rules.
Induction specialist peer-card disabling
src/dreamer/specialists.py
InductionSpecialist sets can_update_peer_card = False, removes peer-card interpolation from system/user prompts, and ensures induction tools never include update_peer_card.
Structural validation infrastructure
src/utils/agent_tools.py
Adds PEER_CARD_ALLOWED_PREFIXES, MAX_PEER_CARD_ENTRY_LENGTH = 200, and _validate_peer_card_entry to enforce prefix, non-empty body, and length cap.
Update handler and agent tool wiring
src/utils/agent_tools.py
Tightens update_peer_card description/schema, removes it from INDUCTION_SPECIALIST_TOOLS, validates and normalizes input lines, deduplicates case-insensitively, samples/logs rejections, returns explicit rejection when all entries fail, and includes rejected_count in metadata.
Peer-card behavior test coverage
tests/dreamer/test_model_config_usage.py, tests/utils/test_agent_tools.py
Tests added/updated to assert deduction prompt framing and conditional tool exposure, induction prompt/tool absence and can_update_peer_card = False, and extensive validation/deduplication/length-cap behaviour and rejection metadata.

LLM Backend Thinking Parameter Handling

Layer / File(s) Summary
Anthropic thinking_effort handling
src/llm/backends/anthropic.py, tests/llm/test_backends/test_anthropic.py
AnthropicBackend.complete and .stream now discard thinking_effort instead of raising; tests assert the client request omits thinking/reasoning_effort.
OpenAI thinking_budget_tokens support
src/llm/backends/openai.py, tests/llm/test_backends/test_openai.py
OpenAIBackend.complete and .stream accept thinking_budget_tokens; _build_params now accepts it and sets extra_body.reasoning.max_tokens when budget > 0. Tests verify presence/absence of extra_body based on budget.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit hops through peer-card gates,
It checks the prefixes, counts the weights.
Induction waits with quivering paws,
Deduction writes the durable laws.
Thinking budgets flow—discard or pass—🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: reframe peer card prompts as stable identity markers' directly and clearly describes the main change—refocusing peer card prompting around stable identity markers—which is the central theme across multiple files in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vineeth/dev-1737-peer-card-identity

Comment @coderabbitai help to get the list of available commands and usage tips.

@VVoruganti VVoruganti marked this pull request as ready for review May 21, 2026 14:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/llm/test_backends/test_openai.py (2)

156-175: ⚡ Quick win

Use conftest fixtures for repeated mock setup in new tests.

Both new tests duplicate local client mock construction; please reuse shared fixtures from tests/conftest.py.

As per coding guidelines, tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.

Also applies to: 194-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 156 - 175, Replace the
duplicated local Mock/AsyncMock setup in the test (the `client` variable and its
`chat.completions.create` AsyncMock) with the shared pytest fixture defined in
tests/conftest.py (use the existing OpenAI/client mock fixture name from
conftest), and update the other duplicated block (lines around the second
occurrence) to also consume that fixture instead of constructing the mock
locally; ensure the test function signature accepts the fixture and the test
uses the fixture's mock to set return_value/side_effect as needed.

155-227: ⚡ Quick win

Add streaming coverage for thinking_budget_tokens.

stream() was updated in parallel with complete(), but these new tests only assert complete-path behavior. Add stream tests for both positive and zero budgets.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 155 - 227, Add
streaming-path tests mirroring the existing complete-path tests: create two new
async tests (e.g.,
test_openai_backend_passes_thinking_budget_via_extra_body_stream and
test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that
instantiate OpenAIBackend with a mocked client and call backend.stream(...) with
thinking_budget_tokens set to a positive value and zero respectively; mock
client.chat.completions.stream to return an async iterator or AsyncMock and
after awaiting the stream initiation inspect
client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that
when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning":
{"max_tokens": N}} and when zero the "extra_body" key is absent, following the
same structure and assertions used in the complete tests.
tests/llm/test_backends/test_anthropic.py (2)

126-154: ⚡ Quick win

Cover the stream() path for ignored thinking_effort.

This new test validates complete(), but the backend change also affects stream(). Add a matching stream assertion to prevent path drift.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_anthropic.py` around lines 126 - 154, The test
only asserts that AnthropicBackend.complete() doesn't pass thinking_effort to
the client but misses the stream path; update
test_anthropic_backend_ignores_thinking_effort to also invoke
backend.stream(...) (matching the same args as complete) and capture the mock
call for the streaming client path (e.g., client.messages.stream or
client.stream depending on the mock setup used by AnthropicBackend) then assert
that the stream call's kwargs do not include "thinking" or "reasoning_effort";
ensure you reference AnthropicBackend.stream and reuse the same mock client used
for complete so both paths are covered.

127-139: ⚡ Quick win

Use shared pytest fixtures for backend/client setup.

The new test builds local mock setup inline; please switch to existing tests/conftest.py fixtures for consistency and teardown hygiene.

As per coding guidelines, tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_anthropic.py` around lines 127 - 139, Replace
the inline Mock/AsyncMock setup in tests/llm/test_backends/test_anthropic.py
(the local client and client.messages.create AsyncMock returning a
SimpleNamespace with TextBlock/usage/stop_reason) with the shared pytest fixture
from tests/conftest.py that provides a mocked Anthropic client; update the test
function signature to accept that fixture, configure the fixture's
messages.create return_value or side_effect as needed, and remove the local
Mock/AsyncMock assignment and any manual teardown so the shared fixture handles
setup/teardown.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1483-1486: Replace the sensitive content logging in the
logger.info call that currently emits the peer-card snippet (logger.info(...,
line[:80])) so it no longer logs user content; instead log non-sensitive
attributes like the rejection reason, the line length (len(line)), and whether
the prefix matched/was allowed (e.g., prefix or prefix_allowed flag). Locate the
logger.info usage around variable line in agent_tools.py (the current call using
"%r" and line[:80]) and change the message to include only those structural
details (reason, length, prefix status) while removing any direct or sliced
content of line.

---

Nitpick comments:
In `@tests/llm/test_backends/test_anthropic.py`:
- Around line 126-154: The test only asserts that AnthropicBackend.complete()
doesn't pass thinking_effort to the client but misses the stream path; update
test_anthropic_backend_ignores_thinking_effort to also invoke
backend.stream(...) (matching the same args as complete) and capture the mock
call for the streaming client path (e.g., client.messages.stream or
client.stream depending on the mock setup used by AnthropicBackend) then assert
that the stream call's kwargs do not include "thinking" or "reasoning_effort";
ensure you reference AnthropicBackend.stream and reuse the same mock client used
for complete so both paths are covered.
- Around line 127-139: Replace the inline Mock/AsyncMock setup in
tests/llm/test_backends/test_anthropic.py (the local client and
client.messages.create AsyncMock returning a SimpleNamespace with
TextBlock/usage/stop_reason) with the shared pytest fixture from
tests/conftest.py that provides a mocked Anthropic client; update the test
function signature to accept that fixture, configure the fixture's
messages.create return_value or side_effect as needed, and remove the local
Mock/AsyncMock assignment and any manual teardown so the shared fixture handles
setup/teardown.

In `@tests/llm/test_backends/test_openai.py`:
- Around line 156-175: Replace the duplicated local Mock/AsyncMock setup in the
test (the `client` variable and its `chat.completions.create` AsyncMock) with
the shared pytest fixture defined in tests/conftest.py (use the existing
OpenAI/client mock fixture name from conftest), and update the other duplicated
block (lines around the second occurrence) to also consume that fixture instead
of constructing the mock locally; ensure the test function signature accepts the
fixture and the test uses the fixture's mock to set return_value/side_effect as
needed.
- Around line 155-227: Add streaming-path tests mirroring the existing
complete-path tests: create two new async tests (e.g.,
test_openai_backend_passes_thinking_budget_via_extra_body_stream and
test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that
instantiate OpenAIBackend with a mocked client and call backend.stream(...) with
thinking_budget_tokens set to a positive value and zero respectively; mock
client.chat.completions.stream to return an async iterator or AsyncMock and
after awaiting the stream initiation inspect
client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that
when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning":
{"max_tokens": N}} and when zero the "extra_body" key is absent, following the
same structure and assertions used in the complete tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 818c7c72-91b7-41f6-96a9-3da08528acfa

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 39951d8.

📒 Files selected for processing (8)
  • src/dreamer/specialists.py
  • src/llm/backends/anthropic.py
  • src/llm/backends/openai.py
  • src/utils/agent_tools.py
  • tests/dreamer/test_model_config_usage.py
  • tests/llm/test_backends/test_anthropic.py
  • tests/llm/test_backends/test_openai.py
  • tests/utils/test_agent_tools.py

Comment thread src/utils/agent_tools.py
Comment on lines +1483 to +1486
logger.info(
"Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
line[:80],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging peer-card content snippets.

Line 1483 logs rejected peer-card text content (%r, line[:80]), which can leak sensitive identity data into logs. Log structural reason/length/prefix instead of user content.

Proposed fix
-            logger.info(
-                "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
-                line[:80],
-            )
+            logger.info(
+                "Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d",
+                ctx.workspace_name,
+                ctx.observer,
+                ctx.observed,
+                len(line),
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(
"Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
line[:80],
)
logger.info(
"Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d",
ctx.workspace_name,
ctx.observer,
ctx.observed,
len(line),
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/agent_tools.py` around lines 1483 - 1486, Replace the sensitive
content logging in the logger.info call that currently emits the peer-card
snippet (logger.info(..., line[:80])) so it no longer logs user content; instead
log non-sensitive attributes like the rejection reason, the line length
(len(line)), and whether the prefix matched/was allowed (e.g., prefix or
prefix_allowed flag). Locate the logger.info usage around variable line in
agent_tools.py (the current call using "%r" and line[:80]) and change the
message to include only those structural details (reason, length, prefix status)
while removing any direct or sliced content of line.

Copy link
Copy Markdown
Collaborator

@Rajat-Ahuja1997 Rajat-Ahuja1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good but i'm concerned about valid information in an existing peer card being dropped because it doesn't have one of the required prefixes. should we add a section to the system prompt or deduction user prompt that tells the model to re-emit existing entries with a correct prefix? like name: Rajat to IDENTITY: Name: Rajat?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1592-1594: The rejection-ratio total is computed using accepted
after max-facts truncation (normalized_peer_card), causing undercounting;
capture the pre-truncation accepted count (e.g., store
accepted_before_truncation = len(peer_card) or len(original_peer_card) before
you call normalization) and use that to compute total (total =
accepted_before_truncation + rejected_count) when building success_content and
calling _format_rejection_feedback so the "rejected X of Y" message reflects
submitted entries rather than the truncated set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef10b2b6-efa1-465f-8ad6-d83682b1fa98

📥 Commits

Reviewing files that changed from the base of the PR and between 39951d8 and 004f417.

📒 Files selected for processing (3)
  • src/dreamer/specialists.py
  • src/utils/agent_tools.py
  • tests/utils/test_agent_tools.py

Comment thread src/utils/agent_tools.py
Comment on lines +1592 to +1594
accepted = len(normalized_peer_card)
total = accepted + rejected_count
success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use pre-truncation accepted count for rejection ratio.

total = accepted + rejected_count uses accepted after max-facts truncation, so the “rejected X of Y” message can undercount total submitted entries.

Suggested fix
-        accepted = len(normalized_peer_card)
-        total = accepted + rejected_count
-        success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}"
+        accepted_before_cap = len(seen)
+        total = accepted_before_cap + rejected_count
+        success_content = (
+            f"{success_content} "
+            f"{_format_rejection_feedback(f'{rejected_count} of {total}')}"
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/agent_tools.py` around lines 1592 - 1594, The rejection-ratio total
is computed using accepted after max-facts truncation (normalized_peer_card),
causing undercounting; capture the pre-truncation accepted count (e.g., store
accepted_before_truncation = len(peer_card) or len(original_peer_card) before
you call normalization) and use that to compute total (total =
accepted_before_truncation + rejected_count) when building success_content and
calling _format_rejection_feedback so the "rejected X of Y" message reflects
submitted entries rather than the truncated set.

@VVoruganti VVoruganti merged commit 4f579d5 into main May 21, 2026
10 checks passed
@VVoruganti VVoruganti deleted the vineeth/dev-1737-peer-card-identity branch May 21, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants