fix: reframe peer card prompts as stable identity markers by VVoruganti · Pull Request #686 · plastic-labs/honcho

VVoruganti · 2026-05-14T22:15:42Z

Summary by CodeRabbit

Improvements
- Stricter peer-card rules and validation: allowed marker types, per-entry length cap, normalized deduplication, and clearer rejection feedback when entries are invalid.
- Role-based peer-card behavior: one specialist type can no longer write to the peer card; another uses tightened identity-marker framing when writing.
- OpenAI backend now accepts a reasoning token budget and forwards it to providers.
Bug Fixes
- Anthropic backend no longer errors on an unsupported thinking parameter (it is ignored).

coderabbitai · 2026-05-14T22:15:48Z

Walkthrough

This PR adds specialist-level peer-card write permissions, enforces structural peer-card entry rules (allowed prefixes, per-entry length cap, deduplication), disables peer-card writes for induction specialists, updates update_peer_card tooling/handler behavior and tests, and changes LLM backends to discard Anthropic thinking_effort and forward OpenAI thinking_budget_tokens via extra_body.reasoning.

Changes

Peer-Card Access Control and Structural Validation

Layer / File(s)	Summary
Permission gate infrastructure `src/dreamer/specialists.py`	`BaseSpecialist` adds `can_update_peer_card` and updates peer-card instruction text; `BaseSpecialist.run()` gates peer-card fetching/injection on both the permission flag and `peer_card.create` config.
Deduction specialist peer-card rules `src/dreamer/specialists.py`	`DeductionSpecialist` updates `peer_card_update_instruction` and rewrites the PEER CARD system prompt to enumerate allowed prefixes and strict stability/evidence/type rules.
Induction specialist peer-card disabling `src/dreamer/specialists.py`	`InductionSpecialist` sets `can_update_peer_card = False`, removes peer-card interpolation from system/user prompts, and ensures induction tools never include `update_peer_card`.
Structural validation infrastructure `src/utils/agent_tools.py`	Adds `PEER_CARD_ALLOWED_PREFIXES`, `MAX_PEER_CARD_ENTRY_LENGTH = 200`, and `_validate_peer_card_entry` to enforce prefix, non-empty body, and length cap.
Update handler and agent tool wiring `src/utils/agent_tools.py`	Tightens `update_peer_card` description/schema, removes it from `INDUCTION_SPECIALIST_TOOLS`, validates and normalizes input lines, deduplicates case-insensitively, samples/logs rejections, returns explicit rejection when all entries fail, and includes `rejected_count` in metadata.
Peer-card behavior test coverage `tests/dreamer/test_model_config_usage.py`, `tests/utils/test_agent_tools.py`	Tests added/updated to assert deduction prompt framing and conditional tool exposure, induction prompt/tool absence and `can_update_peer_card = False`, and extensive validation/deduplication/length-cap behaviour and rejection metadata.

LLM Backend Thinking Parameter Handling

Layer / File(s)	Summary
Anthropic thinking_effort handling `src/llm/backends/anthropic.py`, `tests/llm/test_backends/test_anthropic.py`	`AnthropicBackend.complete` and `.stream` now discard `thinking_effort` instead of raising; tests assert the client request omits `thinking`/`reasoning_effort`.
OpenAI thinking_budget_tokens support `src/llm/backends/openai.py`, `tests/llm/test_backends/test_openai.py`	`OpenAIBackend.complete` and `.stream` accept `thinking_budget_tokens`; `_build_params` now accepts it and sets `extra_body.reasoning.max_tokens` when budget > 0. Tests verify presence/absence of `extra_body` based on budget.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit hops through peer-card gates,
It checks the prefixes, counts the weights.
Induction waits with quivering paws,
Deduction writes the durable laws.
Thinking budgets flow—discard or pass—🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: reframe peer card prompts as stable identity markers' directly and clearly describes the main change—refocusing peer card prompting around stable identity markers—which is the central theme across multiple files in this PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch vineeth/dev-1737-peer-card-identity

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… openai transports

…-identity

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

tests/llm/test_backends/test_openai.py (2)

156-175: ⚡ Quick win

Use conftest fixtures for repeated mock setup in new tests.

Both new tests duplicate local client mock construction; please reuse shared fixtures from tests/conftest.py.

As per coding guidelines, tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.

Also applies to: 194-213

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 156 - 175, Replace the
duplicated local Mock/AsyncMock setup in the test (the `client` variable and its
`chat.completions.create` AsyncMock) with the shared pytest fixture defined in
tests/conftest.py (use the existing OpenAI/client mock fixture name from
conftest), and update the other duplicated block (lines around the second
occurrence) to also consume that fixture instead of constructing the mock
locally; ensure the test function signature accepts the fixture and the test
uses the fixture's mock to set return_value/side_effect as needed.

155-227: ⚡ Quick win

Add streaming coverage for thinking_budget_tokens.

stream() was updated in parallel with complete(), but these new tests only assert complete-path behavior. Add stream tests for both positive and zero budgets.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 155 - 227, Add
streaming-path tests mirroring the existing complete-path tests: create two new
async tests (e.g.,
test_openai_backend_passes_thinking_budget_via_extra_body_stream and
test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that
instantiate OpenAIBackend with a mocked client and call backend.stream(...) with
thinking_budget_tokens set to a positive value and zero respectively; mock
client.chat.completions.stream to return an async iterator or AsyncMock and
after awaiting the stream initiation inspect
client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that
when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning":
{"max_tokens": N}} and when zero the "extra_body" key is absent, following the
same structure and assertions used in the complete tests.

tests/llm/test_backends/test_anthropic.py (2)

126-154: ⚡ Quick win

Cover the stream() path for ignored thinking_effort.

This new test validates complete(), but the backend change also affects stream(). Add a matching stream assertion to prevent path drift.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_anthropic.py` around lines 126 - 154, The test
only asserts that AnthropicBackend.complete() doesn't pass thinking_effort to
the client but misses the stream path; update
test_anthropic_backend_ignores_thinking_effort to also invoke
backend.stream(...) (matching the same args as complete) and capture the mock
call for the streaming client path (e.g., client.messages.stream or
client.stream depending on the mock setup used by AnthropicBackend) then assert
that the stream call's kwargs do not include "thinking" or "reasoning_effort";
ensure you reference AnthropicBackend.stream and reuse the same mock client used
for complete so both paths are covered.

127-139: ⚡ Quick win

Use shared pytest fixtures for backend/client setup.

The new test builds local mock setup inline; please switch to existing tests/conftest.py fixtures for consistency and teardown hygiene.

As per coding guidelines, tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_anthropic.py` around lines 127 - 139, Replace
the inline Mock/AsyncMock setup in tests/llm/test_backends/test_anthropic.py
(the local client and client.messages.create AsyncMock returning a
SimpleNamespace with TextBlock/usage/stop_reason) with the shared pytest fixture
from tests/conftest.py that provides a mocked Anthropic client; update the test
function signature to accept that fixture, configure the fixture's
messages.create return_value or side_effect as needed, and remove the local
Mock/AsyncMock assignment and any manual teardown so the shared fixture handles
setup/teardown.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1483-1486: Replace the sensitive content logging in the
logger.info call that currently emits the peer-card snippet (logger.info(...,
line[:80])) so it no longer logs user content; instead log non-sensitive
attributes like the rejection reason, the line length (len(line)), and whether
the prefix matched/was allowed (e.g., prefix or prefix_allowed flag). Locate the
logger.info usage around variable line in agent_tools.py (the current call using
"%r" and line[:80]) and change the message to include only those structural
details (reason, length, prefix status) while removing any direct or sliced
content of line.

---

Nitpick comments:
In `@tests/llm/test_backends/test_anthropic.py`:
- Around line 126-154: The test only asserts that AnthropicBackend.complete()
doesn't pass thinking_effort to the client but misses the stream path; update
test_anthropic_backend_ignores_thinking_effort to also invoke
backend.stream(...) (matching the same args as complete) and capture the mock
call for the streaming client path (e.g., client.messages.stream or
client.stream depending on the mock setup used by AnthropicBackend) then assert
that the stream call's kwargs do not include "thinking" or "reasoning_effort";
ensure you reference AnthropicBackend.stream and reuse the same mock client used
for complete so both paths are covered.
- Around line 127-139: Replace the inline Mock/AsyncMock setup in
tests/llm/test_backends/test_anthropic.py (the local client and
client.messages.create AsyncMock returning a SimpleNamespace with
TextBlock/usage/stop_reason) with the shared pytest fixture from
tests/conftest.py that provides a mocked Anthropic client; update the test
function signature to accept that fixture, configure the fixture's
messages.create return_value or side_effect as needed, and remove the local
Mock/AsyncMock assignment and any manual teardown so the shared fixture handles
setup/teardown.

In `@tests/llm/test_backends/test_openai.py`:
- Around line 156-175: Replace the duplicated local Mock/AsyncMock setup in the
test (the `client` variable and its `chat.completions.create` AsyncMock) with
the shared pytest fixture defined in tests/conftest.py (use the existing
OpenAI/client mock fixture name from conftest), and update the other duplicated
block (lines around the second occurrence) to also consume that fixture instead
of constructing the mock locally; ensure the test function signature accepts the
fixture and the test uses the fixture's mock to set return_value/side_effect as
needed.
- Around line 155-227: Add streaming-path tests mirroring the existing
complete-path tests: create two new async tests (e.g.,
test_openai_backend_passes_thinking_budget_via_extra_body_stream and
test_openai_backend_skips_extra_body_when_thinking_budget_zero_stream) that
instantiate OpenAIBackend with a mocked client and call backend.stream(...) with
thinking_budget_tokens set to a positive value and zero respectively; mock
client.chat.completions.stream to return an async iterator or AsyncMock and
after awaiting the stream initiation inspect
client.chat.completions.stream.await_args.kwargs (or equivalent) to assert that
when thinking_budget_tokens>0 the kwargs include extra_body {"reasoning":
{"max_tokens": N}} and when zero the "extra_body" key is absent, following the
same structure and assertions used in the complete tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 818c7c72-91b7-41f6-96a9-3da08528acfa

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 39951d8.

📒 Files selected for processing (8)

src/dreamer/specialists.py
src/llm/backends/anthropic.py
src/llm/backends/openai.py
src/utils/agent_tools.py
tests/dreamer/test_model_config_usage.py
tests/llm/test_backends/test_anthropic.py
tests/llm/test_backends/test_openai.py
tests/utils/test_agent_tools.py

coderabbitai · 2026-05-21T14:53:05Z

+            logger.info(
+                "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
+                line[:80],
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging peer-card content snippets.

Line 1483 logs rejected peer-card text content (%r, line[:80]), which can leak sensitive identity data into logs. Log structural reason/length/prefix instead of user content.

Proposed fix

- logger.info( - "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r", - line[:80], - ) + logger.info( + "Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d", + ctx.workspace_name, + ctx.observer, + ctx.observed, + len(line), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info(

"Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",

line[:80],

)

logger.info(

"Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d",

ctx.workspace_name,

ctx.observer,

ctx.observed,

len(line),

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agent_tools.py` around lines 1483 - 1486, Replace the sensitive content logging in the logger.info call that currently emits the peer-card snippet (logger.info(..., line[:80])) so it no longer logs user content; instead log non-sensitive attributes like the rejection reason, the line length (len(line)), and whether the prefix matched/was allowed (e.g., prefix or prefix_allowed flag). Locate the logger.info usage around variable line in agent_tools.py (the current call using "%r" and line[:80]) and change the message to include only those structural details (reason, length, prefix status) while removing any direct or sliced content of line.

Rajat-Ahuja1997

generally looks good but i'm concerned about valid information in an existing peer card being dropped because it doesn't have one of the required prefixes. should we add a section to the system prompt or deduction user prompt that tells the model to re-emit existing entries with a correct prefix? like name: Rajat to IDENTITY: Name: Rajat?

…prompt

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agent_tools.py`:
- Around line 1592-1594: The rejection-ratio total is computed using accepted
after max-facts truncation (normalized_peer_card), causing undercounting;
capture the pre-truncation accepted count (e.g., store
accepted_before_truncation = len(peer_card) or len(original_peer_card) before
you call normalization) and use that to compute total (total =
accepted_before_truncation + rejected_count) when building success_content and
calling _format_rejection_feedback so the "rejected X of Y" message reflects
submitted entries rather than the truncated set.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef10b2b6-efa1-465f-8ad6-d83682b1fa98

📥 Commits

Reviewing files that changed from the base of the PR and between 39951d8 and 004f417.

📒 Files selected for processing (3)

src/dreamer/specialists.py
src/utils/agent_tools.py
tests/utils/test_agent_tools.py

coderabbitai · 2026-05-21T17:22:34Z

+        accepted = len(normalized_peer_card)
+        total = accepted + rejected_count
+        success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use pre-truncation accepted count for rejection ratio.

total = accepted + rejected_count uses accepted after max-facts truncation, so the “rejected X of Y” message can undercount total submitted entries.

Suggested fix

- accepted = len(normalized_peer_card) - total = accepted + rejected_count - success_content = f"{success_content} {_format_rejection_feedback(f'{rejected_count} of {total}')}" + accepted_before_cap = len(seen) + total = accepted_before_cap + rejected_count + success_content = ( + f"{success_content} " + f"{_format_rejection_feedback(f'{rejected_count} of {total}')}" + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agent_tools.py` around lines 1592 - 1594, The rejection-ratio total is computed using accepted after max-facts truncation (normalized_peer_card), causing undercounting; capture the pre-truncation accepted count (e.g., store accepted_before_truncation = len(peer_card) or len(original_peer_card) before you call normalization) and use that to compute total (total = accepted_before_truncation + rejected_count) when building success_content and calling _format_rejection_feedback so the "rejected X of Y" message reflects submitted entries rather than the truncated set.

fix: reframe peer card prompts as stable identity markers

e0e4f1c

VVoruganti added 2 commits May 20, 2026 12:54

fix: remove strict parameter validation for thinking on anthropic and…

6a1bf2f

… openai transports

Merge remote-tracking branch 'origin' into vineeth/dev-1737-peer-card…

39951d8

…-identity

VVoruganti marked this pull request as ready for review May 21, 2026 14:47

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Rajat-Ahuja1997 reviewed May 21, 2026

View reviewed changes

fix(dreamer): Add backwards compatability instructions for peer card …

004f417

…prompt

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Rajat-Ahuja1997 approved these changes May 21, 2026

View reviewed changes

VVoruganti merged commit 4f579d5 into main May 21, 2026
10 checks passed

VVoruganti deleted the vineeth/dev-1737-peer-card-identity branch May 21, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reframe peer card prompts as stable identity markers#686

fix: reframe peer card prompts as stable identity markers#686
VVoruganti merged 4 commits into
mainfrom
vineeth/dev-1737-peer-card-identity

VVoruganti commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

Rajat-Ahuja1997 left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            logger.info(
-                "Rejecting peer card entry (no allowed prefix, empty body, or over length cap): %r",
-                line[:80],
-            )
+            logger.info(
+                "Rejecting peer card entry (invalid structure) for %s/%s/%s: len=%d",
+                ctx.workspace_name,
+                ctx.observer,
+                ctx.observed,
+                len(line),
+            )

Conversation

VVoruganti commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Rajat-Ahuja1997 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VVoruganti commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading