Proj/channel protocol (POC)#686
Conversation
…rser Lands the M1 keystone task: element vocabulary, per-element validators, data-driven registry, and HTML parser wrapper. No behavior change yet — no consumer wires the registry until T3 (curate path) and T4 (query path). Element vocabulary (5 elements): - bv-topic — root container; required `path`, optional importance/maturity/ recency/updatedat (lowercase per HTML5 normalization) - bv-rule — rule statements (severity, id) - bv-decision — decision records (id) - bv-bug — bug runbook entries (id, severity) - bv-fix — fix runbook entries (id) Architecture: - Data-driven `ELEMENT_REGISTRY` is the production-track guardrail. M2 vocabulary expansion is purely additive — new elements register via one entry + one schema/validator pair, no consumer code changes. - `makeAttributeValidator` factory eliminates per-element boilerplate; reduces M2's 12 additional element validators to one-line bindings. - HTML parser is parser-library-agnostic at the API boundary (`DocumentNode` / `ElementNode` / `TextNode`). parse5 used internally; consumers don't see parse5 types. - Attribute-case normalization is documented in `ElementNode` JSDoc: HTML5 lowercases attribute names at parse time, so schema keys must match parser output (lowercase), not source HTML (camelCase). Tests (106 new): - 5 per-element validator suites (61 cases total) — each ≥10 cases - Registry shape + wiring + metadata (14 cases) - HTML parser: basic parsing, malformed-input handling, walkElements, getInnerText, serializeHtml round-trip (22 cases) - End-to-end round-trip on a sample fixture covering all 5 elements (9 cases) Verification: - typecheck clean - lint 0 errors (242 warnings, all pre-existing baseline) - 7715 tests passing, 0 failing - build clean Adds parse5@^8.0.1 as a runtime dep — the W3C-spec HTML parser used by jsdom; types ship in-tree (no @types/parse5 needed). Justified vs hand- rolled parsing because robust HTML tag-soup handling is non-trivial and this is the standard library for the job.
- bv-topic schema: tighten recency regex to ^\d+(\.\d+)?$ (rejects '.', '..1', '1..2' which previously slipped past the regex), drop redundant Number.isFinite for style consistency with importance, accept timezone offsets in updatedat via {offset: true}
- html-parser: replace `as unknown as Parse5DocumentFragment` cast with parse5 defaultTreeAdapter factories (createDocumentFragment, createElement, createTextNode, appendChild) — no more structural casts in the serialize path
- html-parser: getInnerText now inserts a separator between sibling element children so compact emit (<p>foo.</p><p>bar.</p>) does not merge tokens for BM25
- html-parser: document <template> content limitation (parse5 places template children in .content, not childNodes)
- schema docstrings + test names: replace 'warn-only behaviour' with 'parse-and-skip' (no warning is actually emitted in M1)
- sample-topic round-trip: add explicit case-normalization regression test for updatedAt → updatedat
feat: [ENG-2737] HTML render layer foundation — element registry + parser
Rewrite the canonical curate tool description (loaded by ToolDescriptionLoader and presented to the agent on every curate operation) from the structured JSON-args shape (rawConcept/narrative/facts/etc.) to the M1 HTML-emission contract using the closed `<bv-*>` vocabulary defined by T1. Coverage: - All 5 elements (bv-topic, bv-rule, bv-decision, bv-bug, bv-fix) with full per-element attribute schemas: required/optional, value enums, format constraints (importance integer 0-100, recency numeric 0-1, ISO-8601 updatedat, severity enums per element, etc.). - Allowed-children semantics (any/inline/block) + pairing convention (bug+fix as siblings). - Path format (slash-separated snake_case) + domain guidelines. - Output contract: HTML only, exactly one bv-topic root, no preamble/fences/ commentary, lowercase attribute names per HTML5 spec, closed vocabulary, no invented attributes, no clarifying questions. - Three worked examples (bug+fix runbook, rule+decision pair, general topic) to anchor the model's element-pairing and id-naming conventions. Add a sanity test (test/unit/server/infra/render/curate-prompt.test.ts) that loads curate.txt and asserts every registered ELEMENT_NAME, every per-element enum value, and every output-contract directive is present. Guards against silent drift when M2 expands the vocabulary or refactors attribute schemas. Authoring fluency check (M1 T2 spike, separate harness in local-auto-test/curate-fluency/, not in this repo): 20/20 generated outputs validated against the M1 element registry on Sonnet 4.5. Gate cleared on first full run after one prompt-tuning iteration. Decision: proceed to T3. Full report: research repo at features/html-memory-conversion/milestones/01-experiment/fluency-report.md
Field-by-field review of curate.txt against the actual rendered .md
context files (frontmatter + ## Reason + ## Raw Concept + ## Narrative
+ ## Facts) surfaced two issues with the initial T2 draft:
1. <bv-topic> carried runtime-signal attributes (importance, maturity,
recency, updatedat) that were explicitly migrated to a sidecar store
under research/features/runtime-signals/ (per-user, per-machine state
that caused noisy `vc status` diffs and team-share conflicts). Drop
them from the schema; replace with the actual frontmatter fields the
markdown writer renders: title (required), summary, tags, keywords,
related.
2. The 5-element vocabulary covered Rules + three M1 net-new elements
(Decision/Bug/Fix) but left every other rendered MD section as
free-form HTML — losing the structural mapping the writer needs.
Expand M1 vocabulary to 16 elements (added 11), each mapping to a
specific section in the rendered .md output:
- bv-reason → ## Reason
- bv-task, → ## Raw Concept (Task / Changes / Files / Flow)
bv-changes,
bv-files,
bv-flow
- bv-structure, → ## Narrative (Structure / Dependencies /
bv-dependencies, Highlights / Examples / Diagrams);
bv-highlights, bv-rule already covered Rules
bv-examples,
bv-diagram (with type + title attrs for verbatim diagram preservation)
- bv-fact (subject/category/value attrs) → ## Facts list
Each element follows the existing makeAttributeValidator pattern; the
data-driven registry stays the single source of truth.
Tests:
- Update bv-topic test to drop runtime-signal cases and assert the
new required-attributes (path + title); add a regression case
asserting passthrough still tolerates legacy importance/maturity/
recency without enforcing them.
- Add registry test asserting (a) bv-topic exposes the new
optionalAttributes, (b) runtime signals are NOT in the registry
metadata, (c) every registered validator accepts its own empty node
+ rejects mismatched-tag.
- Consolidate the 9 attribute-free new elements into one shared
text-only-elements.test.ts (same trivial schema; per-element
duplication wouldn't add coverage).
- Add dedicated bv-diagram.test.ts (type enum) and bv-fact.test.ts
(category enum, subject/value pass-through).
- Update sample-topic.html fixture to exercise every new element +
the new frontmatter attributes; assert no runtime signals leak in.
- Update curate-prompt.test.ts to assert the expanded vocabulary,
category/type enums for bv-fact/bv-diagram, and that the prompt
explicitly excludes runtime signals from bv-topic attributes.
Re-run the M1 T2 fluency check on the same 20 fixtures with the new
vocabulary: 20 / 20 valid (gate cleared); 100% cohort-appropriate
placement (bug+fix → bv-bug+bv-fix; rule+decision → bv-rule+bv-decision;
general → bv-reason); 122 bv-fact instances extracted across the run
(~6 per fixture). Fluency report updated:
research/features/html-memory-conversion/milestones/01-experiment/fluency-report.md
Open contract violation persists: 18/20 outputs (90%) wrap in code
fences despite explicit prompt instruction. Recommendation for T3
unchanged — strip in the response parser, don't iterate the prompt.
Second writer audit (markdown-writer.ts:270-354) found 3 rawConcept sub-fields without an HTML home: rawConcept.timestamp, rawConcept.author, rawConcept.patterns. Add the missing elements so the M1 vocabulary covers every section the writer renders. Vocabulary additions (16 → 19 elements): - <bv-timestamp> — text-only; renders as `**Timestamp:**` under `## Raw Concept`. Distinct from frontmatter createdAt/updatedAt (system-set) — this is the date the concept's data represents. - <bv-author> — text-only; renders as `**Author:**`. - <bv-pattern> — structured (flags + description as attrs, pattern as text content). Multiple siblings inside <bv-topic> collected into the `**Patterns:**` bullet list. Mirrors how <bv-fact> works. Sanitiser: - stripCodeFenceWrapper(html) added to html-parser.ts. Strips a single outer ``` ... ``` (any language tag) wrapper from the LLM response. Inner fences (e.g., <pre><code> blocks inside <bv-diagram>) survive. T3's curate executor calls this before parseHtml. Defensive sanitisation generalises better than chasing the model's quirk via prompt iteration — observed fence-wrap rate is 70% on Sonnet 4.5 even with explicit prompt instruction to the contrary. Tests: - bv-pattern dedicated test (bv-pattern.test.ts) for the structured attrs. - text-only-elements.test.ts extended for bv-timestamp + bv-author (same trivial schema as the other text-only elements). - html-parser.test.ts: 7 new cases for stripCodeFenceWrapper covering ```html, ```xml, no-language ```, leading/trailing whitespace, unwrapped passthrough, mismatched-fence passthrough, inner-fence preservation, and parse-after-strip end-to-end. - sample-topic-roundtrip.test.ts: renderable-MD coverage assertion extended to require all 15 dedicated body-section elements (was 12). - sample-topic.html fixture exercises the 3 new elements. Re-ran the M1 T2 fluency check on the same 20 fixtures with the 19-element vocabulary. Results: - 20 / 20 valid (gate cleared) - 100% cohort-appropriate placement (unchanged) - bv-timestamp 60% adoption (used where input has a clear date) - bv-author 10% (used only on inputs that name an owner) - bv-pattern 0% — model correctly did not over-emit; no fixture contained regex content. Restraint is good behaviour. - Cost: $0.61 (vs. $0.60 for the 16-element draft; negligible) Updated fluency-report.md in research repo with final coverage matrix and the 19-element → markdown-writer mapping for T3.
…Query/CurateLogEntry Add per-call token usage (input, output, cached, cache-creation), latency tiers (searchMs, llmMs, totalMs), and format mode (markdown vs html) to QueryLogEntry and CurateLogEntry. Token rollup uses a per-task TaskUsageAggregator subscribed to llmservice:usage events on the agent bus; LoggingContentGenerator emits the events on both streaming and non-streaming paths. AiSdkContentGenerator folds result.usage and providerMetadata.anthropic.cacheCreationInputTokens into a synthesized rawResponse so per-provider extraction lands consistently across Anthropic, OpenAI, and Google. Live-validated against 30 curates × 3 providers; all four token fields populate end-to-end with cacheCreationTokens correctly Anthropic-only.
[ENG-2738] HTML-emission contract in curate.txt + fluency check
- core/interfaces no longer reaches into infra: CurateUsageRecord moved to core/domain/entities/curate-log-entry, IUsageAggregator added in core/interfaces/telemetry/, TaskUsageAggregator now `implements` it. - Consolidate curate telemetry assembly inside CurateExecutor: runAgentBody invokes reportTelemetry on both happy and error paths; agent-process passes onTelemetry that forwards via task:curateResult. Removes the duplicated inline assembly in agent-process. - QueryExecutor.buildTiming prefers TaskUsageAggregator.getLlmMs() over the executeOnSession wall-clock so llmMs counts only LLM-call time, not tool-execution overhead. Falls back to wall-clock for tests. - CurateLogHandler error finalization merges telemetryFields(state.usage) so failed curates don't underreport cost when the executor's best-effort error-path reportTelemetry has populated state.usage. - OOD short-circuit in QueryExecutor routes through formatDetector.detect([]) instead of returning format: undefined, so future HTML-aware detectors don't get silently bypassed. - DRY: extract makeUsageListener helper in agent-process and reuse for both curate and query handlers. - Document that PROVIDER_TYPES iteration in LoggingContentGenerator only matches 'aiSdk' in production (all live traffic flows through AiSdkContentGenerator).
…ocation Re-review flagged the gap that error-path reportTelemetry being silent could mask a future regression where failed-curate telemetry disappears from disk. Pin both directions of the contract: - happy path: onTelemetry fires once before runAgentBody returns - error path: onTelemetry fires once before the throw propagates so the daemon can still emit task:curateResult ahead of task:error
feat: telemetry producer for QueryLogEntry + CurateLogEntry
…or wiring Wires the M1 HTML context-tree experiment into the curate flow. When the new BrvConfig.useHtmlContextTree flag is enabled, curate emits a <bv-topic> HTML document using T1's element registry (19 elements after T2's expansion) and atomically writes it to .brv/context-tree/<path>.html. When the flag is off (the default), the existing markdown curate path runs unchanged. Rolled-in T2 review fixes (PR #628 auto-review): - curate.txt: split the "Standard HTML inside <bv-*>" clause per allowed-children semantics. Inline-only subset (code/strong/em) for bv-rule, bv-task, bv-flow, bv-fact, bv-pattern, bv-timestamp, bv-author. Full block subset for bv-topic and the block-content elements. Removes the M1-passthrough/M2-strict drift the reviewer flagged. - curate.txt: prefix the examples block with a disclaimer that the fences are for prompt readability only and the agent's actual output must be bare HTML. Lower-cost mitigation for the model's stubborn fence-wrap habit (T2's stripCodeFenceWrapper still handles the remaining cases). - curate-prompt.test.ts: enum-value tests now anchor to the section after each element header instead of whole-file string match. Catches "severity moved from bv-bug to bv-decision" drift. - curate-prompt.test.ts: new test that parses every fenced example block from curate.txt with parseHtml and runs each <bv-*> element through its registered validator. Catches example typos (severity="hihg") AND vocabulary drift in one assertion. T3 proper: BrvConfig: - New optional boolean field useHtmlContextTree (default undefined == false). Plumbed through the JSON validator, withReviewDisabled, withVersion, and withSpace withers. withoutSpace already used spread, so it picks up the new field automatically. 8 new BrvConfig unit tests cover round-trip + wither preservation. format-detector (src/server/infra/render/format/format-detector.ts): - Two pure functions: getFormatForWrite(config) reads the flag, getFormatForRead(filePath) inspects the extension. Default markdown. 12 unit tests cover both branches plus edge cases (case-insensitive extensions, directory-named-foo.html with .md inside, etc.). html-writer (src/server/infra/render/writer/html-writer.ts): - Public surface: validateHtmlTopic(html) (pure validation, no I/O) and writeHtmlTopic({contextTreeRoot, rawHtml}) (validation + atomic write). Pipeline: 1. stripCodeFenceWrapper (T2-provided) on raw input. 2. parseHtml + walkElements; require exactly one <bv-topic> and a non-empty path attribute. 3. Run every typed bv-* element through its registered validator. Reject on any failure with structured errors. 4. Atomically write via DirectoryManager.writeFileAtomic to <contextTreeRoot>/<path>.html. - Path-traversal hardening: backslashes normalised, `..` segments rejected, leading slashes stripped, resolved path must stay under contextTreeRoot. 15 unit tests cover validity branches, fence variants, atomic-write success/failure, and traversal attempts. curate executor wiring: - CurateExecuteOptions gains useHtmlContextTree?: boolean. - runAgentBody branches the runtime prompt's final-step instruction: HTML mode tells the agent its FINAL RESPONSE is the bv-topic document and explicitly forbids tools.curate calls (which would write a sibling .md file). MD mode keeps the existing "verify via result.applied[].filePath" instruction unchanged. - After agent.executeOnSession returns, HTML mode routes through a new handleHtmlCurateResponse() that calls writeHtmlTopic and shapes a CurationStatus from the result (success/failed). MD mode continues to call parseCurationStatus. - 6 new unit tests cover: valid topic written, fence-wrapped response stripped, missing bv-topic → failed, invalid attribute → failed, flag=false uses MD path, flag=undefined uses MD path. agent-process plumbing: - The daemon's curate dispatcher reads cachedBrvConfig.useHtmlContextTree and forwards it on every CurateExecuteOptions call. Default false remains production-safe; the bench's Configuration B sets it to true via .brv/config.json. Tests: 7892 pass, 0 fail. Typecheck clean, lint clean, build clean. Out of scope (per the M1 plan): - Storage path migration to .brv/render/ — M2. - IR (ComponentIR) compilation — M2. - Multi-tier query / structural selectors — T4. - to-injection/to-evidence/to-summary/to-md adapters — M2. - Migration tool for existing .brv/context-tree/ — M2.
HTML is the curate format; there is no MD/HTML branching to control. Removing the flag simplifies the executor (one prompt, one response handler), the BrvConfig schema (no per-project toggle), and the format-detector (no write-side decision — only read-side dispatch remains for legacy MD files). Code changes: - BrvConfig: remove the optional `useHtmlContextTree` field, the JSON validator branch, all wither plumbing, and the dedicated test block. - ICurateExecutor: remove `useHtmlContextTree` from CurateExecuteOptions. - CurateExecutor: collapse the prompt branch — every curate runs the HTML-emission instruction. Always route the agent's response through `handleHtmlCurateResponse`. Remove the dead `parseCurationStatus` method (the JSON-status-block path is gone). - agent-process: drop the `useHtmlContextTree` plumbing on the curate dispatch site. - format-detector: drop `getFormatForWrite` (no decision to make); keep `getFormatForRead` so the query/search read path can still route legacy `.md` files transparently. Drop the BrvConfig import. Test changes: - BrvConfig: drop the 8-case `useHtmlContextTree` describe block. - format-detector: drop the 4 `getFormatForWrite` cases. - curate-executor-html-mode: drop the two MD-fallback cases (no flag to toggle); the file is now strictly HTML-emission tests. Comment cleanup (separate concern, same commit): Strip milestone scaffolding (M1, M2, T1–T7) from comments across the render layer. Comments now describe logic and decisions, not the project plan that scaffolded them. Specifically: - "M1 light validation; ADR-007 §13 strict validation is M2" → "Light validation; strict validation per ADR-007 §13 is future work" - "M1 element registry"/"M1 vocabulary" → "element registry"/"closed bv-* vocabulary" - "T3 wires the writer", "T4 owns the index", etc. → describe the consumer by role, not by milestone task - "M1 keeps the existing .brv/context-tree/ storage" → describe what the resolver encodes, not which milestone made the decision - Test descriptions: "(parse-and-skip — M1 light validation)" → "(parse-and-skip — light validation)" Pipeline: - 7878 tests pass (lost 14 with the dropped flag — 8 BrvConfig + 4 format-detector write + 2 executor MD-fallback). - Typecheck clean, lint clean, build clean.
End-to-end smoke testing surfaced a gap: the executor's runtime prompt
told the agent to emit HTML as its final response, but the base system
prompt (system-prompt.yml) had ~500 lines of detailed guidance for the
old structured-args curate flow (UPSERT/ADD/UPDATE/MERGE/DELETE with
rawConcept/narrative/facts content shape). The system prompt's
specifics overrode the runtime instruction; the agent kept building
JS data structures via code_exec and never emitted HTML, so the writer
saw nothing to validate and no .html file landed.
Rewrite the curate-related sections of system-prompt.yml to match the
HTML-emission contract:
- Update <TOOL_QUICK_REFERENCE>, <WORKFLOW_SUMMARY>,
<PATH_TITLE_CLARIFICATION>, <PROHIBITED_BEHAVIORS>, <DIRECTORIES>,
and the <SANDBOX_PROGRAMMING> curate example to instruct HTML
emission. The agent's final response IS the bv-topic document; do
NOT call tools.curate; do NOT wrap in code fences.
- Replace the entire ## Curation Workflow section (~500 lines, lines
275–786) with a tighter HTML-focused workflow (~217 lines):
- Keep the recon → mapExtract → dedup → groupBySubject preprocessing
helpers (still useful for chunked/large inputs).
- Document the bv-topic frontmatter (path/title required; summary,
tags, keywords, related optional) and explicitly call out that
importance/maturity/recency/updatedat are sidecar runtime signals,
not authored.
- Map MD sections to bv-* elements (Reason → bv-reason, Raw Concept
sub-fields → bv-task/bv-changes/bv-files/bv-flow/bv-timestamp/
bv-author/bv-pattern, Narrative subsections → bv-structure/
bv-dependencies/bv-highlights/bv-rule/bv-examples/bv-diagram,
Facts → bv-fact, runbooks → bv-decision/bv-bug/bv-fix).
- Document inline-content vs block-content elements and the standard
HTML allowed inside each.
- Keep facts extraction guidance, content quality requirements,
temporal preservation, and content preservation rules — translated
to the bv-* element vocabulary.
- Drop the operation-type table (UPSERT/ADD/UPDATE/MERGE/DELETE),
Two-Part Context Model section, domainContext/topicContext/
subtopicContext (these belonged to the structured-args flow that
no longer exists).
- Update the query-side temporal-reasoning hint to reference
bv-timestamp content + sidecar runtime signals instead of
rawConcept/narrative timestamps.
- Update the related-topics search hint to reference the `related`
attribute on bv-topic alongside the legacy `relations` frontmatter
on .md topics.
Net change: 196 insertions, 494 deletions. The rewritten prompt is
~217 lines shorter and aligned with the actual emission contract in
curate.txt.
Verified end-to-end via local-auto-test/curate-html-e2e/ harness:
- 3/3 fixtures (bug-fix, rule-decision, general-context) curate to
registry-valid HTML topic files (Sonnet 4.5).
- File paths land at <projectDir>/.brv/context-tree/<path>.html as
expected.
- Per-fixture latency: 19.3s, 30.5s, 27.3s.
Pipeline: 7878 unit tests pass, lint clean, typecheck clean, build
clean.
…e time
The previous markdown writer rendered `createdAt` and `updatedAt` in
the YAML frontmatter of every topic file. The HTML writer was dropping
both — the LLM doesn't author them and the writer wasn't injecting
them either, so HTML topics had no temporal metadata on disk.
Inject both as `<bv-topic>` attributes during the writer's atomic
write, after fence-stripping and registry validation:
- `updatedat` is always set to the current ISO-8601 instant.
- `createdat` is preserved from the existing file on disk if one
exists at the resolved path; otherwise it is set to the current ISO
instant. Re-writes leave `createdat` alone, advance `updatedat`.
- Any value the LLM authored for these attributes is overridden — the
agent is not allowed to choose its own timestamps.
Implementation: surgical regex edit on the bv-topic opening tag (no
parse → re-serialize round-trip) so the LLM's formatting (whitespace,
attribute order, quoting style) survives intact. Two helpers added to
html-writer.ts:
- setBvTopicAttributes(html, attrs) — replace-or-insert each attr.
- readExistingTopicAttribute(filePath, name) — surgical read of an
attribute value from an existing on-disk topic without parsing the
whole document.
Tests:
- Existing assertions about file content equaling VALID_TOPIC updated
to match-with-injected-timestamps (body preserved verbatim, bv-topic
opening tag now has createdat/updatedat).
- Three new cases under "system-managed timestamps":
1. First write injects both timestamps within the test window.
2. Re-writes preserve createdat, advance updatedat.
3. LLM-authored values for either attribute are overridden by the
system value.
Verified end-to-end via local-auto-test/curate-html-e2e/ — all 3
fixtures produce HTML topic files with system-injected createdat and
updatedat ISO-8601 timestamps on the bv-topic root, alongside the
LLM's frontmatter (path/title/summary/tags/keywords/related).
Pipeline: 18 html-writer unit tests pass (3 new), full suite green,
typecheck clean, lint clean.
CI test failure (curate-executor-html-mode.test.ts:79) — the file-content
assertion still expected the LLM's raw HTML; now matches the
timestamp-injected file by checking the body is preserved verbatim and
that createdat/updatedat are present. Same fix in the fence-stripping
test variant.
Review feedback addressed:
1. Telemetry format hardcoded to 'markdown' (curate-executor.ts:485).
The flag is gone; HTML is the only curate format. Updated to emit
'html' and reworded the doc comment. Updated the regression test in
curate-executor.test.ts:491 to assert 'html' accordingly.
2. validateHtmlTopic does not catch path-traversal (html-writer.ts).
The doc comment promised "safe to use the topic" but a payload like
`<bv-topic path="../../../etc/passwd">` returned `{ok: true}` —
path-traversal was only caught later when topicPathToFilePath threw.
That works end-to-end (executor catches and surfaces as failed
status), but is a foot-gun for any future caller using
validateHtmlTopic standalone (preview, dry-run).
Move the path-segment safety check into validateHtmlTopic and
surface as a new structured error kind 'unsafe-path'.
topicPathToFilePath keeps path.resolve as defence-in-depth. Added
two unit tests covering `..` segments and single-`.` segments;
updated the existing executor-side test to expect a validation
failure (with `unsafe-path` errors) instead of a thrown exception.
3. Attribute-validation errors swallowed the `field` discriminant
(curate-executor.ts:344). The writer captures which attribute
failed (`severity`, `id`, etc.) but the executor's
verification.missing string only included the tag. So a malformed
`severity="urgent"` was logged as `attribute-validation
(bv-rule): …` rather than `attribute-validation (bv-rule.severity):
…`. Updated the mapping to include the field for
attribute-validation errors. Non-blocking but improves bench-log
debuggability.
Reviewed but obsolete (not actioned):
- "format-detector.ts ships an unused getFormatForWrite" — removed in
the flag-drop commit f701cd2.
- "agent-process.ts double-coerces useHtmlContextTree" — also obsolete
(flag dropped).
Pipeline: 7883 unit tests pass, lint clean, typecheck clean, build
clean. E2e smoke (curate-html-e2e/) green on the rebuilt CLI.
[ENG-2739] curate path — HTML writer + format detector + executor wiring
…ch routing
Wires the search/query path to read HTML topic files transparently
alongside markdown ones. Curate emits HTML; the legacy MD path stays
in for `brv swarm` and existing trees that haven't migrated. The BM25
index is format-agnostic (HTML inner-text decodes to the same plain
text the markdown tokenizer expects); the indexer dispatches on file
extension at read time.
New modules:
- src/server/infra/render/reader/html-reader.ts — readHtmlTopicSync(html)
returns {bodyText, elements, topicAttributes}. bodyText is the
entity-decoded inner text, ready to feed BM25. elements is a flat,
document-order list of every typed `<bv-*>` node with its attribute
map. topicAttributes lifts the bv-topic frontmatter (path, title,
summary, tags, keywords, related). Forgiving on malformed input
(parse5 returns a best-effort tree).
- src/server/infra/render/reader/element-axis-index.ts — in-memory
ElementAxisIndex. Two query keys:
findByTag(tag) → paths
findByAttribute(tag, attr, value) → paths
Reverse path→keys map keeps invalidation O(keys-touched) instead of
full-scan. Built in lockstep with the BM25 index; same lifetime,
same invalidation triggers. Today's consumer is the optional
`elementHint` filter on SearchOptions; the structural-selector
grammar will hook in here without further reshaping.
Search service changes (search-knowledge-service.ts):
- File discovery: glob extended from `**/*.md` to `**/*.{html,md}`.
- Indexing: new readIndexableContent() helper dispatches on
getFormatForRead(filePath). HTML files route through readHtmlTopicSync
(the BM25 input is bodyText, not raw markup) and contribute their
element list to the structural-axis index. MD files index their raw
content as before.
- IndexedDocument gains a `format: 'html' | 'markdown'` field, set
from the file extension at index time and surfaced on every
SearchKnowledgeResult entry. Telemetry/query-log consumers
(T5-shipped QueryLogEntry.format) populate from this without
format-detection at the call site.
- CachedIndex gains `elementAxisIndex: ElementAxisIndex`, populated
during buildFreshIndex from the per-document elements walk. The
empty-tree fallback also constructs an empty index so callers can
treat the field as non-nullable.
- SearchOptions gains `elementHint?: ElementHint`. When present, the
search composes a MiniSearch filter that intersects scope +
element-axis matches before BM25 ranking. No caller passes the hint
today; M2 selector grammar is the consumer.
- INDEX_SCHEMA_VERSION bumped to 6 so previously-cached indexes built
against MD-only schema rebuild on next query.
- Title resolution: HTML files use the bv-topic title attribute; MD
files retain the existing extractTitle frontmatter walk.
- The structural-selector hint is also exposed on the public
SearchKnowledgeOptions as `SearchKnowledgeElementHint` — wired
through ToolsSDK so SDK consumers can pass it without going through
the internal SearchOptions surface.
Tests:
- html-reader (9 cases): bodyText extraction, entity decoding,
attribute lift, elements-list document order + de-duplication,
unknown-bv-* skipping, missing bv-topic graceful behaviour,
malformed input non-throw.
- element-axis-index (10 cases): findByTag/findByAttribute, empty-array
for unknown tags, case-sensitivity on values, single-file multiple-
contribution dedup, remove() drops all keys + cleans empty key sets,
remove() on unknown path no-op, clear().
- search-knowledge-service-html-routing (8 cases): HTML files indexed,
format field populated for both HTML and MD, raw markup is NOT
searchable (inner-text only), bv-topic title attribute lifts as
doc title, elementHint pre-filter (no-match returns empty,
tag-only restricts, tag.attr=value restricts further).
Pipeline: 7910 unit tests pass (27 new), typecheck clean, lint clean
(0 errors), build clean.
Out of scope (per the M1 plan):
- Structural selector grammar — M2 (this PR ships the substrate).
- findReferences reverse-graph — M2.
- Signal-precise cache invalidation — mtime-based today is sufficient.
- BM25 ranking parity test (MD vs HTML same content) — M2 bench
measures end-to-end; the routing tests above cover the
determinism boundary.
- HTML BM25 input now concatenates `<bv-topic>` summary/tags/keywords/related
into the indexable content. Without this, a term living only in those
attributes ranked HTML topics far below the markdown equivalent (where
YAML frontmatter is fed verbatim into BM25).
- HTML title fallback uses `?.trim() || fallbackTitle` so a `title=""` or
whitespace-only attribute doesn't leak into BM25's 3x-boosted title field.
- `ElementHint` (service + tools-sdk) becomes a discriminated union so
`attribute` and `value` must be supplied together at compile time.
- `ElementAxisIndex` storage swaps the stringly-keyed
`${tag}.${attr}=${value}` table for nested Maps — eliminates the
`.`/`=`-in-attribute collision class without a delimiter discipline.
- `html-reader` lifts attributes off the FIRST `<bv-topic>` encountered
unconditionally rather than the first non-empty one (matches the
contract on malformed input).
- Glob discovery does two parallel passes (`**/*.html`, `**/*.md`)
instead of relying on the underlying engine to expand `{html,md}`.
Tests: +7 (BM25 attribute-payload parity, title-fallback, FS-backed
reader round-trip, topic-attribute lift contract, attribute-name/value
collision robustness x3). All 7917 pass; lint + typecheck clean.
[ENG-2740] query path — HTML reader + element-axis index + search routing
…sponse
Tier 2 (`brv query`'s direct-search response — no LLM) reads the full
file at `query-executor.ts:701` and ships the bytes back to the user
through `formatDirectResponse`. For HTML topics that's raw
`<bv-topic ...><bv-rule severity="must">...</bv-rule>...` markup.
Two consequences observed during the M1 multi-tier audit:
1. The 5000-char content budget in `direct-search-responder.ts:11`
burns on tags and attribute syntax before reaching meaningful
text. Long topics get truncated mid-element.
2. Any LLM-shaped consumer downstream (Tier 4 agent re-reads, the
bench harness, third-party integrations) re-parses the HTML.
Wasted tokens; lossy semantics.
Stripping every tag (bodyText only) loses the typed-element signals
the format exists to carry — severity on rules, subject/value on
facts, id on decisions. New `renderHtmlTopicForLlm(html)` walks the
parsed tree and renders each `<bv-*>` element with a semantic prefix:
- `<bv-rule severity="must" id="r-1">x</bv-rule>` →
`- **Rule** [must] (r-1): x`
- `<bv-fact subject="alg" value="RS256">…</bv-fact>` →
`- **Fact** (subject=alg, value=RS256): …`
Wired into `tryDirectSearchResponse`: when `result.format === 'html'`
the full content goes through the renderer; markdown topics pass
through verbatim. Renderer is forgiving on malformed input (mirrors
the rest of the parse5-driven reader pipeline); on any throw we fall
back to raw bytes so a single bad topic can't blank the response.
Tests: +15 (12 renderer, 2 Tier 2 HTML branch, 1 Tier 2 markdown
regression guard). 7931 pass; lint + typecheck + build clean.
fix: Tier 2 direct response renders HTML topics into structured markdown
…annel auth Phase 1, Slice 1.0 of the channel-protocol rollout. Prepares the transport server for the daemon-token auth middleware that arrives in Slice 1.4 (CHANNEL_PROTOCOL.md §2; DESIGN.md §5.6). Three connected changes: - Widen TransportServerConfig.corsOrigin from string-only to the full Socket.IO origin union (string | string[] | RegExp | RegExp[] | OriginCallback). Channel auth needs callback and regex shapes for loopback-only origin enforcement; string-only blocked them. - Extend RequestHandler with an optional third ctx parameter exposing handshake auth.token and origin. Backward-compatible: existing handlers continue to match the (data, clientId) shape; new channel handlers consume ctx to enforce CHANNEL_UNAUTHORIZED before dispatching. - Add DaemonTokenStore (readOrCreateDaemonAuthToken) that persists a 256-bit token to <data-dir>/state/daemon-auth-token with mode 0600, regenerating on missing file, wrong perms, or empty contents. Atomic rename for crash safety; POSIX-only perm check documented for v1. 15 new tests across the three layers; 22 pre-existing socket-io-transport-server tests still green. See plan/channel-protocol/IMPLEMENTATION_PHASE_1.md Slice 1.0 for the full scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-HTML-migration, every QueryLogEntry on disk reports `format: "markdown"` regardless of whether the recall touched HTML or markdown topics. Telemetry consumers (query-log summaries, analytics dashboards, the brv-bench harness) cannot distinguish HTML vs markdown traffic — every chart flat-lines as "all markdown" even on a fully migrated tree. Root cause: T5 (ENG-2741) introduced `MarkdownOnlyFormatDetector` as a deliberate stub with the explicit comment "swap when the format-detector task lands the real extension-aware detector." T3 (ENG-2739) shipped `format-detector.ts` for read-routing inside `search-knowledge-service` but did not swap the executor's binding. The stub stayed wired as the default in `QueryExecutor`. This change: - Adds `ExtensionAwareFormatDetector` — inspects each `matchedDoc.path` extension and returns `'html'` if any HTML doc was retrieved, `'markdown'` for legacy-only recalls, `undefined` for empty recalls (cache hit, OOD short-circuit, tier 4 LLM-only). - Single-`'html'` policy is deliberate: any HTML doc is the load-bearing signal for telemetry rollups in a partially-migrated tree. Reporting `'markdown'` for a mixed result would hide HTML traffic from cost / coverage dashboards. - Strips shared-source `[alias]:` prefixes before extension parsing so shared-source paths classify the same as local. - Defaults to `'markdown'` for extensionless paths (defensive). - Wires as the default `formatDetector` binding in `QueryExecutor`. - Keeps `MarkdownOnlyFormatDetector` for tests that pin pre-migration semantics; updates its docstring + comment to reflect the new role. Tests: +9 (extension-aware detector full coverage). 7940 pass; lint runs through the pre-existing submodule config issue, otherwise clean.
ByteRover-provider runs produce QueryLogEntry records with no
`inputTokens` / `outputTokens` / `cachedInputTokens` /
`cacheCreationTokens` even though Tier 3 calls take seconds of llmMs.
Telemetry consumers cannot compute per-query cost.
Root cause: T5's telemetry pipeline depends on
`GenerateContentChunk.rawResponse` being set on the terminating
chunk so `LoggingContentGenerator.generateContentStream()` can
capture the last non-undefined `rawResponse` and feed it to
`pickRawUsage()` (`.usage ?? .usageMetadata`). The AI-SDK content
generator wires this on its `'finish-step'` case but
`ByteRoverLlmHttpService.extractContentFromResponse()` (the simulated
stream backing every `ByteRoverContentGenerator` call) yielded chunks
without ever attaching the backend response. Result: `lastRawResponse`
stays undefined, no `llmservice:usage` event fires, no tokens land on
the log entry.
Fix: attach the full `GenerateContentResponse` as `rawResponse` on the
terminating chunk in both branches of `extractContentFromResponse` —
empty-content and full-content paths. The `pickRawUsage` helper
already handles the Gemini-shaped (`usageMetadata`) and Claude-shaped
(`usage`) payloads the ByteRover backend returns through this code
path, and the existing `extractUsage('google'|'anthropic')` mappers
take it from there.
Tests: +3 (terminating-chunk rawResponse coverage across content,
empty-parts, and function-call termination cases). 7943 pass.
The companion prompt loaded into every curate-mode invocation
(`curate-detail-preservation.yml`, auto-discovered by
`companion-contributor.ts:discoverCompanionPrompts('curate')` against
the `curate-*.yml` glob) still instructed the agent to "store in
narrative.rules / narrative.highlights / narrative.diagrams /
narrative.structure / narrative.dependencies / narrative.examples /
rawConcept.flow / content.facts / snippets" — every one of which is a
JSON-schema field on the pre-T3 curate-tool API that no longer exists.
T3 (ENG-2739) dropped `useHtmlContextTree` and made HTML the sole
emission format, but T2 (ENG-2738, the HTML-emission contract) only
updated `curate.txt` and missed this companion file.
Empirically observable on the proj/html-mem-conversion branch: a
LoCoMo-bench curate against `conv_26/session_1` produced a `<bv-topic>`
with one giant `<bv-rule>` element wrapping a markdown narrative —
the agent saw "store in narrative.rules" and treated `<bv-rule>` as
the closest match. Other sessions in the same run emitted clean
typed-element shapes; the variance correlates with whether the
legacy companion prompt dominated that turn's context.
This change rewrites the file so every preservation rule maps to the
correct `<bv-*>` element from the closed vocabulary
(`bv-rule`, `bv-task`, `bv-fact`, `bv-decision`, `bv-bug`, `bv-fix`,
`bv-flow`, `bv-diagram`, `bv-highlights`, `bv-structure`,
`bv-dependencies`, `bv-examples`, `bv-pattern`). General preservation
principles (completeness over conciseness, exact preservation, never
summarize) are kept verbatim. Adds an explicit guard at the end of
the prompt telling the agent NOT to use `narrative.*` / `rawConcept.*` /
`content.*` JSON-schema field names so any residual model bias toward
the pre-migration API surfaces is suppressed.
Tests: no test fixture asserts on this file's contents (it's a runtime
agent prompt, exercised via end-to-end curate runs). YAML still parses;
full suite green at 7943.
… goalposts Phase 1, Slice 1.1 of the channel-protocol rollout. Ships only test files; no production code. The tests compile and fail at runtime against the ChannelTestHarness stub with a clear "Slice 1.1 stub" message pointing at the slices that will turn each test green. This is the red signal Outside-In TDD requires before any orchestrator / handler / oclif code lands. New files: - test/helpers/temp-dir.ts — tmpdir + recursive cleanup helper. - test/helpers/temp-context-tree.ts — scratch project dir with the .brv/context-tree/ root expected by the orchestrator. - test/helpers/channel-test-harness.ts — boot() + run() + shutdown() + simulateSnapshotLoss() API. All methods are stubs that throw with a pointer to the implementing slice. Documents both runtime options (in-process daemon vs subprocess) for Slice 1.4/1.5 to pick. - test/integration/channel-phase1-happy-path.test.ts — encodes the §6.4 demo: new → list → get → post → list-turns → show → archive. Includes the auth-rejection canary (Phase 1 DoD §5): orphan BRV_DATA_DIR with no daemon-auth-token MUST fail with CHANNEL_UNAUTHORIZED or ERR_BRV_DAEMON_NOT_INITIALISED. - test/integration/channel-phase1-crash-recovery.test.ts — encodes DoD §2: drop turn.json, replay from events.jsonl. - test/integration/channel-phase1-append-finalize-race.test.ts — encodes DoD §3: N concurrent posts to one channel must serialise through the per-turn write lock with no torn writes. Red signal verified: `npx mocha test/integration/channel-phase1-*.test.ts` reports 3 failing, 0 passing, all at the boot() stub. Slice 1.0 tests remain green (37 passing). See plan/channel-protocol/IMPLEMENTATION_PHASE_1.md Slice 1.1 for the full scope and Slices 1.2-1.5 for the green-path schedule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1, Slice 1.2 of the channel-protocol rollout. Defines the shared wire
types and event constants the daemon, oclif commands, and adapter SDKs will
consume. Two concerns landed together so Phase-1 request validation works
end-to-end:
- Full ChannelEvents constants object (all 19 events per CHANNEL_PROTOCOL.md
§3) is exported from day one with logical lifecycle/membership/turns/
broadcast grouping. Names are locked across phases so the wire never
churns; only handler registration and request schemas grow per phase.
- Phase-1 request/response zod schemas for the 7 client-to-host events
(create, list, get, archive, post, list-turns, get-turn) plus the two
Phase-1 broadcasts (turn-event, state-change). Phase-2 events are present
as constants but have no request schema until Phase 2 wires them up.
Supporting domain types in src/shared/types/channel.ts:
- ContentBlock (ACP-aligned discriminated union, passthrough() so unmodeled
ACP fields round-trip).
- TurnAuthor, ChannelMember, ChannelMemberSummary (discriminated unions
matching CHANNEL_PROTOCOL.md §4 + §5.1).
- Turn, TurnDelivery, TurnEvent (full §7.1 union — Phase 1 only exercises
`message` and `turn_state_change`, but the schema is forward-complete so
later slices don't re-touch the file).
- Channel + ChannelMeta (wire vs on-disk shapes per §10 + §4.2).
- TurnState / TurnDeliveryState enums.
ACP source resolution (Phase 1 plan Open Decision 4): installed
@agentclientprotocol/sdk@^0.21.0; package exports ContentBlock /
PromptRequest / RequestPermission* / PermissionOption / PlanEntry /
SessionUpdate as TypeScript types. Our local zod schemas mirror these for
runtime validation; orchestrator code that needs strict ACP type alignment
imports from the SDK directly. The promptBlocks field is typed properly (not
unknown) so §8.4 prompt-empty validation is accurate.
17 new tests; full prior suite (54 transport+auth tests) still green.
See plan/channel-protocol/IMPLEMENTATION_PHASE_1.md Slice 1.2 for scope and
plan/channel-protocol/CHANNEL_PROTOCOL.md §3/§4/§8 for the spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| export const resolveBrvBin = (opts = {}) => { | ||
| if (typeof opts.brvBin === 'string' && opts.brvBin !== '') return opts.brvBin | ||
| const envBin = process.env.BRV_BIN | ||
| if (typeof envBin === 'string' && envBin !== '') return envBin | ||
| const pathEnv = opts.pathEnv ?? process.env.PATH ?? '' | ||
| for (const dir of pathEnv.split(delimiter)) { | ||
| if (dir === '') continue | ||
| const candidate = join(dir, 'brv') | ||
| try { | ||
| accessSync(candidate, constants.X_OK) | ||
| return candidate | ||
| } catch { | ||
| // Not executable here — keep walking PATH. | ||
| } | ||
| } | ||
|
|
||
| return 'brv' | ||
| } |
There was a problem hiding this comment.
nit (robustness): accessSync(candidate, constants.X_OK) returns success for directories that happen to be executable (which most directories are on POSIX). If a user has a directory named brv on their PATH ahead of the real binary, resolveBrvBin will happily bake the directory path into SKILL.md, and every shell invocation from the LLM will then fail with EACCES / is a directory.
Cheap fix:
const stat = statSync(candidate, {throwIfNoEntry: false})
if (stat && stat.isFile() && (stat.mode & 0o111)) return candidateSame fileset already imports existsSync from node:fs, so adding statSync is a one-line change.
| export function createLocalAgentResponseGenerator(deps: LocalAgentResponseGeneratorDeps): ParleyResponseGenerator { | ||
| return async function* ({envelope}) { | ||
| const profile = await deps.profileStore.get(deps.profileName) | ||
| if (profile === undefined) { | ||
| throw new ParleyResponseError( | ||
| 'PARLEY_LOCAL_AGENT_PROFILE_MISSING', | ||
| `BRV_BRIDGE_PARLEY_PROFILE="${deps.profileName}" does not exist in the driver-profile registry`, | ||
| ) | ||
| } | ||
|
|
||
| const promptBlocks = envelope.prompt.map((b) => ({ | ||
| text: b.text, | ||
| type: 'text' as const, | ||
| })) | ||
|
|
||
| if (deps.pool !== undefined) { | ||
| let acquired | ||
| try { | ||
| acquired = await deps.pool.acquire(deps.profileName, () => deps.driverFactory(profile.invocation, LOCAL_HANDLE)) | ||
| } catch (error) { | ||
| if (error instanceof ParleyResponseError) throw error | ||
| const msg = error instanceof Error ? error.message : String(error) | ||
| throw new ParleyResponseError('PARLEY_LOCAL_AGENT_START_FAILED', msg) | ||
| } | ||
|
|
||
| try { | ||
| for await (const payload of acquired.driver.prompt({prompt: promptBlocks, turnId: envelope.turn_id})) { | ||
| const chunk = projectPayload(payload) | ||
| if (chunk !== undefined) yield chunk | ||
| } | ||
| } finally { | ||
| // Release the driver back to the pool so the next inbound | ||
| // parley reuses the warm subprocess. The pool calls stop() | ||
| // on closeAll at daemon shutdown. | ||
| acquired.release() | ||
| } | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Pool-less fallback: spawn + start + stop per query (legacy | ||
| // 9.4c path, retained for callers that haven't wired the pool). | ||
| const driver = deps.driverFactory(profile.invocation, LOCAL_HANDLE) | ||
| try { | ||
| await driver.start() | ||
| } catch (error) { | ||
| const msg = error instanceof Error ? error.message : String(error) | ||
| throw new ParleyResponseError('PARLEY_LOCAL_AGENT_START_FAILED', msg) | ||
| } | ||
|
|
||
| try { | ||
| for await (const payload of driver.prompt({prompt: promptBlocks, turnId: envelope.turn_id})) { | ||
| const chunk = projectPayload(payload) | ||
| if (chunk !== undefined) yield chunk | ||
| } | ||
| } finally { | ||
| await driver.stop().catch(() => {}) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
threat-model (security, important to document): This dispatcher hands Alice's envelope.prompt directly to Bob's local ACP subprocess (codex/kimi/opencode) as a normal turn. Whatever tools that agent has access to in Bob's environment, Alice can drive via the prompt. The agent's system prompt + tool gating is the only thing between Alice and Bob's local filesystem / shell.
This is by design (the PR description acknowledges read-only is the intent for query protocol), but two concrete gaps in the current code:
-
No
protocol: 'query' vs 'delegate'enforcement. The verifier atparley-verifier.ts:150only checksacceptModesfor the tree-cert kind; it does not gate onenvelope.protocol. A dialer who sendsprotocol: 'delegate'will reach this generator and Bob's agent will be prompted the same way — with full tool access.delegate-policy.ts'spolicyPermitsDelegationis exported butparley-server.tsdoesn't call it. -
Prompt-injection of Bob's agent. Even with intent of "read-only Q&A", a sufficiently capable agent given a prompt like "use brv curate to commit the following file" will happily do it. Unless Bob's profile passes restricted-tool flags to the ACP subprocess, the agent retains its normal capability set.
Neither is a blocker for an internal-test rollout (operators are setting BRV_BRIDGE_PARLEY_PROFILE themselves and consenting), but INTERNAL_TEST.md §5 should call out the trust model more loudly than "cross-bridge agents can answer queries but cannot perform write actions on the dialer's behalf" — the constraint is "Bob's agent runs Alice's prompts in Bob's environment with Bob's tools", which is a much stronger claim than "read-only Q&A".
Recommend (for a follow-up, not this PR): wire policyPermitsDelegation at parley-server.ts:114 between the verifier and the auto-provision check, so delegate_policy: 'deny' short-circuits today, and prompt becomes the wire-handler hook later.
| */ | ||
| function buildRawResponse(parts: { | ||
| providerMetadata: unknown | ||
| response: unknown | ||
| usage: {cachedInputTokens?: number | undefined; inputTokens: number | undefined; outputTokens: number | undefined; totalTokens: number | undefined} | ||
| }): Record<string, unknown> { | ||
| const cacheCreation = readAnthropicCacheCreation(parts.providerMetadata) | ||
| const usage: Record<string, unknown> = { | ||
| ...parts.usage, | ||
| ...(cacheCreation !== undefined && {cacheCreationTokens: cacheCreation}), | ||
| } | ||
| const responseObj = typeof parts.response === 'object' && parts.response !== null ? parts.response : {} | ||
| return { | ||
| ...responseObj, | ||
| providerMetadata: parts.providerMetadata, | ||
| usage, | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Read Anthropic's `cacheCreationInputTokens` out of the AI SDK's providerMetadata | ||
| * (typed as `ProviderMetadata = Record<string, Record<string, JSONValue>>` upstream). | ||
| * Returns `undefined` for non-Anthropic calls or when the provider didn't set the field. | ||
| */ | ||
| function readAnthropicCacheCreation(providerMetadata: unknown): number | undefined { | ||
| if (typeof providerMetadata !== 'object' || providerMetadata === null) return undefined | ||
| const {anthropic} = providerMetadata as {anthropic?: unknown} | ||
| if (typeof anthropic !== 'object' || anthropic === null) return undefined | ||
| const {cacheCreationInputTokens} = anthropic as {cacheCreationInputTokens?: unknown} | ||
| return typeof cacheCreationInputTokens === 'number' && Number.isFinite(cacheCreationInputTokens) | ||
| ? cacheCreationInputTokens | ||
| : undefined | ||
| } |
There was a problem hiding this comment.
nit (correctness): buildRawResponse overwrites the original result.response.usage (if present) with the synthetic usage block, because of object-spread ordering:
return {
...responseObj, // may contain `usage` from upstream
providerMetadata: ...,
usage, // wins
}That's the intended behaviour here — extractUsage is built around the AI-SDK-normalized field names — but if any direct-provider adapter ever passes rawResponse: result.response and the SDK upstream starts including its own usage block, the synthetic block will silently mask the upstream one. Adding a brief comment ("synthetic usage intentionally shadows any upstream usage") would make the intent explicit and survive a future refactor.
Also: Number.isFinite check in readAnthropicCacheCreation rejects negative numbers — desired — but it also rejects 0. A cacheCreationInputTokens: 0 would be silently dropped, leaving the synthetic block without the field. If telemetry consumers distinguish "0 cache-creation tokens" from "unknown", you'll lose that distinction. Minor.
…ply opencode review Closes two final gaps before the team-test ship, both from opencode's independent review of the bridge/heartbeat/config-persistence cut (turn `jVc7J9M0UpZsKCJoE2LSV` on `phase9-final-review` channel, verdict: `LGTM for internal-test cut. Ship it.`): (1) `brv channel uninvite` had existed since Phase 2 (full transport + orchestrator + handler) but its CLI surface had a one-line description that didn't surface the load-bearing operator-UX behaviour: cancelling in-flight deliveries + releasing the warm pool driver. The team will hit the stale-multiaddr / port-rotation problem on day one of cross-bridge dogfood; without knowing this command does a clean stop, they'd reach for `channel new`. Expanded the description, added an example for the multiaddr-rotation case, and replaced INTERNAL_TEST.md §5.1's "no `brv channel kick` yet" admonition with the actual workflow: brv channel uninvite <ch> @<stale-handle> brv channel invite <ch> @<handle> --peer ... --multiaddr ... 6 contract tests added in `test/commands/channel/uninvite.test.ts` covering args/flags/description/examples to prevent silent regression of the host-LLM-facing contract (the skill consumes this surface from bash). (2) Per opencode #1: deleted `test/integration/channel-phase3-origin-rejection.test.ts`. It referenced a `BRV_FORCE_ORIGIN` env-var consumer that has never existed in `src/` (`grep` returns zero hits). Keeping a skipped test that targets dead code is debt — opencode flagged it correctly, kimi's earlier skip-with-rationale was the wrong call for this one specifically. The other two skipped integration tests stay skipped with rationale (opencode #1 only applied to the BRV_FORCE_ORIGIN one — the Phase-2 cancel-ordering + multi-mention-rejection ones target real code paths that just need their assertions updated against the Phase-9/10 surface changes; the `describe.skip` comments serve as reminders for the future fix). Opencode also flagged but DEFERRED (post-internal-test follow-ups, not in this commit): - `brv bridge config unset <key>` CLI verb so operators don't have to `rm <dataDir>/state/bridge-config.json` to revert a setting. Low priority; documented escape hatch is acceptable for internal test. - A proper Mutex/AsyncQueue if a third producer ever joins the `sendChain` in parley-server.ts. Today's two-producer (timer + chunk-loop) case is fine with the single-reassigned promise chain. - 9.9 delegation wire ETA: realistic at ~1 wk IF stream-authenticated (payload-level signatures would add 2-3 days). - 9.8 NAT traversal: non-blocker if team uses Tailscale; defer. Verified locally: - typecheck clean - lint clean - 6 new uninvite contract tests pass in 3 ms - INTERNAL_TEST.md updated with the recovery flow Three-agent sign-off: - kimi: SHIP (round-2 review of heartbeat fix + bridge config) - codex: drove every live cross-bridge test in the session as the ACP backend on Bob/Alice's daemons; the live verifications are its testimony - opencode: LGTM for internal-test cut (this commit) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Claude finished @ngduyanhece's task in 2m 27s —— View job Re-review — Channel + Bridge protocol (synchronize, 2026-05-22 push)Overall: 4 / 5 (unchanged from 2026-05-20). Scope of this re-review is narrow: only one substantive commit landed since the previous pass.
Diff since the 2026-05-20 13:17 UTC reviewOnly one substantive commit landed after the prior review wall-clock:
( Total delta: 4 files, +57 / -54. What's good in this delta
New inline finding (1)
Prior-round findings — statusNone of the 9 files flagged in the 2026-05-20 review were modified by
This is consistent with the PR's "internal-test cut, follow-ups tracked" framing — the deferred slices (9.6 / 9.7 / 9.8 / 9.9) and the open robustness nits are explicitly out-of-scope for this ship. The earlier review's score (4 / 5) stands. Verdict for this pushLGTM-incremental. The |
| it('handle arg rejects non-@-prefixed values via run-time validation', () => { | ||
| // We can't construct the runtime context here, but the static | ||
| // shape commitment is that the handle must start with @ — | ||
| // mirrored from invite.ts:46-48. The command implementation | ||
| // performs the check at runtime. | ||
| // (Smoke check: the args definition is present.) | ||
| expect(ChannelUninvite.args.handle.description).to.match(/@/) | ||
| }) |
There was a problem hiding this comment.
nit (test/code drift): The test name promises runtime behaviour ("rejects non-@-prefixed values via run-time validation") but the assertion only checks that the arg's description string contains @. The actual rejection at uninvite.ts:30-32 (if (!args.handle.startsWith('@')) { this.error(...) }) has no coverage from this suite.
Either:
- Rename the test to match what it asserts — e.g.
'documents the @-prefix requirement in the handle arg description', or - Add a second case that actually exercises the rejection (oclif's
Command.runis awaitable; the existingchai-only setup makes this a few-line addition):
it('rejects non-@-prefixed handles at runtime with exit code 1', async () => {
try {
await ChannelUninvite.run(['some-channel', 'bob'])
expect.fail('expected ChannelUninvite to reject a bare handle')
} catch (error) {
expect((error as Error).message).to.match(/must start with @/i)
}
})Not blocking — the static contract tests are useful — but a test whose name doesn't match its assertion will mislead the next person who greps for "@-prefix" expecting to find the live coverage.
… var Critical gap surfaced during live two-machine setup (laptop ↔ GCP VM over Tailscale, 2026-05-22): the daemon-integrated bridge defaults to `/ip4/127.0.0.1/tcp/0` (loopback only). `brv bridge pin` from the laptop to the VM's bridge multiaddr returned ECONNREFUSED because nothing was actually listening on the VM's external interface. The internal-test ship was effectively single-machine-only without this env var, despite the cross-bridge protocol working over libp2p. Standalone `brv bridge listen --port <p>` was the only configurable listener path, but it spawns an isolated libp2p host that does NOT share state with the daemon's `BridgeTranscriptService` — inbound parley calls land in a separate transcript store invisible to `brv channel list-turns`. Useless for the cross-machine chat use case the team will exercise. Fix: thread `BRV_BRIDGE_LISTEN_ADDRS` (comma-separated multiaddrs) through the same `bridge-config-store.ts` resolver that already persists `BRV_BRIDGE_PARLEY_PROFILE` etc. Daemon-integrated bridge now reads the resolved value and overrides `DEFAULT_BRIDGE_CONFIG.listen_addrs` when set; falls back to the loopback default when unset. Operator usage for a Tailscale-bridged GCP VM: # On both machines, add to shell rc: export BRV_BRIDGE_LISTEN_ADDRS=/ip4/0.0.0.0/tcp/60001 # Or pin to a specific interface: # export BRV_BRIDGE_LISTEN_ADDRS=/ip4/100.x.x.x/tcp/60001 # First `brv bridge whoami` after this persists to # <dataDir>/state/bridge-config.json — subsequent daemon respawns # inherit it without re-exporting. Tests: - 5 new in bridge-config-store.test.ts covering env-supplied, comma-list parsing with whitespace, undefined fallback, file-only respawn recovery, env-overrides-file - All 17 bridge-config-store tests pass in 14 ms - Existing 9603 unit + integration suite unaffected (typecheck + lint clean) INTERNAL_TEST.md §3.1 updated with the new env var + Tailscale guidance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Claude finished @ngduyanhece's task in 2m 57s —— View job Re-review — Channel + Bridge protocol (synchronize, 2026-05-22 15:17 UTC push)Overall: 4 / 5 (unchanged from prior reviews). Scope is narrow: one substantive commit since the previous wall-clock.
Diff since the 2026-05-22 14:08 UTC review
Total delta: 4 files, +140 / -12. What's good in this delta
New inline finding (1)
Minor / not inlined
Prior-round findings — status
Verdict for this pushLGTM-incremental. |
| // externally-routable address — operators set `BRV_BRIDGE_LISTEN_ADDRS` | ||
| // (comma-separated) to something like | ||
| // `/ip4/0.0.0.0/tcp/60001,/ip4/100.x.x.x/tcp/60001`. | ||
| listenAddrs: z.array(z.string().min(1)).min(1).optional(), |
There was a problem hiding this comment.
issue (validation gap, low-medium): This new listenAddrs field is validated as z.array(z.string().min(1)) — i.e. "non-empty strings" only. The canonical BridgeConfigSchema at src/server/infra/channel/bridge/bridge-config.ts:11-16,63 deliberately uses a multiaddrString refinement that calls multiaddr(s) so a malformed value fails at parse time, with the explicit rationale ("Libp2p rejects invalid multiaddrs at runtime with opaque errors; failing at config-parse time gives the user a useful pointer").
The new wiring at brv-server.ts:916-920 constructs the live config with a literal-object spread:
const bridgeListenConfig: typeof DEFAULT_BRIDGE_CONFIG =
bridgeRuntime.listenAddrs === undefined
? DEFAULT_BRIDGE_CONFIG
: {...DEFAULT_BRIDGE_CONFIG, listen_addrs: [...bridgeRuntime.listenAddrs]}so the canonical schema's refinement never runs on the env-supplied multiaddrs. End state: an operator who sets BRV_BRIDGE_LISTEN_ADDRS=:60001 (forgetting the /ip4/.../tcp/ prefix), or BRV_BRIDGE_LISTEN_ADDRS=/ip4/0.0.0.0/tcp/60001/extra-junk, gets the value persisted to bridge-config.json, and the first lazy ensureBridgeHost call dies inside createLibp2p({addresses: {listen: [...]}}) (libp2p-host.ts:304) with the exact opaque error the refinement was meant to prevent. The persisted bad value will keep tripping the next daemon respawn too.
Two reasonable fixes — either is fine:
-
Validate at the resolver — reuse the same refinement here:
import {multiaddr} from '@multiformats/multiaddr' const multiaddrString = z.string().refine((s) => { try { multiaddr(s); return true } catch { return false } }) // … listenAddrs: z.array(multiaddrString).min(1).optional(),
And in
readCommaListEnv(or a newreadMultiaddrListEnv), validate per-entry with anonInvalidcallback that logs the same[Daemon] invalid BRV_BRIDGE_LISTEN_ADDRS="..."shape the other env readers use, so a bad input is rejected at resolve time rather than persisted. -
Route through
parseBridgeConfigat construction —parseBridgeConfig({...DEFAULT_BRIDGE_CONFIG, listen_addrs: [...]})would trigger the canonical refinement and throw with a clear Zod error, whichensureBridgeHostalready has to catch anyway.
(1) is preferable because it ALSO prevents bad values from being persisted to disk in the first place, which closes the "stuck on a bad config across respawns" failure mode.
Non-blocking for the internal-test ship — operators of this PR are technical and will figure out the multiaddr shape. But the friendlier error path is cheap.
…dapter + bridge UX smoothness
Refactors the parley dispatcher into a typed ParleyAdapter registry so non-ACP
agents (Claude Code) can drive cross-machine channel turns alongside the
existing ACP path (codex/kimi/opencode/gemini). Adds the operator-facing
smoothness fixes surfaced during the 2026-05-22 live two-machine internal test.
Phase 9.5.2 — adapter refactor + delete legacy
- ParleyAdapter interface + registry with strict resolution (unset profile
→ mock-echo, bad explicit profile → ParleyAdapterNotFoundError with hint table)
- MockEchoAdapter + AcpAdapter wrap the existing dispatch paths
- Delete local-agent-response-generator.ts (legacy duplicate path)
Phase 9.5.3 — Claude Code headless adapter
- ClaudeCodeHeadlessAdapter spawns `claude -p --output-format stream-json
--dangerously-skip-permissions` per inbound parley turn
- Gated behind BRV_BRIDGE_CLAUDE_UNSAFE=1 (default-off; permission passthrough
is a follow-up before this becomes default)
- ProfileConcurrencyGate semaphore (BRV_BRIDGE_MAX_CONCURRENT_PER_PROFILE)
- ParleyAdapterSessionStore: 0600 perms, atomic writes, in-process mutex,
composite key projectRoot+channelId+senderPeerId+adapterProfile
- Real abortSignal plumbing: heartbeat/send failure fires controller.abort()
early; post-acquire + pre-spawn signal checks prevent queued-abort race
Phase 9.5.4 — channel-mirror auto-create upgrade
- Trust gate: auto-create requires user-confirmed or ca-bound (NOT auto-tofu).
Breaking change: BRV_BRIDGE_AUTO_PROVISION=auto no longer mirrors channels
from first-contact peers without an explicit `brv bridge verify` step.
- Sliding 1h quota (default 5/peer/hour, BRV_BRIDGE_AUTO_CREATE_QUOTA override)
- Shared channelId regex validator: ^[a-z0-9][a-z0-9-]{0,63}$
- addressability: 'bootstrap-only' flag on auto-created remote-peer members
- channel_auto_created event for reactive watchers
- BRIDGE_MULTIADDR_STALE recovery hint with copy-paste `brv bridge connect`
suggestion when reverse-dial fails (exported, full wire-up follow-up)
- channel uninvite now resets the auto-create quota for the removed peer
Phase 9.5.6 — `brv bridge connect` one-command setup
- Bundles pin + (optional) verify + (optional) channel create + invite
- Idempotent re-run; partial-progress failure mode with copy-paste retry hint
that omits already-done flags (codex round-2 verdict: no transactional state)
- Early multiaddr suffix validation before libp2p host startup
Phase 9.5.1 — daemon respawn rebinds bridge listener
- bridge-startup-rebind module ensures persisted listenAddrs are re-bound
unconditionally at daemon boot, fixing the silent-drop-after-respawn bug
hit twice during 2026-05-22 live test
Smaller smoothness fixes
- `brv bridge pin --verify` shortcut (single-step pin+promote)
- `brv bridge whoami` annotates multiaddrs with kind (loopback/lan/tailscale/wan)
and OS interface name via multiaddr-classify util
- `brv channel subscribe --all-kinds` diagnostics catch-all
- docs/channel-events.md canonical event-kinds reference
Tests: 8674 passing (+77 net new since phase start). Lint clean.
Codex reviewed across 7 rounds (PLAN.md + each impl pass). All blockers
resolved before commit. See plan/bridge-smoothness/PLAN.md for the full
review trail and migration notes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ds reference - plan/bridge-smoothness/PLAN.md: architecture for the ParleyAdapter abstraction, the seven friction-point fixes from the 2026-05-22 live test, migration notes for the 9.5.4 trust-gate behavior change, and the codex review trail (7 rounds: 2 on plan + 5 on implementation, all signed off). - docs/channel-events.md: canonical event-kinds reference for `brv channel subscribe` filtering. Documents the kinds emitted on each turn flavour (local outbound / inbound remote / ACP-driven / auto-create) plus the new channel_auto_created event from Phase 9.5.4. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Claude finished @ngduyanhece's task in 6m 17s —— View job Re-review — Channel + Bridge protocol (synchronize, 2026-05-23 10:34 UTC push)Overall: 4 / 5 (unchanged). The Phase 9.5 work is substantial (+5500 / -209 across 42 files, codex-reviewed across 7 rounds), well-tested, and structurally sound. New findings are robustness/threat-model documentation gaps, not blockers.
Diff since 2026-05-22 15:17 UTCTwo commits, only one substantive:
Total delta: +5542 / -209 across 42 files. What's good
New inline findings (5 buffered)
Prior-round findings — status
This is consistent with the PR's "internal-test cut, follow-ups tracked" framing. Verdict for this pushLGTM-incremental. Phase 9.5 closes the operator-UX gaps surfaced during the 2026-05-22 live test (silent-drop on respawn, mock-echo confusion on bad profile, 4-step setup ceremony, auto-create from unverified peers). The Claude Code adapter is the largest new surface and deserves its |
|
|
||
| const child = this.spawnFn(this.claudeBinary, spawnArgs, { | ||
| cwd: ctx.projectRoot, | ||
| env: process.env, |
There was a problem hiding this comment.
issue (security, medium — BRV_BRIDGE_CLAUDE_UNSAFE=1-gated path): the child Claude process inherits the entire daemon environment, including any BRV_API_KEY, BRV_BILLING_*, ANTHROPIC_*, or shell secrets that happen to be set. With --dangerously-skip-permissions the agent has full Bash/Read/Write access, so a verified peer can issue prompts like "print every env var with API or KEY in the name" and exfiltrate them.
The file-level comment ("Operators should run this only on a dedicated VM or sandbox") frames this as an op concern, but the threat model is sharper than that wording suggests: user-confirmed peers can read everything the daemon could read, not just whatever the operator put in projectRoot. That's the same gap the prior round flagged on local-agent-response-generator.ts — except deletion of the legacy generator means the surface is now --dangerously-skip-permissions rather than read-only-ish text streaming.
Concrete mitigation for follow-up: filter to an explicit allowlist (PATH, HOME, USER, LANG, TERM, ANTHROPIC_API_KEY if present, and a curated BRV_* subset) before spawn:
const childEnv: NodeJS.ProcessEnv = {
HOME: process.env.HOME,
PATH: process.env.PATH,
USER: process.env.USER,
...pickAnthropicEnv(process.env), // ANTHROPIC_API_KEY etc.
}Non-blocking for the internal-test cut (the entire adapter is opt-in), but INTERNAL_TEST.md §5 should call out the env-passthrough explicitly so operators don't enable BRV_BRIDGE_CLAUDE_UNSAFE=1 on a laptop with their personal credentials in ~/.zshrc.
| const maxPerHour: number = resolvedMax | ||
|
|
||
| // Map<peerId, sorted array of ISO timestamps (ascending)> | ||
| const windows = new Map<string, number[]>() |
There was a problem hiding this comment.
issue (resource, low — same shape as prior parley-nonce-lru.ts:31 / parley-rate-limit.ts:51 findings): windows: Map<string, number[]> has no global cap. A user-confirmed peer (or, post auto-trust changes, a malicious entity that has reached user-confirmed via the verify step) can attempt auto-creates from many cycled identities — the inner prunedWindow filter discards old timestamps on access, but the outer windows.set(peerId, current) call grows the map without bound. current.length === 0 arrays accumulate too — once a peer is past their 1h window, prunedWindow returns [] but the entry is still in the map; only reset() (called on channel uninvite) ever deletes.
Two cheap fixes:
- After
prunedWindow, ifcurrent.length === 0AND the new push would be the only entry, fine — but periodically iteratewindows, prune each entry, and drop empties. - Cap
windows.sizeand LRU-evict the oldest entry.
Same structural concern still applies to parley-nonce-lru.ts and parley-rate-limit.ts from the prior round — a shared cap/sweep policy across all three would close the family.
Non-blocking for internal-test (the actual memory cost per entry is small), but worth tracking.
|
|
||
| function writeToDisk(data: SessionFile): void { | ||
| mkdirSync(dirname(filePath), {recursive: true}) | ||
| const tmp = `${filePath}.tmp` |
There was a problem hiding this comment.
nit (consistency, same shape as prior alias-store.ts:198 finding): const tmp = \${filePath}.tmp`is a non-unique tmp path. The in-process write mutex serialises mutations from this daemon, but two daemon instances (a stale process from a crash + a freshly-spawned replacement during the briefrespawn-rebind` window) would race on the same path and silently overwrite each other's in-flight tmp.
install-identity-service.ts:427 already uses the pid+random pattern:
const tmp = `${target}.tmp.${process.pid}.${randomBytes(4).toString('hex')}`Replicating that pattern here (and extracting an atomicWrite() helper that both call) would close the prior round's finding and prevent the same pattern from spreading further. Non-blocking, polish.
| resultError = true | ||
| break | ||
| } else if (item.kind === 'exit') { | ||
| flushStdoutBuffer(stdoutBuffer, processEvent) | ||
| break |
There was a problem hiding this comment.
nit (robustness, low): flushStdoutBuffer(stdoutBuffer, processEvent) is called inside the exit branch immediately before break. Any items processEvent enqueues during the flush (a trailing result event without a trailing newline, e.g.) are added to the queue but the loop exits without yielding/handling them.
In practice claude's stream-json emits newlines after every event, so this is unlikely to bite. But if it does, the symptom is "subprocess exited without a result event" → ADAPTER_SUBPROCESS_FAILED thrown, and the session ID from the trailing system_init is lost. Specifically, the success-path resultSessionId assignment at line 354 never fires.
Two fixes — either is fine:
- Drain the queue after flush:
} else if (item.kind === 'exit') { flushStdoutBuffer(stdoutBuffer, processEvent) while (queue.length > 0) { const drained = queue.shift() if (drained?.kind === 'result_success') { resultReceived = true resultSessionId = drained.newSessionId } else if (drained?.kind === 'result_error') { resultError = true } } break }
- Run
flushStdoutBufferBEFORE the close handler enqueuesexit(move into close handler before enqueue), so the items go onto the queue in order.
Non-blocking — speculative against a claude version that ever omits a trailing newline.
| * Subprocess invocation per inbound turn: | ||
| * claude -p --output-format stream-json --dangerously-skip-permissions \ | ||
| * --cwd <projectRoot> [--resume <sessionId>] |
There was a problem hiding this comment.
nit (doc/code drift): The file-level invocation comment shows:
claude -p --output-format stream-json --dangerously-skip-permissions \
--cwd <projectRoot> [--resume <sessionId>]
…but buildSpawnArgs (line 429–441) does not emit --cwd. The actual cwd is set via the spawn options bag at line 206 (cwd: ctx.projectRoot). The two paths converge to the same result in practice, but the comment will mislead the next reviewer who tries to grep for --cwd in the args.
Suggest just dropping the --cwd snippet from the comment, or adding "(set via spawn options, not argv)" inline.
| if (env.BRV_BRIDGE_CLAUDE_UNSAFE === '1') { | ||
| if (args.sessionStore !== undefined && args.concurrencyGate !== undefined) { | ||
| const adapter = new ClaudeCodeHeadlessAdapter({ | ||
| concurrencyGate: args.concurrencyGate, | ||
| log: args.log, | ||
| sessionStore: args.sessionStore, | ||
| }) | ||
| registry.register(adapter) | ||
| args.log('[Daemon] Parley adapter registered: claude-code (kind=sdk-headless, UNSAFE — no permission gate)') | ||
| } else { | ||
| args.log( | ||
| '[Daemon] BRV_BRIDGE_CLAUDE_UNSAFE=1 but sessionStore or concurrencyGate not provided; ' + | ||
| 'claude-code adapter NOT registered', | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
nit (operator UX, low): When BRV_BRIDGE_CLAUDE_UNSAFE=1 is set but sessionStore or concurrencyGate is missing, the code logs a warning and continues without registering the adapter. If BRV_BRIDGE_PARLEY_PROFILE=claude-code is ALSO set, the strict-resolution path at brv-server.ts:1025-1027 will then throw ParleyAdapterNotFoundError with Available: ["mock-echo", ...] — and the operator sees "the env flag I set didn't take effect" without an obvious clue that the cause was the upstream wiring rather than their flag.
The PROFILE_HINTS['claude-code'] table tells them to set BRV_BRIDGE_CLAUDE_UNSAFE=1, but if they ALREADY did and it still didn't register, the hint is misleading.
Cheap fix: in this branch, also throw (or surface a second hint line) so the operator learns the registration failed despite the env flag. The current "log + continue" pattern is fine when daemon-wide registration is a soft-fail concern, but for an explicitly-requested profile with an explicitly-set unsafe flag, the wiring being broken should fail loud.
Non-blocking. In practice the daemon's own DI wiring at brv-server.ts:980-994 always provides both deps when the env flag is set, so the branch is effectively unreachable today — but it would silently misbehave if future wiring drift drops one of them.
…json Live two-machine test (laptop → VM, 2026-05-23) surfaced: ADAPTER_SUBPROCESS_FAILED: claude subprocess exited without a result event. stderr: Error: When using --print, --output-format=stream-json requires --verbose Claude Code 2.x requires --verbose alongside --output-format stream-json. The adapter wasn't passing it, so every cross-bridge mention failed before yielding any chunks. --verbose only adds stderr noise; the stream-json payload on stdout is unchanged, so no other parsing changes needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Claude finished @ngduyanhece's task in 2m 28s —— View job Re-review — Channel + Bridge protocol (synchronize, 2026-05-23 11:01 UTC push)Overall: 4 / 5 (unchanged from prior reviews). Scope of this re-review is narrow: one substantive commit since the previous pass.
Diff since 2026-05-23 10:34 UTC reviewOnly one substantive commit landed:
Total delta: 1 file, +6 / -0. What's good in this delta
New inline findings (3 buffered)
Prior-round findings — status
This is consistent with the PR's "internal-test cut, follow-ups tracked" framing. Verdict for this pushLGTM-incremental. |
| '-p', | ||
| '--output-format', | ||
| 'stream-json', | ||
| '--verbose', |
There was a problem hiding this comment.
nit (test coverage, low): the --verbose addition is the kind of one-line ordering tweak that silently breaks the live cross-bridge path if anyone refactors buildSpawnArgs later (e.g. an "argv reorder for readability" PR that drops a line). The live integration path tested it the hard way on 2026-05-23 (ADAPTER_SUBPROCESS_FAILED: ... requires --verbose); a one-line unit assertion would prevent the next round of "discovered via remote VM failure".
Existing claude-code-headless-adapter.test.ts already builds FakeChild + a spawn() factory — adding an args capture is trivial:
it('passes --verbose with --output-format stream-json (Claude Code 2.x compat)', () => {
let capturedArgs: string[] = []
// spawn factory: spawn: (_bin, args) => { capturedArgs = args; return child }
// … drive one turn …
expect(capturedArgs).to.include('--verbose')
expect(capturedArgs).to.include('-p')
expect(capturedArgs).to.include('stream-json')
})Non-blocking; the live-test catch is documented and the fix is correct.
| const stderrChunks: string[] = [] | ||
|
|
||
| child.stderr?.on('data', (chunk: Buffer) => { | ||
| stderrChunks.push(chunk.toString('utf8')) |
There was a problem hiding this comment.
nit (resource, low — surfaced by the --verbose addition): stderrChunks: string[] grows unbounded for the full lifetime of the turn, but only the last 500 chars (stderr.slice(-500)) are ever used (lines 397, 415). With --verbose now adding stderr noise on every turn (the PR-of-record reason for the prior commit), this becomes a small per-turn memory tax on every cross-bridge mention.
For a long-running daemon serving many parley turns, the cumulative effect is bounded (chunks are GC'd at turn end) but the peak per-turn allocation now scales with whatever Claude Code logs to stderr. Cheap fix — keep a rolling tail:
const STDERR_TAIL_BYTES = 4096
let stderrTail = ''
child.stderr?.on('data', (chunk: Buffer) => {
stderrTail = (stderrTail + chunk.toString('utf8')).slice(-STDERR_TAIL_BYTES)
})
// …
const stderr = stderrTail4KB is plenty headroom for the existing 500-char slice and the stale-session pattern matcher in isStaleSessionError. Non-blocking; the per-turn cost is small. Worth tracking with the prior round's unbounded-Map family of findings.
| * | ||
| * Subprocess invocation per inbound turn: | ||
| * claude -p --output-format stream-json --dangerously-skip-permissions \ | ||
| * --cwd <projectRoot> [--resume <sessionId>] |
There was a problem hiding this comment.
nit (doc/code drift, low): the file-level invocation comment is now drifting in two ways:
claude -p --output-format stream-json --dangerously-skip-permissions \
--cwd <projectRoot> [--resume <sessionId>]
--verbose(added in this commit) is not shown — the next reviewer who tries to grep for the canonical argv will be confused.--cwdis still shown butbuildSpawnArgsdoes not emit it (cwd is set via spawn options at line 206). This was flagged in the prior review round as well.
Both fixes are one-line; the comment is becoming the "stale doc that lies" hazard rather than load-bearing context. Either drop the invocation snippet entirely (the real argv is right there at buildSpawnArgs, with the --verbose justification comment now anchoring intent) or update it to match. Non-blocking.
…er + live evidence Refresh the internal-test guide to reflect the Phase 9.5 cut shipped today. Material changes: §1 Install — bumped HEAD to 1e23de6; `brv bridge --help` now lists `connect`. §3.1 Bridge listener — operators now export env vars in ~/.zshrc / ~/.bashrc so the daemon inherits them across respawns. Phase 9.5.1 makes the bridge start unconditional at daemon boot (no more silent-drop-after-respawn). `brv bridge whoami` output now shows interface-classified multiaddrs (loopback/lan/tailscale/wan) with a "← recommended for cross-machine" hint on Tailscale rows. §3.2 NEW: `brv bridge connect` one-command setup. Collapses the previous 4-step ceremony (pin → verify → channel new → channel invite) into a single idempotent command on each side. Documents the symmetric `--verify` requirement after the 9.5.4 trust-gate change, and the partial-failure retry-hint UX. §3.5 NEW: run Claude Code as the parley dispatcher. Full env-var setup (BRV_BRIDGE_CLAUDE_UNSAFE=1, BRV_BRIDGE_PARLEY_PROFILE=claude-code), required `--verbose` for stream-json output, security caveat about `--dangerously-skip-permissions`, and a working cross-bridge coding-task example with live evidence from 2026-05-23. §4 What works — adds three new rows: Claude Code adapter, one-command setup, channel-mirror auto-create. Updates the cancellation row to mention the Phase 9.5.3 early-abort plumbing. §5 What doesn't work yet — recasts the cross-bridge permission gap as the gating constraint for promoting `BRV_BRIDGE_CLAUDE_UNSAFE` from opt-in to default. Adds the BRIDGE_MULTIADDR_STALE recovery hint for peers that rebind on a new port. §5.1 Ops tips — adds the session-sidecar location, the auto-create-quota auto-reset on uninvite, and `brv channel subscribe --all-kinds` for diagnostics. §7 Feedback — refocuses on the new setup-pain UX (did `brv bridge connect` actually save commands?), the symmetric verify requirement, and what real cross-bridge use cases people reach for. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
||
| **⚠️ Security: `BRV_BRIDGE_CLAUDE_UNSAFE=1` is required** | ||
|
|
||
| The adapter spawns `claude -p --output-format stream-json --verbose --dangerously-skip-permissions`. Until cross-bridge permission passthrough lands, a verified peer's prompt can drive Bob's local Claude Code with Bob's filesystem and process permissions. The env gate is the explicit "yes, I know what this means" opt-in. Run this **only** on a dedicated VM/sandbox you are willing to hand to a verified peer. Default-off prevents demos from accidentally shipping the security hole. |
There was a problem hiding this comment.
nit (threat-model documentation parity, low-medium): the security callout calls out "Bob's filesystem and process permissions" but doesn't explicitly flag the environment-variable exfiltration vector that prompted the prior round's claude-code-headless-adapter.ts:207 finding. The adapter spawns Claude with env: process.env (full daemon-env passthrough), and --dangerously-skip-permissions gives the peer Bash access — so a verified peer prompt like Run "env | grep -iE 'key|token|secret|api'" exfiltrates every API key the operator has in ~/.zshrc, including ANTHROPIC_API_KEY, BRV_API_KEY, OPENAI_API_KEY, cloud creds, etc. That's a credential-theft surface — distinct from (and sharper than) the filesystem-access one.
The "dedicated VM/sandbox" mitigation is correct but doesn't tell operators what specifically not to put on that VM. Suggest expanding the callout with one concrete bullet:
> **Concretely:** the responder VM's environment is fully exposed. Do NOT put your personal `ANTHROPIC_API_KEY`, `BRV_API_KEY`, cloud creds, or shell secrets in the responder's `~/.bashrc` / `~/.zshrc`. A verified peer can extract them with one Bash prompt (`env | grep -i key`). Either run the responder in a clean VM with only the env vars listed above, or wait for the env-allowlist work tracked in [the prior review's `claude-code-headless-adapter.ts:207` finding].Also worth surfacing in §7 (feedback) — operators are unlikely to spot this on their own, and the env-passthrough is the kind of footgun that scales badly with the first non-VM internal-test setup. Non-blocking; the gate keeps the surface opt-in, but the docs are the only mitigation between operators and credential theft right now.
| ```bash | ||
| # In any project directory | ||
| brv channel onboard codex -- codex-acp # one-time per agent | ||
| brv channel new smoke |
There was a problem hiding this comment.
nit (security/UX interaction, low): the recommended internal-test combination is BRV_BRIDGE_AUTO_PROVISION=auto (§3.1, line 37) plus BRV_BRIDGE_CLAUDE_UNSAFE=1 (§3.5, line 158). Phase 9.5.4 hardened the trust gate so auto-tofu peers can't seed new channels — but they CAN still dispatch into existing channels they've been invited to, which means an unverified first-contact peer can drive the headless Claude through any pre-existing channel-member relationship.
For internal-test where channels are short-lived and the team is small, the combination is fine. But the §3.5 security callout reads as if --verify alone is the trust boundary, when in practice the AUTO_PROVISION policy is the upstream gate. Worth one extra sentence in §3.5 along the lines of:
Tighten
BRV_BRIDGE_AUTO_PROVISION=pinned-only(the production default) once the team is set up —autois convenient for first-contact pinning but lets any peer-id-cycling attacker rotate identities cheaply through existing channels.
Non-blocking; this is doc-clarity polish, not a code change.
| # Restart the daemon so it inherits the env. | ||
| pkill -f brv-server || true | ||
| sleep 2 | ||
| brv bridge whoami --format json >/dev/null |
There was a problem hiding this comment.
nit (operator-doc race, low): the pkill -f brv-server || true; sleep 2; brv bridge whoami --format json >/dev/null pattern relies on the soft assumption that 2s is enough for the dying daemon to release the libp2p port + Socket.IO daemon-discovery file. On a loaded laptop (or a VM running other work), this race occasionally loses — the next brv bridge whoami either auto-spawns a new daemon while the old one's port-bind is still tearing down (causes EADDRINUSE deep in the log), or finds the dying daemon and hits a half-shutdown state.
Two reasonable fixes — either is fine:
- Loop until
pgrep -f brv-serverreturns empty (matches the existingdaemon-discovery.tspattern):pkill -f brv-server || true while pgrep -f brv-server >/dev/null 2>&1; do sleep 0.2; done brv bridge whoami --format json >/dev/null
- Use
brv restartif that command exists in this cut (it does — seeoclif/commands/restart.ts), which has the right shutdown-then-spawn ordering wired in.
Non-blocking — operators who hit the race will just re-run the command. But the doc snippet is going to be copy-pasted into many onboarding sessions, and a deterministic version is one extra line.
|
Claude finished @ngduyanhece's task in 1m 43s —— View job Re-review — Channel + Bridge protocol (synchronize, 2026-05-23 12:13 UTC push)Overall: 4 / 5 (unchanged). Latest push is a clean merge from
Diff since 2026-05-23 12:09 UTC
Total delta on the branch: 9 files, +262 / -59 — entirely from the What's good in this delta
Inline findings buffered on this branch (from the 12:12 UTC pass on
|

brv channel + bridge - multi-agent collaboration over a wire protocol
proj/channel-protocol→main. 204 commits. Lands the entire channel-protocol body of work for internal-test rollout.Summary
@mentionor share context across them. Cross-machine collaboration between teammates' agents is impossible.brv channelsubsystem (Phases 1–5), agent-driven skill connector (Phase 8), cross-machinebrv bridgeover libp2p with TOFU identity (Phase 9 demo target + post-merge heartbeat hardening), agent quorum Tier 1 + 2 (Phase 10), transcript-storage migration. Newbrv bridge,brv trust,brv aliasCLI surfaces. Newbridge-config.jsonpersistence. SDK packages@brv/agent-sdk(TS) +brv-agent(Py). Pi-extension SDK consumer./channel:*slash commands in CLIs other than Pi. Other CLIs use the byterover skill to drivebrv channel mentionfrom their shell tool.kad-dhtbackend (Slice 9.6 interface only) — peers manually pinned viabrv trust pin --multiaddr ..../brv/parley/delegate/v1wire handler (Slice 9.9 policy gate + startup warnings only) — cross-bridge agents can answer queries but cannot perform write actions on the dialer's behalf.Type of change
INTERNAL_TEST.md, design + PHASES update)Scope (select all touched areas)
ChannelEvents,BridgeEvents,TrustEventschannel,bridge,trust,aliasbrv connectors install <agent> --type skillflowLinked issues
plan/channel-protocol/DESIGN.md,plan/channel-protocol/PHASES.md,plan/channel-protocol/IMPLEMENTATION_PHASE_9_CLOUD_BRIDGE.mdRoot cause
N/A — primarily new-feature work. The handful of bug fixes (heartbeat keep-alive, env-var persistence, channel-events test fixture lag) have their root causes documented in their individual commit messages.
Test plan
test/unit/server/infra/channel/bridge/for: bridge-config-store, audit-parley-seal, parley-verifier, parley-server, parley-end-to-end, bridge-reachability, bridge-transcript-service, bridge-driver-pool, delegate-policy, peer-multiaddr-resolver, registry-client + nonce-lru, rate-limit)mario-team-v4,ctx-exchange-v2,bobs-side-v2channels, multiple kimi/codex agents, round-trip queries + context-tree exchange, heartbeat verified across multi-minute idle gaps that previously triggeredTRANSCRIPT_TERMINAL_MISSINGtest/{unit,integration}/; see commit-by-commit log for which slice each belongs toUser-visible changes
New CLI surfaces:
brv channel new|invite|mention|cancel|show|list|members|onboard|doctor|approve|deny|archive|leave(full Phases 1–5 surface)brv channel mention --mode sync --suppress-thoughts --json --timeout <ms>for agent-driven invocations (Phase 8)brv channel mention --quorum Nfor fan-out + agreement merging (Phase 10 Tier 1)brv bridge listen|whoami|pin|ping|verifyfor cross-machine peer setupbrv trust pin|list|verifywith--multiaddr+--aliasflagsbrv alias add|list|removefor short-name routingbrv connectors install <agent> --type skillwrites byterover skill (.agents/skills/byterover/SKILL.md) into a project so the host LLM can drivebrv channel mentionfrom its shell toolNew persisted state:
<dataDir>/identity/install.{key.enc,cert.json,master.key,peer-id}— L1 identity (auto-generated on firstbrv bridge whoami)<dataDir>/identity/tree-default.{key.enc,cert.json,master.key}— L2 peer-tree identity<dataDir>/identity/known-peers.jsonl— TOFU pin store (alias / peer-id / pin-state)<dataDir>/identity/aliases.json— local alias map<dataDir>/state/bridge-config.json— NEW persistent operator-prefs forBRV_BRIDGE_*env vars (survives daemon respawns)<projectRoot>/.brv/channel-history/<channelId>/turns/<turnId>.ndjson— channel transcript (Phase 9 storage migration; was under.brv/context-tree/channel/)New behavioural defaults:
BRV_BRIDGE_AUTO_PROVISIONdefaults topinned-only(spec §7.3) — first-contact peers must be pinned viabrv trust pinbefore they can dispatch into a channelBRV_BRIDGE_MAX_CONCURRENT_PER_PROFILE=1per agent; excess inbound parleys reject with signedPARLEY_LOCAL_AGENT_BUSYBRV_BRIDGE_DELEGATE_POLICY=promptdefault (mutating-tool wire is deferred so this is type-level only)heartbeat_pingevery 10s during idle bridge streams so libp2p Yamux substreams don't time out on multi-minute LLM callsDeprecated: none.
Breaking: none on
main's existing CLI surface — channels are net-new commands. Internal:WaitForTaskOptions.timeoutMswas removed in mainf14047cae; reconciled in the cross-merge againstproj/byterover-tool-mode.Evidence
75b6c58b5(heartbeat keep-alive) — added unit test that reliably fails on pre-fix code (cross-bridge tool-using turn dies withTRANSCRIPT_TERMINAL_MISSINGafter ~120s) and passes on post-fix code (heartbeats keep the substream alive indefinitely)mario-team-v4+ctx-exchange-v2+bobs-side-v2channels) on 2026-05-20BRV_BRIDGE_*env vars, daemon reads persisted<dataDir>/state/bridge-config.jsonand resumes withcodexprofile +autoprovision +cap=3instead of silently falling back tomock-echo+pinned-only. SeeINTERNAL_TEST.md§5.1 for the operational storydescribe.skipcomment), 0 failingChecklist
npm test): 9603 passing, 26 pending, 0 failingnpm run lint)npm run typecheck)npm run build)plan/channel-protocol/DESIGN.md,plan/channel-protocol/PHASES.md, newINTERNAL_TEST.mdat repo rootmain— lastmainsyncc897ee00f(Release 3.15.0 merged)Risks and mitigations
INTERNAL_TEST.md§3 recommends Tailscale upfront.brv channel doctorsurfacesreachability(public/wildcard-unconfirmed/behind-nat-with-relay/loopback-only/unreachable) honestly. The libp2p AutoNAT/DCUtR/Circuit-Relay-v2 wire is explicitly deferred.INTERNAL_TEST.md§5 lists this as a known limitation. Thedelegate_policystartup log makes the unwired state visible. The error path returns a clearCHANNEL_AUTO_PROVISION_DECLINEDorPARLEY_REJECTEDrather than hanging.brv trust pin <peer-id> --multiaddr <new-addr>re-pins idempotently.brv channel doctorsurfaces stale-cache state. DHT-driven multiaddr refresh ships in a follow-up slice (Phase 9.6 backend).bridge-config.jsonresolver logs[Daemon] Bridge config persisted to ...whenever env supplies a value that changes the file. Revert path documented inINTERNAL_TEST.md§5.1 (rm <dataDir>/state/bridge-config.json).describe.skipped — looks worse than it is on CI.BRV_FORCE_ORIGINconsumer never implemented; Phase 10 surface change made the multi-mention rejection assertion obsolete; channel-history storage migration made the cancel-ordering integration probe stale). The actual code paths these tested are covered by unit-level tests. None are Phase-9 regressions.How to use it
A. Local multi-agent channel (single machine)
The basic flow. No network setup needed. Two terminals.
Inside another agentic CLI (Claude Code, codex, kimi-cli, opencode, Pi) the host LLM calls these commands from its bash tool — install the byterover skill once per project and the LLM picks them up:
brv connectors install Codex --type skill # or Claude Code / Cursor / Gemini CLI / etc.After that the host agent reads
.agents/skills/byterover/SKILL.mdand learns to callbrv channel mention --mode sync --suppress-thoughts --jsonwhenever the user says "ask @Kimi to review this" or similar.B. Cross-machine bridge (two laptops, two agents)
Prerequisite: both peers need a routable path to each other. Internal-test recommendation is Tailscale — install on every team member's laptop, join the same tailnet, every peer gets a stable IP that punches through every NAT. Free for ≤3 users; trivial setup. Bare-internet over heterogeneous NAT will NOT work in this cut (libp2p AutoNAT/DCUtR/Circuit-Relay-v2 are deferred).
That's the full cross-bridge loop. Two agents on different laptops exchange context via signed parley envelopes; both daemons persist the transcript; the receiver's context tree gains the shared knowledge.
C. What's the difference between
brv channelandbrv bridge?brv channelis the user-facing surface: create a channel, invite members (local or remote), mention them, see transcripts.brv bridgeis the low-level libp2p surface: bring up your install's bridge identity, pin remote peers, ping them, verify TOFU promotions.You almost always interact via
brv channelandbrv trust.brv bridgeis for debugging (e.g.brv bridge whoamito share your multiaddr,brv bridge ping <peer-id>to test reachability).What's left for a full-fledged release
Below is the list of follow-up slices that are NOT in this PR but are designed for and the next obvious work. Internal-test feedback will inform priority.
Within Phase 9 (cross-machine bridge) — 4 follow-up slices
These have interfaces + policy gates in this PR but no live backends:
@libp2p/kad-dhtintoLibp2pHost, implementDhtPeerMultiaddrResolver, daemon publishes signed peer-record every announce interval. Replaces manualbrv trust pin --multiaddrre-pinning when IPs rotate. Effort: ~1 week. Worth doing only if internal-test feedback says manual re-pinning hurts.HttpRegistryClientagainstBRV_REGISTRY_BASE_URL, allows handle-based peer discovery. Requires backend deployment (registry endpoint not yet hosted by ByteRover). Effort: ~1 week of CLI + backend infra outside this repo.autoNAT()+dcutr()+circuitRelayClient/Server()libp2p services inLibp2pHost. Ship default ByteRover-hosted relay multiaddrs. Effort: ~3–5 days CLI + ops work to deploy relays. Likely unnecessary if teams adopt Tailscale; track internal-test feedback before committing./brv/parley/delegate/v1wire protocol handler. Today: policy gate (delegate_policy: auto / prompt / deny) is enforced at the type level + logged at daemon startup, but the actual cross-bridgepermission_request → permission_decisionwire is unimplemented. Implementing this enables Alice asking Bob's agent to write to Bob's repo with cross-bridge consent. Effort: ~1 week. High-value if team's flagship use case is cross-machine delegation.Within Phase 7 (native CLI integrations) — upstream-gated
/channel:*slash command). Effort: ~200–500 LOC; merge timeline depends on Moonshot maintainers.These work without us — the skill connector in this PR (
brv connectors install <agent> --type skill) makes every ACP-speaking CLI usable via shell-tool calls. Native slash commands are quality-of-life, not load-bearing.Within Phase 8 (agent-driven channel) — upstream-async
Pre-existing tech debt outside Phase 9 scope
channel-phase{2,3}-*reference orphan code paths likeBRV_FORCE_ORIGINthat were never implemented, or test against the pre-storage-migration layout). Implement the missing consumer side OR delete the tests in a small follow-up PR.brv channel kick @<handle>/ member-remove command. Today a multiaddr/port change forces creating a new channel. Effort: ~half day. Worth doing soon if teams cycle multiaddrs frequently.Out of scope but tracked for v1.1+
@everyone/@allsemantics in channel mentions@kimi-cliacross all channels")Out of scope indefinitely