Skip to content

Commit fd8ab7f

Browse files
illeatmyhatclaude
andcommitted
feat(platform-integrations): automate provenance matching with native-transcript awareness
Provenance was a fully-manual procedure with no deterministic plumbing, so the recall->entity->trajectory loop couldn't be closed or tested. Add provenance.py: - candidates: read audit recall rows, skip already-influenced pairs, resolve each entity id <type>/<name> to its file, locate the session trajectory, and emit JSONL judgment candidates (entities/trajectories that can't be found are emitted with a missing:[...] field, never silently dropped) - record: validate + persist an influence verdict via the existing log_influence writer (no duplicated write logic) - trajectory locator now also reads the NATIVE Claude transcript at ~/.claude/projects/<slug>/<sid>.jsonl (slug logic shared with doctor), so provenance works in the hookless world where no .evolve/trajectories/ is written The semantic verdict (followed/contradicted/not_applicable) stays agent-driven; provenance.py does only the deterministic matching/resolution + recording. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 75cde00 commit fd8ab7f

11 files changed

Lines changed: 2427 additions & 170 deletions

File tree

platform-integrations/bob/evolve-lite/skills/evolve-lite-provenance/SKILL.md

Lines changed: 83 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -7,58 +7,107 @@ description: Analyze saved trajectories and recall audit events offline to recor
77

88
## Overview
99

10-
This skill runs after one or more sessions have completed. It reads saved trajectories from `.evolve/trajectories/`, matches them to `recall` events in `.evolve/audit.log`, and records post-hoc `influence` events for recalled guidelines.
10+
This skill runs after one or more sessions have completed. It reads `recall`
11+
events from `.evolve/audit.log`, locates each session's trajectory, and records
12+
post-hoc `influence` events for the recalled guidelines.
1113

12-
Use this skill when you want to compute usage provenance without coupling the work to the live learn step.
14+
The mechanical work — reading recall rows, skipping already-assessed pairs,
15+
resolving entity files, and locating trajectories — is done deterministically by
16+
`provenance.py candidates`. Your job is the judgment: read each candidate and
17+
decide whether the recalled guideline was `followed`, `contradicted`, or
18+
`not_applicable`, then persist that verdict.
1319

14-
## Workflow
15-
16-
### Step 1: Load Recall Events
17-
18-
Read `.evolve/audit.log` as JSONL. Find entries where `event == "recall"` and `entities` is a non-empty list.
19-
20-
Skip any recall event that already has `influence` entries for the same `session_id` and entity ids. Do not write duplicate influence records.
21-
22-
### Step 2: Locate Saved Trajectories
20+
Use this skill when you want to compute usage provenance without coupling the
21+
work to the live learn step.
2322

24-
List `.evolve/trajectories/` and match each recall event to a trajectory by `session_id`.
25-
26-
Matching strategy (in order):
27-
1. `claude-transcript_<session-id>.jsonl` - the stop-hook transcript dump; the session id is in the filename.
28-
2. `trajectory_<timestamp>_<session-id>.json` - written by the evolve-lite:save-trajectory skill when a session id is available. Match on the `<session-id>` slice of the filename.
29-
3. `trajectory_<timestamp>.json` - open the file and match its top-level `session_id` field against the recall event. Only fall back to this step when the filename alone does not identify the session.
30-
31-
If none of the above yields a confident match for a recall event, skip it. Do not guess.
32-
33-
### Step 3: Read Recalled Entities
23+
## Workflow
3424

35-
For each recalled entity id, open `.evolve/entities/<id>.md`. The id is a path relative to `.evolve/entities/` without the `.md` suffix, such as `guideline/foo` or `subscribed/alice/guideline/foo`.
25+
### Step 1: Get candidates
3626

37-
Read the entity content and trigger. Skip ids whose files are missing.
27+
Run the candidate builder. It emits one JSON object per line (JSONL), one per
28+
unresolved `(session_id, entity)` recall pair:
3829

39-
### Step 4: Assess Influence
30+
```bash
31+
python3 .bob/skills/evolve-lite-provenance/scripts/provenance.py candidates
32+
```
4033

41-
Compare each recalled entity with the matched trajectory. Pick exactly one verdict:
34+
Each candidate looks like:
4235

43-
- `followed` - the agent's actual actions are consistent with the guideline.
44-
- `contradicted` - the guideline applied, but the agent did the opposite or repeated the avoidable dead end.
45-
- `not_applicable` - the guideline was recalled but did not apply to this session.
36+
```json
37+
{
38+
"session_id": "<session-id>",
39+
"entity_id": "<type>/<name>",
40+
"entity_excerpt": "<frontmatter + content of the entity file>",
41+
"trajectory_path": "/path/to/transcript.jsonl",
42+
"trajectory_excerpt": "<head of the trajectory transcript>",
43+
"missing": ["trajectory"]
44+
}
45+
```
4646

47-
Keep `evidence` to one short sentence citing a concrete action, tool call, or absence in the trajectory.
47+
Notes:
48+
49+
- `entity_id` is the path relative to `.evolve/entities/` without the `.md`
50+
suffix, e.g. `feedback/foo`, `guideline/bar`, or
51+
`subscribed/alice/guideline/baz`.
52+
- Pairs that already have an `influence` row are skipped for you — the builder
53+
reuses the same dedup rule used when influence rows are written. You will
54+
never be handed a duplicate.
55+
- The trajectory locator checks `.evolve/trajectories/` first, then falls back
56+
to the native Claude transcript at
57+
`~/.claude/projects/<slug>/<session-id>.jsonl`. This means provenance works
58+
even when no `.evolve/trajectories/` file was written.
59+
- If an entity file or trajectory cannot be found, the candidate is still
60+
emitted with a `missing: [...]` field so the gap is visible. When the
61+
trajectory is missing you usually cannot judge the pair — skip it (do not
62+
guess), unless the entity content alone makes `not_applicable` certain.
63+
64+
### Step 2: Judge each candidate
65+
66+
For each candidate, read `entity_excerpt` (and open `trajectory_path` for the
67+
full transcript if the excerpt is not enough). Compare the recalled guideline
68+
against the agent's actual actions in the trajectory and pick exactly one
69+
verdict:
70+
71+
- `followed` — the agent's actual actions are consistent with the guideline.
72+
- `contradicted` — the guideline applied, but the agent did the opposite or
73+
repeated the avoidable dead end.
74+
- `not_applicable` — the guideline was recalled but did not apply to this
75+
session.
76+
77+
Keep `evidence` to one short sentence citing a concrete action, tool call, or
78+
absence in the trajectory. This judgment is yours — there is no heuristic
79+
fallback.
80+
81+
### Step 3: Record verdicts
82+
83+
Persist each verdict. Either pipe one verdict per call to `provenance.py
84+
record`:
4885

49-
### Step 5: Write Influence Events
86+
```bash
87+
echo '{
88+
"session_id": "<session-id>",
89+
"entity": "<type>/<name>",
90+
"verdict": "followed",
91+
"evidence": "Agent used the saved parser before trying shell fallbacks."
92+
}' | python3 .bob/skills/evolve-lite-provenance/scripts/provenance.py record
93+
```
5094

51-
Pipe one JSON payload per assessed session to the helper:
95+
…or, to batch many assessments for one session in a single call, pipe to the
96+
underlying writer directly:
5297

5398
```bash
5499
echo '{
55100
"session_id": "<session-id>",
56101
"assessments": [
57-
{"entity": "guideline/<slug>", "verdict": "followed", "evidence": "Agent used the saved parser before trying shell fallbacks."}
102+
{"entity": "feedback/foo", "verdict": "followed", "evidence": "Agent followed it."},
103+
{"entity": "guideline/bar", "verdict": "not_applicable", "evidence": "Did not apply."}
58104
]
59105
}' | python3 .bob/skills/evolve-lite-provenance/scripts/log_influence.py
60106
```
61107

62-
The `entity` value must match exactly what appeared in the recall event, including any `subscribed/<source>/` prefix.
108+
Both paths write the identical `influence` audit row and skip duplicates. The
109+
`entity` value must match the candidate's `entity_id` exactly, including any
110+
`subscribed/<source>/` prefix.
63111

64-
It is valid to emit an empty `assessments` list when recall events exist but no recalled guideline can be assessed.
112+
It is valid to record nothing when recall events exist but no recalled guideline
113+
can be assessed (e.g. every candidate is missing its trajectory).

0 commit comments

Comments
 (0)