Skip to content

Commit 9952482

Browse files
authored
Merge pull request #10 from TransluceAI/transcript-slices
Transcript slices
2 parents b39df88 + c30512d commit 9952482

3 files changed

Lines changed: 72 additions & 4 deletions

File tree

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"name": "docent",
1212
"source": "./plugins/docent",
1313
"description": "Docent AI analysis tools for Claude Code",
14-
"version": "1.0.0",
14+
"version": "0.1.7",
1515
"author": {
1616
"name": "TransluceAI"
1717
},
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
22
"name": "docent",
3-
"version": "0.1.6",
3+
"version": "0.1.7",
44
"description": "Docent AI analysis tools"
55
}

plugins/docent/skills/docent/readings-reference.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,12 @@ Access attributes to get `ColumnRef` objects (e.g., `rows.transcript`).
4343

4444
When you use a ColumnRef in a prompt template, you should make its type explicit with `.as_type()`. The type can be:
4545
* transcript
46+
* transcript_slice
4647
* agent_run
4748
* reading_result
4849
* text
4950

50-
For `text`, the literal text from that column will be embedded in the prompt. For other types, the column will be interpreted as the UUID of an object in the database, and that object will be formatted as a string and embedded in the prompt.
51+
For `text`, the literal text from that column will be embedded in the prompt. For most other types, the column will be interpreted as the UUID of an object in the database, and that object will be formatted as a string and embedded in the prompt. The exception is `transcript_slice`, whose column value is a JSON object produced by the DQL `transcript_slice(transcript_id, start_idx, end_idx)` function (see the **Transcript slices** section below).
5152

5253
When you specify a type, you are also specifying whether the prompt slot is scalar or list-valued:
5354
* `.as_type("transcript")` means scalar and defaults to `is_list=False`
@@ -89,7 +90,7 @@ reading = client.read(
8990
)
9091
```
9192

92-
Other ref types for scripted readings: `AgentRunRef(id, collection_id)`, `ReadingResultRef(id, collection_id)`.
93+
Other ref types for scripted readings: `AgentRunRef(id, collection_id)`, `TranscriptSliceRef(transcript_id, start_idx, end_idx, agent_run_id, collection_id)`, `ReadingResultRef(id, collection_id)`.
9394

9495
Parameters:
9596
- `prompt_template` or `prompts_list` (mutually exclusive)
@@ -104,6 +105,73 @@ Parameters:
104105
- `"results"`: always create a new reading record, but reuse individual results to avoid redundant LLM calls
105106
- `"none"`: no caching — force full re-evaluation
106107

108+
### Transcript slices
109+
110+
A `transcript_slice` parameter renders a contiguous message range on a specific transcript instead of the whole transcript. The range is inclusive on both ends (`start_idx`, `end_idx`), and rendered block labels preserve the original transcript message indices so the LLM can still cite by absolute position. Negative indices are valid and interpreted like in Python, e.g. to get the last 5 transcript blocks you could set start_idx=-5 end_idx=-1.
111+
112+
Transcript slices are a specialized feature, and should only be used if the user's request strongly implies that they're the right tool (e.g. "look at the last 5 messages of each transcript").
113+
114+
**Template reading.** Produce slice references directly in DQL with the `transcript_slice(transcript_id, start_idx, end_idx)` function, then annotate the column with `.as_type("transcript_slice")`:
115+
116+
```python
117+
slices = client.query(
118+
collection_id,
119+
"""
120+
WITH windows AS (
121+
SELECT
122+
t.id AS transcript_id,
123+
GREATEST(0, CAST(t.metadata_json->>'first_error_idx' AS INTEGER) - 3) AS start_idx,
124+
CAST(t.metadata_json->>'first_error_idx' AS INTEGER) + 3 AS end_idx
125+
FROM transcripts t
126+
WHERE t.metadata_json ? 'first_error_idx'
127+
)
128+
SELECT transcript_slice(transcript_id, start_idx, end_idx) AS window
129+
FROM windows
130+
""",
131+
name="Error context windows",
132+
)
133+
134+
reading = client.read(
135+
prompt_template=[
136+
"Explain what went wrong in this excerpt: ",
137+
slices.window.as_type("transcript_slice"),
138+
],
139+
model="openai/gpt-5.4-mini",
140+
name="Explain error contexts",
141+
)
142+
```
143+
144+
Notes on the DQL function:
145+
* `transcript_slice()` must be called with exactly three arguments and emits a JSON object value. It is allowed anywhere a scalar expression is valid (including inside `CASE`, `DISTINCT`, `ORDER BY`, or `ARRAY_AGG(...)` for list-valued slots).
146+
* Access control and collection scoping come from the underlying transcript; indices outside the transcript simply render fewer messages rather than erroring.
147+
* `start_idx` and `end_idx` may be equal to render a single message.
148+
149+
**Scripted reading.** Construct a `TranscriptSliceRef` per prompt. Use this when the slice indices come from Python logic rather than SQL (e.g., derived from earlier reading results):
150+
151+
```python
152+
from docent import TranscriptSliceRef
153+
154+
reading = client.read(
155+
prompts_list=[
156+
[
157+
"Summarize this excerpt: ",
158+
TranscriptSliceRef(
159+
transcript_id="<transcript-uuid>",
160+
start_idx=10,
161+
end_idx=25,
162+
agent_run_id="<run-uuid>",
163+
collection_id=collection_id,
164+
),
165+
],
166+
],
167+
model="openai/gpt-5.4-mini",
168+
name="Slice summaries",
169+
)
170+
```
171+
172+
**Context config for slices.** `TranscriptSliceContextConfig` exposes the same filters as `TranscriptContextConfig` (`transcript_metadata`, `message_metadata`); defaults are listed under **Context configs** above. Attach it the same way as other context configs — via `context_configs={param_name: TranscriptSliceContextConfig(...)}` for template readings, or `TranscriptSliceRef(..., context_config=TranscriptSliceContextConfig(...))` for scripted readings. As with other context configs, changing it changes the reading's content hash and therefore its cache identity.
173+
174+
107175
### Context configs
108176
Use context configs to control which metadata and transcript subtrees are rendered when a reading prompt includes an `agent_run`, `transcript`, or `transcript_slice` parameter. Context configs do not change which rows DQL selects; they only change how selected context items are formatted for the LLM. They are part of the reading config/cache identity, so changing them creates a different reading.
109177

0 commit comments

Comments
 (0)