Commit caf83e4
feat(telemetry): RFC-007 operational telemetry — TelemetryCollector, MemoryManager integration, aggregator, sampler, dashboard (#85)
* feat(telemetry): commit US-001 TelemetryCollector + tests (squash from ralph/rfc-007-operational-telemetry)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(telemetry): integrate TelemetryCollector into MemoryManager (RFC-007 / US-002)
Adds automatic per-query telemetry capture to MemoryManager.recall() and
.synthesize() with zero agent code changes — telemetry activates when
ZETTELFORGE_LOG_LEVEL=DEBUG.
Key changes:
- MemoryManager.__init__ wires in TelemetryCollector singleton + correlation
slots (_telemetry_query_id, _telemetry_retrieved_notes)
- recall() gains actor= kwarg, wraps retriever/graph_retriever with timing,
calls start_query + log_recall with vector/graph latency breakdown
- synthesize() gains actor= kwarg, reuses recall's query_id (or starts fresh),
calls log_synthesis + auto_feedback_from_synthesis
- OCSF events extended with telemetry_query_id and telemetry_actor fields
- 6 integration tests verify telemetry capture paths
Mypy: 3 new errors (float→int latency_ms) fixed, 23 pre-existing errors remain.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(telemetry): add telemetry_aggregator daily report script (RFC-007 / US-003)
CLI tool that reads per-day telemetry JSONL and produces actionable operational
metrics: total queries, synthesis count, latency averages, confidence, tier
distribution, feedback stats, top utility notes, unused notes count.
Includes 5 unit tests covering: missing day handling, recall+synthesis aggregation,
tier distribution merging, feedback utility calculation, and unused notes detection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(telemetry): add human evaluation rubric and sampling script (RFC-007 / US-004)
6-question rubric covering recall relevance, synthesis value, critical gaps,
unsupported claims, latency perception, and overall trust (1-5 scale).
Scripts/human_eval_sampler.py selects 20 random synthesis briefings from
telemetry JSONL files and formats them as a structured Markdown template
for Roland's monthly review. Includes scoring summary table and human_eval
event entry schema.
9 unit tests verify telemetry reading, briefing formatting, rubric template,
and main() behavior across edge cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(telemetry): Streamlit telemetry dashboard (RFC-007 / US-005)
Closes out RFC-007. Optional visualization layer over the telemetry
JSONL — query volume, latency p50/p95, tier distribution, utility
trend, unused notes warning. Runs locally at :8501.
Design notes:
- pandas is a module-load dep (needed to build DataFrames feeding
Streamlit charts). streamlit is imported LAZILY inside render()
so the pure compute helpers (daily_volume, latency_percentiles,
tier_distribution, utility_trend, unused_notes, load_events,
to_dataframe) stay testable without Streamlit installed.
- No database dependency — reads the same ~/.amem/telemetry/*.jsonl
files the aggregator and sampler use.
- unused_notes surfaces retrieval quality issues: a note that shows
up in recall results but never earns utility >= 4 in feedback is
a likely false-positive candidate. Threshold mirrors the
auto_feedback_from_synthesis cited=4/uncited=2 contract from US-001.
- Run: streamlit run src/zettelforge/scripts/telemetry_dashboard.py
(optionally ZF_TELEMETRY_DIR=/path/to/data).
Tests: 16 unit tests over the pure compute layer — covers load_events
(missing dir, multi-day, corrupt-line tolerance), daily_volume (per-day
per-type counts, excludes feedback, empty-df handling), latency
percentiles (p50/p95/max, missing event type), tier_distribution
(summation across events, missing-column defense), utility_trend
(daily mean, empty-feedback case), unused_notes (retrieved-but-not-cited
detection, all-cited case, empty-df case). Skips gracefully if pandas
is missing.
ruff clean. Dashboard is documented as optional; tests run under
CI environments without streamlit installed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: RFC-007 design doc + telemetry section in troubleshoot guide
- docs/rfcs/RFC-007-operational-telemetry.md — full design doc for the
operational telemetry system shipped in US-001 through US-005.
Includes the four DD-1..DD-4 design decisions (caller-opt-in query_id
correlation, narrow-scope latency instrumentation, OCSF unmapped
extension, hybrid __new__-bypass integration tests) resolved by
subagent review before implementation.
- docs/how-to/troubleshoot.md — adds a "Logs and diagnostics" telemetry
subsection so operators know:
1. telemetry JSONL lives at ~/.amem/telemetry/telemetry_YYYY-MM-DD.jsonl
(parallel to OCSF log at ~/.amem/zettelforge.log)
2. aggregator CLI: python -m zettelforge.scripts.telemetry_aggregator
3. sampler CLI: python -m zettelforge.scripts.human_eval_sampler
4. optional dashboard: streamlit run telemetry_dashboard.py
5. privacy contract: raw note content never persisted, query text
truncated at 200/500 chars by mode, local-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(lint): address ruff findings in US-003/US-004 telemetry scripts
Auto-fixes + one manual rename (l → ln to avoid E741 ambiguous
single-letter variable). Applies ruff --fix clean across all 10 files
in the PR. Local CI ruff was satisfied earlier but the repo's CI runs
a stricter ruleset on PR — this reconciles them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(format): ruff format pass across RFC-007 telemetry files
CI's format step (ruff format --check) is stricter than the lint step
and was failing on 8 of the 10 PR files. Ran ruff format across all
modified sources; tests still pass (57/57). No logic changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>1 parent 157ec5a commit caf83e4
13 files changed
Lines changed: 2610 additions & 3 deletions
File tree
- docs
- how-to
- rfcs
- src/zettelforge
- scripts
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
161 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
162 | 163 | | |
163 | 164 | | |
164 | 165 | | |
| |||
173 | 174 | | |
174 | 175 | | |
175 | 176 | | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
176 | 198 | | |
177 | 199 | | |
178 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
0 commit comments