Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "posthog",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from your AI coding tool. Optionally capture Claude Code sessions to PostHog LLM Analytics.",
"version": "1.1.45",
"version": "1.1.46",
"author": {
"name": "PostHog",
"email": "hey@posthog.com",
Expand Down
2 changes: 1 addition & 1 deletion .codex-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "posthog",
"version": "1.0.43",
"version": "1.0.44",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Codex",
"author": {
"name": "PostHog",
Expand Down
2 changes: 1 addition & 1 deletion .cursor-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "posthog",
"displayName": "PostHog",
"version": "1.1.39",
"version": "1.1.40",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Cursor",
"author": {
"name": "PostHog",
Expand Down
2 changes: 1 addition & 1 deletion gemini-extension.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "posthog",
"version": "1.0.41",
"version": "1.0.42",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Gemini CLI",
"mcpServers": {
"posthog": {
Expand Down
6 changes: 6 additions & 0 deletions skills/.sync-manifest
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ copying-flags-across-projects
creating-ai-subscription
creating-an-endpoint
creating-experiments
creating-online-evaluations
creating-replay-vision-scanners
debugging-local-replay
debugging-signals-pipeline
Expand All @@ -31,6 +32,7 @@ diagnosing-missing-recordings
diagnosing-sdk-health
diagnosing-stacktrace-symbolication
downloading-batch-export-files
exploring-ai-failures
exploring-apm-traces
exploring-autocapture-events
exploring-endpoint-execution-logs
Expand All @@ -53,6 +55,7 @@ finding-replay-for-issue
finding-sessions-to-watch
formatting-insight-axes
grouping-noisy-errors
improving-mcp-tools
inbox-exploration
instrument-error-tracking
instrument-feature-flags
Expand Down Expand Up @@ -89,13 +92,16 @@ signals-scout-health-checks
signals-scout-inbox-validation
signals-scout-insight-alerts
signals-scout-logs
signals-scout-mcp-tool-calls
signals-scout-observability-gaps
signals-scout-product-analytics
signals-scout-replay-vision
signals-scout-revenue-analytics
signals-scout-session-replay
signals-scout-skills-store
signals-scout-surveys
signals-scout-web-analytics
signals-scout-web-vitals
skills-store
suggesting-data-imports
suppressing-noisy-errors
Expand Down
255 changes: 104 additions & 151 deletions skills/authoring-scouts/SKILL.md

Large diffs are not rendered by default.

76 changes: 29 additions & 47 deletions skills/authoring-scouts/references/dedupe-and-memory.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,26 @@
# Dedupe and memory conventions

How a scout decides what to do with a candidate observation, how it writes durable
scratchpad entries, and the noise patterns common across PostHog projects. Author your
scout's **Decide** and **Save-memory** sections around these — they're how the fleet avoids
re-emitting and gets smarter every run. This mirrors
`signals-scout-general/references/conventions.md`.
How a scout decides what to do with a candidate observation, how it writes durable scratchpad entries, and the noise patterns common across PostHog projects.
Author your scout's **Decide** and **Save-memory** sections around these — they're how the fleet avoids re-emitting and gets smarter every run.
This mirrors `signals-scout-general/references/conventions.md`.

## The four states

Every scout classifies each candidate finding against prior runs and the scratchpad before
emitting. Bake this classifier into the scout's Decide section:
Every scout classifies each candidate finding against prior runs and the scratchpad before emitting.
Bake this classifier into the scout's Decide section:

1. **Net new** — no prior run mentions the topic, no scratchpad entry covers it.
→ Emit if it clears the confidence bar (≥ 0.65).
2. **Material update on a prior run** — a prior run covered it, but there's new evidence (a
different corroborating source, a fresh deploy correlation, contradicting data, a
meaningful escalation in scope). → **Emit fresh, citing the prior `finding_id`** in the
description and the evidence list (`source_product: signals_scout`, `entity_id: <prior>`).
1. **Net new** — no prior run mentions the topic, no scratchpad entry covers it. → Emit if it clears the confidence bar (≥ 0.65).
2. **Material update on a prior run** — a prior run covered it, but there's new evidence (a different corroborating source, a fresh deploy correlation, contradicting data, a meaningful escalation in scope). → **Emit fresh, citing the prior `finding_id`** in the description and the evidence list (`source_product: signals_scout`, `entity_id: <prior>`).
The inbox groups by dedupe key.
3. **Same fact already covered** — a prior run emitted with the same evidence shape.
→ Skip. Optionally rewrite a scratchpad entry confirming the topic stayed quiet.
4. **Already-addressed or noise** — a scratchpad entry with an `addressed:` / `noise:` /
`dedupe:` prefix names the entity with a "team aware" note. → Skip; note it in the run
summary.
3. **Same fact already covered** — a prior run emitted with the same evidence shape. → Skip.
Optionally rewrite a scratchpad entry confirming the topic stayed quiet.
4. **Already-addressed or noise** — a scratchpad entry with an `addressed:` / `noise:` / `dedupe:` prefix names the entity with a "team aware" note. → Skip; note it in the run summary.

## Scratchpad memory

The scratchpad is durable, per-team prose keyed by string. It has no tags or TTLs — **the
category is encoded in the key prefix** so a future run finds an entry with a single `text=`
search. Re-using a key rewrites the entry in place (the idempotent refresh — use it to
confirm a quiet observation without duplicating entries).
The scratchpad is durable, per-team prose keyed by string.
It has no tags or TTLs — **the category is encoded in the key prefix** so a future run finds an entry with a single `text=` search.
Re-using a key rewrites the entry in place (the idempotent refresh — use it to confirm a quiet observation without duplicating entries).

| Prefix | Use for |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
Expand All @@ -42,12 +33,9 @@ confirm a quiet observation without duplicating entries).
| `mcp-gap:` | Scout-noticed gap in the MCP surface worth raising later. |
| `report:` | A report this scout authored via the report channel — stores the `report_id` so the next run edits/dedups against it instead of re-filing. See [`report-contract.md`](report-contract.md). |

Format: `<prefix>:<domain>:<entity>` — e.g. `pattern:error_tracking:baseline`,
`noise:logs:rabbitmq-deploy-window`, `dedupe:csp_violations:a1b2c3d4`. Each canonical
specialist has its own `<domain>` label (`error_tracking`, `logs`, `llm_analytics`,
`experiments`, `feature-flags`, `session-replay`, `web-analytics`, `pipelines`, `health`,
…) — not a closed set. A new scout introduces its own domain label and reuses the
prefixes; match the label a surface's existing entries already use.
Format: `<prefix>:<domain>:<entity>` — e.g. `pattern:error_tracking:baseline`, `noise:logs:rabbitmq-deploy-window`, `dedupe:csp_violations:a1b2c3d4`.
Each canonical specialist has its own `<domain>` label (`error_tracking`, `logs`, `llm_analytics`, `experiments`, `feature-flags`, `session-replay`, `web-analytics`, `pipelines`, `health`, …) — not a closed set.
A new scout introduces its own domain label and reuses the prefixes; match the label a surface's existing entries already use.

## When to write memory vs. emit

Expand All @@ -72,29 +60,23 @@ content: "2026-05-01: surfaced UndefinedTable on access_control_propertyaccessco
already-surfaced."
```

Why it works: dated, names the entity id, gives a clear conditional ("still firing →
escalate; quiet → skip"), bounded by a precise time anchor, and the key prefix makes it
findable. Bad entry: key `note-1`, content "we have errors today, FYI" — no actionability,
no entity, no condition, uncategorized key the next run can't find or act on.
Why it works: dated, names the entity id, gives a clear conditional ("still firing → escalate; quiet → skip"), bounded by a precise time anchor, and the key prefix makes it findable.
Bad entry: key `note-1`, content "we have errors today, FYI" — no actionability, no entity, no condition, uncategorized key the next run can't find or act on.

Give your scout 2–3 worked example entries scoped to its surface so each run matches the
format instead of inventing its own.
Give your scout 2–3 worked example entries scoped to its surface so each run matches the format instead of inventing its own.

## Cross-project noise patterns

These are noise across essentially all PostHog projects — list the relevant ones in your
scout's **Disqualifiers** so it skips them unless there's a real escalation:
These are noise across essentially all PostHog projects — list the relevant ones in your scout's **Disqualifiers** so it skips them unless there's a real escalation:

- **Single-user, single-session events** — one user, one occurrence, no other signal.
Almost always a personal browser quirk.
- **Dev-environment bursts** — high counts whose `service` / `properties.env` is
`dev` / `local` / `test`. Filter before weighing.
- **Sandbox-internal errors** — Docker `TimeoutExpired`, sandbox sync failures, `agentsh`
errors. Internal harness operations, not user-facing.
- **Single-session frontend state quirks** — e.g. KEA store-path errors; not user-impacting
unless distinct-user counts climb.
- **Known upstream provider errors** — Anthropic / OpenAI rate limits, third-party outages
already covered by past memory. Don't re-emit unless volume or shape changes meaningfully.

The team's scratchpad extends this list per-project as the scout learns — which is exactly
why the save-memory discipline matters.
- **Dev-environment bursts** — high counts whose `service` / `properties.env` is `dev` / `local` / `test`.
Filter before weighing.
- **Sandbox-internal errors** — Docker `TimeoutExpired`, sandbox sync failures, `agentsh` errors.
Internal harness operations, not user-facing.
- **Single-session frontend state quirks** — e.g. KEA store-path errors; not user-impacting unless distinct-user counts climb.
- **Known upstream provider errors** — Anthropic / OpenAI rate limits, third-party outages already covered by past memory.
Don't re-emit unless volume or shape changes meaningfully.

The team's scratchpad extends this list per-project as the scout learns — which is exactly why the save-memory discipline matters.
78 changes: 34 additions & 44 deletions skills/authoring-scouts/references/emit-contract.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
# The emit contract

How a scout calls `signals-scout-emit-signal`, and how to write a scout's **Decide**
section so it emits well-calibrated findings. This is the contract the signal-emitting fleet
runs on — author your scout so its findings fit this shape. (The canonical generalist,
`signals-scout-general`, is report-only and authors `SignalReport`s directly instead, via the
report channel — whose contract rides in the harness prompt, not a bundled reference.) The
harness validates request shape but does **not** grade prose quality; that's on the scout.
How a scout calls `signals-scout-emit-signal`, and how to write a scout's **Decide** section so it emits well-calibrated findings.
This is the contract the signal-emitting fleet runs on — author your scout so its findings fit this shape.
(The canonical generalist, `signals-scout-general`, is report-only and authors `SignalReport`s directly instead, via the report channel — whose contract rides in the harness prompt, not a bundled reference.)
The harness validates request shape but does **not** grade prose quality; that's on the scout.

## Fields

Expand All @@ -23,9 +21,9 @@ harness validates request shape but does **not** grade prose quality; that's on

## Confidence — the emit gate

`confidence` = how sure the scout is the finding is real. It is the emit gate: a finding the
scout can't stand behind belongs in the scratchpad, not the inbox. The scout does not rank
findings itself — the inbox handles ordering once a finding is emitted.
`confidence` = how sure the scout is the finding is real.
It is the emit gate: a finding the scout can't stand behind belongs in the scratchpad, not the inbox.
The scout does not rank findings itself — the inbox handles ordering once a finding is emitted.

**Confidence rubric:**

Expand All @@ -36,58 +34,53 @@ findings itself — the inbox handles ordering once a finding is emitted.
| 0.40–0.64 | Suggestive pattern with material gaps a human should validate. |
| 0.00–0.39 | Don't emit — gather more evidence or skip. |

**The emit gate:** if a scout can't reach `confidence ≥ 0.65`, it should write a scratchpad
entry instead of emitting. Bake this threshold into the scout's Decide section.
**The emit gate:** if a scout can't reach `confidence ≥ 0.65`, it should write a scratchpad entry instead of emitting.
Bake this threshold into the scout's Decide section.

## Severity

`P0`–`P4`, informational only — use consistently. P0: active critical (data loss, outage,
security). P1: active material (errors hitting many users, billing). P2: confirmed,
contained. P3: suspected or minor confirmed. P4: curiosity / FYI. Recommendation-style
scouts (e.g. observability gaps) emit P3 by default rather than P0–P2 anomalies.
`P0`–`P4`, informational only — use consistently.
P0: active critical (data loss, outage, security).
P1: active material (errors hitting many users, billing).
P2: confirmed, contained.
P3: suspected or minor confirmed.
P4: curiosity / FYI.
Recommendation-style scouts (e.g. observability gaps) emit P3 by default rather than P0–P2 anomalies.

## Description prose contract

The description is what a busy human reads in a feed of 30 other findings. Aim for one tight
paragraph (3–6 sentences):
The description is what a busy human reads in a feed of 30 other findings.
Aim for one tight paragraph (3–6 sentences):

1. **Hook** — what's happening, **quantified** ("434 occurrences across 434 distinct users"
beats "many users").
2. **Pattern** — the shape that makes this signal, not noise ("one occurrence per user →
per-request server path").
1. **Hook** — what's happening, **quantified** ("434 occurrences across 434 distinct users" beats "many users").
2. **Pattern** — the shape that makes this signal, not noise ("one occurrence per user → per-request server path").
3. **Hypothesis** — the suspected cause.
4. **Lineage** — if a prior run touched a related topic, cite its `finding_id`.
5. **Recommendation** — the action that would resolve it.

Cite entity ids (issue ids, recording ids, dashboard short_ids) inline so a human pivots
straight from prose to source.
Cite entity ids (issue ids, recording ids, dashboard short_ids) inline so a human pivots straight from prose to source.

## Evidence

Each entry `{source_product, summary, entity_id?}`, capped at 20. Include a citation for
**every** concrete claim in the description. `source_product` is a short origin label —
common values: `error_tracking`, `session_replay`, `logs`, `feature_flag`, `experiment`,
`web_analytics`, `data_warehouse`, `query_runs`, `signals_scout` (cite a prior run/finding),
`inbox` (cite a report). `entity_id` pins the citable id.
Each entry `{source_product, summary, entity_id?}`, capped at 20.
Include a citation for **every** concrete claim in the description.
`source_product` is a short origin label — common values: `error_tracking`, `session_replay`, `logs`, `feature_flag`, `experiment`, `web_analytics`, `data_warehouse`, `query_runs`, `signals_scout` (cite a prior run/finding), `inbox` (cite a report).
`entity_id` pins the citable id.

## Dedupe keys

Stable strings the inbox uses to group related findings across runs and sources. Format
`<kind>:<entity_id>` or `<kind>:<entity_id>:<qualifier>`. Common kinds:
`error_tracking_issue:<id>`, `experiment:<id>`, `feature_flag:<key>`, `dashboard:<id>`,
`insight:<short_id>`, `missing_migration:<table>`, `traffic_anomaly:<event>`. Include 1–2
per finding; more is fine when a finding spans entities. **This is the primary anti-duplicate
mechanism — design your scout's dedupe keys deliberately.**
Stable strings the inbox uses to group related findings across runs and sources.
Format `<kind>:<entity_id>` or `<kind>:<entity_id>:<qualifier>`.
Common kinds: `error_tracking_issue:<id>`, `experiment:<id>`, `feature_flag:<key>`, `dashboard:<id>`, `insight:<short_id>`, `missing_migration:<table>`, `traffic_anomaly:<event>`.
Include 1–2 per finding; more is fine when a finding spans entities.
**This is the primary anti-duplicate mechanism — design your scout's dedupe keys deliberately.**

## finding_id (not a dedupe key)

`finding_id` is a stable, human-readable trace id tying the emitted signal back to its run.
It is **not** used for idempotency: `emit_signal` dedupes on its own generated `document_id`
and your `dedupe_keys`, never on `finding_id`. **Re-calling emit with the same `finding_id`
writes a second signal — so a scout must never retry an emit that may already have
succeeded.** Format `<topic>-<entity>-<date>`, e.g.
`missing-migration-access-control-propertyaccesscontrol-2026-05-01`. A recurrence on a later
day is a new finding that cites the prior `finding_id` in its description.
It is **not** used for idempotency: `emit_signal` dedupes on its own generated `document_id` and your `dedupe_keys`, never on `finding_id`.
**Re-calling emit with the same `finding_id` writes a second signal — so a scout must never retry an emit that may already have succeeded.** Format `<topic>-<entity>-<date>`, e.g. `missing-migration-access-control-propertyaccesscontrol-2026-05-01`.
A recurrence on a later day is a new finding that cites the prior `finding_id` in its description.

## Worked example

Expand Down Expand Up @@ -120,7 +113,4 @@ description: |
migration is in the deployed set, running it, then verifying the issue stops firing.
```

Why it's good: quantified hook (434/434 in a precise window), pattern explained ("one hit
per user" rules out alternatives), lineage cited so the inbox groups it, actionable
recommendation, dual dedupe keys (issue-id + topic), P1 justified by blast radius, confidence
0.9 because the pattern is unambiguous.
Why it's good: quantified hook (434/434 in a precise window), pattern explained ("one hit per user" rules out alternatives), lineage cited so the inbox groups it, actionable recommendation, dual dedupe keys (issue-id + topic), P1 justified by blast radius, confidence 0.9 because the pattern is unambiguous.
Loading