PostHog · releaser-ai-plugin · Jul 3, 2026
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "posthog",
   "description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from your AI coding tool. Optionally capture Claude Code sessions to PostHog LLM Analytics.",
-  "version": "1.1.45",
+  "version": "1.1.46",
   "author": {
     "name": "PostHog",
     "email": "hey@posthog.com",

diff --git a/.codex-plugin/plugin.json b/.codex-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "posthog",
-  "version": "1.0.43",
+  "version": "1.0.44",
   "description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Codex",
   "author": {
     "name": "PostHog",

diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "posthog",
   "displayName": "PostHog",
-  "version": "1.1.39",
+  "version": "1.1.40",
   "description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Cursor",
   "author": {
     "name": "PostHog",

diff --git a/gemini-extension.json b/gemini-extension.json
@@ -1,6 +1,6 @@
 {
   "name": "posthog",
-  "version": "1.0.41",
+  "version": "1.0.42",
   "description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Gemini CLI",
   "mcpServers": {
     "posthog": {

diff --git a/skills/.sync-manifest b/skills/.sync-manifest
@@ -18,6 +18,7 @@ copying-flags-across-projects
 creating-ai-subscription
 creating-an-endpoint
 creating-experiments
+creating-online-evaluations
 creating-replay-vision-scanners
 debugging-local-replay
 debugging-signals-pipeline
@@ -31,6 +32,7 @@ diagnosing-missing-recordings
 diagnosing-sdk-health
 diagnosing-stacktrace-symbolication
 downloading-batch-export-files
+exploring-ai-failures
 exploring-apm-traces
 exploring-autocapture-events
 exploring-endpoint-execution-logs
@@ -53,6 +55,7 @@ finding-replay-for-issue
 finding-sessions-to-watch
 formatting-insight-axes
 grouping-noisy-errors
+improving-mcp-tools
 inbox-exploration
 instrument-error-tracking
 instrument-feature-flags
@@ -89,13 +92,16 @@ signals-scout-health-checks
 signals-scout-inbox-validation
 signals-scout-insight-alerts
 signals-scout-logs
+signals-scout-mcp-tool-calls
 signals-scout-observability-gaps
 signals-scout-product-analytics
 signals-scout-replay-vision
 signals-scout-revenue-analytics
 signals-scout-session-replay
+signals-scout-skills-store
 signals-scout-surveys
 signals-scout-web-analytics
+signals-scout-web-vitals
 skills-store
 suggesting-data-imports
 suppressing-noisy-errors

diff --git a/skills/authoring-scouts/SKILL.md b/skills/authoring-scouts/SKILL.md
diff --git a/skills/authoring-scouts/references/dedupe-and-memory.md b/skills/authoring-scouts/references/dedupe-and-memory.md
@@ -1,35 +1,26 @@
 # Dedupe and memory conventions
 
-How a scout decides what to do with a candidate observation, how it writes durable
-scratchpad entries, and the noise patterns common across PostHog projects. Author your
-scout's **Decide** and **Save-memory** sections around these — they're how the fleet avoids
-re-emitting and gets smarter every run. This mirrors
-`signals-scout-general/references/conventions.md`.
+How a scout decides what to do with a candidate observation, how it writes durable scratchpad entries, and the noise patterns common across PostHog projects.
+Author your scout's **Decide** and **Save-memory** sections around these — they're how the fleet avoids re-emitting and gets smarter every run.
+This mirrors `signals-scout-general/references/conventions.md`.
 
 ## The four states
 
-Every scout classifies each candidate finding against prior runs and the scratchpad before
-emitting. Bake this classifier into the scout's Decide section:
+Every scout classifies each candidate finding against prior runs and the scratchpad before emitting.
+Bake this classifier into the scout's Decide section:
 
-1. **Net new** — no prior run mentions the topic, no scratchpad entry covers it.
-   → Emit if it clears the confidence bar (≥ 0.65).
-2. **Material update on a prior run** — a prior run covered it, but there's new evidence (a
-   different corroborating source, a fresh deploy correlation, contradicting data, a
-   meaningful escalation in scope). → **Emit fresh, citing the prior `finding_id`** in the
-   description and the evidence list (`source_product: signals_scout`, `entity_id: <prior>`).
+1. **Net new** — no prior run mentions the topic, no scratchpad entry covers it. → Emit if it clears the confidence bar (≥ 0.65).
+2. **Material update on a prior run** — a prior run covered it, but there's new evidence (a different corroborating source, a fresh deploy correlation, contradicting data, a meaningful escalation in scope). → **Emit fresh, citing the prior `finding_id`** in the description and the evidence list (`source_product: signals_scout`, `entity_id: <prior>`).
    The inbox groups by dedupe key.
-3. **Same fact already covered** — a prior run emitted with the same evidence shape.
-   → Skip. Optionally rewrite a scratchpad entry confirming the topic stayed quiet.
-4. **Already-addressed or noise** — a scratchpad entry with an `addressed:` / `noise:` /
-   `dedupe:` prefix names the entity with a "team aware" note. → Skip; note it in the run
-   summary.
+3. **Same fact already covered** — a prior run emitted with the same evidence shape. → Skip.
+   Optionally rewrite a scratchpad entry confirming the topic stayed quiet.
+4. **Already-addressed or noise** — a scratchpad entry with an `addressed:` / `noise:` / `dedupe:` prefix names the entity with a "team aware" note. → Skip; note it in the run summary.
 
 ## Scratchpad memory
 
-The scratchpad is durable, per-team prose keyed by string. It has no tags or TTLs — **the
-category is encoded in the key prefix** so a future run finds an entry with a single `text=`
-search. Re-using a key rewrites the entry in place (the idempotent refresh — use it to
-confirm a quiet observation without duplicating entries).
+The scratchpad is durable, per-team prose keyed by string.
+It has no tags or TTLs — **the category is encoded in the key prefix** so a future run finds an entry with a single `text=` search.
+Re-using a key rewrites the entry in place (the idempotent refresh — use it to confirm a quiet observation without duplicating entries).
 
 | Prefix        | Use for                                                                                                                                                                                    |
 | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
@@ -42,12 +33,9 @@ confirm a quiet observation without duplicating entries).
 | `mcp-gap:`    | Scout-noticed gap in the MCP surface worth raising later.                                                                                                                                  |
 | `report:`     | A report this scout authored via the report channel — stores the `report_id` so the next run edits/dedups against it instead of re-filing. See [`report-contract.md`](report-contract.md). |
 
-Format: `<prefix>:<domain>:<entity>` — e.g. `pattern:error_tracking:baseline`,
-`noise:logs:rabbitmq-deploy-window`, `dedupe:csp_violations:a1b2c3d4`. Each canonical
-specialist has its own `<domain>` label (`error_tracking`, `logs`, `llm_analytics`,
-`experiments`, `feature-flags`, `session-replay`, `web-analytics`, `pipelines`, `health`,
-…) — not a closed set. A new scout introduces its own domain label and reuses the
-prefixes; match the label a surface's existing entries already use.
+Format: `<prefix>:<domain>:<entity>` — e.g. `pattern:error_tracking:baseline`, `noise:logs:rabbitmq-deploy-window`, `dedupe:csp_violations:a1b2c3d4`.
+Each canonical specialist has its own `<domain>` label (`error_tracking`, `logs`, `llm_analytics`, `experiments`, `feature-flags`, `session-replay`, `web-analytics`, `pipelines`, `health`, …) — not a closed set.
+A new scout introduces its own domain label and reuses the prefixes; match the label a surface's existing entries already use.
 
 ## When to write memory vs. emit
 
@@ -72,29 +60,23 @@ content: "2026-05-01: surfaced UndefinedTable on access_control_propertyaccessco
          already-surfaced."
 ```
 
-Why it works: dated, names the entity id, gives a clear conditional ("still firing →
-escalate; quiet → skip"), bounded by a precise time anchor, and the key prefix makes it
-findable. Bad entry: key `note-1`, content "we have errors today, FYI" — no actionability,
-no entity, no condition, uncategorized key the next run can't find or act on.
+Why it works: dated, names the entity id, gives a clear conditional ("still firing → escalate; quiet → skip"), bounded by a precise time anchor, and the key prefix makes it findable.
+Bad entry: key `note-1`, content "we have errors today, FYI" — no actionability, no entity, no condition, uncategorized key the next run can't find or act on.
 
-Give your scout 2–3 worked example entries scoped to its surface so each run matches the
-format instead of inventing its own.
+Give your scout 2–3 worked example entries scoped to its surface so each run matches the format instead of inventing its own.
 
 ## Cross-project noise patterns
 
-These are noise across essentially all PostHog projects — list the relevant ones in your
-scout's **Disqualifiers** so it skips them unless there's a real escalation:
+These are noise across essentially all PostHog projects — list the relevant ones in your scout's **Disqualifiers** so it skips them unless there's a real escalation:
 
 - **Single-user, single-session events** — one user, one occurrence, no other signal.
   Almost always a personal browser quirk.
-- **Dev-environment bursts** — high counts whose `service` / `properties.env` is
-  `dev` / `local` / `test`. Filter before weighing.
-- **Sandbox-internal errors** — Docker `TimeoutExpired`, sandbox sync failures, `agentsh`
-  errors. Internal harness operations, not user-facing.
-- **Single-session frontend state quirks** — e.g. KEA store-path errors; not user-impacting
-  unless distinct-user counts climb.
-- **Known upstream provider errors** — Anthropic / OpenAI rate limits, third-party outages
-  already covered by past memory. Don't re-emit unless volume or shape changes meaningfully.
-
-The team's scratchpad extends this list per-project as the scout learns — which is exactly
-why the save-memory discipline matters.
+- **Dev-environment bursts** — high counts whose `service` / `properties.env` is `dev` / `local` / `test`.
+  Filter before weighing.
+- **Sandbox-internal errors** — Docker `TimeoutExpired`, sandbox sync failures, `agentsh` errors.
+  Internal harness operations, not user-facing.
+- **Single-session frontend state quirks** — e.g. KEA store-path errors; not user-impacting unless distinct-user counts climb.
+- **Known upstream provider errors** — Anthropic / OpenAI rate limits, third-party outages already covered by past memory.
+  Don't re-emit unless volume or shape changes meaningfully.
+
+The team's scratchpad extends this list per-project as the scout learns — which is exactly why the save-memory discipline matters.
diff --git a/skills/authoring-scouts/references/emit-contract.md b/skills/authoring-scouts/references/emit-contract.md
@@ -1,11 +1,9 @@
 # The emit contract
 
-How a scout calls `signals-scout-emit-signal`, and how to write a scout's **Decide**
-section so it emits well-calibrated findings. This is the contract the signal-emitting fleet
-runs on — author your scout so its findings fit this shape. (The canonical generalist,
-`signals-scout-general`, is report-only and authors `SignalReport`s directly instead, via the
-report channel — whose contract rides in the harness prompt, not a bundled reference.) The
-harness validates request shape but does **not** grade prose quality; that's on the scout.
+How a scout calls `signals-scout-emit-signal`, and how to write a scout's **Decide** section so it emits well-calibrated findings.
+This is the contract the signal-emitting fleet runs on — author your scout so its findings fit this shape.
+(The canonical generalist, `signals-scout-general`, is report-only and authors `SignalReport`s directly instead, via the report channel — whose contract rides in the harness prompt, not a bundled reference.)
+The harness validates request shape but does **not** grade prose quality; that's on the scout.
 
 ## Fields
 
@@ -23,9 +21,9 @@ harness validates request shape but does **not** grade prose quality; that's on
 
 ## Confidence — the emit gate
 
-`confidence` = how sure the scout is the finding is real. It is the emit gate: a finding the
-scout can't stand behind belongs in the scratchpad, not the inbox. The scout does not rank
-findings itself — the inbox handles ordering once a finding is emitted.
+`confidence` = how sure the scout is the finding is real.
+It is the emit gate: a finding the scout can't stand behind belongs in the scratchpad, not the inbox.
+The scout does not rank findings itself — the inbox handles ordering once a finding is emitted.
 
 **Confidence rubric:**
 
@@ -36,58 +34,53 @@ findings itself — the inbox handles ordering once a finding is emitted.
 | 0.40–0.64 | Suggestive pattern with material gaps a human should validate.                     |
 | 0.00–0.39 | Don't emit — gather more evidence or skip.                                         |
 
-**The emit gate:** if a scout can't reach `confidence ≥ 0.65`, it should write a scratchpad
-entry instead of emitting. Bake this threshold into the scout's Decide section.
+**The emit gate:** if a scout can't reach `confidence ≥ 0.65`, it should write a scratchpad entry instead of emitting.
+Bake this threshold into the scout's Decide section.
 
 ## Severity
 
-`P0`–`P4`, informational only — use consistently. P0: active critical (data loss, outage,
-security). P1: active material (errors hitting many users, billing). P2: confirmed,
-contained. P3: suspected or minor confirmed. P4: curiosity / FYI. Recommendation-style
-scouts (e.g. observability gaps) emit P3 by default rather than P0–P2 anomalies.
+`P0`–`P4`, informational only — use consistently.
+P0: active critical (data loss, outage, security).
+P1: active material (errors hitting many users, billing).
+P2: confirmed, contained.
+P3: suspected or minor confirmed.
+P4: curiosity / FYI.
+Recommendation-style scouts (e.g. observability gaps) emit P3 by default rather than P0–P2 anomalies.
 
 ## Description prose contract
 
-The description is what a busy human reads in a feed of 30 other findings. Aim for one tight
-paragraph (3–6 sentences):
+The description is what a busy human reads in a feed of 30 other findings.
+Aim for one tight paragraph (3–6 sentences):
 
-1. **Hook** — what's happening, **quantified** ("434 occurrences across 434 distinct users"
-   beats "many users").
-2. **Pattern** — the shape that makes this signal, not noise ("one occurrence per user →
-   per-request server path").
+1. **Hook** — what's happening, **quantified** ("434 occurrences across 434 distinct users" beats "many users").
+2. **Pattern** — the shape that makes this signal, not noise ("one occurrence per user → per-request server path").
 3. **Hypothesis** — the suspected cause.
 4. **Lineage** — if a prior run touched a related topic, cite its `finding_id`.
 5. **Recommendation** — the action that would resolve it.
 
-Cite entity ids (issue ids, recording ids, dashboard short_ids) inline so a human pivots
-straight from prose to source.
+Cite entity ids (issue ids, recording ids, dashboard short_ids) inline so a human pivots straight from prose to source.
 
 ## Evidence
 
-Each entry `{source_product, summary, entity_id?}`, capped at 20. Include a citation for
-**every** concrete claim in the description. `source_product` is a short origin label —
-common values: `error_tracking`, `session_replay`, `logs`, `feature_flag`, `experiment`,
-`web_analytics`, `data_warehouse`, `query_runs`, `signals_scout` (cite a prior run/finding),
-`inbox` (cite a report). `entity_id` pins the citable id.
+Each entry `{source_product, summary, entity_id?}`, capped at 20.
+Include a citation for **every** concrete claim in the description.
+`source_product` is a short origin label — common values: `error_tracking`, `session_replay`, `logs`, `feature_flag`, `experiment`, `web_analytics`, `data_warehouse`, `query_runs`, `signals_scout` (cite a prior run/finding), `inbox` (cite a report).
+`entity_id` pins the citable id.
 
 ## Dedupe keys
 
-Stable strings the inbox uses to group related findings across runs and sources. Format
-`<kind>:<entity_id>` or `<kind>:<entity_id>:<qualifier>`. Common kinds:
-`error_tracking_issue:<id>`, `experiment:<id>`, `feature_flag:<key>`, `dashboard:<id>`,
-`insight:<short_id>`, `missing_migration:<table>`, `traffic_anomaly:<event>`. Include 1–2
-per finding; more is fine when a finding spans entities. **This is the primary anti-duplicate
-mechanism — design your scout's dedupe keys deliberately.**
+Stable strings the inbox uses to group related findings across runs and sources.
+Format `<kind>:<entity_id>` or `<kind>:<entity_id>:<qualifier>`.
+Common kinds: `error_tracking_issue:<id>`, `experiment:<id>`, `feature_flag:<key>`, `dashboard:<id>`, `insight:<short_id>`, `missing_migration:<table>`, `traffic_anomaly:<event>`.
+Include 1–2 per finding; more is fine when a finding spans entities.
+**This is the primary anti-duplicate mechanism — design your scout's dedupe keys deliberately.**
 
 ## finding_id (not a dedupe key)
 
 `finding_id` is a stable, human-readable trace id tying the emitted signal back to its run.
-It is **not** used for idempotency: `emit_signal` dedupes on its own generated `document_id`
-and your `dedupe_keys`, never on `finding_id`. **Re-calling emit with the same `finding_id`
-writes a second signal — so a scout must never retry an emit that may already have
-succeeded.** Format `<topic>-<entity>-<date>`, e.g.
-`missing-migration-access-control-propertyaccesscontrol-2026-05-01`. A recurrence on a later
-day is a new finding that cites the prior `finding_id` in its description.
+It is **not** used for idempotency: `emit_signal` dedupes on its own generated `document_id` and your `dedupe_keys`, never on `finding_id`.
+**Re-calling emit with the same `finding_id` writes a second signal — so a scout must never retry an emit that may already have succeeded.** Format `<topic>-<entity>-<date>`, e.g. `missing-migration-access-control-propertyaccesscontrol-2026-05-01`.
+A recurrence on a later day is a new finding that cites the prior `finding_id` in its description.
 
 ## Worked example
 
@@ -120,7 +113,4 @@ description: |
   migration is in the deployed set, running it, then verifying the issue stops firing.
 ```
 
-Why it's good: quantified hook (434/434 in a precise window), pattern explained ("one hit
-per user" rules out alternatives), lineage cited so the inbox groups it, actionable
-recommendation, dual dedupe keys (issue-id + topic), P1 justified by blast radius, confidence
-0.9 because the pattern is unambiguous.
+Why it's good: quantified hook (434/434 in a precise window), pattern explained ("one hit per user" rules out alternatives), lineage cited so the inbox groups it, actionable recommendation, dual dedupe keys (issue-id + topic), P1 justified by blast radius, confidence 0.9 because the pattern is unambiguous.