Merge pull request #118 from PostHog/audit

gewenyu99 · web-flow · commit 5eb0851d340d · 2026-05-05T14:57:51.000-04:00
Audit skill
diff --git a/.gitignore b/.gitignore
@@ -73,6 +73,10 @@ routeTree.gen.ts
 # Build artifacts
 dist/
 
+# Audit skill runtime ledger
+.posthog-audit-checks.json
+posthog-audit-report.md
+
 # Misc
 *.pem
 *.key
diff --git a/scripts/lib/skill-generator.js b/scripts/lib/skill-generator.js
@@ -146,6 +146,7 @@ function expandSkillGroups(config, configDir) {
                 _template: template,
                 _sharedDocs: sharedDocs,
                 _examplePaths: [...baseExamplePaths, ...normalizeExamplePaths(variation.example_paths)],
+                _references: group.references || null,
                 _group: key,
             });
         }
@@ -485,16 +486,29 @@ async function generateSkill({
     }
 
     // Copy local markdown references from a source references/ directory, if present.
+    // Group config injects a shared `preamble`; per-file `next_step` frontmatter drives continuation links.
     const sourceReferencesDir = path.join(configDir, 'skills', ...skill._group.split('/'), 'references');
     if (fs.existsSync(sourceReferencesDir)) {
         const localReferences = fs.readdirSync(sourceReferencesDir, { withFileTypes: true })
             .filter(entry => entry.isFile() && entry.name.endsWith('.md'));
 
+        const refsConfig = skill._references || {};
+
         for (const reference of localReferences) {
             const sourcePath = path.join(sourceReferencesDir, reference.name);
-            const content = fs.readFileSync(sourcePath, 'utf8');
+            const parsed = matter(fs.readFileSync(sourcePath, 'utf8'));
+            const nextFile = parsed.data.next_step;
+            let content = parsed.content.replace(/^\n+/, '');
             const headingMatch = content.match(/^#\s+(.+)$/m);
 
+            if (nextFile) {
+                if (refsConfig.preamble && headingMatch) {
+                    const headingEnd = content.indexOf(headingMatch[0]) + headingMatch[0].length;
+                    content = content.slice(0, headingEnd) + '\n\n' + refsConfig.preamble + content.slice(headingEnd);
+                }
+                content += `\n\n---\n\n**Upon completion, continue with:** [${nextFile}](${nextFile})`;
+            }
+
             fs.writeFileSync(
                 path.join(referencesDir, reference.name),
                 content,
diff --git a/transformation-config/skills/audit/config.yaml b/transformation-config/skills/audit/config.yaml
@@ -0,0 +1,14 @@
+type: docs-only
+template: description.md
+description: Audit an existing PostHog integration for correctness and best practices
+tags: [best-practices]
+references:
+  preamble: "**Read ONLY this file.** Do not read any other reference file until this one tells you to."
+shared_docs:
+  - https://posthog.com/docs/getting-started/identify-users.md
+  - https://posthog.com/docs/product-analytics/best-practices.md
+variants:
+  - id: all
+    display_name: PostHog audit
+    tags: [best-practices]
+    docs_urls: []
diff --git a/transformation-config/skills/audit/description.md b/transformation-config/skills/audit/description.md
@@ -0,0 +1,67 @@
+# PostHog Audit
+
+This skill audits an existing PostHog integration for **data integrity** in event capture and identification. **Read-only** — the only file you create is the final audit report.
+
+Perform the checks described in the referenced skills and only the events referenced in the skills.
+
+## Workflow
+
+The audit runs as a 5-step chain: Installation (SDK + version) → init correctness → identification → event capture → report. Each step file ends with a pointer to the next. Follow them in the order they are written. You must resolve them in order before any source-tree exploration.
+
+The audit ledger is already seeded with the 10 pending checks. Use `mcp__wizard-tools__audit_resolve_checks` to patch each one as you finish it.
+
+**Start by reading the path relative to this file at `references/1-version.md`.** Do not Glob, ls, or find the skill directory. Do not preload future steps. Do not re-read a step file once you've moved past it. Do not re-read SKILL.md.
+
+`ToolSearch` is only for loading a tool by exact name when the SDK has it deferred (e.g. `select:Grep`). Do **not** use it to browse for other tools — every tool the audit needs (`Glob`, `Grep`, `Read`, `Write`, `Bash`, and the named `mcp__wizard-tools__audit_*` tools) is already named in this skill.
+
+**Do not call `TodoWrite`.** The audit doesn't track its own task list — progress comes from the audit ledger plus `[STATUS]` lines.
+
+## Live activity — `[STATUS]`
+
+The "Working on …" banner reads from `[STATUS]` lines you emit in plain text. Whenever you start a new sub-step, write a line like:
+
+```
+[STATUS] Scanning manifests
+```
+
+The wizard intercepts these and updates the spinner. Use them freely — they are cheap. Each step file lists the exact `[STATUS]` strings to emit at each sub-step.
+
+## Audit checks ledger
+
+The ledger lives at `.posthog-audit-checks.json` and is rendered live in the "Audit plan" tab. It is owned by MCP tools — **never `Write` this file directly**:
+
+- `mcp__wizard-tools__audit_resolve_checks({ updates })` — patch one or more checks by `id`. Each `update` is `{ id, status, file?, details? }`. Batch updates from the same step into a single call.
+
+All audit ledger calls are atomic and serialize internally — **concurrent calls from parallel subagents cannot lose updates**, so feel free to fan out runtime checks across `Task` subagents when a step says so.
+
+### Check entry shape
+
+- `id` — stable kebab-case slug. Reuse the existing seeded ids exactly when calling `audit_resolve_checks`.
+- `area` — short group name. The current core workflow uses `Installation`, `Identification`, and `Event Capture`.
+- `label` — short human name.
+- `status` — `pending` | `pass` | `error` | `warning` | `suggestion`.
+- `file` — optional `path:line` for findings tied to a location.
+- `details` — optional one-line explanation.
+
+After the report is written (Step 5), delete `.posthog-audit-checks.json`.
+
+## Severity levels
+
+- `error`: Must fix. Broken functionality, data corruption, or security issue.
+- `warning`: Should fix. Pattern that causes subtle bugs or data-quality problems.
+- `suggestion`: Nice to have. Best-practice improvement.
+
+## Key principles
+
+- **Read-only**: Do not edit project source files. The only file you create is the audit report.
+- **Evidence-based**: Reference specific `file:line` for every non-pass finding.
+- **Actionable**: Every finding states what to fix and how.
+
+## Abort statuses
+
+Report abort states with `[ABORT]` prefixed messages. The wizard catches these and terminates the run — do not halt yourself.
+- No PostHog SDK found
+
+## Framework guidelines
+
+{commandments}
diff --git a/transformation-config/skills/audit/references/1-version.md b/transformation-config/skills/audit/references/1-version.md
@@ -0,0 +1,88 @@
+---
+next_step: 2-init.md
+---
+
+# Step 1 — SDK installed + SDK up-to-date
+
+This step is intentionally narrow. It runs **before any other project work**. Resolve exactly two checks: `sdk-installed` and `sdk-up-to-date`. **Do not** read source code, locate init sites, look at `.env*` files, or scan for identify/capture call sites in this step — that all belongs to later steps.
+
+## Status
+
+Emit:
+
+```
+[STATUS] Scanning manifests
+[STATUS] Checking SDK version
+```
+
+## Action
+
+### a. Find the PostHog SDK
+
+`Glob` for the project's dependency manifests across every language PostHog ships an SDK for. The full list:
+
+- `package.json` — npm / pnpm / yarn (Node, web, React, Next.js, Nuxt, Vue, Svelte, Angular, React Native, Expo)
+- `requirements.txt`, `pyproject.toml`, `Pipfile`, `setup.py` — Python (Django, Flask, FastAPI, etc.)
+- `Gemfile` — Ruby / Ruby on Rails
+- `composer.json` — PHP / Laravel
+- `go.mod` — Go
+- `build.gradle`, `build.gradle.kts`, `pom.xml` — Java / Android
+- `Podfile`, `Package.swift` — iOS / Swift
+- `pubspec.yaml` — Flutter / Dart
+- `*.csproj` — .NET
+- `mix.exs` — Elixir
+
+Read enough of them to identify which PostHog SDK the project uses, what version, and what framework it sits on top of.
+
+If no PostHog SDK is anywhere in the project, emit `[ABORT] No PostHog SDK found` and stop. The wizard catches `[ABORT]` and terminates the run.
+
+### b. Install the matching integration skill
+
+Once you know the SDK + framework, install the matching integration skill so the rest of the audit has framework-specific install docs to reference instead of guessing:
+
+1. Call `mcp__wizard-tools__load_skill_menu({ category: "integration" })` once to list available integration skill IDs.
+2. Call `mcp__wizard-tools__install_skill({ skillId: "<id>" })` with the **single** ID that matches the framework you detected. Pick one — do not install multiple.
+
+If no integration skill matches the framework, skip this step. Step 2 will fall back to general framework knowledge.
+
+### c. Check latest published version
+
+For each detected SDK, run `Bash` once to look up the latest published version. Use the command that matches the SDK's registry:
+
+- **npm** (JS/TS, Node, React, Next.js, Nuxt, Vue, Svelte, Angular, React Native, Expo): `npm view <pkg> version`
+- **PyPI** (Python): `pip index versions <pkg>` (or `pip show <pkg>` if `index` is unavailable)
+- **RubyGems** (Ruby / Rails): `gem search ^<pkg>$ -r`
+- **Packagist** (PHP / Laravel): `composer show <pkg> --latest --available --format=json`
+- **Go modules** (Go): `curl -s https://proxy.golang.org/<module>/@latest` (returns JSON with the latest `Version`)
+- **Maven Central** (Java / Android): `curl -s "https://search.maven.org/solrsearch/select?q=g:<group>+AND+a:<artifact>&rows=1&wt=json"` and read `.response.docs[0].latestVersion`
+- **CocoaPods** (iOS / Swift): `pod search <pkg>` (or check `https://cdn.cocoapods.org/all_pods_versions_<x>_<y>_<z>.txt` for the spec mirror)
+- **Swift Package Manager** (Swift): `gh release list --repo posthog/posthog-ios --limit 1` (SwiftPM resolves from GitHub tags)
+- **pub.dev** (Flutter / Dart): `curl -s https://pub.dev/api/packages/<pkg> | jq -r .latest.version`
+- **NuGet** (.NET): `curl -s https://api.nuget.org/v3-flatcontainer/<pkg>/index.json | jq -r '.versions[-1]'`
+- **Hex** (Elixir): `mix hex.info <pkg>`
+
+## Resolution rules
+
+`sdk-installed`:
+- `pass`: at least one PostHog SDK in a manifest. Record SDK + version in `details`.
+
+`sdk-up-to-date`:
+- `pass`: at the latest minor.
+- `suggestion`: patch-only behind.
+- `warning`: more than one minor behind.
+- `error`: one or more major versions behind.
+
+## Resolve
+
+Single call to `mcp__wizard-tools__audit_resolve_checks` with two updates and **nothing else**:
+
+```
+{
+  "updates": [
+    { "id": "sdk-installed",  "status": "pass",                          "details": "<sdk>@<version>" },
+    { "id": "sdk-up-to-date", "status": "pass|suggestion|warning|error", "details": "installed <v>, latest <v>" }
+  ]
+}
+```
+
+Do not include `init-correct` in this call — it's resolved in Step 2.
diff --git a/transformation-config/skills/audit/references/2-init.md b/transformation-config/skills/audit/references/2-init.md
@@ -0,0 +1,42 @@
+---
+next_step: 3-identification.md
+---
+
+# Step 2 — Init correctness
+
+This step resolves exactly one check: `init-correct`. Manifests and SDK versions are already resolved (Step 1). Identification call sites belong to Step 3 and event-capture call sites to Step 4 — do not scan for them here.
+
+## Status
+
+Emit:
+
+```
+[STATUS] Locating PostHog initialization
+```
+
+## Action
+
+Locate the project's PostHog init by issuing whatever `Grep` and `Read` calls are needed in parallel. Confirm the init exists, runs in the right runtime for the detected SDK + framework, and sources its token from an env variable (not hardcoded). Also check `.env*` files to confirm the token env var is actually set. Reverse-proxy / `api_host` configuration belongs to Step 4 — don't evaluate it here.
+
+Use the detected SDK + framework from Step 1 to know what to look for: the canonical init filename, runtime, and shape vary by framework. If the host project already ships a PostHog integration skill, use that as the source of truth. Skills are typically under `.claude/skills/`; if that directory doesn't exist (some projects keep skills under `agents/skills/`, plain `skills/`, etc.), discover any candidates with one `Glob` pattern: `**/skills/**/SKILL.md`. Read the matching skill before judging.
+
+When no integration skill is available, rely on general framework knowledge — and stay conservative on `init-correct` (prefer `warning` over `error` when the convention is unclear).
+
+## Resolution rules
+
+`init-correct`:
+- `pass`: init present, env-sourced token, runtime-appropriate location.
+- `error`: init missing, hardcoded token, or wrong runtime (e.g. server-only init for a browser-side framework).
+- `warning`: init present but in a non-canonical location for the framework.
+
+## Resolve
+
+Single call to `mcp__wizard-tools__audit_resolve_checks` with one update:
+
+```
+{
+  "updates": [
+    { "id": "init-correct", "status": "pass|error|warning", "file": "<path:line>", "details": "..." }
+  ]
+}
+```
diff --git a/transformation-config/skills/audit/references/3-identification.md b/transformation-config/skills/audit/references/3-identification.md
@@ -0,0 +1,116 @@
+---
+next_step: 4-event-capture.md
+---
+
+# Step 3 — Identification
+
+This step resolves four identification checks **in parallel**, one subagent per check:
+
+- `identify-stable-distinct-id`
+- `identify-not-late`
+- `cross-runtime-distinct-id`
+- `identify-reset-on-logout`
+
+Each subagent owns its own grep, reads, evaluates its single rule, and emits one `audit_resolve_checks` call with one update. The ledger's mutex serializes concurrent writes — there's no race.
+
+## Status
+
+Emit before dispatching:
+
+```
+[STATUS] Auditing identification
+```
+
+## Action — dispatch four subagents in one message
+
+Make **four `Task` tool calls in a single message** so they run concurrently. Wait for all four to return, then continue to `4-event-capture.md`. Do not run any other tools between dispatch and the next step.
+
+The bundled `identify-users.md` reference holds PostHog's authoritative guidance on `distinct_id`, `identify()` ordering, and cross-runtime identity. It's typically at `.claude/skills/audit/references/identify-users.md`; if that path doesn't exist, discover it with `Glob` `**/skills/audit/references/identify-users.md`. Each subagent reads it once before judging.
+
+### Task A — `identify-stable-distinct-id`
+
+`description`: `Audit identify-stable-distinct-id`
+
+`prompt`:
+```
+You are an audit subagent. Resolve exactly one rule and return: identify-stable-distinct-id.
+
+Read this skill's bundled `identify-users.md` reference once (typically `.claude/skills/audit/references/identify-users.md`; otherwise discover with `Glob` `**/skills/audit/references/identify-users.md`).
+
+Run **one** Grep: `posthog\.identify\(`. Read each file that contains a hit, once. Inspect the first argument passed to identify().
+
+Rule:
+- distinct_id must be a stable identifier (auth user id, account id), not a session UUID, ephemeral cookie, or device-only id.
+- pass: sources from authenticated user (session.user.id, auth.uid(), etc.)
+- error: sources from a session, request, or device id that resets
+- warning: source unclear — flag for human review
+
+Emit one `mcp__wizard-tools__audit_resolve_checks` call with a single update for id `identify-stable-distinct-id`, including `file` (path:line) and `details` (one-line explanation). Return when the call completes. Do not write the audit report.
+```
+
+### Task B — `identify-not-late`
+
+`description`: `Audit identify-not-late`
+
+`prompt`:
+```
+You are an audit subagent. Resolve exactly one rule and return: identify-not-late.
+
+Read this skill's bundled `identify-users.md` reference once (typically `.claude/skills/audit/references/identify-users.md`; otherwise discover with `Glob` `**/skills/audit/references/identify-users.md`).
+
+Run **two** Greps in parallel:
+- `posthog\.identify\(` — where identity is established
+- `posthog\.capture\(|getFeatureFlag\(|isFeatureEnabled\(` — where captures and flag evals happen
+
+Read each file that contains a hit, once. Compare the timing/ordering of identify() against the surrounding capture / flag-eval calls.
+
+Rule:
+- identify() must be called before any posthog.capture for that user, and before any feature-flag eval depending on user identity.
+- pass: identify runs at session start / right after login. Captures and flag evals come after.
+- warning: identify runs lazily (e.g. settings-page mount), so early captures and flag evals are anonymous.
+
+Emit one `mcp__wizard-tools__audit_resolve_checks` call with a single update for id `identify-not-late`, including `file` (path:line of the identify call) and `details` (one-line explanation). Return when the call completes. Do not write the audit report.
+```
+
+### Task C — `cross-runtime-distinct-id`
+
+`description`: `Audit cross-runtime-distinct-id`
+
+`prompt`:
+```
+You are an audit subagent. Resolve exactly one rule and return: cross-runtime-distinct-id.
+
+Read this skill's bundled `identify-users.md` reference once (typically `.claude/skills/audit/references/identify-users.md`; otherwise discover with `Glob` `**/skills/audit/references/identify-users.md`).
+
+Run **one** Grep: `posthog\.init\(|new PostHog\(|posthog\.Posthog\(|Posthog\(` — locate every PostHog initialization across runtimes. Read each file that contains a hit, once. Determine whether both client and server runtimes initialize PostHog, and if so, how distinct_id flows between them.
+
+Rule:
+- If both client and server runtimes call PostHog, the same distinct_id must be used on both sides for the same user.
+- pass: server-side captures source the client's distinct_id (cookie, session token, or explicit hand-off).
+- error: server-side captures use a different identifier scheme.
+- Skip (`pass` with details: "single runtime"): only one runtime initializes PostHog.
+
+Emit one `mcp__wizard-tools__audit_resolve_checks` call with a single update for id `cross-runtime-distinct-id`, including `file` (path:line of the most relevant init or capture site) and `details` (one-line explanation). Return when the call completes. Do not write the audit report.
+```
+
+### Task D — `identify-reset-on-logout`
+
+`description`: `Audit identify-reset-on-logout`
+
+`prompt`:
+```
+You are an audit subagent. Resolve exactly one rule and return: identify-reset-on-logout.
+
+Read this skill's bundled `identify-users.md` reference once (typically `.claude/skills/audit/references/identify-users.md`; otherwise discover with `Glob` `**/skills/audit/references/identify-users.md`).
+
+Locate logout, sign-out, and account-switching flows by issuing whatever `Grep` and `Read` calls are needed in parallel. Determine whether those flows clear PostHog state with `posthog.reset()`.
+
+Rule:
+- Logout or account-switching flows should call `posthog.reset()`. Without a reset, when user B logs in on the same device after user A, PostHog's anonymous ID is shared and the next `identify()` can merge both accounts into one person.
+- pass: every detected logout/account-switch flow calls `posthog.reset()`.
+- error: a logout/account-switch flow is missing `posthog.reset()`.
+- Skip (`pass` with details: "no logout/account-switch flow found"): no detectable logout/account-switch flow exists.
+- note: `posthog.reset(true)` is valid when a completely clean device ID reset is required.
+
+Emit one `mcp__wizard-tools__audit_resolve_checks` call with a single update for id `identify-reset-on-logout`, including `file` (path:line of the most relevant logout or reset site) and `details` (one-line explanation). Return when the call completes. Do not write the audit report.
+```
diff --git a/transformation-config/skills/audit/references/4-event-capture.md b/transformation-config/skills/audit/references/4-event-capture.md
diff --git a/transformation-config/skills/audit/references/5-report.md b/transformation-config/skills/audit/references/5-report.md