Skip to content

Telemetry/PostHog: post-#529 follow-ups — title-bar event regression, missing app_version on Node, install funnel re-check #547

@Kosinkadink

Description

@Kosinkadink

Context

Follow-up from #529, which validated the PostHog half of the v0.6.3+ telemetry recovery (PRs #519, #524, #538). #529 itself is being closed: most acceptance items passed, the v0.6.4 release proceeds. This issue tracks two regressions surfaced during validation and one deferred re-check that need follow-up before we can call the PostHog telemetry pipeline fully healthy.

✅ Validated against v0.6.4 in production

  • P2: desktop2.snapshot.created fires across all six trigger values (boot, restart, manual, pre-update, post-update, post-restore).
  • P3: desktop2.snapshot.imported lands with sensible batch_size / original_trigger.
  • P4: deduplicated_previous = true observed on desktop2.snapshot.created after back-to-back identical restarts — the captureSnapshotIfChanged collapse branch is exercised in the wild.
  • P5: No double captures observed (PostHog Node + Browser dedup via mainAlreadyCaptured works).
  • P6 (panel only): renderer_role: "panel" is correctly attached to v0.6.3+ browser events.

Sample-of-one v0.6.4 install funnel (consent → fork → first_use_done → install_opened → method → validation → standalone.start → standalone.end → post_install_end) is 1 → 1 → 1 → 1 → 1 → 1 → 1 → 1 → 1, so the architecture works end-to-end for at least one user.

🔴 Regression A — Title-bar PostHog Browser events missing in v0.6.3+ release builds

Symptom. In a 30-day pull from the comfyui-desktop-2 PostHog project, every v0.6.3 / v0.6.4 release-build event with properties.$lib = 'web' carries renderer_role: "panel". Zero rows carry renderer_role: "title-bar" for v0.6.3 / v0.6.4. Dev builds (0.6.2-13-g…, 0.6.2+a8c494e…) DO emit title-bar events, so the architecture works in dev, just not in shipped builds.

This defeats the entire steady-state coverage goal of #519 — the title-bar renderer was supposed to keep emitting telemetry while the user is in bodyMode === "comfy" and the panel is unmounted. In production we still get zero coverage for that slice.

Reproducer query (PostHog SQL).

SELECT properties.app_version AS v, properties.renderer_role AS role, count() AS n
FROM events
WHERE properties.$lib = 'web'
  AND event LIKE 'desktop2.%'
  AND properties.app_version IS NOT NULL
  AND timestamp > now() - INTERVAL 30 DAY
GROUP BY v, role
ORDER BY v DESC, n DESC

Expect: zero rows where v starts with 0.6.3 or 0.6.4 AND role = 'title-bar'.

Likely root causes (in order of probability).

  1. CSP blocks posthog-js from the title-bar webContents in packaged builds. csp.test.ts may not exercise the title-bar host context.
  2. Preload chunk loading silently fails for the title-bar webContents in release builds (related to preload: build-time chunk inlining so we can re-enable sandbox without source duplication #521 — sandbox-disable workaround may be incomplete in the production build path, or out/preload/chunks/*.js aren't packaged where the production loader expects).
  3. PostHog init() throws inside the try/catch in src/renderer/src/lib/posthogProvider.ts, swallowed by the catch and never logged. The renderer continues without PostHog being initialised.

Suggested investigation path.

  • Build a release artifact locally (pnpm run build, then a ToDesktop build), launch it, open the title-bar's renderer DevTools (right-click the title bar in a comfy window? — or temporarily widen the title-bar webContents webPreferences.devTools to true), check the console for PostHog init errors and Network for any blocked posthog-js requests.
  • Add an instrumentation console.warn (kept in production) inside the try/catch of initPostHog() so future regressions of this shape can't be silent.

Severity. P0 for the telemetry pipeline. Without this fix, every dashboard query that filters by renderer_role = 'title-bar' returns zero, and #515's underlying problem (no telemetry coverage in steady-state Comfy mode) is not actually solved in shipped builds.

🟡 Regression B — PostHog Node missing app_version super-property

Symptom. Every PostHog Node event other than desktop2.session.started lands with an empty properties.app_version. This includes:

  • desktop2.snapshot.created
  • desktop2.execution.{started,completed,first_completed,session_summary,error}
  • desktop2.install.{validation,standalone.start,standalone.end,post_install.start,post_install.end}
  • desktop2.app_update.checked
  • desktop2.session.ended
  • desktop2.title_menu.item_clicked
  • All desktop2.migrate.* events
  • All desktop2.snapshot.restore_* events

Reproducer query.

SELECT event, properties.app_version AS v, count() AS n, max(timestamp) AS last_seen
FROM events
WHERE properties.$lib = 'posthog-node'
  AND event LIKE 'desktop2.%'
  AND timestamp > now() - INTERVAL 30 DAY
GROUP BY event, v
ORDER BY last_seen DESC

session.started rows have v populated; nearly every other row has v empty.

Root cause. src/main/lib/telemetry.ts only includes app_version in the pendingSessionStart payload; subsequent capture() calls don't carry it, and there's no posthog.register() (or equivalent) call after PostHog Node init.

Impact. Per-version cohort filtering on PostHog dashboards is broken for the entire main-emitted half of the catalog. We can recover attribution by joining on distinct_id in HogQL (as we did for the install funnel above) but that's painful and isn't an option in PostHog Insights.

Suggested fix. After PostHog Node init in src/main/lib/telemetry.ts, register app_version, app_env, and is_packaged as defaults on every capture (PostHog Node supports this via groupProperties or $set on every capture; or simply spread these into the properties of every capture() call via a small wrapper). Mirror what the renderer's loaded callback does.

Severity. P1. Doesn't break ingestion, but breaks dashboards.

⏳ Deferred re-check — Install funnel population validation

The v0.6.4 install funnel passed for sample-of-one. The original #515 attrition shape (71 install_opened → 13 post_install_end on v0.6.2, ~18% conversion) needs to be re-checked after v0.6.4 has 2+ weeks of user rollout so we have a meaningful population.

Re-run query in ~2 weeks (the same one used during #529 validation):

WITH user_versions AS (
  SELECT distinct_id,
         multiIf(
           startsWith(properties.app_version, '0.6.4'), '0.6.4',
           startsWith(properties.app_version, '0.6.3'), '0.6.3',
           startsWith(properties.app_version, '0.6.2'), '0.6.2',
           NULL
         ) AS major_v
  FROM events
  WHERE event = 'desktop2.session.started'
    AND properties.app_version IS NOT NULL
    AND timestamp > now() - INTERVAL 30 DAY
  GROUP BY distinct_id, major_v
)
SELECT uv.major_v AS version,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.flow.opened')      AS install_opened,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.method.selected')  AS install_method,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.validation')       AS install_validation,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.standalone.start') AS standalone_start,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.standalone.end')   AS standalone_end,
       uniqIf(e.distinct_id, e.event = 'desktop2.install.post_install.end') AS post_install_end
FROM events e
JOIN user_versions uv ON uv.distinct_id = e.distinct_id
WHERE uv.major_v IS NOT NULL
  AND e.timestamp > now() - INTERVAL 30 DAY
GROUP BY version
ORDER BY version DESC

Expected outcome (acceptance). v0.6.4 conversion rate from install_openedpost_install_end should be meaningfully higher than v0.6.2's ~18%. If it isn't, file follow-up — Regression A is the prime suspect (no title-bar means no relay target means main-emitted install events get lost in PostHog Browser pipeline if PostHog Node also fails for any reason).

Sub-finding: v0.6.3 standalone funnel anomaly

In the sample-of-five v0.6.3 cohort, 5 users reached desktop2.install.validation and ZERO produced desktop2.install.standalone.start. Likely attribution noise (multi-version users), but worth keeping an eye on when rerunning the funnel — if v0.6.4 also shows validation > 0 but standalone.start = 0 once population grows, it's a real regression in the trackedStep('desktop2.install.standalone', …) path in src/main/lib/installer.ts (or wherever the standalone install runs).

Out of scope

Acceptance criteria for closing this issue

  • Regression A diagnosed and fixed; v0.6.x release after the fix shows non-zero renderer_role: "title-bar" browser events in PostHog within 24h of rollout.
  • Regression B fixed; rerun the per-event app_version query above, all desktop2.* posthog-node rows have a populated app_version.
  • Install funnel re-checked once v0.6.4 has 2+ weeks of population data; v0.6.4 conversion is materially better than v0.6.2.
  • Sub-finding investigated: v0.6.3 / v0.6.4 standalone.start matches validation count once population grows.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions