feat(telemetry): add non-reversible machine_id to heartbeat for install dedup#796
Merged
Conversation
…ll dedup The telemetry anonymous_id is a UUID persisted in the config file, so ephemeral environments (layered Docker builds, throwaway HOMEs, CI) mint a fresh id per run — one CI block produced 1,231 ids from 3 IPv6 /64s. The dashboard currently dedups via a lossy normalized_ip|os|arch|edition heuristic; its dedup.ts header explicitly asks for "a stable hashed machine id in the payload". This adds machine_id (schema v6): HMAC-SHA256 of the OS machine id scoped by an mcpproxy-specific app key (denisbrodbeck/machineid ProtectedID pattern) — non-reversible, uncorrelatable with other apps' telemetry, and never the raw id. It resolves once per process (cached, stable across builds) and gracefully falls back to empty (omitted) when the OS machine id is unreadable, so the heartbeat is never blocked. It rides the existing opt-out gate. schema_version bumped 5->6: the ingest worker stores payload_json wholesale and only validates fields for schema_version >= 3/>=4 (it already receives v5 while validating only v4), so a higher version with an additive field cannot break ingestion. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Deploying mcpproxy-docs with
|
| Latest commit: |
a2d1f5c
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c9ac6d8c.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://feat-telemetry-machine-id.mcpproxy-docs.pages.dev |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 28582395495 --repo smart-mcp-proxy/mcpproxy-go
|
Dumbris
added a commit
that referenced
this pull request
Jul 2, 2026
…ne, connect-trust/upgrade-nudge progress (#803) check-github now passes with 0 errors: scanner-simplification epic complete (#786/#792/#793/#794 incl. deep-scan trust fixes + docs sweep); connect-trust US1 preview (#802) + backup visibility (#799) done; upgrade-nudge status/log slice (#798) split out as done with the banner+config remainder tracked separately; telemetry machine_id client (#796) and hygiene check-github (#800) done. Remaining warnings are the known windows-tray no-PR-evidence items. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Telemetry
anonymous_idis a UUIDv4 persisted in the config file. In ephemeral environments — layered Docker builds, throwawayHOMEs, CI runners — the config (and therefore the UUID) is regenerated on every run, so a single physical machine masquerades as many distinct installs. One CI block alone produced 1,231anonymous_ids from 3 IPv6 /64s, inflating install counts.The dashboard (
mcpproxy-dash/src/lib/shared/dedup.ts) currently dedups via a lossynormalized_ip|os|arch|editionheuristic, and its header comment explicitly asks for the fix this PR delivers:What this adds
A new heartbeat field
machine_id(schema v6): a stable, non-reversible hash of the OS machine id.HMAC-SHA256(osMachineID, appKey)viadenisbrodbeck/machineid'sProtectedIDpattern, hex-encoded (64 chars). The mcpproxy-specificappKeyscopes the value so it cannot be correlated with any other application that hashes the same OS machine id. The raw machine id is never returned or transmitted — only the salted hash.resolveMachineID), so the value is identical across every heartbeat build on a machine./etc/machine-id, permission error, exotic platform),protectedMachineIDreturns"", the field is omitted (omitempty), and the heartbeat is never blocked. The backend treats empty as "unknown".machine_idis built insidebuildHeartbeat, which is only reached through the existing opt-out-gatedStart/sendHeartbeatpath (MCPPROXY_TELEMETRY=false,DO_NOT_TRACK, config disable, and the mid-flightoptedOutlatch all cover it).Design decisions
Hash scheme.
ProtectedIDkeys the HMAC with the machine id and uses the app key as the message, yielding a per-app, non-reversible digest — exactly the "ProtectedID" pattern requested. Chosen over a bareSHA256(machineID)because the app-key scoping prevents cross-application correlation.Dependency — added
github.com/denisbrodbeck/machineidv1.0.1 (yes). Nothing in-tree or in existing deps reads the OS machine id (checked repo +go.mod). Rolling it by hand is non-trivial cross-platform (macOS needsioreg, Windows the registryMachineGuid, Linux/etc/machine-id+ dbus fallback). The library is tiny, stdlib-only (no transitive deps), and is the standard answer. Per CLAUDE.md ("avoid new deps without clear need"), the need is clear and the footprint minimal. Now a direct dependency ingo.mod.Schema version — bumped
5 → 6(safe, verified against the worker). I read the ingest worker (mcpproxy-telemetry, read-only) to confirm this cannot break ingestion:src/index.ts/src/validation.tsonly validate fields conditionally onschema_version >= 3and>= 4; there is no upper-bound check and no unknown-field rejection.schema_version: 5while the worker validates only up to v4 — empirical proof it tolerates higher-than-known versions.schema.sqlstores the whole payload inpayload_jsonwholesale.So a higher version carrying one additive field is ignored by older consumers and preserved in
payload_json. Bumping (vs. keeping v5) matches this codebase's convention — every prior field set bumped the version — and keeps the version→fields contract honest for the dashboard.Privacy
docs/features/telemetry.mdupdated: new field row, a dedicated "Machine ID (schema v6)" section, and a "what is NOT collected" bullet (raw machine id / reversible hardware ids).anonymity_test.goextended so the anonymity scanner catches a raw-id leak while the hashed value passes.Tests (TDD, failing-first)
internal/telemetry/machine_id_test.go+ extendedanonymity_test.go:machineIDProviderseam, matching the package's function-pointer test style);Results:
go build ./...— OKgo test -race ./internal/telemetry/...— ok (raw-comparison tests ran, not skipped)golangci-lint run --config .github/.golangci.yml ./...— 0 issuesgofmt/goimports— cleanFollow-ups (not in this PR)
mcpproxy-telemetry): add amachine_id TEXTcolumn + extraction invalidateV*Payload/index.tsand an index, so dedup can query it directly rather than digging intopayload_json. Not required for ingestion today (stored wholesale).mcpproxy-dash): updateidentityExprindedup.tsto prefermachine_idwhen present, falling back to the currentnormalized_ip|os|arch|editionheuristic for pre-v6 rows. Resolves the TODO that motivated this PR.🤖 Generated with Claude Code