Memtrace Telemetry — Compliance Datasheet

A formal, field-level specification of every byte Memtrace transmits from a customer machine. Intended for security, privacy, and audit functions reviewing whether Memtrace can run with telemetry enabled inside a regulated environment.

This document is the authoritative reference. Companion docs (PRIVACY.md, TELEMETRY.md, docs/privacy-and-telemetry.md) provide narrative summaries. Where any of those conflict with this datasheet, this datasheet is correct.

Version reference: Memtrace v0.4.62 (current at time of writing). The pipeline has been stable since v0.3.17, with PR workflow telemetry added in v0.4.62. Material changes are announced in the release notes and reflected in this file.

1. Executive summary

Memtrace runs entirely on the customer's machine. Source code, file contents, embeddings, repository paths, branch names, commit data, and search queries are never transmitted off the machine under any configuration. Symbol names are likewise never transmitted except in one explicit case: the customer opts into the Weekly Memtrace Receipt feature on the memtrace.io account dashboard (off by default; see §6.4).

The product makes four categories of network call:

License validation + usage heartbeat — required, no customer content (license token, device hash, aggregate integer counts).
Product telemetry — on by default, can be disabled with one environment variable. Contains sanitised crash, error, and lightweight usage events. No customer content.
Weekly Memtrace Receipt — off by default, opt-in via memtrace.io account settings. When enabled, the heartbeat carries a small symbol-name surface used to render the weekly email. One environment variable kills this stream specifically while leaving other telemetry behaviour unchanged.
One-time model download — inbound only, first run, from HuggingFace.

For a regulated environment, the recommendation is: keep product telemetry enabled, leave the Weekly Memtrace Receipt off (its default), and add MEMTRACE_TELEMETRY=off + MEMTRACE_NO_REMOTE_RECEIPT=1 if the organisation's policy prohibits any outbound diagnostic data regardless of content.

2. Data flow

Customer machine                                 Memtrace infrastructure
┌──────────────────────────┐                     ┌──────────────────────────┐
│  memtrace start runtime  │                     │                          │
│  ┌────────────────────┐  │   HTTPS (TLS 1.3)   │  api.memtrace.io         │
│  │ AST parser         │  │ ──────────────────▶ │  /api/device/auth        │
│  │ MemDB (local)      │  │   License + heart-  │  /api/device/heartbeat   │
│  │ Embedding (ONNX)   │  │   beat              │  /api/telemetry/ingest   │
│  │ Reranker           │  │                     │                          │
│  │ Sanitiser          │  │                     │  3 Postgres tables       │
│  │ Telemetry queue    │  │                     │  Admin dashboard         │
│  └────────────────────┘  │                     │  (@syncable.dev only)    │
│         │                │                     │                          │
│         ▼                │                     │                          │
│  ~/.memtrace/            │                     └──────────────────────────┘
│    telemetry/queue.jsonl │
│    embed-cache/          │
│    credentials.json      │
│  <project>/.memdb/       │
└──────────────────────────┘

All AST parsing, embedding, ranking, and graph storage runs locally. The customer machine's only outbound calls are to *.memtrace.io for the three endpoints above (plus a one-time inbound model download from HuggingFace).

3. Data classification matrix

The following is the complete inventory of fields that leave a customer machine via telemetry. Every field is enumerated; nothing is collected outside this list.

3.1 Identity fields (present in every telemetry payload)

Field	Type	Source	Example	Customer-content?
`device_id`	string (UUID)	Generated locally at first run, stored in `~/.memtrace/credentials.json`	`a1b2c3d4-e5f6-...`	No
`version`	string	Compiled into the binary	`0.3.89`	No
`target`	string	Compile-time platform triple	`aarch64-apple-darwin`	No
`os`	string	Runtime detection	`macos-aarch64`	No
`tier`	string	Runtime resource detection	`standard` / `light` / `heavy`	No

device_id is not reversible to the machine's hardware identity, hostname, IP, or user account. It is a randomly generated UUID stored in the credentials file. Deleting ~/.memtrace/credentials.json and re-authenticating issues a new one.

3.2 Usage events (`telemetry_events` table)

One row per discrete signal the binary emits. The complete list of event types:

Event name	When it fires	Fields beyond §3.1 identity
`start`	Every `memtrace start` / `memtrace mcp` invocation	`subcommand` (string, e.g. `start`, `mcp`), `transport` (string, e.g. `stdio`, `streamable-http`)
`index_complete`	After Phase-1 indexing finishes	`duration_ms` (integer), `repo_count` (integer, number of repos indexed — not names)
`embed_complete`	After Phase-2 embedding finishes	`duration_ms` (integer), `embedding_count` (integer, number of embeddings produced — not content)
`pr_review_completed`	After `memtrace code-review` completes a GitHub PR review run	`posted` (boolean), `watch` (boolean), `comment_count` (integer), `finding_count` (integer), `graph_mode` (string, e.g. `strict`/`off`), `min_severity` (string), `severity_counts` (JSON object with `low`/`medium`/`high`/`critical` integer buckets), `source_counts` (JSON object of numeric source buckets only)
`pr_watch_registered`	When `memtrace code-review --post --watch` registers a local PR watch	`comment_count` (integer), `graph_mode` (string), `status` (string enum, initially `awaiting_response`)
`pr_watch_synced`	When watched PRs are polled by `memtrace start`, `memtrace mcp`, or `memtrace pr sync`	`watch_count`, `changed_count`, `awaiting_response_count`, `human_replied_count`, `approved_count`, `changes_requested_count`, `stale_after_push_count`, `merged_count`, `closed_count`, `poll_error_count` (all integers)
`pr_watch_poll_error`	When polling one watched PR fails	`error_kind` (string enum: `rate_limited`, `token`, `github`, `parse`, `unknown`)

No event payload contains file paths, symbol names, repository names, PR URLs, owner names, branch names, commit hashes, reviewer identities, comment bodies, discussion text, query content, or any other customer-derived data. Counts are integers only, except for low-cardinality mode/status/error enums.

PR watch state is persisted locally at ~/.memtrace/pr-watches.json so the local daemon can poll GitHub for the PRs Memtrace reviewed. That local file may contain PR coordinates and the local repo root. It is not uploaded through telemetry.

3.3 Error events (`telemetry_errors` table)

The binary uses the tracing crate for internal logging. WARN- and ERROR-level log lines emitted by Memtrace's own crates are mirrored to the telemetry queue after passing through a sanitiser (see §4).

Schema:

Field	Type	Notes
Identity fields (§3.1)	—
`level`	string	`WARN` or `ERROR`
`target`	string	Tracing target — Memtrace crate name (e.g. `memtrace_mcp::search`)
`message`	string	Sanitised log message, max 8 KB
`fingerprint`	string	`sha256(version ‖ target ‖ level ‖ first 6 tokens of message)`
`occurrences`	integer	Count of identical fingerprints; one row per fingerprint, occurrences bumped
`first_seen_at`, `last_seen_at`	timestamp	When this fingerprint first / most recently appeared

The fingerprint mechanism means a recurring error becomes one row, not thousands. The maximum cardinality of error rows from one machine over the product's lifetime is bounded by the number of distinct error fingerprints in the binary — typically dozens, not millions.

3.4 Crash reports (`telemetry_crashes` table)

If the binary panics, the panic hook captures:

Field	Type	Notes
Identity fields (§3.1)	—
`panic_message`	string	Sanitised panic message
`location`	string	`file:line` from the Rust crate (e.g. `crates/memtrace-mcp/src/server.rs:142`)
`backtrace`	string	Sanitised Rust backtrace, capped at 16 KB
`occurred_at`	timestamp	When the panic happened locally

The location field is the crate file path inside the Memtrace binary, not a customer file path. The backtrace is the Rust call stack of the binary's own code.

Crash reports are written synchronously to ~/.memtrace/telemetry/queue.jsonl inside the panic hook, so a hard crash that exits the process still leaves a breadcrumb. They flush to the ingestion endpoint on the next successful binary run.

3.5 Rail shadow records (`rail_shadow` table)

Memtrace Rail is an optional router that can intercept code-discovery searches (grep/ripgrep/find for source symbols) and answer them from the Memtrace graph. When Rail is active it records a content-free measurement of what it would have returned — the dataset used to decide whether Rail is reliable enough to enable by default. No part of the search query, the matched files, or any result content is captured.

Field	Type	Notes
Identity fields (§3.1)	—	`device_id`, `version`, `os` — same as other streams
`mode`	enum	`observe` / `nudge` / `rail` / `strict`
`surface`	enum	always `memtrace_owned` (a source-symbol search in an indexed repo)
`would_route`	bool	whether Rail would route this search to Memtrace
`shape`	enum	`identifier` / `alternation` / `phrase` / `regex` / `empty` — the shape of the search pattern, derived locally; never the pattern text
`retrieval`	enum	`hit` / `miss` / `unavailable` — did Memtrace return a confident result
`score_bucket`	enum	`lt10` / `b10` / `b25` / `gte50` — bucketed relevance score, never the raw float
`relevance_proxy`	bool	computed on-device: did the top result's name/path contain a token from the search? Only the boolean is transmitted; the strings are compared locally and discarded
`latency_bucket`	enum	`fast` (<100 ms) / `mid` / `slow`
`occurred_at`	timestamp	When the search happened locally

Conditions for emission. Produced by default in observe mode (every install), one row per Memtrace-owned code search. It is measured asynchronously, off the user's critical path: the search hook records a request to a local spool and returns immediately — it issues no query — and the long-running daemon performs the retrieval in the background. The search therefore incurs no added latency. Enforcing modes (memtrace rail enable nudge|rail|strict) additionally measure inline. Opt-out: MEMTRACE_TELEMETRY=off (all telemetry) or MEMTRACE_RAIL_SHADOW=off (Rail only); MEMTRACE_RAIL_SHADOW_SAMPLE (0–1) bounds the background sampling rate. Records do not pass through the §4 text sanitiser because they contain no free-text fields — only enums, buckets, and booleans.

4. Sanitisation pipeline

Before any error message, panic message, or backtrace is written to the local queue, it passes through a sanitiser implemented in the binary. The sanitiser performs three transformations:

Transformation	Pattern	Replacement
Home-directory paths	Any absolute path under the OS-detected `$HOME` (or its Windows equivalent)	`~`
Token-shaped strings	Regex `[A-Za-z0-9_+/=-]{40,}` (matches API tokens, session tokens, JWTs, GitHub PATs, base64-encoded secrets)	`<redacted-token>`
Email addresses	RFC 5322 simplified regex	`<redacted-email>`

Sanitisation is applied before the content fingerprint is computed, so two errors that differ only in their redacted content collapse to the same fingerprint.

The sanitiser source of truth is the public repo at crates/memtrace-mcp/src/telemetry.rs — there are no closed-source telemetry paths.

4.1 What the sanitiser is not designed to do

Customers operating in regulated environments should understand the limits:

The sanitiser does not strip arbitrary file names below $HOME. A panic backtrace that includes ~/clients/acme-corp/audit-2026/main.py would emit ~/clients/acme-corp/audit-2026/main.py — the home-directory prefix collapses, but the directory structure below it does not.
The sanitiser does not classify content semantically. It uses regex patterns. A panic that happened to log a customer name, a project codename, or a non-token-shaped secret would not be redacted.
The sanitiser does not parse JSON / structured payloads. It runs against the log line as a string.

In practice, Memtrace's own crates do not log customer content at WARN/ERROR level — these levels are reserved for indexer / runtime / network errors. But the sanitiser is a defence-in-depth measure, not a guarantee that no customer-derived path or identifier could ever appear in a sanitised payload.

If your data classification policy treats any path component below $HOME as sensitive (for example, KPMG client codenames in directory names), set MEMTRACE_TELEMETRY=off and rely on local error inspection.

5. What is explicitly never collected by product telemetry (§3)

The telemetry pipeline schema on the receiving end has no column for the following — collecting them would require a new product release. None of these ever leave the customer machine via the product-telemetry endpoint:

Source code or file contents
Symbol names extracted from customer code (but see §6.4 — symbol names can cross the network via the heartbeat only when the customer opts into the Weekly Memtrace Receipt feature)
Embeddings, BM25 indices, or any derived data
Repository names, paths, or remote URLs
GitHub PR URLs, pull request discussion text, issue/review/comment bodies, or reviewer identities
Branch names, commit messages, commit hashes, or git history
Search queries (find_code query strings)
File paths pointing inside the indexed repository, except where they appear in a sanitised crash backtrace (§4.1)
Environment variable values (the sanitiser strips token-shaped strings; the binary does not read environment values directly into telemetry payloads)
IP addresses on the server side (standard request logs are retained for 7 days for abuse mitigation only and are not joined to telemetry tables)

6. Required network calls (non-telemetry)

For completeness, two additional categories of network traffic exist. Neither contains customer content.

6.1 License authentication


Endpoint	`POST https://www.memtrace.io/api/device/auth`
Transport	HTTPS, TLS 1.3
Payload	License key (`MTC-COM-…`), machine hostname (used only as a human-readable label in the licensing dashboard)
Frequency	First run, then refresh near session-token expiry (typically every 24 hours)
Purpose	Validate the license, issue session token
Offline behaviour	24-hour grace period before re-validation required

6.2 Usage heartbeat


Endpoint	`POST https://www.memtrace.io/api/device/heartbeat`
Transport	HTTPS, TLS 1.3
Payload	Aggregate integer counts only: `{ "totalNodes": <int>, "totalEdges": <int>, "totalEpisodes": <int>, "totalRepositories": <int> }`. No symbol names, no paths, no code.
Frequency	Every 15 minutes while the daemon is running
Purpose	Usage metering, entitlement checks

6.3 Embedding model download (one-time, inbound)


Source	HuggingFace Hub via the `fastembed` library
Direction	Inbound only — Memtrace downloads model weights; nothing about the customer machine is uploaded
Payload	ONNX model weights (typically `jina-embeddings-v2-base-code` or `bge-small-en-v1.5`)
Frequency	Once on first run, cached at `~/.cache/fastembed/` thereafter
Customer content sent	None

6.4 Weekly Memtrace Receipt (opt-in, off by default)

A separate opt-in feature that turns the usage heartbeat into the source data for a weekly summary email sent to the customer's registered memtrace.io email address. This is the only configuration under which symbol names can leave the customer machine.


Endpoint	The existing heartbeat endpoint (`POST https://www.memtrace.io/api/device/heartbeat`); receipt payload is attached when this feature is enabled
How to opt in	Toggled on the memtrace.io account dashboard — off by default for every new account
Payload	A small symbol-name surface (the symbols the weekly email needs to render) in addition to the standard heartbeat counts
Frequency	Same as the standard heartbeat — every 15 minutes while the daemon is running, aggregated server-side into one weekly email
Per-machine kill switch	`MEMTRACE_NO_REMOTE_RECEIPT=1`. Set this on a specific machine and the heartbeat from that machine carries no symbol-name surface even if the account toggle is on — the server then has no concrete content to anchor the email and skips that week's send.

For a regulated environment, the recommended posture is:

Leave the Weekly Memtrace Receipt toggle off at the account level (its default).
As defence in depth, set MEMTRACE_NO_REMOTE_RECEIPT=1 in the developer-machine environment so a future account-level toggle change cannot silently start shipping symbol names from regulated machines.

7. Storage, retention, and access on the receiving end

Property	Value
Operator	Syncable ApS (Denmark, EU)
Storage location	Memtrace-operated PostgreSQL on `*.memtrace.io` infrastructure
Tables	`telemetry_events`, `telemetry_errors`, `telemetry_crashes`, `rail_shadow` (content-free Rail routing-quality buckets; emitted only when Rail is enabled)
Schema	`memtrace-ui/drizzle/0002_telemetry.sql` (closed-source repo; available under NDA for compliance review)
Retention	No automatic purge today. Retention policy of 90 days is committed before the dataset exceeds 90 days of history. Material changes announced in release notes.
Access	Admin analytics dashboard at `https://memtrace.io/admin/analytics`, gated to `@syncable.dev` email accounts via authenticated session. No third-party access.
Third parties	None. The pipeline ships no data to third-party analytics SDKs (Segment, Mixpanel, Datadog, etc.). The binary contains no embedded third-party telemetry library.
Sale or sharing	Telemetry data is not sold, shared, or published in anonymised aggregate form without prior notice in the release notes.

7.1 Right of erasure

Customers may request erasure of all telemetry data associated with their device_id by emailing support@syncable.dev with the device ID (visible in ~/.memtrace/credentials.json). Erasure is processed within 30 days and confirmed by email.

7.2 Breach notification

In the event of a confirmed compromise of the telemetry storage layer, affected customers will be notified within 72 hours of confirmation, via the email associated with their license key, with details of the impact and recommended actions.

8. Network egress allowlist (for organisational firewalls)

If outbound traffic from developer machines is filtered, the following destinations must be allowed for Memtrace to function:

Destination	Required for	Direction
`*.memtrace.io` (HTTPS / TCP 443)	License validation + heartbeat + telemetry	Outbound
`huggingface.co`, `cdn-lfs*.huggingface.co` (HTTPS / TCP 443)	One-time model download	Outbound (inbound payload)
`registry.npmjs.org` (HTTPS / TCP 443)	Only required when running `memtrace install` to upgrade	Outbound

Blocking huggingface.co after first run is safe — the model is cached. Blocking registry.npmjs.org only prevents upgrades. Blocking *.memtrace.io puts the binary into offline-grace mode for 24 hours before license re-validation is required.

To disable telemetry traffic specifically while keeping license validation, set MEMTRACE_TELEMETRY=off — license calls continue, the telemetry queue stops flushing.

9. Customer-side verification

The customer can verify locally exactly what is in the telemetry queue before it ships:

# Inspect the on-disk queue
cat ~/.memtrace/telemetry/queue.jsonl

# Each line is one record; the "kind" field is "event" | "error" | "crash"
# There is no separate raw buffer — the file shown above is the complete record.

To run Memtrace and accumulate a queue without shipping it (for compliance review):

Set MEMTRACE_TELEMETRY=off in the environment.
Run Memtrace normally — the queue is not written when telemetry is off.

If your goal is to inspect what would have been queued, run a normal session with telemetry on, then immediately read the JSONL file before the flusher's 60-second batched flush.

10. Opt-out procedure

10.1 Per-process

MEMTRACE_TELEMETRY=off memtrace start

Accepted off-values: off, 0, false, disabled, no (case-insensitive). Anything else, including unset, keeps telemetry enabled.

When set, the binary's behaviour is:

The panic hook still installs locally (so a crash in a disabled session still leaves a ~/.memtrace/telemetry/queue.jsonl breadcrumb), but the flusher never ships it.
The tracing layer becomes a no-op for telemetry — WARN/ERROR lines are still printed to stderr but are not queued.
The flusher exits immediately on startup — no network calls are made to the telemetry endpoint.
Usage event callsites short-circuit before any data is constructed.

10.2 Permanent — shell profile

# ~/.zshrc / ~/.bashrc
export MEMTRACE_TELEMETRY=off

10.3 Permanent — MCP client configuration

{
  "command": "memtrace",
  "args": ["mcp"],
  "env": { "MEMTRACE_TELEMETRY": "off" }
}

Applies to Claude Code, Cursor, Codex, Windsurf, and any MCP client that honours the env block.

10.4 The hard-override variable

A second environment variable, MEMTRACE_TELEMETRY_DISABLED=1, is documented in docs/environment-variables.md as a hard override that blocks telemetry regardless of any other state. For most users MEMTRACE_TELEMETRY=off is sufficient; the hard override is recommended for CI / locked-down environments where the higher-precedence variable should be unambiguous.

10.5 Verification that opt-out is active

The on-disk queue is the verification surface. After running Memtrace with MEMTRACE_TELEMETRY=off:

ls -la ~/.memtrace/telemetry/   # directory empty or absent

If the queue file is present and growing, telemetry is still on. If it's empty or missing, the kill switch took effect.

11. Recommended configuration for regulated environments

For an audit, financial-services, healthcare, or other regulated context, the following configuration is appropriate with product telemetry enabled:

Leave the Weekly Memtrace Receipt toggle off at the memtrace.io account level (its default). As defence in depth, set MEMTRACE_NO_REMOTE_RECEIPT=1 in the developer-machine environment so symbol names cannot leave the machine even if the account toggle is later changed.
Keep MEMTRACE_TELEMETRY at its default (on) and use the queue inspection procedure (§9) periodically to confirm no client-identifying paths appear in errors.
If the engagement involves hostnames that encode client identity, override the licensing hostname label so it doesn't surface in your account dashboard.
Add *.memtrace.io and huggingface.co to the egress allowlist.
Retain a copy of this datasheet and the linked source files in the project's compliance record.

If the organisation's policy prohibits any outbound diagnostic data regardless of content classification, set both MEMTRACE_TELEMETRY=off and MEMTRACE_NO_REMOTE_RECEIPT=1 permanently. The product remains fully functional in that mode — only license validation and the heartbeat (aggregate integer counts, no content) continue to run.

12. Change management

Any change to:

The list of fields collected
The sanitisation pipeline
The storage location or operator
Retention or access policy

will be announced in:

The release notes of the version that introduces the change.
This datasheet (with an entry in §13 below).
The companion PRIVACY.md and TELEMETRY.md files.

Customers who require a notice period before a material telemetry change reaches their environment should pin to a specific Memtrace version (npm install -g memtrace@0.3.89) and review release notes before upgrading.

13. Changelog

Version	Date	Change
v0.3.17	2025-09	Telemetry pipeline introduced (events, errors, crashes). Sanitiser shipped with launch. Default on, env-var opt-out.
v0.3.89	2026-05	This datasheet published. No change to telemetry behaviour.

14. Contacts

Topic	Contact
Compliance / DPA / SOC 2 questionnaire	`support@syncable.dev`
Security disclosures	`support@syncable.dev` (PGP key on request)
General support	`support@syncable.dev`
Public issue tracker	github.com/syncable-dev/memtrace-public/issues

A formal Data Processing Agreement (DPA), GDPR Article 30 record-of-processing-activities entry, and SOC 2 readiness questionnaire are available on request via support@syncable.dev.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memtrace Telemetry — Compliance Datasheet

1. Executive summary

2. Data flow

3. Data classification matrix

3.1 Identity fields (present in every telemetry payload)

3.2 Usage events (`telemetry_events` table)

3.3 Error events (`telemetry_errors` table)

3.4 Crash reports (`telemetry_crashes` table)

3.5 Rail shadow records (`rail_shadow` table)

4. Sanitisation pipeline

4.1 What the sanitiser is not designed to do

5. What is explicitly never collected by product telemetry (§3)

6. Required network calls (non-telemetry)

6.1 License authentication

6.2 Usage heartbeat

6.3 Embedding model download (one-time, inbound)

6.4 Weekly Memtrace Receipt (opt-in, off by default)

7. Storage, retention, and access on the receiving end

7.1 Right of erasure

7.2 Breach notification

8. Network egress allowlist (for organisational firewalls)

9. Customer-side verification

10. Opt-out procedure

10.1 Per-process

10.2 Permanent — shell profile

10.3 Permanent — MCP client configuration

10.4 The hard-override variable

10.5 Verification that opt-out is active

11. Recommended configuration for regulated environments

12. Change management

13. Changelog

14. Contacts

FilesExpand file tree

telemetry-compliance-datasheet.md

Latest commit

History

telemetry-compliance-datasheet.md

File metadata and controls

Memtrace Telemetry — Compliance Datasheet

1. Executive summary

2. Data flow

3. Data classification matrix

3.1 Identity fields (present in every telemetry payload)

3.2 Usage events (telemetry_events table)

3.3 Error events (telemetry_errors table)

3.4 Crash reports (telemetry_crashes table)

3.5 Rail shadow records (rail_shadow table)

4. Sanitisation pipeline

4.1 What the sanitiser is not designed to do

5. What is explicitly never collected by product telemetry (§3)

6. Required network calls (non-telemetry)

6.1 License authentication

6.2 Usage heartbeat

6.3 Embedding model download (one-time, inbound)

6.4 Weekly Memtrace Receipt (opt-in, off by default)

7. Storage, retention, and access on the receiving end

7.1 Right of erasure

7.2 Breach notification

8. Network egress allowlist (for organisational firewalls)

9. Customer-side verification

10. Opt-out procedure

10.1 Per-process

10.2 Permanent — shell profile

10.3 Permanent — MCP client configuration

10.4 The hard-override variable

10.5 Verification that opt-out is active

11. Recommended configuration for regulated environments

12. Change management

13. Changelog

14. Contacts

3.2 Usage events (`telemetry_events` table)

3.3 Error events (`telemetry_errors` table)

3.4 Crash reports (`telemetry_crashes` table)

3.5 Rail shadow records (`rail_shadow` table)