A formal, field-level specification of every byte Memtrace transmits from a customer machine. Intended for security, privacy, and audit functions reviewing whether Memtrace can run with telemetry enabled inside a regulated environment.
This document is the authoritative reference. Companion docs (PRIVACY.md, TELEMETRY.md, docs/privacy-and-telemetry.md) provide narrative summaries. Where any of those conflict with this datasheet, this datasheet is correct.
Version reference: Memtrace v0.4.62 (current at time of writing). The pipeline has been stable since v0.3.17, with PR workflow telemetry added in v0.4.62. Material changes are announced in the release notes and reflected in this file.
Memtrace runs entirely on the customer's machine. Source code, file contents, embeddings, repository paths, branch names, commit data, and search queries are never transmitted off the machine under any configuration. Symbol names are likewise never transmitted except in one explicit case: the customer opts into the Weekly Memtrace Receipt feature on the memtrace.io account dashboard (off by default; see §6.4).
The product makes four categories of network call:
- License validation + usage heartbeat — required, no customer content (license token, device hash, aggregate integer counts).
- Product telemetry — on by default, can be disabled with one environment variable. Contains sanitised crash, error, and lightweight usage events. No customer content.
- Weekly Memtrace Receipt — off by default, opt-in via memtrace.io account settings. When enabled, the heartbeat carries a small symbol-name surface used to render the weekly email. One environment variable kills this stream specifically while leaving other telemetry behaviour unchanged.
- One-time model download — inbound only, first run, from HuggingFace.
For a regulated environment, the recommendation is: keep product telemetry enabled, leave the Weekly Memtrace Receipt off (its default), and add MEMTRACE_TELEMETRY=off + MEMTRACE_NO_REMOTE_RECEIPT=1 if the organisation's policy prohibits any outbound diagnostic data regardless of content.
Customer machine Memtrace infrastructure
┌──────────────────────────┐ ┌──────────────────────────┐
│ memtrace start runtime │ │ │
│ ┌────────────────────┐ │ HTTPS (TLS 1.3) │ api.memtrace.io │
│ │ AST parser │ │ ──────────────────▶ │ /api/device/auth │
│ │ MemDB (local) │ │ License + heart- │ /api/device/heartbeat │
│ │ Embedding (ONNX) │ │ beat │ /api/telemetry/ingest │
│ │ Reranker │ │ │ │
│ │ Sanitiser │ │ │ 3 Postgres tables │
│ │ Telemetry queue │ │ │ Admin dashboard │
│ └────────────────────┘ │ │ (@syncable.dev only) │
│ │ │ │ │
│ ▼ │ │ │
│ ~/.memtrace/ │ └──────────────────────────┘
│ telemetry/queue.jsonl │
│ embed-cache/ │
│ credentials.json │
│ <project>/.memdb/ │
└──────────────────────────┘
All AST parsing, embedding, ranking, and graph storage runs locally. The customer machine's only outbound calls are to *.memtrace.io for the three endpoints above (plus a one-time inbound model download from HuggingFace).
The following is the complete inventory of fields that leave a customer machine via telemetry. Every field is enumerated; nothing is collected outside this list.
| Field | Type | Source | Example | Customer-content? |
|---|---|---|---|---|
device_id |
string (UUID) | Generated locally at first run, stored in ~/.memtrace/credentials.json |
a1b2c3d4-e5f6-... |
No |
version |
string | Compiled into the binary | 0.3.89 |
No |
target |
string | Compile-time platform triple | aarch64-apple-darwin |
No |
os |
string | Runtime detection | macos-aarch64 |
No |
tier |
string | Runtime resource detection | standard / light / heavy |
No |
device_id is not reversible to the machine's hardware identity, hostname, IP, or user account. It is a randomly generated UUID stored in the credentials file. Deleting ~/.memtrace/credentials.json and re-authenticating issues a new one.
One row per discrete signal the binary emits. The complete list of event types:
| Event name | When it fires | Fields beyond §3.1 identity |
|---|---|---|
start |
Every memtrace start / memtrace mcp invocation |
subcommand (string, e.g. start, mcp), transport (string, e.g. stdio, streamable-http) |
index_complete |
After Phase-1 indexing finishes | duration_ms (integer), repo_count (integer, number of repos indexed — not names) |
embed_complete |
After Phase-2 embedding finishes | duration_ms (integer), embedding_count (integer, number of embeddings produced — not content) |
pr_review_completed |
After memtrace code-review completes a GitHub PR review run |
posted (boolean), watch (boolean), comment_count (integer), finding_count (integer), graph_mode (string, e.g. strict/off), min_severity (string), severity_counts (JSON object with low/medium/high/critical integer buckets), source_counts (JSON object of numeric source buckets only) |
pr_watch_registered |
When memtrace code-review --post --watch registers a local PR watch |
comment_count (integer), graph_mode (string), status (string enum, initially awaiting_response) |
pr_watch_synced |
When watched PRs are polled by memtrace start, memtrace mcp, or memtrace pr sync |
watch_count, changed_count, awaiting_response_count, human_replied_count, approved_count, changes_requested_count, stale_after_push_count, merged_count, closed_count, poll_error_count (all integers) |
pr_watch_poll_error |
When polling one watched PR fails | error_kind (string enum: rate_limited, token, github, parse, unknown) |
No event payload contains file paths, symbol names, repository names, PR URLs, owner names, branch names, commit hashes, reviewer identities, comment bodies, discussion text, query content, or any other customer-derived data. Counts are integers only, except for low-cardinality mode/status/error enums.
PR watch state is persisted locally at ~/.memtrace/pr-watches.json so the local daemon can poll GitHub for the PRs Memtrace reviewed. That local file may contain PR coordinates and the local repo root. It is not uploaded through telemetry.
The binary uses the tracing crate for internal logging. WARN- and ERROR-level log lines emitted by Memtrace's own crates are mirrored to the telemetry queue after passing through a sanitiser (see §4).
Schema:
| Field | Type | Notes |
|---|---|---|
| Identity fields (§3.1) | — | |
level |
string | WARN or ERROR |
target |
string | Tracing target — Memtrace crate name (e.g. memtrace_mcp::search) |
message |
string | Sanitised log message, max 8 KB |
fingerprint |
string | sha256(version ‖ target ‖ level ‖ first 6 tokens of message) |
occurrences |
integer | Count of identical fingerprints; one row per fingerprint, occurrences bumped |
first_seen_at, last_seen_at |
timestamp | When this fingerprint first / most recently appeared |
The fingerprint mechanism means a recurring error becomes one row, not thousands. The maximum cardinality of error rows from one machine over the product's lifetime is bounded by the number of distinct error fingerprints in the binary — typically dozens, not millions.
If the binary panics, the panic hook captures:
| Field | Type | Notes |
|---|---|---|
| Identity fields (§3.1) | — | |
panic_message |
string | Sanitised panic message |
location |
string | file:line from the Rust crate (e.g. crates/memtrace-mcp/src/server.rs:142) |
backtrace |
string | Sanitised Rust backtrace, capped at 16 KB |
occurred_at |
timestamp | When the panic happened locally |
The location field is the crate file path inside the Memtrace binary, not a customer file path. The backtrace is the Rust call stack of the binary's own code.
Crash reports are written synchronously to ~/.memtrace/telemetry/queue.jsonl inside the panic hook, so a hard crash that exits the process still leaves a breadcrumb. They flush to the ingestion endpoint on the next successful binary run.
Memtrace Rail is an optional router that can intercept code-discovery searches (grep/ripgrep/find for source symbols) and answer them from the Memtrace graph. When Rail is active it records a content-free measurement of what it would have returned — the dataset used to decide whether Rail is reliable enough to enable by default. No part of the search query, the matched files, or any result content is captured.
| Field | Type | Notes |
|---|---|---|
| Identity fields (§3.1) | — | device_id, version, os — same as other streams |
mode |
enum | observe / nudge / rail / strict |
surface |
enum | always memtrace_owned (a source-symbol search in an indexed repo) |
would_route |
bool | whether Rail would route this search to Memtrace |
shape |
enum | identifier / alternation / phrase / regex / empty — the shape of the search pattern, derived locally; never the pattern text |
retrieval |
enum | hit / miss / unavailable — did Memtrace return a confident result |
score_bucket |
enum | lt10 / b10 / b25 / gte50 — bucketed relevance score, never the raw float |
relevance_proxy |
bool | computed on-device: did the top result's name/path contain a token from the search? Only the boolean is transmitted; the strings are compared locally and discarded |
latency_bucket |
enum | fast (<100 ms) / mid / slow |
occurred_at |
timestamp | When the search happened locally |
Conditions for emission. Produced by default in observe mode (every install), one row per Memtrace-owned code search. It is measured asynchronously, off the user's critical path: the search hook records a request to a local spool and returns immediately — it issues no query — and the long-running daemon performs the retrieval in the background. The search therefore incurs no added latency. Enforcing modes (memtrace rail enable nudge|rail|strict) additionally measure inline. Opt-out: MEMTRACE_TELEMETRY=off (all telemetry) or MEMTRACE_RAIL_SHADOW=off (Rail only); MEMTRACE_RAIL_SHADOW_SAMPLE (0–1) bounds the background sampling rate. Records do not pass through the §4 text sanitiser because they contain no free-text fields — only enums, buckets, and booleans.
Before any error message, panic message, or backtrace is written to the local queue, it passes through a sanitiser implemented in the binary. The sanitiser performs three transformations:
| Transformation | Pattern | Replacement |
|---|---|---|
| Home-directory paths | Any absolute path under the OS-detected $HOME (or its Windows equivalent) |
~ |
| Token-shaped strings | Regex [A-Za-z0-9_+/=-]{40,} (matches API tokens, session tokens, JWTs, GitHub PATs, base64-encoded secrets) |
<redacted-token> |
| Email addresses | RFC 5322 simplified regex | <redacted-email> |
Sanitisation is applied before the content fingerprint is computed, so two errors that differ only in their redacted content collapse to the same fingerprint.
The sanitiser source of truth is the public repo at crates/memtrace-mcp/src/telemetry.rs — there are no closed-source telemetry paths.
Customers operating in regulated environments should understand the limits:
- The sanitiser does not strip arbitrary file names below
$HOME. A panic backtrace that includes~/clients/acme-corp/audit-2026/main.pywould emit~/clients/acme-corp/audit-2026/main.py— the home-directory prefix collapses, but the directory structure below it does not. - The sanitiser does not classify content semantically. It uses regex patterns. A panic that happened to log a customer name, a project codename, or a non-token-shaped secret would not be redacted.
- The sanitiser does not parse JSON / structured payloads. It runs against the log line as a string.
In practice, Memtrace's own crates do not log customer content at WARN/ERROR level — these levels are reserved for indexer / runtime / network errors. But the sanitiser is a defence-in-depth measure, not a guarantee that no customer-derived path or identifier could ever appear in a sanitised payload.
If your data classification policy treats any path component below $HOME as sensitive (for example, KPMG client codenames in directory names), set MEMTRACE_TELEMETRY=off and rely on local error inspection.
The telemetry pipeline schema on the receiving end has no column for the following — collecting them would require a new product release. None of these ever leave the customer machine via the product-telemetry endpoint:
- Source code or file contents
- Symbol names extracted from customer code (but see §6.4 — symbol names can cross the network via the heartbeat only when the customer opts into the Weekly Memtrace Receipt feature)
- Embeddings, BM25 indices, or any derived data
- Repository names, paths, or remote URLs
- GitHub PR URLs, pull request discussion text, issue/review/comment bodies, or reviewer identities
- Branch names, commit messages, commit hashes, or git history
- Search queries (
find_codequery strings) - File paths pointing inside the indexed repository, except where they appear in a sanitised crash backtrace (§4.1)
- Environment variable values (the sanitiser strips token-shaped strings; the binary does not read environment values directly into telemetry payloads)
- IP addresses on the server side (standard request logs are retained for 7 days for abuse mitigation only and are not joined to telemetry tables)
For completeness, two additional categories of network traffic exist. Neither contains customer content.
| Endpoint | POST https://www.memtrace.io/api/device/auth |
| Transport | HTTPS, TLS 1.3 |
| Payload | License key (MTC-COM-…), machine hostname (used only as a human-readable label in the licensing dashboard) |
| Frequency | First run, then refresh near session-token expiry (typically every 24 hours) |
| Purpose | Validate the license, issue session token |
| Offline behaviour | 24-hour grace period before re-validation required |
| Endpoint | POST https://www.memtrace.io/api/device/heartbeat |
| Transport | HTTPS, TLS 1.3 |
| Payload | Aggregate integer counts only: { "totalNodes": <int>, "totalEdges": <int>, "totalEpisodes": <int>, "totalRepositories": <int> }. No symbol names, no paths, no code. |
| Frequency | Every 15 minutes while the daemon is running |
| Purpose | Usage metering, entitlement checks |
| Source | HuggingFace Hub via the fastembed library |
| Direction | Inbound only — Memtrace downloads model weights; nothing about the customer machine is uploaded |
| Payload | ONNX model weights (typically jina-embeddings-v2-base-code or bge-small-en-v1.5) |
| Frequency | Once on first run, cached at ~/.cache/fastembed/ thereafter |
| Customer content sent | None |
A separate opt-in feature that turns the usage heartbeat into the source data for a weekly summary email sent to the customer's registered memtrace.io email address. This is the only configuration under which symbol names can leave the customer machine.
| Endpoint | The existing heartbeat endpoint (POST https://www.memtrace.io/api/device/heartbeat); receipt payload is attached when this feature is enabled |
| How to opt in | Toggled on the memtrace.io account dashboard — off by default for every new account |
| Payload | A small symbol-name surface (the symbols the weekly email needs to render) in addition to the standard heartbeat counts |
| Frequency | Same as the standard heartbeat — every 15 minutes while the daemon is running, aggregated server-side into one weekly email |
| Per-machine kill switch | MEMTRACE_NO_REMOTE_RECEIPT=1. Set this on a specific machine and the heartbeat from that machine carries no symbol-name surface even if the account toggle is on — the server then has no concrete content to anchor the email and skips that week's send. |
For a regulated environment, the recommended posture is:
- Leave the Weekly Memtrace Receipt toggle off at the account level (its default).
- As defence in depth, set
MEMTRACE_NO_REMOTE_RECEIPT=1in the developer-machine environment so a future account-level toggle change cannot silently start shipping symbol names from regulated machines.
| Property | Value |
|---|---|
| Operator | Syncable ApS (Denmark, EU) |
| Storage location | Memtrace-operated PostgreSQL on *.memtrace.io infrastructure |
| Tables | telemetry_events, telemetry_errors, telemetry_crashes, rail_shadow (content-free Rail routing-quality buckets; emitted only when Rail is enabled) |
| Schema | memtrace-ui/drizzle/0002_telemetry.sql (closed-source repo; available under NDA for compliance review) |
| Retention | No automatic purge today. Retention policy of 90 days is committed before the dataset exceeds 90 days of history. Material changes announced in release notes. |
| Access | Admin analytics dashboard at https://memtrace.io/admin/analytics, gated to @syncable.dev email accounts via authenticated session. No third-party access. |
| Third parties | None. The pipeline ships no data to third-party analytics SDKs (Segment, Mixpanel, Datadog, etc.). The binary contains no embedded third-party telemetry library. |
| Sale or sharing | Telemetry data is not sold, shared, or published in anonymised aggregate form without prior notice in the release notes. |
Customers may request erasure of all telemetry data associated with their device_id by emailing support@syncable.dev with the device ID (visible in ~/.memtrace/credentials.json). Erasure is processed within 30 days and confirmed by email.
In the event of a confirmed compromise of the telemetry storage layer, affected customers will be notified within 72 hours of confirmation, via the email associated with their license key, with details of the impact and recommended actions.
If outbound traffic from developer machines is filtered, the following destinations must be allowed for Memtrace to function:
| Destination | Required for | Direction |
|---|---|---|
*.memtrace.io (HTTPS / TCP 443) |
License validation + heartbeat + telemetry | Outbound |
huggingface.co, cdn-lfs*.huggingface.co (HTTPS / TCP 443) |
One-time model download | Outbound (inbound payload) |
registry.npmjs.org (HTTPS / TCP 443) |
Only required when running memtrace install to upgrade |
Outbound |
Blocking huggingface.co after first run is safe — the model is cached. Blocking registry.npmjs.org only prevents upgrades. Blocking *.memtrace.io puts the binary into offline-grace mode for 24 hours before license re-validation is required.
To disable telemetry traffic specifically while keeping license validation, set MEMTRACE_TELEMETRY=off — license calls continue, the telemetry queue stops flushing.
The customer can verify locally exactly what is in the telemetry queue before it ships:
# Inspect the on-disk queue
cat ~/.memtrace/telemetry/queue.jsonl
# Each line is one record; the "kind" field is "event" | "error" | "crash"
# There is no separate raw buffer — the file shown above is the complete record.To run Memtrace and accumulate a queue without shipping it (for compliance review):
- Set
MEMTRACE_TELEMETRY=offin the environment. - Run Memtrace normally — the queue is not written when telemetry is off.
If your goal is to inspect what would have been queued, run a normal session with telemetry on, then immediately read the JSONL file before the flusher's 60-second batched flush.
MEMTRACE_TELEMETRY=off memtrace startAccepted off-values: off, 0, false, disabled, no (case-insensitive). Anything else, including unset, keeps telemetry enabled.
When set, the binary's behaviour is:
- The panic hook still installs locally (so a crash in a disabled session still leaves a
~/.memtrace/telemetry/queue.jsonlbreadcrumb), but the flusher never ships it. - The
tracinglayer becomes a no-op for telemetry —WARN/ERRORlines are still printed to stderr but are not queued. - The flusher exits immediately on startup — no network calls are made to the telemetry endpoint.
- Usage event callsites short-circuit before any data is constructed.
# ~/.zshrc / ~/.bashrc
export MEMTRACE_TELEMETRY=off{
"command": "memtrace",
"args": ["mcp"],
"env": { "MEMTRACE_TELEMETRY": "off" }
}Applies to Claude Code, Cursor, Codex, Windsurf, and any MCP client that honours the env block.
A second environment variable, MEMTRACE_TELEMETRY_DISABLED=1, is documented in docs/environment-variables.md as a hard override that blocks telemetry regardless of any other state. For most users MEMTRACE_TELEMETRY=off is sufficient; the hard override is recommended for CI / locked-down environments where the higher-precedence variable should be unambiguous.
The on-disk queue is the verification surface. After running Memtrace with MEMTRACE_TELEMETRY=off:
ls -la ~/.memtrace/telemetry/ # directory empty or absentIf the queue file is present and growing, telemetry is still on. If it's empty or missing, the kill switch took effect.
For an audit, financial-services, healthcare, or other regulated context, the following configuration is appropriate with product telemetry enabled:
- Leave the Weekly Memtrace Receipt toggle off at the memtrace.io account level (its default). As defence in depth, set
MEMTRACE_NO_REMOTE_RECEIPT=1in the developer-machine environment so symbol names cannot leave the machine even if the account toggle is later changed. - Keep
MEMTRACE_TELEMETRYat its default (on) and use the queue inspection procedure (§9) periodically to confirm no client-identifying paths appear in errors. - If the engagement involves hostnames that encode client identity, override the licensing hostname label so it doesn't surface in your account dashboard.
- Add
*.memtrace.ioandhuggingface.coto the egress allowlist. - Retain a copy of this datasheet and the linked source files in the project's compliance record.
If the organisation's policy prohibits any outbound diagnostic data regardless of content classification, set both MEMTRACE_TELEMETRY=off and MEMTRACE_NO_REMOTE_RECEIPT=1 permanently. The product remains fully functional in that mode — only license validation and the heartbeat (aggregate integer counts, no content) continue to run.
Any change to:
- The list of fields collected
- The sanitisation pipeline
- The storage location or operator
- Retention or access policy
will be announced in:
- The release notes of the version that introduces the change.
- This datasheet (with an entry in §13 below).
- The companion
PRIVACY.mdandTELEMETRY.mdfiles.
Customers who require a notice period before a material telemetry change reaches their environment should pin to a specific Memtrace version (npm install -g memtrace@0.3.89) and review release notes before upgrading.
| Version | Date | Change |
|---|---|---|
| v0.3.17 | 2025-09 | Telemetry pipeline introduced (events, errors, crashes). Sanitiser shipped with launch. Default on, env-var opt-out. |
| v0.3.89 | 2026-05 | This datasheet published. No change to telemetry behaviour. |
| Topic | Contact |
|---|---|
| Compliance / DPA / SOC 2 questionnaire | support@syncable.dev |
| Security disclosures | support@syncable.dev (PGP key on request) |
| General support | support@syncable.dev |
| Public issue tracker | github.com/syncable-dev/memtrace-public/issues |
A formal Data Processing Agreement (DPA), GDPR Article 30 record-of-processing-activities entry, and SOC 2 readiness questionnaire are available on request via support@syncable.dev.