-
Notifications
You must be signed in to change notification settings - Fork 2k
docs: ADR-0027 feature-usage bitmask in the User-Agent #6500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
3b8cf13
e5cc3d0
676d884
c945b33
e075a78
2ef56b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,280 @@ | ||
| --- | ||
| status: proposed | ||
| contact: eavanvalkenburg | ||
| date: 2026-06-12 | ||
| deciders: eavanvalkenburg | ||
| consulted: | ||
| informed: | ||
| --- | ||
|
|
||
| # Feature-usage bitmask in the User-Agent | ||
|
|
||
| ## Context and Problem Statement | ||
|
|
||
| We can see which Agent Framework packages are installed and that *some* framework | ||
| call happened (via the existing `agent-framework-python/{version}` User-Agent), | ||
| but we have no usage-based signal about **which features are actually exercised** | ||
| at runtime, nor which are used *together* (e.g. workflows + MCP + Foundry). How | ||
| can we collect a lightweight, privacy-respecting signal of feature usage for the | ||
| traffic we can actually read, without standing up new event pipelines? | ||
|
|
||
| The detailed mechanism is in [SPEC-002](../specs/002-feature-usage-telemetry.md); | ||
| the per-language bit tables are in | ||
| [feature-usage-bit-registry.md](../specs/feature-usage-bit-registry.md). | ||
|
|
||
| ## Decision Drivers | ||
|
|
||
| - **Transparency** — openly documented, human-decodable, user-controllable. No | ||
| hidden or obfuscated telemetry. | ||
| - **First-party scope / no third-party leakage** — emit only to Azure/Foundry | ||
| endpoints (the telemetry we can ingest); never leak a feature fingerprint into | ||
| third-party logs we cannot read. | ||
| - **Live signal** — reflect features exercised *so far*, re-evaluated per request, | ||
| not frozen at client construction. | ||
| - **Low cost / few moving parts** — reuse telemetry already in the request path; | ||
| near-zero runtime overhead; as little machinery as the job needs. | ||
| - **Privacy** — encode only coarse boolean feature usage; no identifiers, | ||
| arguments, prompts, or payloads. | ||
|
|
||
| ## Considered Options | ||
|
|
||
| The options below are grouped by the decisions that matter: the **transport**, | ||
| the **granularity**, and the **registry sharing model**. | ||
|
|
||
| ### Transport | ||
|
|
||
| #### A. User-Agent token, first-party only, per request (chosen) | ||
|
|
||
| Stamp a `(feat=...)` comment onto the UA, but only on Azure/Foundry clients, and | ||
| re-evaluate it per request. | ||
|
|
||
| - Good, reuses telemetry already sent to the one backend we can read. | ||
| - Good, per-request stamping reflects the live mask (not frozen at construction). | ||
| - Good, first-party scoping means no fingerprint leaks to third-party providers. | ||
| - Good, maps onto .NET's existing per-request UA pipeline policies unchanged. | ||
| - Bad, no signal for traffic that never hits a first-party endpoint (accepted — | ||
| we couldn't read it anyway). | ||
|
|
||
| #### B. User-Agent token on all clients | ||
|
|
||
| - Good, simplest to wire (one static header). | ||
| - Bad, sends a deployment fingerprint to OpenAI/Anthropic/AWS/Google logs we | ||
| cannot read — privacy leak for zero benefit. | ||
| - Bad, baked into static `default_headers`, so it freezes at client construction | ||
| and reports a near-empty mask. | ||
|
|
||
| #### C. OpenTelemetry span/resource attribute | ||
|
|
||
| - Good, precise per-call usage; no UA change. | ||
| - Bad (**privacy — the main reason to hold it**), a span attribute broadcasts the | ||
| feature-combination fingerprint into the user's **general** telemetry pipeline, | ||
| which is typically exported to third-party APM vendors (Datadog, Honeycomb, …). | ||
| That re-introduces exactly the fingerprint leakage the first-party-only UA | ||
| scoping (A) was chosen to avoid — just into a different set of third parties. | ||
| - Bad (secondary), also a cardinality footgun (a growing, combinatorial value | ||
| must never become a metric dimension). | ||
| - Neutral, for the team's own goal it reaches us only if the user exports to | ||
| Azure Monitor and we query it. | ||
| - **Deferred, not rejected.** The version prefix lets us add it later **if** the | ||
| privacy review blesses a broadly-emitted mask (or a scoped/redacted variant) | ||
| and a concrete query needs the per-call precision. | ||
|
|
||
| #### D. Bespoke usage events | ||
|
|
||
| - Good, richest detail and flexibility. | ||
| - Bad, new data flow and cost; larger privacy surface; heavy to build and review; | ||
| overkill for a coarse "which features" signal. | ||
|
|
||
| #### E. Install/import-time signal only (status quo-ish) | ||
|
|
||
| - Good, zero new runtime work. | ||
| - Bad, measures installation, not usage; cannot capture feature combinations — | ||
| does not solve the problem. | ||
|
|
||
| ### Granularity | ||
|
|
||
| #### F. Per package, with core broken out per feature/provider (chosen) | ||
|
|
||
| - Good, ~50 bits (Python) / ~40 (.NET) fit a **64-bit** mask, which keeps .NET's | ||
| accumulator lock-free (`Interlocked.Or`) and the registry hand-maintainable. | ||
| - Good, matches the actual questions ("which orchestration / which built-in | ||
| provider / which package?") — each orchestration pattern and each built-in | ||
| context/history provider gets its own bit, since they serve different purposes. | ||
| - Neutral, cannot distinguish sub-features *within* a provider package (e.g. | ||
| openai chat vs embeddings) until a bit is promoted. | ||
|
|
||
| #### G. Per construct (one bit per instantiable type) | ||
|
|
||
| - Good, finest detail. | ||
| - Bad, ~96 bits forces a 128-bit mask, which forfeits .NET's lock-free | ||
| `Interlocked.Or` (needs a lock / `UInt128`). | ||
| - Bad, ~96 call sites across two SDKs; the sheer count pushes toward code | ||
| generation and extra tests — machinery to manage machinery. | ||
| - Bad, precision nobody's decision actually needs. | ||
|
|
||
| ### Registry sharing model | ||
|
|
||
| #### H. Per-language bit lists (chosen) | ||
|
|
||
| Each SDK owns an independent list; the decoder picks the list using the language | ||
| already present in the UA product token. | ||
|
|
||
| - Good, **no cross-language coordination**: each SDK numbers and evolves its | ||
| features independently; adding a Python feature never touches .NET numbering. | ||
| - Good, no null placeholders for one-SDK features, no "same bit, same meaning" | ||
| rule, no SDK-aware decode caveats. | ||
| - Good, decoding is trivial: language (from UA) + version -> list -> AND. | ||
| - Neutral, two small lists to maintain instead of one (but they were going to | ||
| diverge anyway — the packages differ). | ||
|
|
||
| #### I. Single shared cross-language registry | ||
|
|
||
| - Good, one list, one number space. | ||
| - Bad, forces synchronized numbering and null placeholders for features that | ||
| exist in only one SDK, plus SDK-aware decode rules. | ||
| - Bad, the synchronization is pure accidental complexity — **the language is | ||
| already in the User-Agent**, so sharing the number space buys nothing. | ||
|
|
||
| ### Registry maintenance | ||
|
|
||
| #### J. Hand-written enum + parity test (chosen) | ||
|
|
||
| - Good, ~40 members that change a few times a year; a 10-line test (enum vs JSON | ||
| list) is enough. | ||
|
eavanvalkenburg marked this conversation as resolved.
Outdated
|
||
| - Good, no build step, no generator to own. | ||
|
|
||
| #### K. Code-generate the enums from the registry | ||
|
|
||
| - Bad, a generator + drift test + schema test to maintain a short list of | ||
| integer constants; justified only by the per-construct bit count we rejected. | ||
|
|
||
| ### Representation (how the mask is rendered as text) | ||
|
|
||
| All examples below encode the same mask — bits 0, 2, 16, 22, 27 set | ||
| (agent + workflow + sequential-orchestration + foundry.chat_client + openai, in | ||
| the Python v1 list) = decimal `138477573`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have we gone over how the registry is versioned?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is a section on that topic, the basics are to only up the version when we need to remove keys, adding can be done inplace (within the limits of the number of bits) |
||
|
|
||
| #### L. Decimal — `feat=v1.138477573` | ||
|
|
||
| - Good, human-familiar; trivial to parse. | ||
| - Neutral, no visual alignment to bit/nibble boundaries; slightly longer than hex | ||
| for large masks. No advantage over hex. | ||
|
|
||
| #### M. Hex (chosen) — `feat=v1.8410005` | ||
|
|
||
| - Good, compact (≤16 chars for a 64-bit mask). | ||
| - Good, decodes with one stdlib call in every language (`int(x, 16)` / | ||
| `Convert.ToUInt64(x, 16)`); nibble boundaries are eyeball-able. | ||
| - Good, lowercase, no `0x` prefix, no leading zeros — unambiguous and stable. | ||
|
|
||
| #### N. Binary / bit-list — `feat=v1.1000010000010000000000000101` or `feat=v1.0,2,16,22,27` | ||
|
|
||
| - Good, most directly human-readable ("which bits"). | ||
| - Bad, longest form in the UA; the bit-list needs delimiter handling and grows | ||
| with the number of set bits. | ||
|
|
||
| #### O. Alphabet / base-N (e.g. Crockford base32 `feat=v1.442005`, base62 `feat=v1.9n2lf`) | ||
|
|
||
| - Good, shortest representation. | ||
| - Bad, needs a custom alphabet + decode table on both ends; base62 is | ||
| case-sensitive (fragile through case-normalizing intermediaries); not | ||
| eyeball-able. Premature optimization for a value that is already ≤16 chars in | ||
| hex. | ||
|
|
||
| ## Decision Outcome | ||
|
|
||
| Chosen: **a per-request, first-party-only User-Agent `(feat=...)` token (A), | ||
| with per-package granularity (F), per-language bit lists (H), hand-written enums | ||
| kept honest by a parity test (J), rendered as lowercase hex (M).** | ||
|
|
||
| This is the smallest design that answers the question. A 64-bit mask accumulates | ||
| from universal `mark_feature_used()` calls; the token is stamped per request only | ||
| on Azure/Foundry clients (live, no third-party leak); each SDK owns an | ||
| independent bit list selected by the language already in the UA; the mask is | ||
| rendered as hex (`feat=v1.8410005`). OTel (C) is deferred — mainly because a | ||
| broadly-emitted span attribute would leak the fingerprint into the user's general | ||
| telemetry, against the first-party-only stance — but left open behind the version | ||
| prefix. Per-construct granularity (G), a shared registry (I), codegen (K), and the | ||
| decimal/binary/base-N representations (L, N, O) are rejected as complexity or | ||
| length the problem does not require. | ||
|
|
||
| ### Consequences | ||
|
|
||
| - Good, adds usage signal at near-zero cost, no new data flow, few moving parts. | ||
| - Good, transparent (public registry, human-decodable token) and disabled by the | ||
| existing User-Agent opt-out. | ||
| - Good, first-party-only + per-request emission gives a live mask and no | ||
| third-party fingerprint leak. | ||
| - Good, 64-bit keeps .NET lock-free; per-language lists remove all cross-language | ||
| sync; hand-written enums avoid a codegen toolchain. | ||
| - Neutral, the token's reach equals first-party traffic; broader per-call signal | ||
| (OTel) can be added later if needed. | ||
| - Bad, each feature must add a `mark_feature_used()` call, and first-party clients | ||
| need a per-request hook (small, mirrors existing patterns). | ||
|
|
||
| ## Registry versioning and migration (v1 → v2) | ||
|
|
||
| The token carries a **per-language** version (`feat=v1.<hex>`); a version bump is | ||
| independent for Python and .NET. | ||
|
|
||
| - **Additive growth stays on v1 — no bump.** Allocating a new feature to a | ||
| reserved/unused bit is backward-compatible: an older decoder simply sees an | ||
| unknown high bit and ignores it. Normal package growth never needs a new | ||
| version. | ||
| - **A bump (v2) is required only for breaking changes:** renumbering or | ||
| re-partitioning existing bits, changing the *meaning* of an already-assigned | ||
| bit, or widening beyond 64-bit. Within a version a bit is **never** reused or | ||
| reassigned — that invariant is what lets old decoders stay correct. | ||
| - **Mixed-version coexistence is the norm.** A fleet runs many SDK releases at | ||
| once, so `v1` and `v2` tokens appear simultaneously for a long time (old SDKs | ||
| keep emitting `v1`). The decoder keeps **every** published `(language, | ||
| version)` table and selects by the token's version; the `v1` table is retained | ||
| indefinitely for historical decode. | ||
| - **Unknown version → do not guess.** A decoder without the `vN` table must | ||
| record "unknown registry version" rather than decode against an older table — | ||
| bit meanings may differ across versions, so mis-attribution is worse than | ||
| no data. | ||
| - **Producing v2:** publish the v2 table alongside v1 in the registry doc, bump | ||
| that SDK's `FeatureBit` enum + version constant; the SDK emits `v2` from the | ||
| release it ships in. Prefer staying on v1 (additive) and reserving a clean v2 | ||
| for an eventual deliberate re-partition. | ||
|
|
||
| ## Limitations | ||
|
|
||
| | Limitation | Caused by (choice) | Why we accepted it | | ||
| | --- | --- | --- | | ||
| | **No signal for self-hosted or third-party-only traffic.** If a process never calls Azure/Foundry, we see nothing. | First-party-only emission (A) | We can't read third-party logs anyway, and must not leak a fingerprint into them. Reach traded for privacy. | | ||
| | **No OTel / per-call signal in v1.** | OTel deferred (C) — primarily on **privacy** grounds | A broadly-emitted span attribute would push the fingerprint into the user's general telemetry / third-party APM vendors, undoing the first-party-only scoping. Left open to add later if there is a compelling reason to add. | | ||
| | **Mask reflects "usage so far," not the whole session.** Early requests carry fewer bits than later ones. | Process-global accumulator + per-request stamping | Honest and still useful; the team aggregates across requests. The per-request design is what makes it *grow* rather than freeze. | | ||
| | **No per-agent / per-call attribution.** The mask is one process-wide value — "this process used X", not "this agent/call used X". | Single global accumulator (simplicity) | Per-call attribution is what the deferred OTel span path would add; not needed for portfolio-level questions. | | ||
| | **Coarse granularity.** Can't distinguish sub-features (e.g. openai chat vs embeddings, which shell tool). | Per-package granularity (F) + 64-bit (keeps .NET lock-free) | Matches the actual questions; finer bits can be promoted later behind the version prefix. | | ||
| | **Fingerprinting risk is reduced, not eliminated.** A feature-combination mask is still a deployment signature, and it transits intermediaries (proxies/CDNs) even when first-party-scoped. | Emitting any feature-combination value | Scope + opt-out + coarse granularity mitigate it; residual risk is the subject of the privacy review below. | | ||
|
|
||
| ## Open Questions (for decider discussion) | ||
|
|
||
| These are unresolved and should be decided before/at approval: | ||
|
|
||
| 1. **Privacy / telemetry-acceptance review (blocking).** Is a coarse, | ||
| first-party-only, opt-out-able feature-combination mask acceptable telemetry? | ||
| Even scoped, it transits intermediaries and is a deployment fingerprint. This | ||
| is a **release precondition**. Possible outcomes that would change the design: | ||
| require a dedicated opt-out flag (Q2), coarser granularity, hashing, or | ||
| explicit opt-in. | ||
| 2. **Dedicated opt-out flag?** v1 reuses `AGENT_FRAMEWORK_USER_AGENT_DISABLED` | ||
| (mask dies with the whole UA). Do we add a mask-only flag now (keep base UA, | ||
| drop the fingerprint), or wait until asked / until the privacy review requires | ||
| it? | ||
| 3. **When (if ever) to add the OTel path?** Held back mainly for **privacy**: a | ||
| span attribute broadcasts the fingerprint into the user's general telemetry | ||
| and onward to third-party APM vendors, contradicting the first-party-only | ||
| stance. It also carries a metric-cardinality hazard. Would the privacy review | ||
| allow a broadly-emitted mask, a scoped/redacted variant, or none? Decide if/when | ||
| to revisit. | ||
|
Comment on lines
+367
to
+372
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OTel requires much more set up on the user side. I am not sure how we can make this option work, given that the data may not even be available to us even if the customer is using Foundry.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed, I also doubt if this can ever work, left this in for completeness. |
||
|
|
||
| ## More Information | ||
|
|
||
| - Mechanism & API: [SPEC-002](../specs/002-feature-usage-telemetry.md) | ||
| - Per-language bit tables, encoding, opt-out, governance: [feature-usage-bit-registry.md](../specs/feature-usage-bit-registry.md) | ||
| - Existing accumulator pattern: `python/packages/core/agent_framework/_telemetry.py` | ||
| - .NET emission policies: `dotnet/src/Microsoft.Agents.AI.Foundry/AgentFrameworkUserAgentPolicy.cs`, | ||
| `dotnet/src/Microsoft.Agents.AI.Foundry.Hosting/HostedAgentUserAgentPolicy.cs` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoints they hit are different, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, so the question for us is, if we need that level of detail or do we care about openai vs gemini usage, rather then openai completions vs openai responses, this is one of the things we need to figure out when implementing.