Schema version: 0.2 | Client ID: 616561816355034 | Protocol: GXT Event Protocol v1.6
TRT-LLM collects anonymous, session-level deployment telemetry to understand how the library is used in production (GPU types, parallelism configs, model architectures). No PII, model weights, prompts, outputs, model paths, tokenizer paths, or raw free-form configuration strings are collected.
Opt-out (any one of these disables telemetry):
TRTLLM_NO_USAGE_STATS=1TELEMETRY_DISABLED=trueDO_NOT_TRACK=1- Create file
~/.config/trtllm/do_not_track TelemetryConfig(disabled=True)in code
Auto-disabled in CI/test environments (detects CI, GITHUB_ACTIONS,
JENKINS_URL, GITLAB_CI, PYTEST_CURRENT_TEST, etc.). Override with
TRTLLM_USAGE_FORCE_ENABLED=1 for staging deployments.
Every payload is wrapped in a GXT v1.6 envelope. Dashboard builders will see these top-level fields in Kibana alongside the event parameters.
| Field | Type | Description |
|---|---|---|
clientId |
string | Always "616561816355034". Identifies TRT-LLM in the GXT system. |
clientType |
string | Always "Native". |
clientVer |
string | TRT-LLM version, e.g. "1.3.0rc9". |
eventProtocol |
string | Always "1.6". |
eventSchemaVer |
string | Schema version, currently "0.2". |
eventSysVer |
string | Always "trtllm-telemetry/1.0". |
sessionId |
string | Unique hex UUID per server lifetime. Use this to correlate initial report with heartbeats. |
sentTs |
string | ISO 8601 UTC timestamp of when the payload was sent. |
Privacy/identity fields (osVersion, geoInfo, deviceGUID, etc.) are
hardcoded to "undefined" — TRT-LLM is a server-side SDK with no browser or
login context.
Sent once at server startup. Contains system info and serving configuration.
| Field | Type | Description | Example |
|---|---|---|---|
trtllmVersion |
ShortString | TRT-LLM package version. | "1.3.0rc9" |
platform |
LongString | OS platform string. | "Linux-5.15.0-88-generic-x86_64" |
pythonVersion |
ShortString | Python version. | "3.12.3" |
cpuArchitecture |
ShortString | CPU architecture. | "x86_64", "aarch64" |
cpuCount |
PositiveInt | Number of logical CPUs (from os.cpu_count()). |
128 |
| Field | Type | Description | Example |
|---|---|---|---|
gpuCount |
PositiveInt | Number of GPUs visible to the process (torch.cuda.device_count()). Reflects CUDA_VISIBLE_DEVICES, not total system GPUs. |
8 |
gpuName |
LongString | Name of GPU 0. | "NVIDIA H100 80GB HBM3" |
gpuMemoryMB |
PositiveInt | Total memory of GPU 0 in MB. | 81559 |
cudaVersion |
ShortString | CUDA toolkit version. | "12.4" |
| Field | Type | Description | Example |
|---|---|---|---|
tensorParallelSize |
PositiveInt | Tensor parallelism degree. | 8 |
pipelineParallelSize |
PositiveInt | Pipeline parallelism degree. | 1 |
contextParallelSize |
PositiveInt | Context parallelism degree. | 1 |
moeExpertParallelSize |
PositiveInt | MoE expert parallelism. 0 = auto/unset (runtime decides). Positive value = explicitly configured. |
0, 8 |
moeTensorParallelSize |
PositiveInt | MoE tensor parallelism. 0 = auto/unset (runtime decides). Positive value = explicitly configured. |
0, 2 |
| Field | Type | Description | Example |
|---|---|---|---|
architectureClassName |
LongString | HuggingFace model architecture class. | "MixtralForCausalLM", "LlamaForCausalLM" |
backend |
ShortString | Execution backend. | "pytorch", "tensorrt" |
dtype |
ShortString | Model data type. | "float16", "bfloat16", "auto" |
quantizationAlgo |
ShortString | Quantization algorithm. Empty string if none. | "", "fp8", "w4a16_awq" |
kvCacheDtype |
ShortString | KV cache data type. Empty string if default. | "", "fp8", "auto" |
| Field | Type | Description | Example |
|---|---|---|---|
ingressPoint |
ShortString | How TRT-LLM was invoked. See Ingress point values. | "cli_serve" |
featuresJson |
string | Legacy JSON-serialized summary of feature flags. See featuresJson keys. | '{"lora":false,...}' |
llmApiConfigJson |
string | JSON-serialized sanitized, type-driven effective LLM API configuration. See LLM API config capture. | '{"tensor_parallel_size":2,...}' |
llmApiConfigMetaJson |
string | JSON-serialized metadata for LLM API configuration capture. | '{"capture_succeeded":true,...}' |
disaggRole |
ShortString | Disaggregated serving role. Empty if not disaggregated. | "", "context", "generation" |
deploymentId |
ShortString | Shared ID across disaggregated workers. Empty if not disaggregated. | "", "dep-abc123" |
Sent periodically (default: every 600s) to track session duration. Up to 1000 heartbeats per session.
| Field | Type | Description | Example |
|---|---|---|---|
seq |
PositiveInt | Zero-based heartbeat sequence number. | 0, 1, 42 |
| Type | JSON type | Constraints |
|---|---|---|
| ShortString | string | 0–128 characters |
| LongString | string | 0–256 characters |
| PositiveInt | integer | 0–4,294,967,295 |
The ingressPoint field identifies which TRT-LLM entry point started the session.
| Value | Meaning |
|---|---|
"cli_serve" |
Started via trtllm-serve CLI |
"cli_bench" |
Started via trtllm-bench CLI |
"cli_eval" |
Started via evaluation CLI |
"llm_class" |
Started via LLM() Python API directly |
"unknown" |
Entry point not identified |
The featuresJson field is a JSON-serialized dict. All keys are always present
with safe defaults. This list may evolve as features are added.
TODO: Deduplicate featuresJson with llmApiConfigJson after derived-only
flags such as LoRA/speculative decoding have explicit safe config fields.
| Key | Type | Default | Description |
|---|---|---|---|
lora |
bool | false |
LoRA adapter enabled (enable_lora=True or lora_config provided). |
speculative_decoding |
bool | false |
Speculative decoding enabled (speculative_config is not None). Covers MTP, EAGLE, Medusa, etc. |
prefix_caching |
bool | false |
KV cache block reuse / prefix caching enabled. |
cuda_graphs |
bool | false |
CUDA graphs enabled for reduced launch overhead. |
chunked_context |
bool | false |
Chunked prefill enabled (enable_chunked_prefill=True). |
data_parallel_size |
int | 1 |
Data parallel degree. 1 = no data parallelism. Derived from tp_size when attention DP is enabled. |
The llmApiConfigJson field is a JSON-serialized dict containing a type-driven
subset of the validated, effective LLM API configuration. Capture is
type-driven: a field is captured automatically when its type is categorical
(Literal/Enum/bool) or numeric (int/float), or a safe collection of
those. Free-form str/Any/Path/dict/Callable are not captured unless the
field carries an explicit allowlist (TelemetryField.categorical(...)). Any field
can opt out with telemetry=False.
Captured values must be safe primitives. Raw strings are excluded unless the
field is a Literal[...] or uses an explicit allowlist converter. Paths,
tokenizer locations, dicts, objects, callables, raw Any values, non-finite
floats (nan/inf), and unsafe or heterogeneous sequences are excluded.
Captured sequences are capped at a fixed length and any clipping is reported in
llmApiConfigMetaJson. Exclusion is fail-closed: the value is omitted instead
of being serialized, and llmApiConfigMetaJson reports whether any resolved field
was excluded as unsafe.
The table below is a non-exhaustive set of examples for readers building
dashboards. The exhaustive source of truth is
tensorrt_llm/usage/llm_args_golden_manifest.json (regenerated from
build_capture_manifest), after the safety sanitizer has excluded unsafe values.
Use llmApiConfigMetaJson digests and field counts to track the exact capture
manifest for a given release. The rendered documentation generates the
exhaustive field table at docs build time under Developer Guide > Telemetry.
| Key | Description |
|---|---|
tensor_parallel_size |
Tensor parallelism degree from the effective LLM args. |
pipeline_parallel_size |
Pipeline parallelism degree from the effective LLM args. |
context_parallel_size |
Context parallelism degree from the effective LLM args. |
moe_expert_parallel_size |
MoE expert parallelism degree (None/unset when runtime decides). |
moe_tensor_parallel_size |
MoE tensor parallelism degree (None/unset when runtime decides). |
moe_cluster_parallel_size |
MoE cluster parallelism degree (None/unset when runtime decides). |
backend |
Execution backend. Captured as the Literal["pytorch"] value on the PyTorch args, and through an explicit allowlist (pytorch, tensorrt, _autodeploy) on the base/TRT args. |
dtype |
Model dtype, captured through an explicit allowlist. |
load_format |
Weight load format, captured as a low-cardinality enum/string value. |
quant_config.quant_algo |
Quantization algorithm, captured as a closed QuantAlgo enum value (TRT args only). Empty/absent when unquantized. |
kv_cache_config.dtype |
KV cache dtype, captured through an explicit allowlist. |
kv_cache_config.enable_block_reuse |
Whether KV cache block reuse/prefix caching is enabled. |
cuda_graph_config.batch_sizes |
CUDA graph batch sizes when configured. |
scheduler_config.capacity_scheduler_policy |
Scheduler capacity policy. |
scheduler_config.enable_prefix_aware_scheduling |
Whether scheduler admission and token budgeting use KV prefix-reuse estimates. |
torch_compile_config.enable_inductor |
Whether Torch Inductor compilation is enabled. |
moe_config.backend |
MoE backend selection (AUTO, CUTLASS, TRTLLM, ...), an annotation-derived categorical. |
speculative_config.decoding_type |
Speculative decoding mode discriminator (e.g. User_Provided); other arms expose their own numeric/boolean knobs under speculative_config.*. |
sparse_attention_config.algorithm |
Sparse attention algorithm discriminator; arm-specific knobs appear under sparse_attention_config.*. |
reasoning_parser |
Reasoning parser selection, captured through an allowlist mirroring the ReasoningParserFactory registry. |
sampler_type |
Sampler selection, captured through an allowlist mirroring the SamplerType enum. |
llmApiConfigMetaJson describes the capture process itself. It includes
contract/version fields, schema and manifest digests, source args class, field
counts (capturable_field_count, captured_field_count, excluded_field_count), capture
success, unsafe-exclusion status, a sequence_truncated flag set when any captured
sequence was clipped to the length cap, and a payload_truncated flag set when the
total serialized config exceeded the size budget and fields were dropped. The metadata
is intended to make dashboards robust when the safe capture manifest changes
over time.
| Variable | Default | Description |
|---|---|---|
TRTLLM_NO_USAGE_STATS |
unset | Set to 1 to disable telemetry. |
TELEMETRY_DISABLED |
unset | Set to true to disable telemetry. |
DO_NOT_TRACK |
unset | Set to 1 to disable telemetry. |
TRTLLM_USAGE_STATS_SERVER |
https://events.gfe.nvidia.com/v1.1/events/json |
Override the GXT endpoint URL. Use for staging. |
TRTLLM_USAGE_HEARTBEAT_INTERVAL |
600 |
Heartbeat interval in seconds. |
TRTLLM_USAGE_FORCE_ENABLED |
0 |
Set to 1 to force-enable telemetry in CI/test environments. |
TRTLLM_DISAGG_ROLE |
unset | Disaggregated serving role (context or generation). |
TRTLLM_DISAGG_DEPLOYMENT_ID |
unset | Shared deployment ID across disaggregated workers. |
Checklist for adding a telemetry field:
tensorrt_llm/usage/schema.py— Add field toTrtllmInitialReport(orTrtllmHeartbeat) Pydantic model with alias.tensorrt_llm/usage/schemas/trtllm_usage_event_schema.json— Add topropertiesandrequiredarray.tensorrt_llm/usage/usage_lib.py— Populate the field in_background_reporter()and add extraction logic in_extract_trtllm_config()or_collect_gpu_info()as appropriate.tests/unittest/usage/test_schema.py— Update test fixtures and expected field sets.tests/unittest/usage/test_collectors.py— Add extraction test.tests/unittest/usage/test_e2e_capture.py— Update e2e payload assertions if needed.- SMS schema upload — Upload the updated JSON schema to the NvTelemetry Schema Management Service and toggle "on stage" / "on prod".
- Update this README — Add the field to the appropriate table above.
Checklist for adding an LLM API config capture field inside llmApiConfigJson:
- Add the field with its natural type. If it is categorical
(
Literal/Enum/bool) or numeric (int/float) — or a safe collection of those — it is captured automatically; no marker is needed. - Bounded bare-string fields opt in via an allowlist. If a free-form
str/Anyfield should be captured, mark ittelemetry=TelemetryField.categorical(<allowed_values>), mirroring its real recognized domain. Prefer tightening the type (e.g.str->Literal) over an allowlist when the API contract allows it; the allowlist is the fallback when the annotation cannot be narrowed without a breaking validation change. - Type-safe but sensitive? Opt out with
telemetry=False. This honored exclusion sentinel keeps a categorical/numeric field out of capture. - Do not capture unsafe data. No model/tokenizer/file paths, prompts,
outputs, secrets/tokens/URLs/hostnames, free-form user strings, raw
dict/object payloads, or callables. The sanitizer fails closed regardless:
bare
str,Any,object,Path,dict, callables, permissive unions, and non-finite floats are dropped unless an approvedallowlistconverter applies. tests/unittest/usage/test_llmapi_config_capture.py— Add behavior coverage: assert the value is captured, and for a categorical bare-string field assert that an out-of-allowlist value is redacted (dropped) while an in-allowlist value is captured.- Regenerate the manifest golden from
build_capture_manifest:python -c "import json; from tensorrt_llm.usage.llmapi_config import golden_manifest; open('tensorrt_llm/usage/llm_args_golden_manifest.json','w').write(json.dumps(golden_manifest(), indent=2, sort_keys=True)+'\n')"Review the golden diff — it is the privacy review. A newly captured field requires sign-off from the GitHub telemetry/privacy CODEOWNER (.github/CODEOWNERS). docs/source/developer-guide/telemetry.mdis generated from the committed golden at docs-build time; do not hand-edit it.- Update this README — Add a common-key row above when the field is important enough for dashboard users to know by name.
Dashboard note: payloads carry capture_version and field_policy_version in
llmApiConfigMetaJson. During release adoption, v1 (opt-in) and v2 (type-driven)
payloads coexist in the same index — bucket by these before aggregating
captured_field_count or any llmApiConfigJson.<field>.
- Use camelCase aliases for JSON wire format (Pydantic
alias=). - Use snake_case for Python field names.
- String fields: use
ShortString(128 chars) orLongString(256 chars). - Integer fields: use
PositiveInt(0–4B). Use0for "auto/unset" semantics. - All fields must be required in the JSON schema (no optional fields).
- Empty string
""is the sentinel for "not applicable" string fields. - The telemetry code is fail-silent in two layers. The LLM API config
collector catches only the expected sanitizer/walk error family
(
AttributeError,TypeError,ValueError,KeyError) and emits an empty config pluscapture_succeeded=false; unexpected exceptions are left to propagate so genuine collector bugs are not masked. They are then caught by the outer daemon-thread reporter guard inusage_lib.py, which keeps the reporting thread from ever taking down the host process. - No PII. No model weights. No prompts. No outputs. No model/tokenizer paths.