Skip to content

Commit ebffba0

Browse files
authored
Merge branch 'main' into dev-mixin-cleanups
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2 parents 1247439 + 4f46653 commit ebffba0

997 files changed

Lines changed: 21869 additions & 27415 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/CODEOWNERS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ docs/source/performance/perf-benchmarking.md @NVIDIA/trtllm-bench-reviewers
240240
/cpp/tensorrt_llm/batch_manager/allocateKvCache.cpp @NVIDIA/trt-llm-kv-cache-manager-devs
241241
/cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp @NVIDIA/trt-llm-kv-cache-manager-devs
242242
/cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp @NVIDIA/trt-llm-kv-cache-manager-devs
243+
/tensorrt_llm/_torch/pyexecutor/kv_cache_manager_v2.py @NVIDIA/trt-llm-kv-cache-manager-devs
243244
/tensorrt_llm/_torch/pyexecutor/resource_manager.py @NVIDIA/trt-llm-kv-cache-manager-devs
244245
/cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h @NVIDIA/trt-llm-kv-cache-manager-devs
245246
/cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp @NVIDIA/trt-llm-kv-cache-manager-devs
@@ -297,3 +298,8 @@ docs/source/performance/perf-benchmarking.md @NVIDIA/trtllm-bench-reviewers
297298
# of the NVIDIA/trt-llm-release-branch-approval team, regardless of who else approves the PR.
298299
# Without approval from a member of this team, PRs cannot be merged to release branches.
299300
# * @NVIDIA/trt-llm-release-branch-approval
301+
302+
### Telemetry / privacy review
303+
# Golden manifest is the privacy-review artifact; route it and the usage package to the privacy owner.
304+
/tensorrt_llm/usage/llm_args_golden_manifest.json @NVIDIA/trt-llm-oss-compliance
305+
/tensorrt_llm/usage/ @NVIDIA/trt-llm-oss-compliance

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -298,9 +298,10 @@ Deprecation is used to inform developers that some APIs and tools are no longer
298298
TensorRT-LLM collects anonymous telemetry data by default. This data is used
299299
in aggregate to understand usage patterns and prioritize engineering efforts.
300300
**This data cannot be traced back to any individual user.** No prompts,
301-
user-identifying information, or persistent identifiers are collected. Any
302-
deployment identifiers are ephemeral, randomly generated per deployment, and
303-
not linked to users. The data we collect includes:
301+
outputs, model weights, model paths, tokenizer paths, user-identifying
302+
information, raw free-form configuration strings, or persistent identifiers are
303+
collected. Any deployment identifiers are ephemeral, randomly generated per
304+
deployment, and not linked to users. The data we collect includes:
304305

305306
- Ingress point (e.g., LLM API, CLI, serve command)
306307
- Deployment duration (via periodic heartbeats)
@@ -309,8 +310,10 @@ not linked to users. The data we collect includes:
309310
- Parallelism configuration (TP/PP/CP/MoE-EP/MoE-TP sizes), quantization algorithm, dtype, KV cache dtype
310311
- System information (OS platform, Python version, CPU architecture, CPU count)
311312
- TRT-LLM version and backend
312-
- Feature flags (LoRA, speculative decoding, prefix caching, CUDA graphs, chunked context, data parallelism)
313+
- Feature summary flags (LoRA, speculative decoding, prefix caching, CUDA graphs, chunked context, data parallelism)
313314
- Disaggregated serving metadata (role and deployment ID)
315+
- Selected LLM API configuration values: parallelism, dtype, KV cache, scheduler, CUDA graph, and compile settings
316+
- Capture diagnostics for that payload: a schema checksum (for provenance), the count of captured fields, and whether any free-form value was skipped
314317

315318
Telemetry is automatically disabled in CI and test environments.
316319

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#include <torch/custom_class.h>
3636
#include <torch/python.h>
3737
#include <type_traits>
38+
#include <unordered_map>
3839
#include <unordered_set>
3940
#include <vector>
4041

@@ -287,6 +288,12 @@ class CacheTransceiver : public BaseCacheTransceiver
287288
// Dedup sets so observe-only timeout WARN logs fire at most once per stuck request.
288289
std::unordered_set<LlmRequest::RequestIdType> mTimedOutSenderIds;
289290
std::unordered_set<LlmRequest::RequestIdType> mTimedOutRequesterIds;
291+
std::unordered_set<LlmRequest::RequestIdType> mCompletedSenderRequestIds;
292+
std::unordered_set<LlmRequest::RequestIdType> mFailedSenderRequestIds;
293+
std::unordered_map<LlmRequest::RequestIdType, std::shared_ptr<LlmRequest>> mSenderRequestsAwaitingConsensus;
294+
std::unordered_set<LlmRequest::RequestIdType> mCompletedRequesterRequestIds;
295+
std::unordered_set<LlmRequest::RequestIdType> mFailedRequesterRequestIds;
296+
std::unordered_map<LlmRequest::RequestIdType, std::shared_ptr<LlmRequest>> mRequesterRequestsAwaitingConsensus;
290297
mpi::MpiComm const* mMpiWorldComm{nullptr};
291298

292299
std::shared_ptr<CacheTransceiverComm> mGroupComm;

0 commit comments

Comments
 (0)