[Feat] Integrate ucm metrics into vLLM connector metrics#995
[Feat] Integrate ucm metrics into vLLM connector metrics#995dante159753 wants to merge 2 commits into
Conversation
| { | ||
| if (this->logger_) { return this->logger_; } | ||
| std::lock_guard<std::mutex> lg(this->mutex_); | ||
| if (this->logger_) { return this->logger_; } |
There was a problem hiding this comment.
💡 L4 Performance Consideration: The InternSourceString function uses a static mutex for protecting the string pool. In high-frequency logging scenarios (many log statements per millisecond), this mutex could become a contention bottleneck since every log call with a new source location needs to intern the string.
Consider alternatives:
- Use a concurrent hash map (e.g.,
folly::ConcurrentHashMapor similar) - Pre-intern common source locations at initialization
- Use lock-free string interning with atomic pointers
For typical UCM usage this may not be critical, but worth documenting if high-frequency logging is expected.
| if len(current_counts) < len(bucket_counts): | ||
| current_counts.extend([0] * (len(bucket_counts) - len(current_counts))) | ||
| for index, count in enumerate(bucket_counts): | ||
| current_counts[index] += int(count) |
There was a problem hiding this comment.
💡 Previous Concern Still Unaddressed: The int(count) conversion on very large float values could have precision issues. For example, if count is a float like 1e15, int(count) would work, but intermediate floating-point operations could introduce rounding errors.
Consider using int(round(count)) or validating that count is within reasonable bounds (e.g., abs(count) < 1e12) before conversion. This is especially important for histogram bucket counts which should always be integers.
| if isinstance(value, (tuple, list)) and len(value) == 2: | ||
| bucket_counts, sum_delta = value | ||
| return list(bucket_counts), float(sum_delta) | ||
| return list(getattr(value, "bucketCounts", [])), float( |
There was a problem hiding this comment.
💡 Previous Concern Still Unaddressed: The _histogram_tuple function returns empty list [] when the histogram value format is unrecognized (line 137-139). This silently drops histogram data without any warning.
Consider adding a warning log:
if not bucket_counts:
logger.warning(f"Unrecognized histogram format: {type(value)}, data dropped")This helps diagnose configuration issues or unexpected metric formats.
| auto cbStatus = stream.AppendCallback([eventReadyTp](bool) { | ||
| eventReadyTp->store(NowTime::Now(), std::memory_order_relaxed); | ||
| }); |
There was a problem hiding this comment.
memory_order_relaxed, which may not provide sufficient synchronization guarantees. The callback could be executed on a different thread (e.g., driver/stream callback thread) while the main thread reads the value. Consider using memory_order_release in the store and memory_order_acquire in the load:
eventReadyTp->store(NowTime::Now(), std::memory_order_release);
// ...
auto ready = eventReadyTp->load(std::memory_order_acquire);While relaxed is technically correct for a single atomic value, the timing semantics (splitting compute-wait vs D2H) rely on precise ordering which relaxed may not guarantee.
| } | ||
| const auto copiedShards = holder_.size() + 1; | ||
| s = stream.Synchronize(); | ||
| auto h2dSyncMs = (NowTime::Now() - tpH2dSubmitted) * 1e3; | ||
| UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_sync_ms"), h2dSyncMs); | ||
| if (copiedShards > 0 && h2dSyncMs > 0.0) { | ||
| auto copiedBytes = static_cast<double>(copiedShards) * static_cast<double>(shardBytes_); | ||
| UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_bandwidth_gbps"), | ||
| copiedBytes / (h2dSyncMs * 1e-3) / 1e9); | ||
| } |
There was a problem hiding this comment.
💡 L4 Observation: The bandwidth calculation copiedShards = holder_.size() + 1 assumes all tasks in holder_ are from the same batch and have identical shard sizes. However, holder_ accumulates tasks across multiple TransferOneTask calls before synchronization. If different tasks have different sizes, the total bytes calculation could be inaccurate.
Consider:
- Tracking actual bytes per task
- Adding a comment explaining that
shardBytes_is assumed constant for all tasks inholder_
Follow-up Review Summary (2026-06-13)This follow-up review focused on the most recent changes since the last review on 2026-06-05, particularly: Previous Concerns Status
New Issues Found
Inline comments have been posted for these issues. Overall AssessmentThe logger redesign addresses the self-reference and memory issues mentioned in the commit. The new bandwidth metrics provide useful visibility into H2D/D2H performance. The main concerns are around memory ordering in the callback-based timing and the assumptions in bandwidth calculations. |
ygwpz
left a comment
There was a problem hiding this comment.
New changes look good - comprehensive metrics integration with proper consumer architecture and updated Grafana dashboards. The metrics dispatcher pattern allows proper fanout to multiple consumers without clearing data.
0dc784e to
1dcdaf9
Compare
Summary
vllm:-prefixed Prometheus names and worker/engine labels.Why
The previous multiproc metrics path only covered the local multiproc exporter cleanly and did not fit vLLM's KV connector metrics snapshot path. This change lets UCM metrics be consumed through vLLM-native metric collection while preserving the existing multiproc path and making multi-worker/multi-engine deployments easier to inspect.
Impact
engineandworker_rankdimensions for disambiguation.ucm:multiproc metric output remains available through the existing configuration path.Verification
test using jenkins pipeline, no performance impact, metrics works fine.