Skip to content

[TENT] Add cross-platform dynamic library adapter with GPU vendor abstraction#2090

Open
alogfans wants to merge 7 commits into
kvcache-ai:mainfrom
alogfans:tent-cross-platform-clean
Open

[TENT] Add cross-platform dynamic library adapter with GPU vendor abstraction#2090
alogfans wants to merge 7 commits into
kvcache-ai:mainfrom
alogfans:tent-cross-platform-clean

Conversation

@alogfans
Copy link
Copy Markdown
Collaborator

Description

This PR introduces cross-platform support to TENT (Transfer Engine) through a comprehensive GPU vendor abstraction layer and dynamic library loading system. The changes enable TENT to work seamlessly across multiple GPU vendors and hardware platforms.

Key Changes

  1. GPU Vendor Abstraction Layer
  • Added unified GPU_* macros that map to vendor-specific APIs (CUDA, MUSA, HIP, MACA, Ascend)
  • Supports NVIDIA CUDA, Moore Threads MUSA, AMD HIP, Iluvatar MACA, Huawei Ascend, and CPU fallback
  • Provides compile-time vendor selection through CMake flags
  • Maintains API compatibility while enabling cross-platform compilation
  1. Transport Selector
  • Implemented configuration-driven transport selection policy
  • Supports pattern-based rules for device and segment type matching
  • Enables priority-based transport selection with fallback support
  • Backward compatible with existing buffer-based transport ordering
  1. Dynamic Library Loading
  • Added transport_loader for runtime plugin loading
  • Enables dynamic loading of transport implementations
  • Supports modular transport architecture
  • Reduces compilation dependencies and enables flexible deployment
  1. Platform Implementation
  • Refactored platform backend for better multi-vendor support
  • Updated CMake build system for conditional compilation
  • Added comprehensive documentation for GPU abstraction layer

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Other

How Has This Been Tested?

TBD

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

alogfans and others added 3 commits May 12, 2026 13:04
- Add GPU vendor abstraction layer supporting NVIDIA CUDA, Moore Threads MUSA,
  AMD HIP, Iluvatar MACA, Huawei Ascend, and CPU fallback
- Add plugin-based build mode with TENT_BUILD_PLUGIN_MODE option
- Implement platform backend plugins for dynamic loading
- Add transport loader for runtime plugin discovery and loading
- Maintain full compatibility with existing static build mode
- Add comprehensive documentation for cross-platform architecture

This implementation focuses on core cross-platform functionality without
including unrelated features like QoS scheduling or quota management.
Reference revise-paper-branch to fix platform abstraction and compilation issues:

- Update platform.h to add IPlatformBackend interface and complete MemoryType enum
- Replace platform.cpp with revised implementation supporting dynamic backend loading
- Update topology.h to add bw_gbps field to NicEntry structure
- Fix cpu.h constructor to properly initialize Platform base class
- Update transfer_engine_impl.cpp to use MTYPE_HIP instead of MTYPE_ROCM
- Add transport_selector implementation for cross-platform transport selection
- Add gpu_vendor.h for vendor-specific GPU abstraction
- Update CMakeLists.txt to remove transport_selector from runtime_impl

The changes enable successful compilation by aligning the platform
abstraction layer with the revised architecture that supports multiple
GPU/NPU vendors through dynamic backend plugins.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GPU Vendor Abstraction Layer and refactors the platform and transport systems to support dynamic plugin loading across multiple GPU vendors, including NVIDIA, AMD, Huawei, and others. Key additions include a configuration-driven transport selection policy and a thread-local location cache for performance optimization. Review feedback identifies a critical bug where a dynamic library is unloaded while a shared pointer still references its objects, potentially causing a crash. Additional recommendations focus on improving code quality through the use of static constexpr for constants, removing unnecessary const qualifiers on output parameters, replacing manual memory management with std::vector, and improving the robustness of JSON parsing.

Comment thread mooncake-transfer-engine/tent/src/runtime/platform.cpp
#include <acl/acl_rt.h>
#include <string>

const static std::string GPU_PREFIX = "ascend:";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defining a const static std::string in a header file included by multiple translation units can lead to multiple copies of the object and potential static initialization order issues. Since this is a constant string literal, it is more efficient and safer to use static constexpr const char*.

Suggested change
const static std::string GPU_PREFIX = "ascend:";
static constexpr const char* GPU_PREFIX = "ascend:";


// Ascend doesn't have cudaPointerGetAttributes, use a stub
inline int cudaPointerGetAttributes_ascend(
const struct cudaPointerAttributes* attr, const void* ptr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The first parameter of cudaPointerGetAttributes should not be const as it is an output parameter intended to be modified by the function. Removing the const qualifier eliminates the need for the const_cast inside the implementation.

Suggested change
const struct cudaPointerAttributes* attr, const void* ptr) {
struct cudaPointerAttributes* attr, const void* ptr) {

#include <string>

#define CPU_ONLY_MODE
const static std::string GPU_PREFIX = "cpu:";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using const static std::string in a header can cause static initialization order issues. It is better to use static constexpr const char* for string constants.

Suggested change
const static std::string GPU_PREFIX = "cpu:";
static constexpr const char* GPU_PREFIX = "cpu:";

// Note: CUDA API is cudaPointerGetAttributes(const cudaPointerAttributes* attr,
// const void* ptr)
inline int cudaPointerGetAttributes_impl(
const struct cudaPointerAttributes* attr, const void* ptr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The attr parameter should not be const since it is modified by the function. Removing const avoids the const_cast on line 101.

Suggested change
const struct cudaPointerAttributes* attr, const void* ptr) {
struct cudaPointerAttributes* attr, const void* ptr) {

Comment on lines +345 to +347
(uintptr_t(start) - aligned_start + len + kPageSize - 1) / kPageSize;
void** pages = (void**)malloc(sizeof(void*) * n);
int* status = (int*)malloc(sizeof(int) * n);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If len is 0, n might still be calculated as 1 depending on the alignment of start, leading to unnecessary processing. Additionally, using manual malloc and free is error-prone and not exception-safe. It is recommended to use std::vector for automatic memory management and add a guard for len == 0.

    if (len == 0) return {};
    int n =
        (uintptr_t(start) - aligned_start + len + kPageSize - 1) / kPageSize;
    std::vector<void*> pages(n);
    std::vector<int> status(n);


// Capitalize first letter of base_name for symbol name
std::string base_cap = base_name;
if (!base_cap.empty()) base_cap[0] = toupper(base_cap[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of toupper requires the <cctype> header, which is not explicitly included in this file. While it might be included transitively, it is better to include it directly to ensure portability.


// Parse size filters (optional)
if (policy_json.contains("min_size")) {
policy.min_size = policy_json["min_size"].get<uint64_t>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling .get<uint64_t>() on a JSON object without verifying its type can throw a json::type_error exception if the configuration contains an unexpected type (e.g., a string or boolean). It is safer to use is_number_unsigned() or the .value() method with a default.

            policy.min_size = policy_json["min_size"].is_number_unsigned() ? 
                std::optional<uint64_t>(policy_json["min_size"].get<uint64_t>()) : std::nullopt;

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants