Skip to content

Add ucm-toolkit for posix aio test&nic monitor and copy bandwidth test#1012

Open
harrisonyhq wants to merge 9 commits into
developfrom
dev-toolkit
Open

Add ucm-toolkit for posix aio test&nic monitor and copy bandwidth test#1012
harrisonyhq wants to merge 9 commits into
developfrom
dev-toolkit

Conversation

@harrisonyhq

Copy link
Copy Markdown
Contributor

Purpose

Add ucm-toolkit as a unified CLI entry for POSIX AIO testing, NIC monitoring, and copy bandwidth testing.

Modifications

  • Added the ucm-toolkit package and CLI entry point.
  • Integrated posix-aio, nic-monitor, and dev-sandbox tooling.
  • Added documentation and install support for toolkit usage.
  • Updated formatting and codespell configuration for toolkit sources.

Test

Comment thread toolkit/src/dev-sandbox/module/aio/aio_impl.cc
Comment thread toolkit/src/dev-sandbox/module/aio/aio_impl.cc
std::vector<const CopyBuffer*> srcBuffers(ctx.nDevice);
std::vector<const CopyBuffer*> dstBuffers(ctx.nDevice);
for (size_t device = 0; device < ctx.nDevice; device++) {
srcBuffers[device] = new HostCopyBuffer{device, ctx.size, ctx.num};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Memory allocation with new followed by instance.DoCopyBatch(...) without exception handling. If DoCopyBatch throws an exception, the HostCopyBuffer and DeviceCopyBuffer objects will not be deleted, causing memory leaks. Consider wrapping this in a try-catch block or using smart pointers (e.g., std::unique_ptr).

@ygwpz ygwpz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review: Two concerns addressed, but critical memory safety issue remains.

* Copyright (c) 2026 Mag1c.H
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the first two concerns. However, I notice that the critical memory safety issue (Comment #3) remains unaddressed.

The concern was about exception safety in toolkit/src/dev-sandbox/module/copy/ascend/copy_case_ascend.cc around lines 73-82:

for (size_t device = 0; device < ctx.nDevice; device++) {
    srcBuffers[device] = new HostCopyBuffer{device, ctx.size, ctx.num};
    dstBuffers[device] = new DeviceCopyBuffer{device, ctx.size, ctx.num};
}
H2DCECopyInstance instance{ctx.iter, false};
CopyResult result;
result.Push(instance.DoCopyBatch(srcBuffers, dstBuffers));  // ← If this throws
for (size_t device = 0; device < ctx.nDevice; device++) {
    delete srcBuffers[device];  // ← These cleanup lines will never execute
    delete dstBuffers[device];
}

If DoCopyBatch() throws an exception, the memory allocated with new will leak because the cleanup loop won't execute.

Suggested fix: Wrap the cleanup in a try-catch block or use smart pointers (e.g., std::unique_ptr):

std::vector<std::unique_ptr<HostCopyBuffer>> srcBuffers(ctx.nDevice);
std::vector<std::unique_ptr<DeviceCopyBuffer>> dstBuffers(ctx.nDevice);
for (size_t device = 0; device < ctx.nDevice; device++) {
    srcBuffers[device] = std::make_unique<HostCopyBuffer>(device, ctx.size, ctx.num);
    dstBuffers[device] = std::make_unique<DeviceCopyBuffer>(device, ctx.size, ctx.num);
}
H2DCECopyInstance instance{ctx.iter, false};
CopyResult result;
result.Push(instance.DoCopyBatch(srcBuffers, dstBuffers));
// Smart pointers handle cleanup automatically

Or use a try-catch:

try {
    result.Push(instance.DoCopyBatch(srcBuffers, dstBuffers));
} catch (...) {
    for (size_t device = 0; device < ctx.nDevice; device++) {
        delete srcBuffers[device];
        delete dstBuffers[device];
    }
    throw;
}

Please address this issue before the PR can be approved.

- Replace raw pointer new/delete with std::unique_ptr for RAII
- Extract raw pointers for DoCopyBatch interface compatibility
- Ensure automatic memory cleanup even on exception

Fixes review comment about potential memory leak if DoCopyBatch throws.
result.Push(instance.DoCopy(&srcBuffer, &dstBuffer));
}
result.Show("[[ " + Key() + " ]] " + Brief());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Using unique_ptr for memory management is good. Verify that the smart pointer ownership transfer is correct, especially in multi-threaded contexts.

result.Show("[[ " + Key() + " ]] " + Brief());
}

DEFINE_COPY_CASE(AllHost2AllDeviceCECase, "all_host_to_all_device_ce",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Using unique_ptr for memory management is good. Verify that the smart pointer ownership transfer is correct, especially in multi-threaded contexts.

result.Show("[[ " + Key() + " ]] " + Brief());
}

DEFINE_COPY_CASE(AllHost2AllDeviceCECase, "all_host_to_all_device_ce",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Using unique_ptr for memory management is good. Verify that the smart pointer ownership transfer is correct, especially in multi-threaded contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants