fix: resolve index out of range panic in fetchContainerInfo for Ascend devices#95
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cygnushan The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @cygnushan! It looks like this is your first PR to Project-HAMi/HAMi-WebUI 🎉 |
7a7afe4 to
5e78963
Compare
5e78963 to
d1cd2c1
Compare
|
Hi @archlitchi , |
|
@coderabbitai Please review thie PR! |
|
✅ Action performedReview finished.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughFixes an Ascend pod device decoding and merge path that could misalign container devices and trigger index-out-of-range panics. The Ascend decoder now preserves container positions, ChangesAscend device decode and merge
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
server/internal/provider/util/util_test.go (2)
356-357: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueTest mutates the package-global
SupportDevicesmap.Registering Ascend keys directly into the shared
SupportDevices(same asTest_DecodePodDevicesdoes for NVIDIA) leaks state across tests in this package and can cause order-dependent failures. Prefer at.Cleanupto delete the injected keys, or operate on a local copy.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/internal/provider/util/util_test.go` around lines 356 - 357, The test is mutating the shared package-global SupportDevices map by inserting Ascend entries, which can leak state into other tests. Update the relevant test in util_test.go to avoid permanent changes to SupportDevices by either using a local copy inside the test or registering a t.Cleanup that deletes the injected Ascend keys after the test finishes. Keep the fix near the test logic that adds the Ascend and Ascend310P entries so the state is restored consistently.
366-366: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUnused struct field
wantErrMsg. It is declared but never asserted in the test loop. Drop it or wire it into an error-message assertion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/internal/provider/util/util_test.go` at line 366, The test case struct in util_test.go has an unused wantErrMsg field that is never checked in the test loop. Either remove wantErrMsg from the table-driven test cases or update the test logic to assert the returned error message against it, using the existing test loop and error handling in the util_test.go test function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/internal/data/pod.go`:
- Around line 119-127: The loop in Pod device aggregation ignores the result of
copier.Copy, so a failed copy leaves bizPds empty and silently drops that device
set. Update the logic in the pod device conversion path (the loop over pdevices
in the Pod data handling code) to check the copier.Copy return value, and either
log the failure with enough context or return it so the caller can handle it;
keep the rest of the biz.PodSingleDevice iteration unchanged.
In `@server/internal/provider/util/util_test.go`:
- Around line 467-481: Update the DecodePodDevices test case for the Ascend310P
malformed-device annotation so it matches the current parser behavior: the
bad-format string should not be treated as an error. In the existing
table-driven test near the Ascend310P case, change the expectation to wantErr
false and assert that PodDevices contains an empty "Ascend310P" entry rather
than an empty result.
---
Nitpick comments:
In `@server/internal/provider/util/util_test.go`:
- Around line 356-357: The test is mutating the shared package-global
SupportDevices map by inserting Ascend entries, which can leak state into other
tests. Update the relevant test in util_test.go to avoid permanent changes to
SupportDevices by either using a local copy inside the test or registering a
t.Cleanup that deletes the injected Ascend keys after the test finishes. Keep
the fix near the test logic that adds the Ascend and Ascend310P entries so the
state is restored consistently.
- Line 366: The test case struct in util_test.go has an unused wantErrMsg field
that is never checked in the test loop. Either remove wantErrMsg from the
table-driven test cases or update the test logic to assert the returned error
message against it, using the existing test loop and error handling in the
util_test.go test function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 17674e22-330e-44c3-867a-0c3dd36263e9
📒 Files selected for processing (3)
server/internal/data/pod.goserver/internal/provider/util/util.goserver/internal/provider/util/util_test.go
…d devices - Pre-allocate bizContainerDevices based on actual container count to prevent out-of-bounds access when annotation device count differs from pod container count - Merge devices from all device types per container instead of overwriting, which previously caused device data loss for multi-device-type pods - Add container index boundary check in DecodePodDevices Ascend branch to align with NVIDIA/Hygon/Metax device handling Fixes Project-HAMi#94 Signed-off-by: handong <cygnushan@yunify.com>
d1cd2c1 to
9593759
Compare
Description
This PR fixes the
runtime error: index out of range [1] with length 1panic that occurs infetchContainerInfowhen scanning Pods using Ascend310P / AscendGPU devices.Previous PR #35 fixed the same symptom for NVIDIA/Hygon/Metax paths, but the Ascend branch in
DecodePodDeviceswas missed and still lacks the container-index boundary check.Root Cause
DecodePodDevices(Ascend path):strings.Split(str, OnePodMultiContainerSplitSymbol)may yield more segments than actual containers in the pod spec, causingPodSingleDevice(i.e.[]ContainerDevices) to be longer thanlen(pod.Spec.Containers).fetchContainerInfo: The old code blindly copied decoded devices into a slice and then indexed it with the container loop variable, triggering the panic when lengths mismatch.Changes
server/internal/data/pod.go: Pre-allocatebizContainerDevicesbased onlen(pod.Spec.Containers)and merge per-container devices with index guard (i < numContainers). Also fixes device data loss for multi-device-type pods by appending instead of overwriting.server/internal/provider/util/util.go: Addi >= len(pod.Spec.Containers) { break }guard and empty-segment handling (s == "") to theAscendGPUDevice / Ascend310PGPUDevicebranch, aligning it with NVIDIA/Hygon/Metax behavior.Related
Summary by CodeRabbit