Summary
InfiniLM on Ascend NPU throws during forward/compile when device tensors are non-contiguous, because TensorImpl::copy_from / contiguous() require an external InfiniOps rearrange wrapper that is not implemented today.
Downstream symptom: RuntimeError: RankWorker stopped during run (RankWorker sets should_exit_ after the real C++ exception).
Priority: P0 — inference worker exits permanently; inference_server restart required.
Not in scope: KV block recycle cliff (#186) and lm_eval slowdown (#89) are separate InfiniLM / capacity issues.
Primary errors (InfiniLM explicitly names InfiniOps)
From InfiniLM/csrc/core/src/infinicore/tensor/copy.cc:
| # |
Message |
| A |
Device-side non-contiguous copy requires an external InfiniOps rearrange wrapper |
| B |
Device-side contiguous() for non-contiguous tensors requires an external InfiniOps rearrange wrapper |
| C |
Device-side non-contiguous H2D copy requires an external InfiniOps rearrange wrapper |
Same-device D2D branch (non-contiguous → throw):
} else if (this->is_contiguous() && src->is_contiguous()) {
context::memcpyD2D(...);
} else {
throw std::runtime_error("Device-side non-contiguous copy requires an external InfiniOps rearrange wrapper");
}
CPU path uses rearrange_cpu(); Ascend non-contiguous path has no fallback.
Log evidence (2026-05-21, production)
- Container:
infinilm-ascend-run, log: /tmp/infinilm-server.log
- Device:
ASCEND:0, flags: --enable-paged-attn --enable-graph, model 9g_8b_thinking_llama
- ≥6 hits between 03:28–05:35 UTC, e.g.:
exception during forward: Device-side non-contiguous copy requires an external InfiniOps rearrange wrapper
→ Error in step loop: RankWorker stopped during run
Also seen:
exception during compile (same message, graph compile)
Device-side contiguous() for non-contiguous tensors requires an external InfiniOps rearrange wrapper
- Related:
RotaryEmbedding: InfiniOps adapter requires contiguous Q/K tensors
Reproduction
- Ascend + InfiniLM + InfiniOps,
inference_server.py --device=ascend --enable-paged-attn --enable-graph
- Sustained
POST /v1/chat/completions (variable batch/seq len), or
- Minimal: non-contiguous Ascend tensor →
copy_from() or contiguous() → immediate throw
Expected vs actual
| Item |
Expected |
Actual |
| Non-contiguous D2D copy |
InfiniOps rearrange / general copy |
throw, worker exit |
Device contiguous() |
allocate + rearrange on device |
throw, worker exit |
| Service |
continuous forward |
process restart |
Requested deliverables (P0)
- Ascend device-side general rearrange / copy — arbitrary shape/strides → contiguous (BF16/FP16 minimum)
- Integration hook for InfiniLM
copy.cc throw sites (lines ~46, ~72, ~83)
- Tests: non-contiguous→contiguous on Ascend; compatible with graph capture (
exception during compile also failed)
Acceptance: No上述 throw on Ascend serving path; >500 forward steps without RankWorker stopped.
References (InfiniLM side)
InfiniLM/csrc/core/src/infinicore/tensor/copy.cc — throw sites
InfiniLM/csrc/core/utils/rearrange.h — CPU reference
InfiniLM/csrc/engine/rank_worker.cpp — RankWorker stopped during run
- Full write-up: InfiniLM deploy repo
rca/issue-infinops-rearrange-en.md (can attach log excerpt 03:28–05:35 UTC)
Environment
| Item |
Value |
| NPU |
Huawei Ascend davinci2 |
| CANN |
8.5.1 (from deploy logs) |
| InfiniLM |
Paged KV + graph, OpenAI API server |
| Downstream |
lm_eval @ http://<host>:8000/v1 |
infinilm-server.log
Summary
InfiniLM on Ascend NPU throws during forward/compile when device tensors are non-contiguous, because
TensorImpl::copy_from/contiguous()require an external InfiniOps rearrange wrapper that is not implemented today.Downstream symptom:
RuntimeError: RankWorker stopped during run(RankWorker setsshould_exit_after the real C++ exception).Priority: P0 — inference worker exits permanently;
inference_serverrestart required.Not in scope: KV block recycle cliff (
#186) and lm_eval slowdown (#89) are separate InfiniLM / capacity issues.Primary errors (InfiniLM explicitly names InfiniOps)
From
InfiniLM/csrc/core/src/infinicore/tensor/copy.cc:Device-side non-contiguous copy requires an external InfiniOps rearrange wrapperDevice-side contiguous() for non-contiguous tensors requires an external InfiniOps rearrange wrapperDevice-side non-contiguous H2D copy requires an external InfiniOps rearrange wrapperSame-device D2D branch (non-contiguous → throw):
CPU path uses
rearrange_cpu(); Ascend non-contiguous path has no fallback.Log evidence (2026-05-21, production)
infinilm-ascend-run, log:/tmp/infinilm-server.logASCEND:0, flags:--enable-paged-attn --enable-graph, model9g_8b_thinking_llamaAlso seen:
exception during compile(same message, graph compile)Device-side contiguous() for non-contiguous tensors requires an external InfiniOps rearrange wrapperRotaryEmbedding: InfiniOps adapter requires contiguous Q/K tensorsReproduction
inference_server.py --device=ascend --enable-paged-attn --enable-graphPOST /v1/chat/completions(variable batch/seq len), orcopy_from()orcontiguous()→ immediate throwExpected vs actual
contiguous()Requested deliverables (P0)
copy.ccthrow sites (lines ~46, ~72, ~83)exception during compilealso failed)Acceptance: No上述 throw on Ascend serving path; >500 forward steps without
RankWorker stopped.References (InfiniLM side)
InfiniLM/csrc/core/src/infinicore/tensor/copy.cc— throw sitesInfiniLM/csrc/core/utils/rearrange.h— CPU referenceInfiniLM/csrc/engine/rank_worker.cpp—RankWorker stopped during runrca/issue-infinops-rearrange-en.md(can attach log excerpt 03:28–05:35 UTC)Environment
davinci2http://<host>:8000/v1infinilm-server.log