fix: export operator call instantiations#623
Open
voltjia wants to merge 2 commits into
Open
Conversation
a02cb31 to
ca40c6f
Compare
ca40c6f to
e870e3e
Compare
Collaborator
Author
|
请 @crapromer 初审,@Ziminli 终审。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
infini::ops::functionallayer added by feat: add public C++ operator API #618.Operator<Op>::Calltemplate instantiations intolibinfiniops.soand generatedextern templatedeclarations for C++ consumers.#include <infini/ops.h>as the public C++ entrypoint for existing operator classes and adds an external C++ smoke test forinfini::ops::Add::Call.Motivation
Closes #593
Downstream C++ consumers should not need backend kernel headers or vendor compilers just to call existing
Operator<Op>::CallAPIs. Explicit template instantiation keeps the existing operator class API and moves backend-dependent instantiation intolibinfiniops.so, avoiding the extrafunctionalwrapper layer.Rebase Status
master:42738491b4306f247c59eba24aaffc1001885cdb(ci: stabilize iluvatar runner and test images (#625)).e870e3e74b196226ec80cd8ff9f843f838a70f39.Type of Change
feat: this does not add a new feature/operator/platform.fix— bug fix.perf: no performance-path change.refactor: not a behavior-neutral restructuring only.test: includes tests but is not test-only.build/ci— build system/codegen/linkage configuration.docs: not documentation-only.Operator<Op>::Callsurface rather than introducing an ABI break.Platforms Affected
WITH_CPU)WITH_NVIDIA)WITH_ILUVATAR)WITH_METAX)WITH_CAMBRICON)WITH_MOORE)WITH_ASCEND)WITH_TORCH)Full Platform Test Results
All accelerator runs used card 6. Commands installed the package first, then ran bare
pytestwith no test-path or device arguments.pytestresult9207 passed, 8665 skipped, 81 warnings in 342.43sssh nvidia, Docker--gpus "device=6", in-containerCUDA_VISIBLE_DEVICES=08699 passed, 7655 skipped, 81 warnings in 399.34sssh metax,CUDA_VISIBLE_DEVICES=67705 passed, 8649 skipped, 81 warnings in 582.92sssh iluvatar,CUDA_VISIBLE_DEVICES=68472 passed, 7900 skipped, 99 warnings in 618.44sssh moore,MUSA_VISIBLE_DEVICES=6; requiredLD_PRELOAD=/usr/local/musa-4.3.1/lib/libomp.sobecause/usr/local/musa/lib/libomp.sodoes not export__kmpc_for_static_fini5900 passed, 10070 skipped, 172 warnings in 978.34sssh cambricon,MLU_VISIBLE_DEVICES=6, command exit status07398 passed, 8914 skipped, 71 warnings in 632.45sssh ascend,ASCEND_RT_VISIBLE_DEVICES=6; pytest reached a passing summary, but the outer Docker command wrote exit status137after completion, so this is recorded as an environment/teardown anomaly rather than a clean command passValidation commands
Benchmark / Performance Impact
N/A — this PR changes build/codegen/linkage for C++ operator calls, not operator kernels or performance paths.
Notes for Reviewers
<infini/ops.h>to get the generatedextern templatedeclarations before callinginfini::ops::<Op>::Call.Callsignatures.Operator::Callnow takesconst Args&...to make the instantiation signature stable across lvalue/rvalue call sites.137after pytest completed. The test result is included for review transparency instead of being marked as a clean command pass.Checklist
Title, Branch, and Commits
<type>/xxx-yyyy-zzzzwhere<type>matches the PR title's Conventional Commits type.master.fixup!/squash!/wipcommits remain.Scope and Design
functionalplus exportingCallinstantiations.General Code Hygiene
C++ Specific
clang-format --dry-run --Werror include/infini/ops.h src/operator.hpassed onssh nvidiain the PR creation pass.clang-tidywas not run in this pass; this PR does not add new kernel logic.clang-format.new/deletewas introduced.Python Specific
ruff check scripts/generate_wrappers.py tests/test_cpp_api.pypassed in the PR creation pass.ruff format --check scripts/generate_wrappers.py tests/test_cpp_api.pypassed in the PR creation pass.pytest.skipconventions are preserved.Testing
pytestwas run on all six supported accelerator platforms: NVIDIA, MetaX, Iluvatar, Moore, Cambricon, and Ascend.137teardown anomaly.tests/.pytest.mark.auto_act_and_assertis not applicable to this external compile/link smoke test.Add::Callsmoke fails on the previous behavior with an empty backend dispatch and passes with this PR.Build, CI, and Tooling
pip install/wheel build was run as part of every full-platform pytest command above.compile_commands.jsonregeneration was not separately checked.clang-format.ymlandruff.ymlchecks passed for touched files in the PR creation pass.Documentation
<infini/ops.h>as the entrypoint.Security and Safety