refactor: extract generator modules and add shared library factory functions#52
Draft
refactor: extract generator modules and add shared library factory functions#52
Conversation
Move `OperatorExtractor`, `Operator`, `snake_to_pascal`, and `get_all_ops` out of `generate_wrappers.py` into a reusable module.
Also fix the generated output to be compatible with InfiniCore: - Use `__INFINI_C` instead of `__C`. - Use `InfiniopDescriptor *` as the descriptor base type. - Include `../operator_descriptor.h` instead of InfiniOps base headers. - Use PascalCase for C API names (e.g. `infiniopCreateGemmDescriptor`). - Set stream/workspace via `Handle` setters instead of passing as arguments. - Use `reinterpret_cast` for descriptor type conversions.
Without `inline`, this function defined in a header causes multiple definition errors when included from multiple translation units.
…enerator The generated `operator.cc` now guards device implementation includes (e.g. CUDA headers) with `ENABLE_*_API` preprocessor checks, matching InfiniCore's conditional compilation pattern.
InfiniCore uses non-standard casing for some operators (e.g. `RMSNorm` instead of `RmsNorm`). Add override mapping to preserve compatibility.
…adata in `operator()` The `operator()` overloads should only use data pointers from call-time `Tensor` arguments, not metadata like `dtype()`. This ensures correct behavior when the C API bridge passes data-only tensors. - `Gemm`: use `a_type_`/`b_type_`/`c_type_` instead of `a.dtype()`/etc. in both cuBLAS and cuBLASLt implementations - `CausalSoftmax`: use `dtype_` instead of `out.dtype()` - `RmsNorm`: add `dtype_` member to base class, use it instead of `out.dtype()` in CUDA and CPU implementations
Replace `constructors[-1]`/`calls[-1]` with configurable index overrides (`_CONSTRUCTOR_INDEX_OVERRIDES`, `_CALL_INDEX_OVERRIDES`) so each operator can specify which constructor and `operator()` overload to use for the generated C API. For example, `RmsNorm` uses constructor index `0` to include the `eps` parameter.
Add `_generate_shared_lib.py` that generates non-template `Make{Op}`
factory functions wrapping `Operator<Key>::make()`. These are compiled
into `libinfiniops.so`, allowing consumers to construct operators
without needing device-specific headers (e.g., CUDA).
Generated files:
- `generated/src/{op}/make.cc`: factory function definitions
- `generated/include/make.h`: combined header with declarations
Update the legacy C generator to call `Make{Op}` factory functions
from `libinfiniops.so` instead of `Operator<Key>::make()`. The
generated wrappers no longer include device-specific headers and
can be compiled with g++ (no CUDA needed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
generate_wrappers.pyinto separate modules:_operator_utils.py,_generate_pybind11.py,_generate_legacy_c.py, and_generate_shared_lib.py._generate_shared_lib.py) that produces non-templateMake{Op}()functions compiled intolibinfiniops.so, enabling InfiniCore to call InfiniOps operators without needing device-specific headers ornvcc.OperatorBasecasts instead ofOperator<T>::make()andOperator<T>*casts.dtype_,c_type_) instead of call-time tensor metadata inoperator()implementations.#includeforRuntimeUtils,inlineonIndexToOffset,#ifdefguards for device-specific includes, C API name overrides forRMSNorm/SwiGLU.