Skip to content

refactor: extract generator modules and add shared library factory functions#52

Draft
voltjia wants to merge 15 commits intomasterfrom
refactor/shared-lib
Draft

refactor: extract generator modules and add shared library factory functions#52
voltjia wants to merge 15 commits intomasterfrom
refactor/shared-lib

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented Apr 13, 2026

Summary

  • Extract generate_wrappers.py into separate modules: _operator_utils.py, _generate_pybind11.py, _generate_legacy_c.py, and _generate_shared_lib.py.
  • Add factory function generator (_generate_shared_lib.py) that produces non-template Make{Op}() functions compiled into libinfiniops.so, enabling InfiniCore to call InfiniOps operators without needing device-specific headers or nvcc.
  • Update the legacy C generator to use factory functions and OperatorBase casts instead of Operator<T>::make() and Operator<T>* casts.
  • Use stored member variables (e.g. dtype_, c_type_) instead of call-time tensor metadata in operator() implementations.
  • Fix miscellaneous issues: missing #include for RuntimeUtils, inline on IndexToOffset, #ifdef guards for device-specific includes, C API name overrides for RMSNorm/SwiGLU.

voltjia added 15 commits April 9, 2026 03:27
Move `OperatorExtractor`, `Operator`, `snake_to_pascal`, and `get_all_ops`
out of `generate_wrappers.py` into a reusable module.
Also fix the generated output to be compatible with InfiniCore:
- Use `__INFINI_C` instead of `__C`.
- Use `InfiniopDescriptor *` as the descriptor base type.
- Include `../operator_descriptor.h` instead of InfiniOps base headers.
- Use PascalCase for C API names (e.g. `infiniopCreateGemmDescriptor`).
- Set stream/workspace via `Handle` setters instead of passing as arguments.
- Use `reinterpret_cast` for descriptor type conversions.
Without `inline`, this function defined in a header causes multiple
definition errors when included from multiple translation units.
…enerator

The generated `operator.cc` now guards device implementation includes
(e.g. CUDA headers) with `ENABLE_*_API` preprocessor checks, matching
InfiniCore's conditional compilation pattern.
InfiniCore uses non-standard casing for some operators (e.g. `RMSNorm`
instead of `RmsNorm`). Add override mapping to preserve compatibility.
…adata in `operator()`

The `operator()` overloads should only use data pointers from call-time
`Tensor` arguments, not metadata like `dtype()`. This ensures correct
behavior when the C API bridge passes data-only tensors.

- `Gemm`: use `a_type_`/`b_type_`/`c_type_` instead of `a.dtype()`/etc.
  in both cuBLAS and cuBLASLt implementations
- `CausalSoftmax`: use `dtype_` instead of `out.dtype()`
- `RmsNorm`: add `dtype_` member to base class, use it instead of
  `out.dtype()` in CUDA and CPU implementations
Replace `constructors[-1]`/`calls[-1]` with configurable index overrides
(`_CONSTRUCTOR_INDEX_OVERRIDES`, `_CALL_INDEX_OVERRIDES`) so each operator
can specify which constructor and `operator()` overload to use for the
generated C API. For example, `RmsNorm` uses constructor index `0` to
include the `eps` parameter.
Add `_generate_shared_lib.py` that generates non-template `Make{Op}`
factory functions wrapping `Operator<Key>::make()`. These are compiled
into `libinfiniops.so`, allowing consumers to construct operators
without needing device-specific headers (e.g., CUDA).

Generated files:
- `generated/src/{op}/make.cc`: factory function definitions
- `generated/include/make.h`: combined header with declarations
Update the legacy C generator to call `Make{Op}` factory functions
from `libinfiniops.so` instead of `Operator<Key>::make()`. The
generated wrappers no longer include device-specific headers and
can be compiled with g++ (no CUDA needed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant