CuTe DSL is an embedded domain-specific language within Python. It utilizes a subset of Python's syntax to provide a streamlined programming experience. It is important to understand that CuTe DSL does NOT implement the complete Python language semantics in its JIT compilation process.
- Python Native Data Types
CuTe DSL supports Python data structures when used for "meta-programming," but these structures cannot be treated as dynamic values modifiable at runtime. For instance, lists and dictionaries can be used to configure kernel parameters during compilation or serve as containers for dynamic values, but their structure and organization cannot be altered during kernel execution.
- Static Values:
- Evaluated during JIT compilation phase
- Immutable after compilation completes
- Most Python native types (lists, tuples, dictionaries) are processed as static values
- Primarily utilized for "meta-programming" and configuration purposes
- Example: Lists can contain dynamic values but their structure cannot be modified during kernel execution
- Dynamic Values:
- Evaluated during runtime execution
- Modifiable during execution of JIT-compiled functions
- Only a specific subset of Python types are supported as dynamic values
- Primitive types are automatically converted when passed as function arguments:
int→Int32(may be updated toInt64in future releases)bool→Boolfloat→Float32(may be updated toFloat64in future releases)
The JIT compiler processes Python native types analogously to C++ template parameters. The compiled code cannot manipulate dynamic values of composite types such as lists, tuples, or dictionaries.
For example, following code doesn't work as traditional Python program inside JIT function.
@cute.jit def foo(a: Float32, b: Float32, i: Int32, res: cute.Tensor): xs = [a, b] # indexing list with dynamic index is not supported in CuTe DSL: res[0] = xs[i] if i == 0: # This will alway append Float32(3.0) to the list regardless # of the runtime value of `i` xs.append(Float32(3.0)) for i in range(10): # This only append one element to the list at compile-time # as loop doesn't unroll at compile-time xs.append(Float32(1.0))
- Python Function
The DSL currently does not implement support for return values from Python functions, although this capability is planned for future releases.
Example:
@cute.jit def foo(): return 1 # Currently unsupported in CuTe DSL
- Expression or Statement with Dependent Type
CuTe DSL implements static typing and does not support dependent types. The type of each expression must be determinable during compile time, in contrast to standard Python which implements dynamic typing.
Example illustrating functionality in Python that is not supported in the DSL:
# Valid in standard Python, but unsupported in CuTe DSL max(int(1), float(2.0)) # => 2.0 : float max(int(3), float(2.0)) # => 3 : int
In CuTe DSL, types are promoted. For example:
@cute.jit def foo(a: Int32, b: Float32, res: cute.Tensor): res[0] = max(a, b) # Type is automatically promoted to Float32
Following code using inlined if-else expression with dependent types is not supported in CuTe DSL:
@cute.jit def foo(cond: Boolean, a: Int32, b: Float32, res: cute.Tensor): res[0] = a if cond else b
- Control Flow
The DSL transforms Python control flow statements (
if,for,while) during Abstract Syntax Tree (AST) processing into structured control flow in MLIR which has the same constraints as dependent types. For instance, changing type of a variable in loop body is not allowed.- Variables must be defined prior to the control flow statement
- Type consistency must be maintained throughout the control flow statement
- Don't support early exit or return from if-else statements
Example illustrating functionality in Python that is not supported in the DSL:
@cute.jit def foo(): a = Int32(1) for i in range(10): a = Float32(2) # Changing type inside loop-body is not allowed in the DSL
- Built-in Operators
- The DSL transforms built-in operators like
and,or,max,min, etc. into MLIR operations. They also follow the same constraints of dependent types. For instance,a and brequiresaandbto be of the same type. - Special Variables
The DSL treats
_as a special variable that it's value is meant to be ignored. It is not allowed to read_in the DSL.Example illustrating functionality in Python that is not supported in the DSL:
@cute.jit def foo(): _ = 1 print(_) # This is not allowed in the DSL
- Object Oriented Programming
The DSL is implemented on top of Python and supports Python's object-oriented programming (OOP) features for meta-programming at compile-time.
However, similar to other composed data types, the DSL provides limited support for OOP when objects contain dynamic values. It is strongly recommended to avoid passing dynamic values between member methods through class state in your code.
The following example illustrates functionality in Python that is not supported in the DSL without implementing the
DynamicExpressionprotocol:class Foo: def __init__(self, a: Int32): self.a = a def set_a(self, i: Int32): self.a = i def get_a(self): return self.a @cute.jit def foo(a: Int32, res: cute.Tensor): foo = Foo(a) for i in range(10): foo.set_a(i) # This fails to compile because `a` is assigned a local value defined within the for-loop body # and is not visible outside of the loop body res[0] = foo.get_a()
The example above fails to compile because
Foo.ais assigned a local value defined within the for-loop body, which is not visible outside the loop body.The CuTe DSL implements an internal mechanism that provides limited support for OOP patterns via protocol. As the DSL continues to evolve to support additional features, this mechanism is subject to change and is not recommended for direct use in users' code for better portability.
- CuTe Layout algebra in native Python
Entirety of CuTe Layout algebra operations and APIs require JIT compilation. These functionalities are exclusively available within JIT-compiled functions and cannot be accessed in standard Python execution environments.
Additionally, there exists a restricted set of data types that can be passed as arguments to JIT-compiled functions, which further constrains their usage in native Python contexts. Only following CuTe algebra types are supported as JIT function arguments:
Tensor,Pointer,Shape,Stride,CoordandIntTuple. ForStride, we don't supportScacledBasisfrom native Python Context. Unfortunately, in the first release, we don't support passingLayoutunder native Python Context.
For reliable and predictable results:
- Avoid dependent types in your code
- Implement explicit type conversion for dynamic values
- Clearly distinguish between static (compile-time) and dynamic (runtime) values
- Use type annotations as much as possible to help JIT compiler to identify type to avoid ambiguity
# Example demonstrating explicit typing
alpha = 1.0 # Explicitly defined as float using `1.0` instead of `1`
# or `float(1)`
beta = 2.0 # Explicitly defined as float
result = max(alpha, beta) # Will correctly perform float comparison