Skip to content

Commit 413d057

Browse files
committed
Update docs
1 parent 12eac82 commit 413d057

11 files changed

Lines changed: 112 additions & 73 deletions

File tree

cext/tile_kernel.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1728,7 +1728,12 @@ static PyMethodDef functions[] = {
17281728
{"launch", reinterpret_cast<PyCFunction>(cuda_tile_launch), METH_FASTCALL,
17291729
LAUNCH_SIGNATURE "\n"
17301730
"--\n\n"
1731-
"Launch a cuTile kernel."
1731+
"Launch a cuTile kernel.\n\n"
1732+
"Args:\n"
1733+
" stream: The CUDA stream to execute the |kernel| on.\n"
1734+
" grid: Tuple of up to 3 grid dimensions to execute the |kernel| over.\n"
1735+
" kernel: The |kernel| to execute.\n"
1736+
" kernel_args: Positional arguments to pass to the kernel.\n"
17321737
},
17331738
NULL
17341739
};

docs/source/data.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -139,16 +139,16 @@ to a common dtype using the following process:
139139
Scalars
140140
-------
141141

142-
A *scalar* is a single immutable value of a specific |data type|.
142+
A *scalar* is a single immutable value of a specific |data type|. A *scalar* and *0D-tile*
143+
can be used interchangably in a tile |kernel|. They can also be |kernel| parameters.
143144

144-
|Scalars| can be used in |host code| and |tile code|.
145-
They can be |kernel| parameters.
145+
Typing of a *scalar* has the following rules:
146146

147-
.. toctree::
148-
:maxdepth: 2
149-
:hidden:
150-
151-
data/scalar
147+
- Constant scalars are |loosely typed| by default, for example, a literal ``2`` or
148+
a constant property like ``Tile.ndim``, ``Tile.shape``, or ``Array.ndim``.
149+
- ``Array.shape`` and ``Array.stride`` are not constant by default and has default int type `int32`.
150+
Using default `int32` makes kernel more performant at the cost of limiting max representable shape.
151+
This limitation will be lifted in the near future.
152152

153153
Tuples
154154
------

docs/source/data/scalar.rst

Lines changed: 0 additions & 14 deletions
This file was deleted.

docs/source/interoperability.rst

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,3 @@ This includes:
2828
- Passing the same kinds of arrays to both tile and SIMT kernels.
2929

3030
Inter-kernel interoperability shall be supported.
31-
32-
Hybrid Kernels
33-
~~~~~~~~~~~~~~
34-
35-
Hybrid kernels refer to kernels that contain both |tile code| and |SIMT code|.
36-
Hybrid kernels are not yet supported.

docs/source/quickstart.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ Prerequisites
1616

1717
cuTile Python requires the following:
1818

19+
- Linux x86_64, Linux aarch64 or Windows x86_64
1920
- A GPU with compute capability 10.x or 12.x
2021
- NVIDIA Driver r580 or later
2122
- CUDA Toolkit 13.1 or later
22-
- The `PATH` environment variable must contain the path to the `bin/` directory of the CUDA Toolkit
23-
- Python version 3.10 or higher
23+
- Python version 3.10, 3.11, 3.12 or 3.13
2424

2525

2626
Installing cuTile Python
@@ -47,9 +47,9 @@ The quickstart sample on this page uses cupy, which can be installed with:
4747
4848
The cuTile Python samples in the ``samples/`` directory also use pytest, torch, and numpy packages.
4949

50-
For torch installation instructions, see `<https://pytorch.org/get-started/locally/>`__.
50+
For PyTorch installation instructions, see `<https://pytorch.org/get-started/locally/>`__.
5151

52-
Pytest and numpy can be installed with:
52+
Pytest and Numpy can be installed with:
5353

5454
.. code-block:: bash
5555
@@ -68,7 +68,7 @@ This example shows a structure common to cuTile kernels:
6868
* Write the resulting tile(s) out to GPU memory
6969

7070
In this case, the kernel loads tiles from two vectors, ``a`` and ``b``. These loads create tiles called ``a_tile`` and ``b_tile``. These tiles are added together to form a third tile, called ``result``. In the last step, the kernel stores the ``result`` tile to the output vector ``c``.
71-
This code can be found in the cuTile Python repository at ``samples/quickstart/VectorAdd_quickstart.py``.
71+
More samples can be found in the cuTile Python `repository <https://github.com/nvidia/cutile-python>`_.
7272

7373
.. literalinclude:: ../../samples/quickstart/VectorAdd_quickstart.py
7474
:language: python

docs/source/references.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
.. |constant embedded| replace:: :ref:`constant embedded <execution:Constant Embedding>`
4848
.. |Constant embedded| replace:: :ref:`Constant embedded <execution:Constant Embedding>`
4949

50+
.. |loosely typed| replace:: :ref:`loosely typed <execution:Constant Expressions & Objects>`
5051
.. |loosely typed numeric constant| replace:: :ref:`loosely typed numeric constant <execution:Constant Expressions & Objects>`
5152
.. |loosely typed numeric constants| replace:: :ref:`loosely typed numeric constants <execution:Constant Expressions & Objects>`
5253

@@ -101,10 +102,10 @@
101102
.. |tile spaces| replace:: :ref:`tile spaces <data:Element & Tile Space>`
102103
.. |Tile spaces| replace:: :ref:`Tile spaces <data:Element & Tile Space>`
103104

104-
.. |scalar| replace:: :ref:`scalar <data/scalar:cuda.tile.Scalar>`
105-
.. |Scalar| replace:: :ref:`Scalar <data/scalar:cuda.tile.Scalar>`
106-
.. |scalars| replace:: :ref:`scalars <data/scalar:cuda.tile.Scalar>`
107-
.. |Scalars| replace:: :ref:`Scalars <data/scalar:cuda.tile.Scalar>`
105+
.. |scalar| replace:: :ref:`scalar <data:Scalars>`
106+
.. |Scalar| replace:: :ref:`Scalar <data:Scalars>`
107+
.. |scalars| replace:: :ref:`scalars <data:Scalars>`
108+
.. |Scalars| replace:: :ref:`Scalars <data:Scalars>`
108109

109110
.. |Rounding Modes| replace:: :ref:`Rounding Modes <data:Rounding Modes>`
110111
.. |Padding Modes| replace:: :ref:`Padding Modes <data:Padding Modes>`

samples/quickstart/VectorAdd_quickstart.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@
77
Shows how to perform elementwise operations on vectors.
88
"""
99

10-
from math import ceil
11-
1210
import cupy as cp
1311
import numpy as np
1412
import cuda.tile as ct
@@ -34,7 +32,7 @@ def test():
3432
# Create input data
3533
vector_size = 2**12
3634
tile_size = 2**4
37-
grid = (ceil(vector_size / tile_size), 1, 1)
35+
grid = (ct.cdiv(vector_size, tile_size), 1, 1)
3836

3937
a = cp.random.uniform(-1, 1, vector_size)
4038
b = cp.random.uniform(-1, 1, vector_size)

src/cuda/tile/_cext.pyi

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,7 @@ def launch(stream,
1111
kernel,
1212
kernel_args: tuple[Any, ...],
1313
/):
14-
"""
15-
Queue a |kernel| for execution over |grid| on a particular stream.
16-
17-
Args:
18-
stream: The CUDA stream to execute the |kernel| on.
19-
grid: Tuple of up to 3 grid dimensions to execute the |kernel| over.
20-
kernel: The |kernel| to execute.
21-
kernel_args: Positional arguments to pass to the kernel.
22-
"""
14+
...
2315

2416

2517
class TileDispatcher:

src/cuda/tile/_ir/ops.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1564,6 +1564,7 @@ def getattr_impl(object: Var, name: Var) -> Var:
15641564
case ArrayTy(), "dtype": return loosely_typed_const(ty.dtype)
15651565
case ArrayTy(), "ndim": return loosely_typed_const(ty.ndim)
15661566
case ArrayTy(), "shape": return get_array_shape(object)
1567+
case ArrayTy(), "strides": return get_array_strides(object)
15671568

15681569
case TileTy(), "dtype": return loosely_typed_const(ty.dtype)
15691570
case TileTy(), "shape": return loosely_typed_const(ty.shape_value)

src/cuda/tile/_stub.py

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -148,27 +148,38 @@ class Array:
148148
@property
149149
@function
150150
def dtype(self) -> "DType":
151-
"""The |data type| of the |array|'s elements."""
151+
"""The |data type| of the |array|'s elements.
152+
153+
Returns:
154+
DType (constant):
155+
"""
152156

153157
@property
154158
@function
155159
def shape(self) -> tuple[int, ...]:
156-
"""The number of elements in each of the |array|'s dimensions."""
160+
"""The number of elements in each of the |array|'s dimensions.
161+
162+
Returns:
163+
tuple[int32,...]:
164+
"""
157165

158166
@property
159167
@function
160168
def strides(self) -> tuple[int, ...]:
161-
"""The number of elements to step in each dimension while traversing the |array|."""
169+
"""The number of elements to step in each dimension while traversing the |array|.
162170
163-
@property
164-
@function
165-
def size(self) -> int:
166-
"""The number of elements in the |array|."""
171+
Returns:
172+
tuple[int32,...]:
173+
"""
167174

168175
@property
169176
@function
170177
def ndim(self) -> int:
171-
"""The number of dimensions in the |array|."""
178+
"""The number of dimensions in the |array|.
179+
180+
Returns:
181+
int (constant):
182+
"""
172183

173184

174185
class Tile:
@@ -188,35 +199,37 @@ class Tile:
188199
@property
189200
@function
190201
def dtype(self) -> "DType":
191-
"""The |data type| of the |tile|'s elements."""
202+
"""The |data type| of the |tile|'s elements.
192203
193-
@property
194-
@function
195-
def shape(self) -> tuple[int, ...]:
196-
"""The number of elements in each of the |tile|'s dimensions."""
204+
Returns:
205+
DType (constant):
206+
"""
197207

198208
@property
199209
@function
200-
def strides(self) -> tuple[int, ...]:
201-
"""The number of elements to step in each dimension while traversing the |tile|."""
210+
def shape(self) -> tuple[int, ...]:
211+
"""The number of elements in each of the |tile|'s dimensions.
202212
203-
@property
204-
@function
205-
def size(self) -> int:
206-
"""The number of elements in the |tile|."""
213+
Returns:
214+
tuple[const int,...]:
215+
"""
207216

208217
@property
209218
@function
210219
def ndim(self) -> int:
211-
"""The number of dimensions in the |tile|."""
220+
"""The number of dimensions in the |tile|.
221+
222+
Returns:
223+
int (constant):
224+
"""
212225

213226
@function
214-
def item(self) -> "Scalar":
215-
"""Extract scalar from a single element tile.
227+
def item(self) -> "Tile":
228+
"""Extract scalar (0D Tile) from a single element tile.
216229
Tile must contain only 1 element.
217230
218231
Returns:
219-
Scalar:
232+
Tile: A 0D Tile usable as a scalar.
220233
221234
Examples:
222235
@@ -441,6 +454,9 @@ def num_tiles(array: Array, /,
441454
shape (const int...): A sequence of const integers definining the shape of the tile.
442455
order ("C" or "F", or tuple[const int,...]): Order of axis mapping. See :py:func:`load`.
443456
457+
Returns:
458+
int32
459+
444460
Examples:
445461
446462
Suppose array size is (32, 16), tile shape (4, 8),

0 commit comments

Comments
 (0)