|
| 1 | +.. _dpnp_execution_model: |
| 2 | + |
| 3 | +######################## |
| 4 | +oneAPI programming model |
| 5 | +######################## |
| 6 | + |
| 7 | +oneAPI library and its Python interface |
| 8 | +======================================= |
| 9 | + |
| 10 | +Using oneAPI libraries, a user calls functions that take ``sycl::queue`` and a collection of |
| 11 | +``sycl::event`` objects among other arguments. For example: |
| 12 | + |
| 13 | +.. code-block:: cpp |
| 14 | + :caption: Prototypical call signature of oneMKL function |
| 15 | +
|
| 16 | + sycl::event |
| 17 | + compute( |
| 18 | + sycl::queue &exec_q, |
| 19 | + ..., |
| 20 | + const std::vector<sycl::event> &dependent_events |
| 21 | + ); |
| 22 | +
|
| 23 | +The function ``compute`` inserts computational tasks into the queue ``exec_q`` for DPC++ runtime to |
| 24 | +execute on the device the queue targets. The execution may begin only after other tasks whose |
| 25 | +execution status is represented by ``sycl::event`` objects in the provided ``dependent_events`` |
| 26 | +vector complete. If the vector is empty, the runtime begins the execution as soon as the device is |
| 27 | +ready. The function returns a ``sycl::event`` object representing completion of the set of |
| 28 | +computational tasks submitted by the ``compute`` function. |
| 29 | + |
| 30 | +Hence, in the oneAPI programming model, the execution **queue** is used to specify which device the |
| 31 | +function will execute on. To create a queue, one must specify a device to target. |
| 32 | + |
| 33 | +In :mod:`dpctl`, the ``sycl::queue`` is represented by :class:`dpctl.SyclQueue` Python type, |
| 34 | +and a Python API to call such a function might look like |
| 35 | + |
| 36 | +.. code-block:: python |
| 37 | +
|
| 38 | + def call_compute( |
| 39 | + exec_q : dpctl.SyclQueue, |
| 40 | + ..., |
| 41 | + dependent_events : List[dpctl.SyclEvent] = [] |
| 42 | + ) -> dpctl.SyclEvent: |
| 43 | + ... |
| 44 | +
|
| 45 | +When building Python API for a SYCL offloading function, and you choose to |
| 46 | +map the SYCL API to a different API on the Python side, it must still translate to a |
| 47 | +similar call under the hood. |
| 48 | + |
| 49 | +The arguments to the function must be suitable for use in the offloading functions. |
| 50 | +Typically these are Python scalars, or objects representing USM allocations, such as |
| 51 | +:class:`dpnp.tensor.usm_ndarray`, :class:`dpctl.memory.MemoryUSMDevice` and friends. |
| 52 | + |
| 53 | +.. note:: |
| 54 | + The USM allocations these objects represent must not get deallocated before |
| 55 | + offloaded tasks that access them complete. |
| 56 | + |
| 57 | + This is something authors of DPC++-based Python extensions must take care of, |
| 58 | + and users of such extensions should assume assured. |
| 59 | + |
| 60 | + |
| 61 | +USM allocations and compute-follows-data |
| 62 | +======================================== |
| 63 | + |
| 64 | +To make a USM allocation on a device in SYCL, one needs to specify ``sycl::device`` in the |
| 65 | +memory of which the allocation is made, and the ``sycl::context`` to which the allocation |
| 66 | +is bound. |
| 67 | + |
| 68 | +A ``sycl::queue`` object is often used instead. In such cases ``sycl::context`` and ``sycl::device`` associated |
| 69 | +with the queue are used to make the allocation. |
| 70 | + |
| 71 | +.. important:: |
| 72 | + :mod:`dpnp.tensor` associates a queue object with every USM allocation. |
| 73 | + |
| 74 | + The associated queue may be queried using ``.sycl_queue`` property of the |
| 75 | + Python type representing the USM allocation. |
| 76 | + |
| 77 | +This design choice allows :mod:`dpnp.tensor` to have a preferred queue to use when operating on any single |
| 78 | +USM allocation. For example: |
| 79 | + |
| 80 | +.. code-block:: python |
| 81 | +
|
| 82 | + def unary_func(x : dpnp.tensor.usm_ndarray): |
| 83 | + code1 |
| 84 | + _ = _func_impl(x.sycl_queue, ...) |
| 85 | + code2 |
| 86 | +
|
| 87 | +When combining several objects representing USM-allocations, the |
| 88 | +:ref:`programming model <dpnp_tensor_compute_follows_data>` |
| 89 | +adopted in :mod:`dpnp.tensor` insists that queues associated with each object be the same, in which |
| 90 | +case it is the execution queue used. Alternatively :exc:`dpctl.utils.ExecutionPlacementError` is raised. |
| 91 | + |
| 92 | +.. code-block:: python |
| 93 | +
|
| 94 | + def binary_func( |
| 95 | + x1 : dpnp.tensor.usm_ndarray, |
| 96 | + x2 : dpnp.tensor.usm_ndarray |
| 97 | + ): |
| 98 | + exec_q = dpctl.utils.get_execution_queue((x1.sycl_queue, x2.sycl_queue)) |
| 99 | + if exec_q is None: |
| 100 | + raise dpctl.utils.ExecutionPlacementError |
| 101 | + ... |
| 102 | +
|
| 103 | +In order to ensure that compute-follows-data works seamlessly out-of-the-box, :mod:`dpnp.tensor` maintains |
| 104 | +a cache with context and device as keys and queues as values used by :class:`dpnp.tensor.Device` class. |
| 105 | + |
| 106 | +.. code-block:: python |
| 107 | +
|
| 108 | + >>> import dpctl |
| 109 | + >>> from dpnp import tensor |
| 110 | +
|
| 111 | + >>> sycl_dev = dpctl.SyclDevice("cpu") |
| 112 | + >>> d1 = tensor.Device.create_device(sycl_dev) |
| 113 | + >>> d2 = tensor.Device.create_device("cpu") |
| 114 | + >>> d3 = tensor.Device.create_device(dpctl.select_cpu_device()) |
| 115 | +
|
| 116 | + >>> d1.sycl_queue == d2.sycl_queue, d1.sycl_queue == d3.sycl_queue, d2.sycl_queue == d3.sycl_queue |
| 117 | + (True, True, True) |
| 118 | +
|
| 119 | +Since :class:`dpnp.tensor.Device` class is used by all :ref:`array creation functions <dpnp_tensor_creation_functions>` |
| 120 | +in :mod:`dpnp.tensor`, the same value used as ``device`` keyword argument results in array instances that |
| 121 | +can be combined together in accordance with compute-follows-data programming model. |
| 122 | + |
| 123 | +.. code-block:: python |
| 124 | +
|
| 125 | + >>> from dpnp import tensor |
| 126 | + >>> import dpctl |
| 127 | +
|
| 128 | + >>> # queue for default-constructed device is used |
| 129 | + >>> x1 = tensor.arange(100, dtype="int32") |
| 130 | + >>> x2 = tensor.zeros(100, dtype="int32") |
| 131 | + >>> x12 = tensor.concat((x1, x2)) |
| 132 | + >>> x12.sycl_queue == x1.sycl_queue, x12.sycl_queue == x2.sycl_queue |
| 133 | + (True, True) |
| 134 | + >>> # default constructors of SyclQueue class create different instance of the queue |
| 135 | + >>> q1 = dpctl.SyclQueue() |
| 136 | + >>> q2 = dpctl.SyclQueue() |
| 137 | + >>> q1 == q2 |
| 138 | + False |
| 139 | + >>> y1 = tensor.arange(100, dtype="int32", sycl_queue=q1) |
| 140 | + >>> y2 = tensor.zeros(100, dtype="int32", sycl_queue=q2) |
| 141 | + >>> # this call raises ExecutionPlacementError since compute-follows-data |
| 142 | + >>> # rules are not met |
| 143 | + >>> tensor.concat((y1, y2)) |
| 144 | +
|
| 145 | +Please refer to the :ref:`array migration <dpnp_tensor_array_migration>` section of the introduction to |
| 146 | +:mod:`dpnp.tensor` for examples on how to resolve ``ExecutionPlacementError`` exceptions. |
0 commit comments