diff --git a/cuda_core/README.md b/cuda_core/README.md
index d7dfe83bfa9..7ea41966017 100644
--- a/cuda_core/README.md
+++ b/cuda_core/README.md
@@ -1,10 +1,10 @@
-# `cuda.core`: (experimental) Pythonic CUDA module
+# `cuda.core`: Pythonic CUDA module
 
 Currently under active development; see [the documentation](https://nvidia.github.io/cuda-python/cuda-core/latest/) for more details.
 
 ## Installing
 
-Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) for instructions and required/optional dependencies.
+Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-core/latest/install.html) for instructions and required/optional dependencies.
 
 ## Developing
 
diff --git a/cuda_core/docs/nv-versions.json b/cuda_core/docs/nv-versions.json
index 80f9de3e69a..d55ec26f53f 100644
--- a/cuda_core/docs/nv-versions.json
+++ b/cuda_core/docs/nv-versions.json
@@ -3,6 +3,10 @@
         "version": "latest",
         "url": "https://nvidia.github.io/cuda-python/cuda-core/latest/"
     },
+    {
+        "version": "0.7.0",
+        "url": "https://nvidia.github.io/cuda-python/cuda-core/0.7.0/"
+    },
     {
         "version": "0.6.0",
         "url": "https://nvidia.github.io/cuda-python/cuda-core/0.6.0/"
diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst
index 0c877dcc81e..005866ddb2d 100644
--- a/cuda_core/docs/source/api.rst
+++ b/cuda_core/docs/source/api.rst
@@ -129,12 +129,40 @@ Each subclass exposes attributes unique to its operation type.
    graph.SwitchNode
 
 
+Graphics interoperability
+-------------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   :template: autosummary/cyclass.rst
+
+   GraphicsResource
+
+
+Tensor Memory Accelerator (TMA)
+-------------------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   :template: autosummary/cyclass.rst
+
+   TensorMapDescriptor
+
+   :template: dataclass.rst
+
+   TensorMapDescriptorOptions
+
+
 CUDA compilation toolchain
 --------------------------
 
 .. autosummary::
    :toctree: generated/
 
+   :template: autosummary/cyclass.rst
+
    Program
    Linker
    ObjectCode
diff --git a/cuda_core/docs/source/release/0.6.0-notes.rst b/cuda_core/docs/source/release/0.6.0-notes.rst
index 654eb7641bf..b7d6188cc25 100644
--- a/cuda_core/docs/source/release/0.6.0-notes.rst
+++ b/cuda_core/docs/source/release/0.6.0-notes.rst
@@ -54,11 +54,6 @@ New features
 - Added CUDA version compatibility check at import time to detect mismatches between
   ``cuda.core`` and the installed ``cuda-bindings`` version.
 
-- ``Program.compile()`` now automatically resizes the NVRTC PCH heap and
-  retries when precompiled header creation fails due to heap exhaustion.
-  The ``pch_status`` property reports the PCH creation outcome
-  (``"created"``, ``"not_attempted"``, ``"failed"``, or ``None``).
-
 
 Fixes and enhancements
 ----------------------
diff --git a/cuda_core/docs/source/release/0.7.0-notes.rst b/cuda_core/docs/source/release/0.7.0-notes.rst
new file mode 100644
index 00000000000..3946c8804bf
--- /dev/null
+++ b/cuda_core/docs/source/release/0.7.0-notes.rst
@@ -0,0 +1,116 @@
+.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+.. SPDX-License-Identifier: Apache-2.0
+
+.. currentmodule:: cuda.core
+
+``cuda.core`` 0.7.0 Release Notes
+=================================
+
+
+Highlights
+----------
+
+- Introduced support for explicit graph construction. CUDA graphs can now be
+  built programmatically by adding nodes and edges, and their topology can be
+  modified after construction.
+- Added CUDA-OpenGL interoperability support, enabling zero-copy sharing of
+  GPU memory between CUDA compute kernels and OpenGL renderers.
+- Added :class:`TensorMapDescriptor` for Hopper+ TMA (Tensor Memory Accelerator)
+  bulk data movement, with automatic kernel argument integration.
+- :class:`~utils.StridedMemoryView` now supports DLPack export via
+  ``from_dlpack()`` array API.
+
+
+New features
+------------
+
+- Added the :mod:`cuda.core.graph` public module containing
+  :class:`~graph.GraphDef` for explicit graph construction, typed node
+  subclasses, and supporting types. :class:`~graph.GraphBuilder` (stream
+  capture) also moves into this module.
+
+- Added :meth:`~graph.GraphBuilder.callback` for CPU callbacks during stream
+  capture, mirroring the existing :meth:`~graph.GraphDef.callback` API.
+
+- Added :class:`GraphicsResource` for CUDA-OpenGL interoperability.
+  Factory classmethods :meth:`~GraphicsResource.from_gl_buffer` and
+  :meth:`~GraphicsResource.from_gl_image` register OpenGL objects for CUDA
+  access, and mapping returns a :class:`Buffer` for zero-copy kernel use.
+
+- Added :class:`TensorMapDescriptor` wrapping the CUDA driver's ``CUtensorMap``
+  for Hopper+ TMA (Tensor Memory Accelerator) bulk data movement.
+  :class:`~utils.StridedMemoryView` gains an :meth:`~utils.StridedMemoryView.as_tensor_map`
+  method for convenient descriptor creation, with automatic dtype inference, stride
+  computation, and first-class kernel argument integration.
+
+- Added DLPack export support to :class:`~utils.StridedMemoryView` via
+  ``__dlpack__`` and ``__dlpack_device__``, complementing the existing import
+  path.
+
+- Added the DLPack C exchange API (``__dlpack_c_exchange_api__``) to
+  :class:`~utils.StridedMemoryView`.
+
+- Added NVRTC precompiled header (PCH) support (CUDA 12.8+).
+  :class:`ProgramOptions` gains ``pch``, ``create_pch``, ``use_pch``,
+  ``pch_dir``, and related options. :attr:`Program.pch_status` reports the
+  PCH creation outcome, and :meth:`~Program.compile` automatically resizes the NVRTC
+  PCH heap and retries when PCH creation fails due to heap exhaustion.
+
+- Added NUMA-aware managed memory pool placement.
+  :class:`ManagedMemoryResourceOptions` gains a ``preferred_location_type``
+  option (``"device"``, ``"host"``, or ``"host_numa"``), and
+  :attr:`ManagedMemoryResource.preferred_location` queries the resolved
+  location. The existing ``preferred_location`` parameter retains full
+  backwards compatibility.
+
+- Added NUMA-aware pinned memory pool placement.
+  :class:`PinnedMemoryResourceOptions` gains a ``numa_id`` option, and
+  :attr:`PinnedMemoryResource.numa_id` queries the host NUMA node ID used for
+  pool placement. When ``ipc_enabled=True`` and ``numa_id`` is not set, the
+  NUMA node is automatically derived from the current CUDA device.
+
+- Added support for CUDA 13.2.
+
+
+New examples
+------------
+
+- ``gl_interop_plasma.py``: Real-time plasma effect demonstrating CUDA-OpenGL
+  interoperability via :class:`GraphicsResource`.
+- ``tma_tensor_map.py``: TMA bulk data movement using
+  :class:`TensorMapDescriptor` on Hopper+ GPUs.
+
+
+Fixes and enhancements
+----------------------
+
+- Fixed managed memory buffers being misclassified as ``kDLCUDAHost`` in DLPack
+  device mapping. They are now correctly reported as ``kDLCUDAManaged``.
+  (`#1863 <https://github.com/NVIDIA/cuda-python/pull/1863>`__)
+- Fixed IPC-enabled pinned memory pools using a hardcoded NUMA node ID of ``0``
+  instead of the NUMA node closest to the active CUDA device. On multi-NUMA
+  systems where the device is attached to a non-zero host NUMA node, this could
+  cause pool creation or allocation failures. (`#1603 <https://github.com/NVIDIA/cuda-python/issues/1603>`__)
+- Fixed :attr:`DeviceMemoryResource.peer_accessible_by` returning stale results when wrapping
+  a non-owned (default) memory pool. The property now always queries the CUDA driver for
+  non-owned pools, so multiple wrappers around the same pool see consistent state. (`#1720 <https://github.com/NVIDIA/cuda-python/issues/1720>`__)
+- Fixed a bare ``except`` clause in stream acceptance that silently swallowed all exceptions,
+  including ``KeyboardInterrupt`` and ``SystemExit``. Only the expected "protocol not
+  supported" case is now caught. (`#1631 <https://github.com/NVIDIA/cuda-python/issues/1631>`__)
+- :class:`~utils.StridedMemoryView` now validates strides at construction time so unsupported
+  layouts fail immediately instead of on first metadata access. (`#1429 <https://github.com/NVIDIA/cuda-python/issues/1429>`__)
+- IPC file descriptor cleanup now uses a C++ ``shared_ptr`` with a POSIX deleter, avoiding
+  cryptic errors when a :class:`DeviceMemoryResource` is destroyed during Python shutdown.
+- Improved error message when :class:`ManagedMemoryResource` is called without options on platforms
+  that lack a default managed memory pool (e.g. WSL2). (`#1617 <https://github.com/NVIDIA/cuda-python/issues/1617>`__)
+- Handle properties on core API objects now return ``None`` during Python shutdown instead of
+  crashing.
+- Reduced Python overhead in :class:`Program` and :class:`Linker` by moving compilation and
+  linking operations to the C level and releasing the GIL during backend calls. This benefits
+  workloads that create many programs or linkers, and enables concurrent compilation in
+  multithreaded applications.
+- Error enum explanations are now derived from ``cuda-bindings`` docstrings when available
+  (bindings 12.9.6+ or 13.2.0+), with frozen tables as fallback for older versions.
+- Improved optional dependency handling for NVVM and nvJitLink imports so that only genuinely
+  missing optional modules are treated as unavailable; unrelated import failures now surface
+  normally, and ``cuda.core`` now depends directly on ``cuda-pathfinder``.
diff --git a/cuda_core/docs/source/release/0.7.x-notes.rst b/cuda_core/docs/source/release/0.7.x-notes.rst
deleted file mode 100644
index 20e6738987a..00000000000
--- a/cuda_core/docs/source/release/0.7.x-notes.rst
+++ /dev/null
@@ -1,76 +0,0 @@
-.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-.. SPDX-License-Identifier: Apache-2.0
-
-.. currentmodule:: cuda.core
-
-``cuda.core`` 0.7.x Release Notes
-=================================
-
-
-Highlights
-----------
-
-- Introduced support for explicit graph construction. CUDA graphs can now be
-  built programmatically by adding nodes and edges, and their topology can be
-  modified after construction.
-
-
-Breaking Changes
-----------------
-
-- Building ``cuda.core`` from source now requires ``cuda-bindings`` >= 12.9.0, due to Cython-level
-  dependencies on the NVVM and nvJitLink bindings (``cynvvm``, ``cynvjitlink``). Pre-built wheels
-  are unaffected. The previous minimum was 12.8.0.
-
-
-New features
-------------
-
-- Added the :mod:`cuda.core.graph` public module containing
-  :class:`~graph.GraphDef` for explicit graph construction, typed node
-  subclasses, and supporting types. :class:`~graph.GraphBuilder` (stream
-  capture) also moves into this module.
-
-- Added ``preferred_location_type`` option to :class:`ManagedMemoryResourceOptions`
-  for explicit control over the preferred location kind (``"device"``,
-  ``"host"``, or ``"host_numa"``). This enables NUMA-aware managed memory
-  pool placement. The existing ``preferred_location`` parameter retains full
-  backwards compatibility when ``preferred_location_type`` is not set.
-
-- Added :attr:`ManagedMemoryResource.preferred_location` property to query the
-  resolved preferred location of a managed memory pool. Returns ``None`` for no
-  preference, or a tuple such as ``("device", 0)``, ``("host", None)``, or
-  ``("host_numa", 3)``.
-
-- Added ``numa_id`` option to :class:`PinnedMemoryResourceOptions` for explicit
-  control over host NUMA node placement. When ``ipc_enabled=True`` and
-  ``numa_id`` is not set, the NUMA node is automatically derived from the
-  current CUDA device.
-
-- Added :attr:`PinnedMemoryResource.numa_id` property to query the host NUMA
-  node ID used for pool placement. Returns ``-1`` for OS-managed placement.
-
-
-New examples
-------------
-
-None.
-
-
-Fixes and enhancements
-----------------------
-
-- Fixed IPC-enabled pinned memory pools using a hardcoded NUMA node ID of ``0``
-  instead of the NUMA node closest to the active CUDA device. On multi-NUMA
-  systems where the device is attached to a non-zero host NUMA node, this could
-  cause pool creation or allocation failures. (:issue:`1603`)
-- Fixed :attr:`DeviceMemoryResource.peer_accessible_by` returning stale results when wrapping
-  a non-owned (default) memory pool. The property now always queries the CUDA driver for
-  non-owned pools, so multiple wrappers around the same pool see consistent state. (:issue:`1720`)
-- Reduced Python overhead in :class:`Program` and :class:`Linker` by moving compilation and
-  linking operations to the C level and releasing the GIL during backend calls. This benefits
-  workloads that create many programs or linkers, and enables concurrent compilation in
-  multithreaded applications.
-- Improved optional dependency handling for NVVM and nvJitLink imports so that only genuinely
-  missing optional modules are treated as unavailable; unrelated import failures now surface
-  normally, and ``cuda.core`` now depends directly on ``cuda-pathfinder``.
diff --git a/cuda_core/pixi.toml b/cuda_core/pixi.toml
index 913472c07e1..1696a4a4c57 100644
--- a/cuda_core/pixi.toml
+++ b/cuda_core/pixi.toml
@@ -107,7 +107,7 @@ examples = { features = ["cu13", "examples", "local-deps"], solve-group = "examp
 # TODO: check if these can be extracted from pyproject.toml
 [package]
 name = "cuda-core"
-version = "0.6.0"
+version = "0.7.0"
 
 [package.build]
 backend = { name = "pixi-build-python", version = "*" }
diff --git a/cuda_core/pyproject.toml b/cuda_core/pyproject.toml
index aacbe4f4c59..80711f39ede 100644
--- a/cuda_core/pyproject.toml
+++ b/cuda_core/pyproject.toml
@@ -19,7 +19,7 @@ dynamic = [
     "readme",
 ]
 requires-python = '>=3.10'
-description = "cuda.core: (experimental) pythonic CUDA module"
+description = "cuda.core: pythonic CUDA module"
 authors = [
     { name = "NVIDIA Corporation" }
 ]