Skip to content

cuda_core: derive error enum explanations from bindings docstrings#1860

Draft
rwgk wants to merge 8 commits intoNVIDIA:mainfrom
rwgk:enum_explanations_from_docstrings
Draft

cuda_core: derive error enum explanations from bindings docstrings#1860
rwgk wants to merge 8 commits intoNVIDIA:mainfrom
rwgk:enum_explanations_from_docstrings

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 5, 2026

Closes #1712

This PR grows out of the validation work under #1805. That work showed that driver/runtime error enums expose usable per-member __doc__ text once a small amount of normalization is applied. The enum docstrings are usable starting with cuda-bindings >= 13.2.0 on the main branch, and cuda-bindings 12.9.6 on the 12.9.x backport branch. See the comment below for a summary of a manual review of the cleaned enum docstrings.

Summary

  • use cleaned enum __doc__ strings for driver/runtime error explanations in cuda_core when 12.9.6 <= cuda-bindings < 13.0.0 or cuda-bindings >= 13.2.0
  • keep the CTK 13.1.1 explanation tables frozen as fallback for bindings versions that do not expose usable enum-member __doc__
  • centralize the version gate + cleanup logic in enum_explanations_helpers.py
  • update cuda_utils coverage to share the same version predicate and verify representative enums expose __doc__ and that explanation text is attached to raised CUDAErrors
  • remove the obsolete toolshed/reformat_cuda_enums_as_py.py

Use cleaned driver/runtime enum __doc__ text from cuda-bindings 13.2.0+ as the primary source for CUDA error explanations in cuda_core, while freezing the 13.1.1 explanation tables as fallback for older bindings. Centralize the version-gated selection and docstring cleanup helpers, update the driver/runtime explanation modules to use them, add tests that verify representative enums expose __doc__ and that cuda_utils attaches the explanation text, and remove the obsolete enum-reformat toolshed helper script.

Made-with: Cursor
@rwgk rwgk added this to the cuda.core v0.7.0 milestone Apr 5, 2026
@rwgk rwgk self-assigned this Apr 5, 2026
@rwgk rwgk added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Apr 5, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 5, 2026

/ok to test

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Treat the 12.9.6 backport line as docstring-capable and reuse the same version predicate in tests so error explanations follow the bindings releases that already expose usable enum docs.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 5, 2026

/ok to test

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 5, 2026

I manually reviewed the cleaned driver/runtime enum-doc output for both cuda-bindings 13.2.0 and cuda-bindings 12.9.6 using this local helper script:

from __future__ import annotations

import cuda.bindings
from cuda.bindings import driver, runtime

from cuda.core._utils.enum_explanations_helpers import clean_enum_member_docstring


def dump_enum_docs(label, enum_type):
    print(f"=== {label} ===")
    for member in enum_type:
        cleaned = clean_enum_member_docstring(member.__doc__)
        print(f"{member.name} ({int(member)})")
        print(cleaned if cleaned is not None else "<None>")
        print()


def main():
    print(f"cuda.bindings.__version__ = {cuda.bindings.__version__}")
    print()
    dump_enum_docs("driver.CUresult", driver.CUresult)
    dump_enum_docs("runtime.cudaError_t", runtime.cudaError_t)


if __name__ == "__main__":
    main()

High-level result from the manual review:

  • the cleaned output looks presentable in both versions
  • I did not see leftover ~, ::, or Sphinx-role markup in the cleaned strings
  • this gave me enough confidence to use the docstring-backed path for 12.9.6 <= cuda-bindings < 13.0.0 as well as cuda-bindings >= 13.2.0
  • the only remaining oddities I noticed look upstream/generated rather than cleanup-related: cudaErrorIncompatibleDriverContext still says this the CUDA Runtime, and cudaErrorApiFailureBase has no docstring (which still falls back to cudaGetErrorString() in cuda_core)

rwgk added 3 commits April 5, 2026 11:49
Remove the old Doxygen ``::`` normalization path now that error explanations no longer depend on dict-vs-docstring parity checks. This keeps the helper focused on the cleanup rules that still affect user-facing CUDAError messages.

Made-with: Cursor
Clarify that DocstringBackedExplanations is a compatibility shim for the existing ``.get(int(error))`` lookup shape, and trim a low-value implementation note from the docstring cleanup workaround comment.

Made-with: Cursor
Remove redundant helper coverage now that DocstringBackedExplanations.get() and clean_enum_member_docstring() are already exercised elsewhere, and simplify the remaining test module imports.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 5, 2026

/ok to test

@rwgk rwgk marked this pull request as ready for review April 5, 2026 22:21
999: "This indicates that an unknown internal error has occurred.",
}

DRIVER_CU_RESULT_EXPLANATIONS = get_best_available_explanations(driver.CUresult, _FALLBACK_EXPLANATIONS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to move this one level up so this module full of pre-written results doesn't have to be imported at all if we have a new enough cuda-bindings.

s = re.sub(r"\*([^*]+)\*", r"\1", s)
s = re.sub(r"\s+", " ", s).strip()
s = _fix_hyphenation_wordwrap_spacing(s)
return s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really brittle, but I guess I don't see a way around it. I would prefer to just depend on rst2txt or something, that but seems unmaintained.

# in cuda-bindings code generation.
s = s.replace("\n:py:obj:`~.Interactions`", ' "Interactions ')
s = re.sub(
r":(?:py:)?(?:obj|func|meth|class|mod|data|const|exc):`([^`]+)`",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex seems too specific. Could we just use one that would capture any rst directive?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like (untested):

r":(?:[a-z]+:)?(?:[a-z]):`([^`]+)`"

while prev != s:
prev = s
s = re.sub(r"([a-z])- ([a-z])", r"\1-\2", s)
s = re.sub(r"([a-z]) -([a-z])", r"\1-\2", s)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems too narrow. Why only perform this on lower-case words?

rwgk added 3 commits April 6, 2026 10:49
Broaden the inline-role cleanup to accept generic RST roles and widen the word-wrap hyphen fix beyond lowercase-only cases. Keep the current 13.2.x output unchanged while expanding unit coverage for the newly supported forms.

Made-with: Cursor
Add terse comments for the remaining inline regex substitutions so the docstring cleanup steps are easier to follow without changing behavior.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 6, 2026

My Cursor prompt:

Regarding: importing _FALLBACK_EXPLANATIONS only when needed.

Brainstorming: I guess we could achieve this by leaving cuda_utils.pyx untouched, but maybe it's better (especially long-term) if we also consider changing cuda_utils.pyx? Could you please think through the alternatives, how this would shake out either way? What's your recommendation?

Cursor response:

I checked the current import path:

  • cuda_core/cuda/core/_utils/cuda_utils.pyx:29 imports DRIVER_CU_RESULT_EXPLANATIONS and RUNTIME_CUDA_ERROR_EXPLANATIONS at module import time.
  • cuda_core/cuda/core/_utils/driver_cu_result_explanations.py:355 and cuda_core/cuda/core/_utils/runtime_cuda_error_explanations.py:543 each define a huge _FALLBACK_EXPLANATIONS dict and then immediately call get_best_available_explanations(...).

So today, even on 13.2.x and 12.9.6, importing cuda_utils still imports both big fallback-table modules.

Option 1

  • Leave cuda_utils.pyx untouched.
  • Move the large dicts into new fallback-only modules, for example driver_cu_result_explanations_fallback.py and runtime_cuda_error_explanations_fallback.py.
  • Keep the current driver_cu_result_explanations.py and runtime_cuda_error_explanations.py as thin selector modules that either:
    • return DocstringBackedExplanations(...), or
    • import the fallback module only on old bindings.

How it shakes out:

  • cuda_utils.pyx stays exactly as-is.
  • No Cython rebuild.
  • Public/import shape stays stable.
  • Existing tests in cuda_core/tests/test_cuda_utils.py mostly continue to work unchanged.
  • If we want to avoid duplicating the version gate twice, enum_explanations_helpers.py can grow a small loader-based helper, e.g. “pick docstrings or call this fallback loader”.

Tradeoff:

  • adds two small new modules
  • a little bit of plumbing in pure Python

Option 2

  • Move the version-selection logic up into cuda_utils.pyx.
  • cuda_utils.pyx would either:
    • import fallback modules only when old bindings are detected, or
    • lazily initialize the explanation source on first use.

How it shakes out:

  • this is the most literal reading of “move it one level up”
  • could avoid importing even the selector modules until needed
  • but it pushes import orchestration into Cython

Tradeoff:

  • requires rebuilding cuda_core
  • more invasive diff
  • more logic in compiled code for something that is really import/selection policy
  • less nice to unit-test in isolation than the pure-Python helper path
  • you would likely need to touch cuda_core/tests/test_cuda_utils.py more, and maybe lose some of the clean separation you currently have in enum_explanations_helpers.py

Option 3

  • Hybrid: keep cuda_utils.pyx mostly unchanged, but have it import lightweight getter functions instead of precomputed objects.
  • That still means touching cuda_utils.pyx, but not much.

Tradeoff:

  • more churn than Option 1
  • not obviously better unless you specifically want first-use lazy initialization inside cuda_utils.pyx

Recommendation

  • I would keep cuda_utils.pyx untouched for this PR.
  • I’d implement the lazy fallback import entirely in Python by splitting the big tables into fallback-only modules and making the current driver/runtime modules thin selectors.
  • If we want the selector code to stay tidy, I’d slightly generalize get_best_available_explanations() so it can accept a fallback loader instead of an already-imported dict.

Why I think this is the best fit:

  • it satisfies the reviewer’s actual performance/import concern
  • it avoids a Cython rebuild for an import-policy change
  • it keeps the existing .get(int(error)) shape intact everywhere
  • it preserves the current separation of concerns: cuda_utils.pyx consumes explanations, while _utils decides where they come from

@rwgk rwgk marked this pull request as draft April 6, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Move enum explanations and health checks from cuda_core to cuda_bindings

2 participants