Commit dbc6348
committed
feat(core.utils): add persistent program caches (sqlite + filestream)
Convert cuda.core.utils to a package and add persistent, on-disk caches
for compiled ObjectCode produced by Program.compile.
Public API (cuda.core.utils):
* ProgramCacheResource -- abstract bytes|str -> ObjectCode mapping
with context manager and pickle-safety warning. Path-backed
ObjectCode is rejected at write time (would store only the path).
* SQLiteProgramCache -- single-file sqlite3 backend (WAL mode,
autocommit) with LRU eviction against an optional size cap. A
threading.RLock serialises connection use so one cache object is
safe across threads. wal_checkpoint(TRUNCATE) + VACUUM run after
evictions so the size cap bounds real on-disk usage. __contains__
is read-only -- it does not bump LRU. Schema-version mismatch on
open drops the tables and rebuilds; corrupt / non-SQLite files
are detected and the cache reinitialises empty.
* FileStreamProgramCache -- directory of atomically-written entries
(tmp + os.replace) safe across concurrent processes. On-disk
filenames are blake2b(32) hashes of the key so arbitrary-length
keys never overflow filesystem name limits. Reader pruning is
stat-guarded: only delete a corrupt-looking file if its inode/
size/mtime have not changed since the read, so a concurrent
os.replace by a writer is preserved. Windows ERROR_SHARING_VIOLATION
/ ERROR_LOCK_VIOLATION on os.replace are retried with bounded
backoff (~185ms) before being treated as a non-fatal cache miss;
other PermissionErrors and all POSIX failures propagate.
* make_program_cache_key -- stable 32-byte blake2b key over code,
code_type, ProgramOptions, target_type, name expressions, cuda
core/driver/NVRTC versions, NVVM IR version, and linker
backend+version for PTX inputs. Backend-specific gates mirror the
compiler:
* NVRTC side-effect options (create_pch, time, fdevice_time_trace)
and external-content options (include_path, pre_include, pch,
use_pch, pch_dir) require an extra_digest from the caller; an
empty list/tuple means "no items", an empty string is "set"
(NVRTC still emits the flag). The is_sequence check uses
collections.abc.Sequence to match _prepare_nvrtc_options_impl.
* NVVM use_libdevice=True requires extra_digest because
libdevice bitcode comes from the active toolkit.
* PTX (Linker) input options are normalised through per-field
gates that match _prepare_nvjitlink_options /
_prepare_driver_options. ftz/prec_div/prec_sqrt/fma collapse
to a sentinel under the driver linker (it ignores them).
ptxas_options canonicalises across str/list/tuple shapes.
The driver linker's hard rejections (time, ptxas_options,
split_compile) raise at key time.
* code_type/target_type combinations are validated against
Program.compile's SUPPORTED_TARGETS; mismatches raise.
* Failed environment probes mix the exception class name into a
*_probe_failed label so broken environments never collide with
working ones, while staying stable across processes.
Lazy import: ``from cuda.core.utils import StridedMemoryView`` does
NOT pull in the cache backends. The cache classes are exposed via
module __getattr__. sqlite3 is imported lazily inside
SQLiteProgramCache.__init__ so the package is usable on interpreters
built without libsqlite3.
Tests: single-process CRUD, LRU/size-cap (logical and on-disk),
corruption, schema-mismatch, threaded SQLite, cross-process FileStream
stress (including a writer/reader race that exercises the stat-guard
prune), Windows vs POSIX PermissionError narrowing, lazy-import
subprocess test, an end-to-end test that compiles a real CUDA C++
kernel, stores the ObjectCode, reopens the cache, and calls get_kernel
on the deserialised copy, and a test that parses _program.pyx via
tokenize + ast.literal_eval to assert the cache's
_SUPPORTED_TARGETS_BY_CODE_TYPE matches Program.compile's matrix.
Public API is documented in cuda_core/docs/source/api.rst.1 parent a18022c commit dbc6348
File tree
6 files changed
+2979
-8
lines changed- cuda_core
- cuda/core
- utils
- docs/source
- tests
6 files changed
+2979
-8
lines changedThis file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
0 commit comments