Commit 9d2a1f1
committed
Public offload knobs: partitioner kwarg, weight_offload_budget_mb, runner CLI flag
EXPERIMENTAL. Lands the user-facing API surface. After this
commit, the CUDA weight-offload runtime is reachable from outside
the stack's own tests:
* ``CudaPartitioner(..., weight_offload=True,
weight_offload_pin_fqns=[...])`` named kwargs translate to
the existing internal compile specs.
* ``weight_offload_budget_mb`` runtime spec (int megabytes)
accepted by ``cuda_backend.cpp::init`` alongside the existing
private byte spec.
* ``executor_runner --cuda_runtime_spec=k1=v1,k2=v2`` CLI flag
lets tests + manual repros drive the public spec end-to-end.
Part A — public partitioner kwarg:
``CudaPartitioner.__init__`` grows two named kwargs:
``weight_offload: bool`` and
``weight_offload_pin_fqns: Optional[List[str]]``. Translation
rules in order:
1. Reject pin-without-enable (ValueError).
2. Strict mixed-channel rejection: when ANY public kwarg is
non-default, reject ANY raw ``_weight_offload_internal_*``
compile spec entry — not just same-key conflicts. Raw
internal specs stay allowed only when both public kwargs
are at defaults (preserves the test stack).
3. Dedupe pin_fqns first-seen-order. The runtime parser also
hard-fails on duplicates as defense in depth; deduping at
the partitioner keeps harmless caller mistakes from
reaching that hard-fail.
4. Append the internal compile specs.
The four internal key strings are INLINED in
``cuda_partitioner.py`` (not imported from
``weight_offload_pass.py``) to avoid the ``@custom_op``
registration side-effect at import time that would defeat the
lazy-import pattern in
``CudaBackend.pre_aoti_transform_and_collect_named_data``.
Drift is bounded by ``test_partitioner_internal_keys_match_pass``.
Part B — public ``weight_offload_budget_mb`` runtime spec:
``cuda_backend.cpp::init`` tries the public int-MB spec first,
falls through to the existing private byte spec, defaults to
``floor_bytes + pinned_bytes_total`` (with checked addition
overflow guard) when neither is set. When both are set the
public wins so the test path can't accidentally bypass the
public route.
New ``BudgetSpec`` struct in ``session.h`` carries the spec
name + value + value_is_mb flag from the runtime-spec
resolution chain into Session::create. The below-floor UX
message now:
* Names the spec the user actually set (public name for the
public path, internal name for the test path; the
default-budget path defaults to hinting the public name).
* Echoes the user-supplied value (``set via
weight_offload_budget_mb=N`` or ``set via _weight_offload_..._bytes=N``).
* For the public path, includes an MB-rounded suggested fix
(``Set weight_offload_budget_mb >= N``) using
division/modulo rounding so the round-up itself can't
overflow at uint64 boundaries.
* Has a checked-addition guard on ``required_total = floor +
pinned`` to match the default-budget guard.
Part C — ``executor_runner --cuda_runtime_spec`` CLI flag:
Single comma-separated string parsed in ``executor_runner.cpp``
(gflags doesn't natively support repeated flags; comma-splitting
internally is simpler). Key-aware parsing via ``kKnownCudaSpecs``
table:
* ``weight_offload_budget_mb`` → int
* ``_weight_offload_internal_budget_bytes`` → string
Unknown keys hard-fail at parse with "known keys: ..." message.
Duplicate keys hard-fail at parse. Builds
``std::vector<BackendOption>``, wires through
``LoadBackendOptionsMap::set_options("CudaBackend", Span)`` to
the existing-but-currently-nullptr ``backend_options`` arg of
``Program::load_method``.
Flag is intentionally CUDA-scoped (``--cuda_runtime_spec`` not
``--backend_option``) because the route feeds load-time backend
options for CudaBackend specifically; other backends can add
their own ``--<backend>_runtime_spec`` flag if they want
similar test access.
Tests (8 new + 1 deferred-from-9a un-deferred):
Pool side:
* ``test_runtime_accepts_public_budget_mb_via_runner_flag``:
``--cuda_runtime_spec=weight_offload_budget_mb=4`` →
success summary's ``budget_bytes == 4 << 20``.
* ``test_pinning_below_floor_with_pinned_hard_fails``:
previously deferred from 9a. ``_LargePinnedModel`` (~1 MB
per weight) + ``weight_offload_budget_mb=1`` lands strictly
below ``floor + pinned``; init hard-fails with the new UX
message format.
* ``test_floor_message_names_public_spec_when_user_set``:
asserts the error message names the public spec and
includes the suggested ``Set weight_offload_budget_mb >= N``
fix line.
Partitioner side:
* ``test_partitioner_public_kwargs_round_trip``: kwargs
produce the expected internal compile specs.
* ``test_partitioner_dedupes_pin_fqns``: ``["w1","w2","w1"]``
→ ``["w1","w2"]`` first-seen-order.
* ``test_partitioner_rejects_pin_without_enable``: ValueError.
* ``test_partitioner_rejects_any_mixed_channel``: covers
same-key AND different-key conflicts; also covers the
raw-without-public-kwarg-still-allowed path.
* ``test_partitioner_internal_keys_match_pass``: asserts the
inlined key constants equal the canonical pass-side
exports. Catches drift at CI time.
Banner / docstring updates:
* ``weight_offload.h``: banner flips to "OFFLOAD COMPLETE;
PUBLIC KNOBS WIRED. MULTI-DEVICE PENDING". "What's NOT YET
WIRED" reduces to multi-device only; new "Resolved in commit
9b" section spells out the three public surfaces.
* ``session.h``: drops the "commit 9b public knobs" bullet
from the deferred list.
* ``weight_offload_pass.py``: docstring updated to describe
the public partitioner kwarg as the user-facing entry point;
the underscore-prefixed compile specs are still documented
as accessible from tests for exact-byte budget control.1 parent 29a8be9 commit 9d2a1f1
9 files changed
Lines changed: 937 additions & 138 deletions
File tree
- backends/cuda
- passes
- runtime
- weight_offload
- tests
- examples/portable/executor_runner
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
15 | 134 | | |
16 | 135 | | |
17 | 136 | | |
| |||
29 | 148 | | |
30 | 149 | | |
31 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
32 | 154 | | |
33 | 155 | | |
34 | 156 | | |
| |||
38 | 160 | | |
39 | 161 | | |
40 | 162 | | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
41 | 177 | | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
42 | 186 | | |
43 | 187 | | |
44 | 188 | | |
45 | 189 | | |
46 | 190 | | |
47 | | - | |
| 191 | + | |
48 | 192 | | |
49 | 193 | | |
50 | 194 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | | - | |
32 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
33 | 36 | | |
34 | 37 | | |
35 | 38 | | |
| |||
158 | 161 | | |
159 | 162 | | |
160 | 163 | | |
161 | | - | |
| 164 | + | |
162 | 165 | | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
168 | 179 | | |
169 | 180 | | |
170 | 181 | | |
| |||
760 | 771 | | |
761 | 772 | | |
762 | 773 | | |
763 | | - | |
764 | | - | |
765 | | - | |
766 | | - | |
767 | | - | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
768 | 779 | | |
769 | 780 | | |
770 | 781 | | |
| |||
0 commit comments