Commit bf4da7a
committed
Offload cleanup (round 5): init/Session boundary refactor
Init becomes a thin shim. Session owns the offload lifecycle.
Net -120 lines across the runtime; the bigger win is that the two
sides now match the boundary they claim to enforce. Before, init
did 500+ lines of "pre-digest the inputs for Session" (walk AOTI
catalog, build dummies, install via AOTI, compute fqn_offsets,
compute pinned_bytes_total, resolve budget, build session_catalog
by field-copying ConstantMetadata into a near-identical
ConstantInfo) and then handed all of that to Session::create,
which re-validated most of it. The two layers were the same layer.
New split:
* Init: parse payload (parser does all per-FQN validation),
fail-fast on cuda_graph / shared_stream, cudaSetDevice(0),
walk_aoti_catalog (helper), coverage check (catalog <-> schedule
set equality), AOTI <-> payload data_size cross-check (the one
genuinely cross-source check), fetch _weights_blob, call
Session::create.
* Session::create(payload, handle, catalog, weights_blob,
compute_stream, context): builds pinned_bytes_total, resolves
the budget from the runtime-spec chain, builds + installs
dummies via AOTI, computes fqn_offsets, builds the host mirror,
creates pool + copy stream, allocates pinned constants,
registers dummies with ProbeRegistry.
Helpers added to the offload runtime:
* ``walk_aoti_catalog(handle, method_name) -> Result<AOTICatalog>``
(session.h, defined in session.cpp). Walks the AOTI container's
constants, hard-failing on folded constants, empty original_fqns,
duplicate original_fqns, and any missing AOTI symbol. Returns
the (fqns, internal_names, data_sizes, fqn_to_index) tuple.
* ``resolve_budget(context, floor, pinned)`` (session.cpp
anonymous namespace). Public mb spec -> internal byte spec ->
default = floor + pinned. Returns the resolved bytes plus a
BudgetSpec describing what the user set (for the UX hint).
* AOTICatalog struct (session.h).
Removed:
* ConstantInfo struct (session.h). It was a copy of
ConstantMetadata with an added data_ptr field; Session looks
dummies up internally from ``dummies_.dummy_data_ptrs`` now.
* session_catalog field-copy loop in init (~25 lines): Session
walks payload.constants_metadata directly.
* Pre-Session pinned_bytes_total + budget resolution in init:
Session does both internally.
* Dummy install + fqn_offsets computation in init: Session does
both internally.
* Session::create's pinned_bytes_total parameter (added in round
4): Session now computes it.
61 of 61 offload tests pass; lint clean.
Authored with Claude.1 parent 6c815cd commit bf4da7a
3 files changed
Lines changed: 466 additions & 586 deletions
0 commit comments