Status: Implemented (PR C2.a of issue #75 Phase C / Milestone C2)
Parent epic: issue #75
Builds on: extractor_compilation_runtime_target.md (the runtime-target RFC), extractor_compilation_scaffolding.md (compile harness + Manifest)
Working plan: issue #96, Milestone C2 / PR C2.a
The trust boundary between on-disk compiled bundles and the runtime that's about to import + execute them. Per the runtime-target RFC, compiled extractors run client-side as plain Python callables plugged into the existing run_structured_extractors() hook. This module is what verifies a bundle on disk matches the runtime's active inputs (fingerprint + event_types) and that the imported callable has a usable shape, before any callable is registered.
Two distinct concerns:
load_bundle— single-bundle loader. Verifies one bundle directory and returns eitherLoadedBundle(everything passed) orLoadFailure(one of the stable failure codes). Never raises.discover_bundles— directory walker. Loads every child bundle, applies an optionalevent_type_allowlist, detects duplicate-coverage collisions and fails closed on them, and returns aDiscoveryResultwith the populatedevent_type → callableregistry plus an audit trail.
C2.a is loader + discovery only. Validation/fallback semantics, BQ-table mirror, and the ontology-graph call-site swap are explicitly out of scope.
from bigquery_agent_analytics.extractor_compilation import (
load_bundle,
discover_bundles,
LoadedBundle,
LoadFailure,
DiscoveryResult,
)
# Single bundle
result = load_bundle(
bundle_dir,
expected_fingerprint=...,
expected_event_types=("bka_decision",), # subset check; None to skip
)
if isinstance(result, LoadedBundle):
extractor = result.extractor
else:
log.warning("bundle %s rejected: %s — %s", result.bundle_dir, result.code, result.detail)
# Many bundles
discovery = discover_bundles(
bundles_root,
expected_fingerprint=...,
event_type_allowlist=("bka_decision", "tool_call"), # None to register everything
)
extractors_dict = discovery.registry # event_type -> callable
loaded_audit = discovery.loaded # tuple[LoadedBundle, ...]
failures = discovery.failures # tuple[LoadFailure, ...]Callers can switch on failure.code:
| Code | When |
|---|---|
manifest_missing |
bundle_dir/manifest.json doesn't exist (or, for discover_bundles on a non-existent / unreadable parent, the parent itself is unavailable — a PermissionError from iterdir() lands here too rather than propagating) |
manifest_unreadable |
JSON parse error, schema mismatch (unknown / missing fields), or any field whose type / shape doesn't satisfy the manifest contract. The strict parse rejects: event_types: "xy" (would silently become ("x", "y") under lenient parsing); event_types empty / containing duplicates / non-string items; module_filename that's not <identifier>.py (rejects ../escape.py, /etc/passwd.py, foo.bar.py, class.py, non-string values); function_name that isn't a Python identifier or is a keyword. The loader's strict parser is what makes the trust boundary load-bearing — a permissive parse would let nonsense register at runtime. |
fingerprint_mismatch |
manifest fingerprint != caller's expected_fingerprint |
event_types_mismatch |
expected_event_types (when set) isn't a subset of the manifest's event_types |
module_not_found |
the module file referenced by the manifest is absent on disk (only reachable for shape-valid manifests; the strict parse already rejects path-traversal-shaped names) |
import_failed |
importing the module raised — covers Exception and BaseException (e.g., SystemExit), so a malicious or buggy bundle can't tear down the loading process |
function_not_found |
manifest's function_name isn't defined as a callable in the imported module |
function_signature_mismatch |
the imported callable can't be called as f(event, spec) (best-effort introspection via inspect.signature + sig.bind(None, None)) |
event_type_collision |
discovery only: two valid bundles declare coverage of the same event_type. Fail-closed: that event_type is dropped from the registry; both colliding bundles get a failure record. Other event_types from those bundles still register. |
Each gate short-circuits — the first failure wins:
manifest.jsonexists.manifest.jsonparses with strict shape validation (no unknown / missing fields, every field is the declared type,event_typesis a non-empty list of distinct non-empty strings,module_filenameis<identifier>.pywith no path components,function_nameis a Python identifier).- Manifest fingerprint equals
expected_fingerprint. expected_event_types(when set) is a subset ofmanifest.event_types.- Resolved module path is directly inside
bundle_dir(defense in depth — the shape check at step 2 already catches path traversal; this catches anything the shape check misses, including symlink shenanigans). - The module file exists on disk.
- Importing the module succeeds (no exception).
import_failedcatches bothExceptionandBaseExceptionso a bundle that callssys.exitat import time can't tear down the loading process. - The manifest's
function_nameis defined as a callable in the imported module. - The imported callable accepts
(event, spec).
After a successful load, the imported module is popped from sys.modules — the captured callable retains a reference to the module's globals, so the runtime keeps working without leaking a <stem>__loaded_<uuid> entry per call. Repeated load_bundle calls don't grow sys.modules.
The fingerprint check runs before module import, so an attacker can't side-effect via a broken module if their fingerprint doesn't match. A regression test (test_fingerprint_check_runs_before_module_load) pins this ordering. Path-traversal attempts are rejected at step 2 (manifest shape) or step 5 (resolved-path containment) — before any import attempt — so module_filename: "../escape.py" cannot import a sibling file outside the bundle.
A manifest declaring event_types=("a", "b") registers the same callable under both keys in the discovery registry. The bundle is loaded once; the registry has one entry per declared event_type.
event_type_allowlist:
None→ register every declared event_type.- A tuple → register only event_types that appear in the tuple (bundle still loads even if some event_types are filtered out).
- An empty tuple → register nothing (degenerate but valid).
A bundle whose entire declared coverage falls outside the allowlist still loads (it's in discovery.loaded); none of its event_types reach discovery.registry. The bundle isn't "broken" — it's just unwanted by this runtime.
Two bundles claiming the same event_type fail closed:
- The event_type is dropped from the registry.
- Each colliding bundle gets a separate
LoadFailurewith codeevent_type_collision. - Other event_types from those same bundles still register if they're unique.
The alternative — silently picking one bundle — would make runtime behavior depend on filesystem ordering, which is a debugging nightmare and a security smell.
TestLoadBundleHappyPath(3) — valid bundle loads; subset check accepts broader manifest; check is skipped whenexpected_event_types=None.TestLoadBundleFailureCodes(13) — every stable code, including: manifest missing, invalid JSON, missing required field, fingerprint mismatch, event-types mismatch, module not found, import-timeSyntaxError, import-timeRuntimeError, import-timeSystemExit, function not found, function not callable (same code), signature with too few args, signature kwargs-only rejected, signature with*argsaccepted.TestLoadBundleGateOrdering(1) — fingerprint check runs before module import; an attacker bundle with a wrong fingerprint and a broken module fails withfingerprint_mismatch, notimport_failed.TestDiscoverBundles(6) — empty parent, non-existent parent, single bundle, multi-event bundle (both keys point at the same callable), allowlist filters registry without unloading the bundle, empty allowlist registers nothing.TestDiscoverBundlesCollisions(2) — two bundles same event_type fail closed; partial collision preserves unique event_types from each bundle.TestDiscoverBundlesNonBundleEntries(2) — loose files (README, INDEX) at the parent are silently skipped; non-bundle subdirectories fail withmanifest_missing(every walked directory is accounted for, no silent skips of children).TestBundleLoaderEndToEnd(2) — runs the realcompile_extractorto produce a bundle, then loads it viaload_bundleANDdiscover_bundles, invokes the loaded callable, and asserts behavioral parity with the handwritten reference. Proves the loader's contract holds for bundles produced by the rest of Phase C, not just hand-built fixtures.
Strict-trust-boundary regression groups (added in review):
- Strict manifest validation (20) — an 18-case parametrized test (
test_malformed_manifest_rejected_with_manifest_unreadable) covers every shape the lenientManifest.from_jsonwould accept silently:event_types: "xy"(silent char-tuple coercion), empty / duplicate / non-string / empty-stringevent_typesitems, non-stringmodule_filename/function_name,module_filenamewithout.py, double-dot stems, Python-keyword stems, empty strings, dashed function names, integer / emptyfingerprint, unknown extra fields, missing required fields. Plus root-array rejection (test_malformed_manifest_root_array_rejected) and invalid-UTF-8 rejection (test_invalid_utf8_manifest_rejected). - Path-traversal defense (2) —
module_filename: "../escape.py"doesn't import a sibling file outside the bundle (test plants anescape.pythat raises if imported and confirms it's never touched);module_filename: "/etc/passwd.py"rejected. sys.modulescleanup (1) — five repeatedload_bundlecalls leave the__loaded_<uuid>count insys.modulesunchanged; the captured callable still works after cleanup.discover_bundlesiterdirfailure (1) —pathlib.Path.iterdirmonkeypatched to raisePermissionError; discovery returns a structuredmanifest_missingfailure naming the underlying error rather than propagating.
- Per-event / per-field / per-node / per-edge fallback. When a compiled extractor is rejected (or a #76 validator failure on its output is recoverable), what does the runtime do? That's C2.b.
- BigQuery-table bundle mirror. Cross-process bundle distribution and the in-repo / BQ-mirror choice. Loader stays filesystem-only for C2.a.
- Ontology-graph call-site swap. Where in the orchestrator does the discovered registry actually replace the existing extractors? The integration moves once C2.a (loader) and C2.b (fallback) are both in.
- Revalidation harness. Scheduled / on-demand agreement check between compiled and reference outputs.
extractor_compilation_runtime_target.md— the RFC that decided client-side Python is the Phase 1 runtime target. C2.a is the trust boundary that decision needs.extractor_compilation_scaffolding.md—Manifest+compile_extractorshape the loader consumes.extractor_compilation_bka_measurement.md— PR 4c's measurement utility loads bundles a different way (per-call, for parity comparison); the public loader here is what an orchestrator will use.