[Research / Enhancement] Compile-pipeline DX — converter-failure diagnostics + streaming tape API
Component: skainet-compile-hlo / skainet-lang tape · Version: SKaiNET 0.31.0 · Type: enhancement / research (NOT a correctness bug)
While lowering a Whisper model (skainet-whisper-kmp) through tape → ComputeGraph → StableHLO → IREE,
two developer-experience rough edges surfaced. The actual correctness fixes all belonged in our code
(see "Not bugs"); these two are DX/research asks.
1. Converter-failure diagnostics: one failing node cascades into many misleading "Unsupported arity" errors
When a single op has no registered converter, the failure is reported not once but ~20× — because the
failed node produces no SSA value, every downstream consumer then reports Unsupported <op> arity.
Concrete repro: StableHloConverterFactory.createBasic() does not register
NeuralNetOperationsConverter, so a conv1d front-end emits:
// Unsupported op 'conv1d' (type=trace) for node n1_conv1d. Known names: [..., no conv1d ...]
// Unsupported squeeze arity for node n2_squeeze
// Unsupported add arity for node n9_add
// Unsupported batch matmul arity for node n21_matmul
// Unsupported SDPA arity for node n34_scaledDotProductAttention
... (~20 lines)
Root cause = 1 missing converter registration (use createExtended()); the other ~19 are cascade
victims. This masked the real cause and cost real debugging time.
Ask (any of):
- Distinguish root failures (
no converter for op X) from cascade failures (operand missing because a predecessor failed) in the emitted comments / a summary.
- Emit a one-line summary:
N nodes failed; root causes: [conv1d]; M downstream skipped (missing operands).
- The
Known names: [...] list is already helpful — pairing it with op 'conv1d' is registered by createExtended()/createFast(), not createBasic() would shortcut diagnosis to seconds.
2. Streaming / incremental tape API (open question)
DefaultGraphExecutionContext.tape(...).record { } materialises the whole tape, then toComputeGraph()
builds the full graph in memory. Is there interest in a streaming/incremental tape API — emit nodes as
recorded, or chunk by subgraph — so very large forwards can be lowered without buffering the entire tape?
Not a blocker for whisper (tiny.en records fine); relevant for multi-GB models.
Not bugs (fixed in our code)
conv1d "Unsupported" → use createExtended() (registers NeuralNetOperationsConverter); converter
was always present.
- A build-time
ctx.ops.narrow(posEmb,…) under VoidTensorOps baked zeros (its narrow returns
zeros) → moved the slice inside forward so it's a traced op. Arguably VoidTensorOps silently
returning zeros for value-producing ops is a footgun, but it's correct for a shape-only trace backend;
noting only for awareness.
[Research / Enhancement] Compile-pipeline DX — converter-failure diagnostics + streaming tape API
Component:
skainet-compile-hlo/skainet-langtape · Version: SKaiNET 0.31.0 · Type: enhancement / research (NOT a correctness bug)While lowering a Whisper model (
skainet-whisper-kmp) through tape →ComputeGraph→ StableHLO → IREE,two developer-experience rough edges surfaced. The actual correctness fixes all belonged in our code
(see "Not bugs"); these two are DX/research asks.
1. Converter-failure diagnostics: one failing node cascades into many misleading "Unsupported arity" errors
When a single op has no registered converter, the failure is reported not once but ~20× — because the
failed node produces no SSA value, every downstream consumer then reports
Unsupported <op> arity.Concrete repro:
StableHloConverterFactory.createBasic()does not registerNeuralNetOperationsConverter, so aconv1dfront-end emits:Root cause = 1 missing converter registration (use
createExtended()); the other ~19 are cascadevictims. This masked the real cause and cost real debugging time.
Ask (any of):
no converter for op X) from cascade failures (operand missing because a predecessor failed) in the emitted comments / a summary.N nodes failed; root causes: [conv1d]; M downstream skipped (missing operands).Known names: [...]list is already helpful — pairing it withop 'conv1d' is registered by createExtended()/createFast(), not createBasic()would shortcut diagnosis to seconds.2. Streaming / incremental tape API (open question)
DefaultGraphExecutionContext.tape(...).record { }materialises the whole tape, thentoComputeGraph()builds the full graph in memory. Is there interest in a streaming/incremental tape API — emit nodes as
recorded, or chunk by subgraph — so very large forwards can be lowered without buffering the entire tape?
Not a blocker for whisper (tiny.en records fine); relevant for multi-GB models.
Not bugs (fixed in our code)
conv1d"Unsupported" → usecreateExtended()(registersNeuralNetOperationsConverter); converterwas always present.
ctx.ops.narrow(posEmb,…)underVoidTensorOpsbaked zeros (itsnarrowreturnszeros) → moved the slice inside
forwardso it's a traced op. ArguablyVoidTensorOpssilentlyreturning zeros for value-producing ops is a footgun, but it's correct for a shape-only trace backend;
noting only for awareness.