Skip to content

[Research/Enhancement] Compile-pipeline DX: converter-failure diagnostics + streaming tape API #740

@michalharakal

Description

@michalharakal

[Research / Enhancement] Compile-pipeline DX — converter-failure diagnostics + streaming tape API

Component: skainet-compile-hlo / skainet-lang tape · Version: SKaiNET 0.31.0 · Type: enhancement / research (NOT a correctness bug)

While lowering a Whisper model (skainet-whisper-kmp) through tape → ComputeGraph → StableHLO → IREE,
two developer-experience rough edges surfaced. The actual correctness fixes all belonged in our code
(see "Not bugs"); these two are DX/research asks.

1. Converter-failure diagnostics: one failing node cascades into many misleading "Unsupported arity" errors

When a single op has no registered converter, the failure is reported not once but ~20× — because the
failed node produces no SSA value, every downstream consumer then reports Unsupported <op> arity.

Concrete repro: StableHloConverterFactory.createBasic() does not register
NeuralNetOperationsConverter, so a conv1d front-end emits:

// Unsupported op 'conv1d' (type=trace) for node n1_conv1d. Known names: [..., no conv1d ...]
// Unsupported squeeze arity for node n2_squeeze
// Unsupported add arity for node n9_add
// Unsupported batch matmul arity for node n21_matmul
// Unsupported SDPA arity for node n34_scaledDotProductAttention
... (~20 lines)

Root cause = 1 missing converter registration (use createExtended()); the other ~19 are cascade
victims. This masked the real cause and cost real debugging time.

Ask (any of):

  • Distinguish root failures (no converter for op X) from cascade failures (operand missing because a predecessor failed) in the emitted comments / a summary.
  • Emit a one-line summary: N nodes failed; root causes: [conv1d]; M downstream skipped (missing operands).
  • The Known names: [...] list is already helpful — pairing it with op 'conv1d' is registered by createExtended()/createFast(), not createBasic() would shortcut diagnosis to seconds.

2. Streaming / incremental tape API (open question)

DefaultGraphExecutionContext.tape(...).record { } materialises the whole tape, then toComputeGraph()
builds the full graph in memory. Is there interest in a streaming/incremental tape API — emit nodes as
recorded, or chunk by subgraph — so very large forwards can be lowered without buffering the entire tape?
Not a blocker for whisper (tiny.en records fine); relevant for multi-GB models.

Not bugs (fixed in our code)

  • conv1d "Unsupported" → use createExtended() (registers NeuralNetOperationsConverter); converter
    was always present.
  • A build-time ctx.ops.narrow(posEmb,…) under VoidTensorOps baked zeros (its narrow returns
    zeros) → moved the slice inside forward so it's a traced op. Arguably VoidTensorOps silently
    returning zeros for value-producing ops is a footgun, but it's correct for a shape-only trace backend;
    noting only for awareness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions