[CoreML EP] Support bool Cast in ML Program#28595
Conversation
Two changes to the ML Program Cast builder: 1. Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The ML Program `cast` op already handles bool, and AddToModelBuilderImpl already maps `to == BOOL`; only the input/output type gate omitted it. This lets int64<->bool<->float casts (transformer attention-mask graphs) stay on CoreML. 2. Move the "no preceding node" check after the ML Program early-return. It was legacy gating for the NeuralNetwork ArgMax-only path (which dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. Tests (coreml_basic_test.cc): - CastBoolRoundTrip_MLProgram: an int64->bool->float cast chain runs fully on CoreML and matches the CPU reference. The bool tensor is internal (a CoreML partition cannot have bool I/O) and the first Cast is graph-input fed. - CastNonArgMaxNeuralNetworkNotSupported: the same chain falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CastBoolRoundTrip_MLProgram exercised int64 -> Cast(bool) -> Cast(float). CoreML's compiler fuses the two back-to-back `cast` ops and drops the bool clamp (cast(cast(x,bool),fp32) collapses to cast(x,fp32)), so the round-trip produces the raw input value instead of 0/1 -- the test can't be numerically verified standalone. The bool-Cast support itself is correct: it is exercised end to end by the dependent PRs, where a non-Cast op sits between the int<->bool casts so no fusion occurs -- Cast->And->Cast (Where/And PR) and Cast->GatherND->Cast (GatherND PR), both numerically verified against the CPU EP. CastNonArgMaxNeuralNetworkNotSupported (the NeuralNetwork-format negative test) is kept; it guards the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@yuslepukhin Continuing the great work on making Mac ML on Onnxruntime amazing! Thank you :) |
There was a problem hiding this comment.
Pull request overview
This PR extends the CoreML EP’s ML Program Cast support to enable bool casts and avoid unnecessary CPU fallbacks when a Cast is fed directly by a graph input (no preceding node). This is positioned as a prerequisite step toward keeping transformer/diffusion attention-mask subgraphs fully within a single CoreML partition.
Changes:
- Allow
BOOLas a supported input/output dtype for ML ProgramCastinHasSupportedInputsImpl. - Reorder
IsOpSupportedImplso the “no preceding node” rejection applies only to the NeuralNetwork (ArgMax-only) path, not ML Program. - Add a regression test ensuring non-ArgMax
Castchains fall back on the NeuralNetwork format.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/core/providers/coreml/builders/impl/cast_op_builder.cc | Enables bool dtype gating for ML Program casts and relaxes the “must have preceding node” constraint for ML Program. |
| onnxruntime/test/providers/coreml/coreml_basic_test.cc | Adds a NeuralNetwork-format negative test covering the reordered support checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bool CastOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputParams& input_params, | ||
| const logging::Logger& logger) const { | ||
| if (input_params.create_mlprogram) { | ||
| // The ML Program 'cast' op stands alone, so a Cast fed directly by a graph | ||
| // input (no preceding node) is fine here. | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Intentional — explained in the comment on MakeCastBoolModelData in coreml_basic_test.cc. A standalone int64 → Cast(bool) → Cast(float) in MLProgram cannot be numerically verified here because CoreML fuses back-to-back cast ops and drops the bool clamp, so the test would silently pass even if the bool dtype were ignored. The positive coverage is delivered in the dependent PRs (#28597 Where/And and #28598 GatherND), where a non-Cast op sits between the int↔bool casts and the bool quantization becomes observable; those PRs assert ExpectedEPNodeAssignment::All on graphs that exercise the relaxed "no preceding node" branch.
|
LGTM, Does this has any constrains on the CoreML version? |
|
Thanks for the review! No additional version constraint beyond what the EP already requires for MLProgram. The MIL |
|
Could you resolve the conflicts? |
Resolves conflict in coreml_basic_test.cc by keeping both the new bool-Cast NeuralNetwork-negative test and the upstream Gather test additions.
| // int64 -> Cast(bool) -> Cast(float); the first Cast is fed directly by a | ||
| // graph input (no preceding node). Used by the NeuralNetwork negative test | ||
| // below. Positive bool-Cast coverage lives in the dependent Where/And and | ||
| // GatherND PRs, where a non-Cast op sits between the int<->bool casts -- a |
There was a problem hiding this comment.
No MLProgram positive test in this PR: While the explanation is sound (CoreML fusion prevents standalone verification), a test asserting ExpectedEPNodeAssignment::All for the ML Program path (even without numerical verification) would confirm the partitioner claims the nodes. The MakeCastBoolModelData() helper is already constructed — only the TestModelLoad call with MakeCoreMLExecutionProvider("MLProgram") and ExpectedEPNodeAssignment::All is missing.
Summary
Two changes to the ML Program
Castbuilder:BOOLas a source and target dtype inHasSupportedInputsImpl. TheML Program
castop already handles bool, andAddToModelBuilderImplalreadymaps
to == BOOL; only the input/output type gate omitted it.check is legacy gating for the NeuralNetwork ArgMax-only path (which
dereferences
InputEdgesBegin()); on the ML Program path aCastfed directlyby a graph input is fine, and rejecting it forced needless CPU fallback.
Why
This is the first of a 4-PR series giving the CoreML EP the op coverage to run
transformer and diffusion graphs as a single CoreML partition instead of
fragmenting across CPU.
Transformer attention-mask graphs are a
Cast → GatherND → And → Wherechain overbool tensors. A CoreML partition cannot have a bool input/output (CoreML
MLMultiArrayhas no bool type), so bool must stay internal — which makesCast(the int↔bool boundary) the prerequisite for the rest of the series.
Combined impact of the series
With all four PRs plus #28278 (scalar-
Gather), every model below goes from 2CoreML partitions to 1, with zero graph breaks — the whole graph runs on
CoreML. Measured on an Apple M3 Max, ML Program format:
The op builders eliminate the graph breaks (deterministic); the speedups are what
CoreML already delivers once a model is no longer fragmented.
Tests (
coreml_basic_test.cc)CastNonArgMaxNeuralNetworkNotSupported— anint64 → bool → floatcast chainfalls back to CPU on the NeuralNetwork format, guarding the
IsOpSupportedImplreordering.
Positive
bool-Cast coverage is in the dependent PRs:Cast → GatherND → Cast(#28598's
GatherNDBoolData_MLProgram) andCast → And → Cast(#28597'sAnd_MLProgram). Both place a non-Castop between the int↔bool casts and checkthe result against the CPU EP. A standalone
int64 → Cast(bool) → Cast(float)round-trip can't be verified here — CoreML's compiler fuses back-to-back
castops and drops the bool clamp — so the pattern needs that intervening op, which
only the dependent PRs provide.
Series — CoreML EP coverage for transformer / diffusion graphs
Together with #28278 (scalar-
Gather), the series takes BERT / GPT-2 / ViT /diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.