Skip to content

[CoreML EP] Support bool Cast in ML Program#28595

Open
maxwbuckley wants to merge 4 commits into
microsoft:mainfrom
maxwbuckley:coreml-cast-bool
Open

[CoreML EP] Support bool Cast in ML Program#28595
maxwbuckley wants to merge 4 commits into
microsoft:mainfrom
maxwbuckley:coreml-cast-bool

Conversation

@maxwbuckley
Copy link
Copy Markdown
Contributor

@maxwbuckley maxwbuckley commented May 20, 2026

Summary

Two changes to the ML Program Cast builder:

  1. Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The
    ML Program cast op already handles bool, and AddToModelBuilderImpl already
    maps to == BOOL; only the input/output type gate omitted it.
  2. Move the "no preceding node" check after the ML Program early-return. That
    check is legacy gating for the NeuralNetwork ArgMax-only path (which
    dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly
    by a graph input is fine, and rejecting it forced needless CPU fallback.

Why

This is the first of a 4-PR series giving the CoreML EP the op coverage to run
transformer and diffusion graphs as a single CoreML partition instead of
fragmenting across CPU.

Transformer attention-mask graphs are a Cast → GatherND → And → Where chain over
bool tensors. A CoreML partition cannot have a bool input/output (CoreML
MLMultiArray has no bool type), so bool must stay internal — which makes Cast
(the int↔bool boundary) the prerequisite for the rest of the series.

Combined impact of the series

With all four PRs plus #28278 (scalar-Gather), every model below goes from 2
CoreML partitions to 1, with zero graph breaks — the whole graph runs on
CoreML. Measured on an Apple M3 Max, ML Program format:

Model partitions (before → after) CoreML vs CPU
BERT-large (340M) 2 → 1 7.3× (fp32) / 11.0× (fp16)
ViT-large (304M) 2 → 1 8.5× (fp32) / 10.3× (fp16)
GPT-2-large (774M) 2 → 1 11.4× (fp16)
SD-1.5 UNet (860M) 2 → 1 9.7× (fp16)

The op builders eliminate the graph breaks (deterministic); the speedups are what
CoreML already delivers once a model is no longer fragmented.

Tests (coreml_basic_test.cc)

  • CastNonArgMaxNeuralNetworkNotSupported — an int64 → bool → float cast chain
    falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl
    reordering.

Positive bool-Cast coverage is in the dependent PRs: Cast → GatherND → Cast
(#28598's GatherNDBoolData_MLProgram) and Cast → And → Cast (#28597's
And_MLProgram). Both place a non-Cast op between the int↔bool casts and check
the result against the CPU EP. A standalone int64 → Cast(bool) → Cast(float)
round-trip can't be verified here — CoreML's compiler fuses back-to-back cast
ops and drops the bool clamp — so the pattern needs that intervening op, which
only the dependent PRs provide.

Series — CoreML EP coverage for transformer / diffusion graphs

Together with #28278 (scalar-Gather), the series takes BERT / GPT-2 / ViT /
diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.

Two changes to the ML Program Cast builder:

1. Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The
   ML Program `cast` op already handles bool, and AddToModelBuilderImpl
   already maps `to == BOOL`; only the input/output type gate omitted it.
   This lets int64<->bool<->float casts (transformer attention-mask graphs)
   stay on CoreML.

2. Move the "no preceding node" check after the ML Program early-return. It
   was legacy gating for the NeuralNetwork ArgMax-only path (which
   dereferences InputEdgesBegin()); on the ML Program path a Cast fed
   directly by a graph input is fine, and rejecting it forced needless CPU
   fallback.

Tests (coreml_basic_test.cc):
- CastBoolRoundTrip_MLProgram: an int64->bool->float cast chain runs fully
  on CoreML and matches the CPU reference. The bool tensor is internal (a
  CoreML partition cannot have bool I/O) and the first Cast is graph-input
  fed.
- CastNonArgMaxNeuralNetworkNotSupported: the same chain falls back to CPU
  on the NeuralNetwork format, guarding the IsOpSupportedImpl reordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
maxwbuckley and others added 2 commits May 21, 2026 09:34
CastBoolRoundTrip_MLProgram exercised int64 -> Cast(bool) -> Cast(float).
CoreML's compiler fuses the two back-to-back `cast` ops and drops the bool
clamp (cast(cast(x,bool),fp32) collapses to cast(x,fp32)), so the round-trip
produces the raw input value instead of 0/1 -- the test can't be numerically
verified standalone.

The bool-Cast support itself is correct: it is exercised end to end by the
dependent PRs, where a non-Cast op sits between the int<->bool casts so no
fusion occurs -- Cast->And->Cast (Where/And PR) and Cast->GatherND->Cast
(GatherND PR), both numerically verified against the CPU EP.

CastNonArgMaxNeuralNetworkNotSupported (the NeuralNetwork-format negative
test) is kept; it guards the IsOpSupportedImpl reordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maxwbuckley maxwbuckley marked this pull request as ready for review May 21, 2026 13:24
@maxwbuckley
Copy link
Copy Markdown
Contributor Author

maxwbuckley commented May 22, 2026

@yuslepukhin Continuing the great work on making Mac ML on Onnxruntime amazing! Thank you :)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the CoreML EP’s ML Program Cast support to enable bool casts and avoid unnecessary CPU fallbacks when a Cast is fed directly by a graph input (no preceding node). This is positioned as a prerequisite step toward keeping transformer/diffusion attention-mask subgraphs fully within a single CoreML partition.

Changes:

  • Allow BOOL as a supported input/output dtype for ML Program Cast in HasSupportedInputsImpl.
  • Reorder IsOpSupportedImpl so the “no preceding node” rejection applies only to the NeuralNetwork (ArgMax-only) path, not ML Program.
  • Add a regression test ensuring non-ArgMax Cast chains fall back on the NeuralNetwork format.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/core/providers/coreml/builders/impl/cast_op_builder.cc Enables bool dtype gating for ML Program casts and relaxes the “must have preceding node” constraint for ML Program.
onnxruntime/test/providers/coreml/coreml_basic_test.cc Adds a NeuralNetwork-format negative test covering the reordered support checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 84 to +90
bool CastOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputParams& input_params,
const logging::Logger& logger) const {
if (input_params.create_mlprogram) {
// The ML Program 'cast' op stands alone, so a Cast fed directly by a graph
// input (no preceding node) is fine here.
return true;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — explained in the comment on MakeCastBoolModelData in coreml_basic_test.cc. A standalone int64 → Cast(bool) → Cast(float) in MLProgram cannot be numerically verified here because CoreML fuses back-to-back cast ops and drops the bool clamp, so the test would silently pass even if the bool dtype were ignored. The positive coverage is delivered in the dependent PRs (#28597 Where/And and #28598 GatherND), where a non-Cast op sits between the int↔bool casts and the bool quantization becomes observable; those PRs assert ExpectedEPNodeAssignment::All on graphs that exercise the relaxed "no preceding node" branch.

@wejoncy
Copy link
Copy Markdown
Contributor

wejoncy commented May 26, 2026

LGTM, Does this has any constrains on the CoreML version?

@maxwbuckley
Copy link
Copy Markdown
Contributor Author

Thanks for the review! No additional version constraint beyond what the EP already requires for MLProgram. The MIL cast op (iOS 15 / Core ML 5) already accepts bool as both an input type and a dtype string — see coremltools/.../iOS15/elementwise_unary.py where T = (fp16, fp32, int32, bool) and the dtype docstring lists bool. Since this PR only touches the MLProgram path (gated to Core ML 5+ by model_builder.h), the bool-cast support inherits that same minimum: iOS 15 / macOS 12 / Core ML 5.

@wejoncy
Copy link
Copy Markdown
Contributor

wejoncy commented May 26, 2026

Could you resolve the conflicts?

Resolves conflict in coreml_basic_test.cc by keeping both the new
bool-Cast NeuralNetwork-negative test and the upstream Gather test
additions.
// int64 -> Cast(bool) -> Cast(float); the first Cast is fed directly by a
// graph input (no preceding node). Used by the NeuralNetwork negative test
// below. Positive bool-Cast coverage lives in the dependent Where/And and
// GatherND PRs, where a non-Cast op sits between the int<->bool casts -- a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No MLProgram positive test in this PR: While the explanation is sound (CoreML fusion prevents standalone verification), a test asserting ExpectedEPNodeAssignment::All for the ML Program path (even without numerical verification) would confirm the partitioner claims the nodes. The MakeCastBoolModelData() helper is already constructed — only the TestModelLoad call with MakeCoreMLExecutionProvider("MLProgram") and ExpectedEPNodeAssignment::All is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants