[jvm] dex compatibility issues by kLabz · Pull Request #12898 · HaxeFoundation/haxe

kLabz · 2026-05-12T12:54:53Z

By running d8 against our jvm tests, two more issues have been uncovered:

1. anon object fields can be generated with names not accepted by dex format

Anonymous structures whose field names aren't valid Java/DEX SimpleNames (i.e. contain anything outside [A-Za-z_$][A-Za-z0-9_$]*) are now stored in the map-backed _hx_fields slot rather than as typed JVM fields named after the raw Haxe identifier.

What used to happen

For a structure like { "weird name\n" : 1, x : 2 }, the JVM backend emitted a typed field literally named weird name\n on the generated anon class. JVM bytecode is fine with that — the constant pool stores arbitrary UTF-8 — but:

DEX rejects field names with whitespace, control chars, etc. (pre-DEX-040 SimpleName grammar).
The Signature attribute couldn't always round-trip these names either.

So any Haxe-on-JVM project run through d8/r8 (i.e. anything targeting Android) failed dexing as soon as user code constructed an anon with a non-identifier key — including some std-lib paths.

What happens now

is_dex_safe_simple_name gates the typed-field path at four codegen sites:

Anon class constructor — non-safe-named fields skip the typed-field assignment and fall through to the _hx_fields map (two-pass: typed first, then map).
_hx_getKnownFields — only safe-named fields are reported as "known"; the rest are reachable via the map.
generate_dynamic_access (the _hx_getField/_hx_setField switch) — switch cases are generated only for safe-named fields.
read_anon_field / write twin — direct typed-field load/store is only emitted for safe-named fields; otherwise the call falls back to map access.

Net effect for end users:

Safe-named anon fields (the overwhelming majority): unchanged. Same bytecode, same typed-field access, same perf.
Non-safe-named anon fields: now go through the _hx_fields HashMap instead of a typed field. A small access-cost increase (map lookup vs getfield), but they're already a slow, reflective-style path in practice, and they now actually dex.

2. an overload closure issue was detected, related to StringBuf.add overloads

Why the bug existed

At -D jvm.dynamic-level=2, generate_dynamic_access builds the _hx_getField dispatch switch by walking c.cl_ordered_fields and emitting one case per Haxe field. When several Haxe fields share a JVM name — typical with @:overload @:native("foo") (e.g. StringBuf.add and its specialised addOpt siblings) — they all reach the case-builder with the same name but different signatures.

Two consequences:

The switch ended up with multiple cases all testing the same string (the second one onward is dead code, since the first match always wins).
Each case called create_field_closure, which spawns a closure class named purely from (target_path, name) — no signature component. Different signatures → different class bytecode → same class path. Both classes were written to the jar under the identical entry name, and the later write silently overwrote the earlier one in the central directory.

The JVM tolerates duplicate jar entries (one variant wins at load time, the other's bytecode is dead), which is why this never crashed in practice — the lost variant is typically a primitive-specialised invoke(boolean) that most call sites box around anyway. DEX is stricter: d8 rejects the jar outright, which is what surfaced the bug.

Why the fix looks like that

The "right" representation of an overloaded method-closure is a single class with multiple invoke overloads — exactly mirroring how the target class itself holds multiple add(...) methods. The underlying machinery (typed_function.generate_invoke, JvmClass.builder mutability) already supports adding multiple methods to one closure class; nothing was missing structurally. The fix is just to use it that way:

Group generate_dynamic_access's input by name before building switch cases. One case per name kills the dead-code duplicates.
For a group with multiple method overloads, call a new create_field_closure_overloads helper that spawns one inner class, generates one constructor + one equals, and emits one invoke per overload signature. Each overload's jsig gets registered in the closure cache pointing at this merged class, so later user-code references (which still go through the existing single-signature create_field_closure) cache-hit and reuse it instead of spawning a separate copy.

This is the conservative scope: it fixes the in-class collision and removes the dead switch cases. A follow-up could also defer closure writes so that closures spawned from other host classes' codegen can share the merged class — eliminating the remaining cross-host duplication (e.g. Main$haxe_root_StringBuf_add alongside StringBuf$haxe_root_StringBuf_add). That's a perf/cleanliness win, not a correctness fix, so it's left for later.

Follow-up implemented in 52a98ff

generate_dynamic_access used to emit one switch case + one closure class per Haxe field, but @:overload @:native(X) makes several fields share a JVM name. The dispatch switch ended up with duplicate cases for the same string (the second one dead code), and each case spawned its own closure class with an identical path -- so the second class silently overwrote the first in the jar. d8/r8 reject the duplicate jar entries. Group fields by name before building cases, and add a multi-overload closure builder that emits one inner class with one invoke method per overload signature. All overload jsigs get registered in the closure cache so later user-code references resolve to the merged class. Repro at tests/misc/jvm/projects/Issue12897 (Main2/Check2).

Simn · 2026-05-12T15:58:11Z

If there's no obvious disadvantage I don't think we need -D jvm.dex-compatible, given that we don't specify anything related to this.

kLabz · 2026-05-12T20:50:28Z

If there's no obvious disadvantage I don't think we need -D jvm.dex-compatible, given that we don't specify anything related to this.

Nothing obvious, but there were some potential issues with weird reflection use cases. I initially wanted to add the define because I expected more dex specific issues, but in the end it doesn't make much sense indeed

kLabz · 2026-05-13T08:25:47Z

If there's no obvious disadvantage I don't think we need -D jvm.dex-compatible, given that we don't specify anything related to this.

Behavior change

Anonymous structures whose field names aren't valid Java/DEX SimpleNames (i.e. contain anything outside [A-Za-z_$][A-Za-z0-9_$]*) are now stored in the map-backed _hx_fields slot rather than as typed JVM fields named after the raw Haxe identifier.

What used to happen

For a structure like { "weird name\n" : 1, x : 2 }, the JVM backend emitted a typed field literally named weird name\n on the generated anon class. JVM bytecode is fine with that — the constant pool stores arbitrary UTF-8 — but:

DEX rejects field names with whitespace, control chars, etc. (pre-DEX-040 SimpleName grammar).
The Signature attribute couldn't always round-trip these names either.

So any Haxe-on-JVM project run through d8/r8 (i.e. anything targeting Android) failed dexing as soon as user code constructed an anon with a non-identifier key — including some std-lib paths.

What happens now

is_dex_safe_simple_name gates the typed-field path at four codegen sites:

Anon class constructor — non-safe-named fields skip the typed-field assignment and fall through to the _hx_fields map (two-pass: typed first, then map).
_hx_getKnownFields — only safe-named fields are reported as "known"; the rest are reachable via the map.
generate_dynamic_access (the _hx_getField/_hx_setField switch) — switch cases are generated only for safe-named fields.
read_anon_field / write twin — direct typed-field load/store is only emitted for safe-named fields; otherwise the call falls back to map access.

Net effect for end users:

Safe-named anon fields (the overwhelming majority): unchanged. Same bytecode, same typed-field access, same perf.
Non-safe-named anon fields: now go through the _hx_fields HashMap instead of a typed field. A small access-cost increase (map lookup vs getfield), but they're already a slow, reflective-style path in practice, and they now actually dex.

If that's acceptable to you, I'll drop the define.

Simn · 2026-05-13T09:39:58Z

Yeah that's fine, I don't care much about wonky field names and the DEX valid identifier regex looks very standard.

Three scenarios — overload merge via Reflect, direct member closure, single-method dynamic access via Reflect — each spans two callers and goes through clean / per-module invalidate / full invalidate cycles. Asserts the (target, method) closure is materialized exactly once as a top-level class under jvm/$Closure/ regardless of which caller's codegen triggered the registration.

Closure classes for member-method references used to live as inner classes of whichever caller's codegen happened to emit them first, producing a separate closure class per caller per (target, method). Move them to top-level shared classes under jvm/$Closure/ named Closure_<target_pkg_mangled>_<method>; codegen sites only register their needed invoke signatures (and any functional-interface bindings) and a finalize step materializes each closure class once with all accumulated invokes. The synthetic package segment `$Closure` is unreachable from Haxe source — Haxe identifiers reject `$` — so user code cannot define colliding classes there. Compilation-server safe: JVM codegen re-walks all reachable types on every build, so the closure cache repopulates from scratch each cycle and the finalize pass writes every needed class regardless of which modules were cache-hit at the typing layer. typed_function gains a typed_function_host variant (inner vs. standalone) selecting between caller-nested and top-level closure class spawning; local-function and static-method-reference closures keep their existing caller-nested layout.

JVM Sys.println uses the platform line separator, so the jar emits \r\n on Windows. Strip CR before comparing.

The dex-compatible anon-field handling (route fields whose names aren't valid DEX SimpleNames through DynamicObject._hx_fields instead of as typed JVM fields) is now unconditional. The user-visible difference for the cases this affects — anon structures with unsafe-named keys — is a small access-cost increase (map lookup vs. getfield); safe-named fields are unchanged. Producing jars that are dex-clean by default is the expected behavior, so the opt-in gate adds complexity without value. CI: drop the second per-target build pass. The single existing jar from each target is verified against d8 directly.

Simn · 2026-05-15T07:25:44Z

I've noticed that it creates projects/Issue12897 which is the wrong issue number.

The closure change makes sense to me conceptually but I'm starting to wonder if the complications that arise from the dynamic level >= 2 could suggest that we should drop this feature entirely. It already seems questionable to create static closures for every single method on the off-chance that it might be used as a closure, and now that it has been shown to cause problems with overloads I'm inclined to think that we should just remove this feature.

Could you check with Claude what he thinks about the costs and benefits here?

kLabz · 2026-05-15T11:32:37Z

Could you check with Claude what he thinks about the costs and benefits here?

On dropping the per-method closure feature

I'd push back on a full removal, but the maintainer's instinct is right that the eager policy is the questionable part. Worth separating two things:

What's actually expensive / complicated

Generating a closure class for every method on the off-chance — that's the dex method-count tax and the overload-merge complexity that just bit us.
The dynamic_level >= 2 path is the one that makes overload resolution at the closure-dispatch site nontrivial.

What's actually load-bearing

obj.method as a typed value (without reflection) and Reflect.field(o, "method") returning something invokable. Removing closure classes means falling back to readFieldClosure reflection for every method reference — measurable slowdown, and it changes observable semantics around closure identity/equals.
The shared closure class is also what cf_expr references survive into; macro/@:keep paths assume it.

Middle path (mirrors what jvm/SAM just did for interfaces): only materialize a closure class for (type, method) pairs the program actually forms a closure of. The infrastructure to do this is already half-built — register_field_closure is only called from the FClosure emission sites; the "every method" cost comes from dynamic_level > 0 generating dynamic access tables, not from closure classes themselves. A demand-driven pass over the AST (same shape as SAM's note_fi_in_expr) would keep the fast path and drop the dead weight. The overload-merge fix stays useful but stops being load-bearing for unused methods.

So my recommendation: keep the feature, drop the eagerness, rather than rip it out. If the maintainer wants minimal scope for this PR, then at least narrow dynamic_level >= 2 (the path that actually caused the overload problem) to opt-in, and leave levels 0–1 alone.

Impact on `jvm/SAM` branch (#12901)

Yes — non-trivial. jvm/SAM makes the shared closure class implement the matched functional interface (associate_functional_interfaces attaches SAM types to the typed_function that backs the closure class, and generate_closures emits the bridge invokes). If the per-method closure feature is removed wholesale, SAM loses its implementation vehicle: you'd need a per-call-site lambda class (à la invokedynamic/LambdaMetafactory, or a hand-rolled equivalent) instead. That's a meaningfully larger redesign than what's on jvm/SAM today.

If we go with the "demand-driven, don't remove" middle path, jvm/SAM is essentially unaffected — the closure exists exactly when a closure is formed, which is exactly when a SAM conversion can happen.

Simn · 2026-05-15T11:52:50Z

To be clear, I was specifically talking about the code paths affected by gctx.dynamic_level >= 2 which I think is that eager policy Claude is talking about. This would still keep generate_dynamic_access itself, but lose the calls to register_field_closure inside of it, which to my understanding is what prompted this adventure in part.

And since it was mentioned, I'm still interested in using invokedynamic but I have been unable to understand how to actually use it effectively. Everyone suggests that it must be the right tool for the job but nobody seems to know how to actually make it work in our context.

kLabz · 2026-05-15T12:00:42Z

On the proposed scope

Agreed, that's a clean cut. The two register_field_closure calls inside generate_dynamic_access (the dynamic_level >= 2 branches) exist purely to upgrade _hx_getField's reflection answer from "a readFieldClosure MethodHandle-ish lookup" to "a pre-baked typed closure object." Removing them and letting level 2 fall through to emit_dynamic_closure for these cases costs:

One extra reflective dispatch per Reflect.field(o, "method")() call. The direct obj.method path is unaffected — that goes through the call site at src/generators/genjvm.ml:897, not through _hx_getField.
Loss of closure identity/equality for closures obtained via reflection. Probably nobody depends on this; if they do, it's already brittle.

What it buys:

No closure class generated for any method that's only ever reached through _hx_getField. On a dex build that's most methods on most classes — the savings should be substantial.
The "merge overload closures into a single class" commit (3a9124868e) stops being load-bearing for this path. It's still correct for the direct-closure call site at line 897 (which can also hit overloads), so I'd keep that commit in but its motivation narrows.
The shared-closure-across-modules complication (52a98ff274 and the server tests) is also less stressed — fewer closure classes registered, fewer cross-module shares.

So: low-risk, narrow change. Worth doing.

On invokedynamic

Honest answer: it works well for the direct obj.method site, awkwardly for the reflective site, and is a wash on dex.

Where it fits cleanly. LambdaMetafactory.metafactory is built to turn MethodHandle → SAM-instance. For the line 897 path, when we know the target method statically and (post-jvm/SAM) the functional interface, an invokedynamic bootstrapped against LambdaMetafactory replaces the entire shared-closure-class machinery for that call site. One invokedynamic opcode per obj.method expression; the JVM materializes a hidden class lazily and caches it. This is exactly what javac emits for obj::method.

Where it doesn't fit. _hx_getField takes a string at runtime, so there's no static method handle to bind — you can't bootstrap an invokedynamic against an unknown method name. The best you could do is keep readFieldClosure reflection there (which is what level 1 already does) and let invokedynamic handle only the typed call sites. Which is fine, but it means invokedynamic and the proposed dynamic_level >= 2 simplification are complementary, not alternatives.

Why people say it's the right tool but can't make it work for us. Three real friction points, in order of how much they actually matter:

Haxe closures are values that flow. A Haxe obj.method typed as Void -> Void can be stored, passed, compared, called later. LambdaMetafactory produces an instance of a specific SAM type chosen at the bootstrap site. So we need to know the target SAM type at the closure-formation site — which is exactly the gap jvm/SAM's AST scan closes for the AbstractCast case, but still doesn't close for "closure stored as Dynamic/haxe.Function, later cast." For the unknown-SAM case you'd need a Haxe-native interface (haxe.jvm.Function0/1/...) as the bootstrap target, then bridge to the real SAM at the use site — at which point you're partway back to the current closure class.
Equality / identity. Haxe code (and Reflect.compareMethods) expects two closures of the same (obj, method) to compare equal. LambdaMetafactory makes no such promise; you get fresh instances. Solvable with altMetafactory(FLAG_SERIALIZABLE) + a writeReplace hack, or by wrapping, but it's not free.
Dex. D8 desugars invokedynamic+LambdaMetafactory back to generated anonymous classes for minSdk < 26. On dex builds you don't reduce method count; you change who generates the classes (D8 instead of us). Sometimes D8 does it better (sharing across the whole APK), sometimes worse. ~~This is the bit nobody's measured, and given the entire jvm/dex-compatible motivation is dex method count, it's the question that matters most before going there.~~

My suggested order: ship the narrow dynamic_level >= 2 simplification on this PR. Then, as a separate experiment, prototype invokedynamic for the line 897 path only (with a Haxe-native FunctionN interface as the bootstrap target so we don't depend on jvm/SAM first) and measure dex output sizes on a real Android app. If the numbers look good, that becomes the eventual replacement for the remaining shared-closure machinery; if D8's desugaring makes it neutral or worse, we keep what we have.

Simn · 2026-05-15T12:26:13Z

My suggested order: ship the narrow dynamic_level >= 2 simplification on this PR.

Agreed!

The dynamic_level >= 2 branches in generate_dynamic_access registered a shared closure class for every method reachable via reflection, just so Reflect.field could hand back a typed closure. Fall back to the same readFieldClosure dispatch level 1 uses; direct obj.method closures are unaffected.

testOverloadedClosureViaReflect and testSingleMethodViaReflect asserted the shared-closure invariant for Reflect.field at dynamic-level=2, which no longer holds after that path was reverted to readFieldClosure dispatch. testDirectMemberClosure still covers the unchanged obj.method path.

Simn · 2026-05-15T15:37:23Z

LGTM now, the only thing is that the JvmDynamicLevel doc in define.json should be adjusted now that we're removing level 2.

kLabz · 2026-05-15T15:51:20Z

LGTM now, the only thing is that the JvmDynamicLevel doc in define.json should be adjusted now that we're removing level 2.

Right, and update the test that uses it.

Should we error for -D jvm.dynamic-level=2 now or emit a warning and consider it as 1? (genjvm.ml:3327)

Simn · 2026-05-15T16:06:31Z

I guess we can update it to error because I doubt anyone has been using it. And I was about to mention that we should update the tests but I see you already did that.

kLabz · 2026-05-15T16:30:31Z

I did both (updated line 3327 to error for values > 1 instead of > 2)

kLabz added 9 commits May 12, 2026 14:28

[tests] add dex checker for all jvm tests

190f089

[jvm] Add jvm.dex-compatible define (noop for now)

a7efb81

[jvm] avoid invalid anon field names with -D jvm.dex-compatible

1b02b23

[tests] add test

04b1fc9

[tests,jvm] check for dex issues

2fb4c3d

[tests] add test for closure overload issue

ef2ccd5

[tests] fix runci regression

cd1fabd

[CI] reduce impact on CI times

abc5501

Simn reviewed May 12, 2026

View reviewed changes

Comment thread src/generators/genjvm.ml Outdated

kLabz added 5 commits May 13, 2026 13:12

[tests] normalize CRLF in Issue12898 stdout assertion

190cdc8

JVM Sys.println uses the platform line separator, so the jar emits \r\n on Windows. Strip CR before comparing.

cleanup diff

b4862a7

kLabz mentioned this pull request May 13, 2026

[jvm] auto-detect Java SAM interfaces; honor SAM in overload resolution #12901

Merged

Wrong 'issue' number

0de95e3

kLabz added 2 commits May 15, 2026 14:53

[jvm] cleanup dynamic-level=2 leftovers

ac5b2ef

Simn merged commit 6c739be into development May 15, 2026
70 checks passed

kLabz deleted the jvm/dex-compatible branch May 16, 2026 08:08

Uh oh!

Conversation

kLabz commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. anon object fields can be generated with names not accepted by dex format

What used to happen

What happens now

2. an overload closure issue was detected, related to StringBuf.add overloads

Uh oh!

Uh oh!

Simn commented May 12, 2026

Uh oh!

kLabz commented May 12, 2026

Uh oh!

kLabz commented May 13, 2026

Behavior change

What used to happen

What happens now

Uh oh!

Simn commented May 13, 2026

Uh oh!

Simn commented May 15, 2026

Uh oh!

kLabz commented May 15, 2026

On dropping the per-method closure feature

What's actually expensive / complicated

What's actually load-bearing

Impact on jvm/SAM branch (#12901)

Uh oh!

Simn commented May 15, 2026

Uh oh!

kLabz commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

On the proposed scope

On invokedynamic

Uh oh!

Simn commented May 15, 2026

Uh oh!

Simn commented May 15, 2026

Uh oh!

kLabz commented May 15, 2026

Uh oh!

Simn commented May 15, 2026

Uh oh!

kLabz commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kLabz commented May 12, 2026 •

edited

Loading

Impact on `jvm/SAM` branch (#12901)

kLabz commented May 15, 2026 •

edited

Loading