[jvm] dex compatibility issues#12898
Conversation
generate_dynamic_access used to emit one switch case + one closure class per Haxe field, but @:overload @:native(X) makes several fields share a JVM name. The dispatch switch ended up with duplicate cases for the same string (the second one dead code), and each case spawned its own closure class with an identical path -- so the second class silently overwrote the first in the jar. d8/r8 reject the duplicate jar entries. Group fields by name before building cases, and add a multi-overload closure builder that emits one inner class with one invoke method per overload signature. All overload jsigs get registered in the closure cache so later user-code references resolve to the merged class. Repro at tests/misc/jvm/projects/Issue12897 (Main2/Check2).
|
If there's no obvious disadvantage I don't think we need |
Nothing obvious, but there were some potential issues with weird reflection use cases. I initially wanted to add the define because I expected more dex specific issues, but in the end it doesn't make much sense indeed |
Behavior changeAnonymous structures whose field names aren't valid Java/DEX What used to happenFor a structure like
So any Haxe-on-JVM project run through d8/r8 (i.e. anything targeting Android) failed dexing as soon as user code constructed an anon with a non-identifier key — including some std-lib paths. What happens now
Net effect for end users:
If that's acceptable to you, I'll drop the define. |
|
Yeah that's fine, I don't care much about wonky field names and the DEX valid identifier regex looks very standard. |
Three scenarios — overload merge via Reflect, direct member closure, single-method dynamic access via Reflect — each spans two callers and goes through clean / per-module invalidate / full invalidate cycles. Asserts the (target, method) closure is materialized exactly once as a top-level class under jvm/$Closure/ regardless of which caller's codegen triggered the registration.
Closure classes for member-method references used to live as inner classes of whichever caller's codegen happened to emit them first, producing a separate closure class per caller per (target, method). Move them to top-level shared classes under jvm/$Closure/ named Closure_<target_pkg_mangled>_<method>; codegen sites only register their needed invoke signatures (and any functional-interface bindings) and a finalize step materializes each closure class once with all accumulated invokes. The synthetic package segment `$Closure` is unreachable from Haxe source — Haxe identifiers reject `$` — so user code cannot define colliding classes there. Compilation-server safe: JVM codegen re-walks all reachable types on every build, so the closure cache repopulates from scratch each cycle and the finalize pass writes every needed class regardless of which modules were cache-hit at the typing layer. typed_function gains a typed_function_host variant (inner vs. standalone) selecting between caller-nested and top-level closure class spawning; local-function and static-method-reference closures keep their existing caller-nested layout.
JVM Sys.println uses the platform line separator, so the jar emits \r\n on Windows. Strip CR before comparing.
The dex-compatible anon-field handling (route fields whose names aren't valid DEX SimpleNames through DynamicObject._hx_fields instead of as typed JVM fields) is now unconditional. The user-visible difference for the cases this affects — anon structures with unsafe-named keys — is a small access-cost increase (map lookup vs. getfield); safe-named fields are unchanged. Producing jars that are dex-clean by default is the expected behavior, so the opt-in gate adds complexity without value. CI: drop the second per-target build pass. The single existing jar from each target is verified against d8 directly.
|
I've noticed that it creates projects/Issue12897 which is the wrong issue number. The closure change makes sense to me conceptually but I'm starting to wonder if the complications that arise from the dynamic level >= 2 could suggest that we should drop this feature entirely. It already seems questionable to create static closures for every single method on the off-chance that it might be used as a closure, and now that it has been shown to cause problems with overloads I'm inclined to think that we should just remove this feature. Could you check with Claude what he thinks about the costs and benefits here? |
On dropping the per-method closure featureI'd push back on a full removal, but the maintainer's instinct is right that the eager policy is the questionable part. Worth separating two things: What's actually expensive / complicated
What's actually load-bearing
Middle path (mirrors what So my recommendation: keep the feature, drop the eagerness, rather than rip it out. If the maintainer wants minimal scope for this PR, then at least narrow Impact on
|
|
To be clear, I was specifically talking about the code paths affected by And since it was mentioned, I'm still interested in using |
On the proposed scopeAgreed, that's a clean cut. The two
What it buys:
So: low-risk, narrow change. Worth doing. On invokedynamicHonest answer: it works well for the direct Where it fits cleanly. Where it doesn't fit. Why people say it's the right tool but can't make it work for us. Three real friction points, in order of how much they actually matter:
My suggested order: ship the narrow |
Agreed! |
The dynamic_level >= 2 branches in generate_dynamic_access registered a shared closure class for every method reachable via reflection, just so Reflect.field could hand back a typed closure. Fall back to the same readFieldClosure dispatch level 1 uses; direct obj.method closures are unaffected.
testOverloadedClosureViaReflect and testSingleMethodViaReflect asserted the shared-closure invariant for Reflect.field at dynamic-level=2, which no longer holds after that path was reverted to readFieldClosure dispatch. testDirectMemberClosure still covers the unchanged obj.method path.
|
LGTM now, the only thing is that the |
Right, and update the test that uses it. Should we error for |
|
I guess we can update it to error because I doubt anyone has been using it. And I was about to mention that we should update the tests but I see you already did that. |
|
I did both (updated line 3327 to error for values > 1 instead of > 2) |
By running
d8against our jvm tests, two more issues have been uncovered:1. anon object fields can be generated with names not accepted by dex format
Anonymous structures whose field names aren't valid Java/DEX
SimpleNames (i.e. contain anything outside[A-Za-z_$][A-Za-z0-9_$]*) are now stored in the map-backed_hx_fieldsslot rather than as typed JVM fields named after the raw Haxe identifier.What used to happen
For a structure like
{ "weird name\n" : 1, x : 2 }, the JVM backend emitted a typed field literally namedweird name\non the generated anon class. JVM bytecode is fine with that — the constant pool stores arbitrary UTF-8 — but:SimpleNamegrammar).Signatureattribute couldn't always round-trip these names either.So any Haxe-on-JVM project run through d8/r8 (i.e. anything targeting Android) failed dexing as soon as user code constructed an anon with a non-identifier key — including some std-lib paths.
What happens now
is_dex_safe_simple_namegates the typed-field path at four codegen sites:_hx_fieldsmap (two-pass: typed first, then map)._hx_getKnownFields— only safe-named fields are reported as "known"; the rest are reachable via the map.generate_dynamic_access(the_hx_getField/_hx_setFieldswitch) — switch cases are generated only for safe-named fields.read_anon_field/ write twin — direct typed-field load/store is only emitted for safe-named fields; otherwise the call falls back to map access.Net effect for end users:
_hx_fieldsHashMapinstead of a typed field. A small access-cost increase (map lookup vsgetfield), but they're already a slow, reflective-style path in practice, and they now actually dex.2. an overload closure issue was detected, related to StringBuf.add overloads
Why the bug existed
At
-D jvm.dynamic-level=2,generate_dynamic_accessbuilds the_hx_getFielddispatch switch by walkingc.cl_ordered_fieldsand emitting one case per Haxe field. When several Haxe fields share a JVM name — typical with@:overload @:native("foo")(e.g.StringBuf.addand its specialisedaddOptsiblings) — they all reach the case-builder with the samenamebut different signatures.Two consequences:
create_field_closure, which spawns a closure class named purely from(target_path, name)— no signature component. Different signatures → different class bytecode → same class path. Both classes were written to the jar under the identical entry name, and the later write silently overwrote the earlier one in the central directory.The JVM tolerates duplicate jar entries (one variant wins at load time, the other's bytecode is dead), which is why this never crashed in practice — the lost variant is typically a primitive-specialised
invoke(boolean)that most call sites box around anyway. DEX is stricter:d8rejects the jar outright, which is what surfaced the bug.Why the fix looks like that
The "right" representation of an overloaded method-closure is a single class with multiple
invokeoverloads — exactly mirroring how the target class itself holds multipleadd(...)methods. The underlying machinery (typed_function.generate_invoke,JvmClass.buildermutability) already supports adding multiple methods to one closure class; nothing was missing structurally. The fix is just to use it that way:generate_dynamic_access's input by name before building switch cases. One case per name kills the dead-code duplicates.create_field_closure_overloadshelper that spawns one inner class, generates one constructor + oneequals, and emits oneinvokeper overload signature. Each overload's jsig gets registered in the closure cache pointing at this merged class, so later user-code references (which still go through the existing single-signaturecreate_field_closure) cache-hit and reuse it instead of spawning a separate copy.This is the conservative scope: it fixes the in-class collision and removes the dead switch cases. A follow-up could also defer closure writes so that closures spawned from other host classes' codegen can share the merged class — eliminating the remaining cross-host duplication (e.g.
Main$haxe_root_StringBuf_addalongsideStringBuf$haxe_root_StringBuf_add). That's a perf/cleanliness win, not a correctness fix, so it's left for later.Follow-up implemented in 52a98ff