Skip to content

Commit c6a228b

Browse files
stevenschlanskerClaude (on behalf of Steven Schlansker)
andauthored
refactor(format): inline custom-codec dispatch in row codecs (#3716)
## Why? Simplify codegen and runtime codepath, net -1 frame stack depth, set stage for future feature work ## What does this PR do? Replace CustomTypeEncoderRegistry$Gen<N>, a per-registration Janino-generated class whose static encode/decode/newCollection methods the row codec called into, with one static final CustomCodec / CustomCollectionFactory field per registered codec on the row codec class itself. Fields are initialized at class load from CustomTypeEncoderRegistry; encode/decode invoke the cached instance. Per-call cost is one indirect call; the Janino compile-and-define cycle on first registration is gone. CustomTypeEncoderRegistry collapses to two ConcurrentHashMap lookups plus a singleton handler view. findCodec / findCollectionFactory are public so a row codec generated under a child classloader can reach them. Key the collection-factory codegen cache on (container, element) rather than container alone, so two SortedSet-typed sibling fields with different element types dispatch to their own factories. ## AI Contribution Checklist - [x] Substantial AI assistance was used in this PR: `yes` / `no` AI Usage Disclosure - substantial_ai_assistance: yes - scope: all - affected_files_or_subsystems: Java row format - ai_review: 👍 - ai_review_artifacts: > Nothing blocking. Net is a clear win: deletes ~165 lines of Janino-generated dispatch infrastructure, removes a class-generation step from cold-start registration, and folds in a real bug fix for sibling collection fields with a focused test. > LGTM with no requested changes. Cleanly deletes the per-registration Janino dispatch class in favor of static-final field caches on the row codec itself, and fixes the sibling-SortedSet cache-keying bug with a targeted test. fory-format suite is green (108/108, 4 pre-existing skips). > — Claude > Overall > Sound refactor and a meaningful simplification — a Janino compile/define cycle per registration is removed in favor of one monomorphic, JIT-friendly call site per codec field. The (beanType, fieldType) key matches at both codegen-time discovery and runtime lookup, and concurrency moves from coarse synchronized to ConcurrentHashMap cleanly. Residual risks are minor: small generated-class bloat if duplicate TypeRefs ever reach customEncode. > I followed the Apache Fory AI review policy: review was performed by a read-only subagent (no edits, commits, or pushes), with findings ordered by severity, citing exact file/line locations, and including the validation commands the author should run before merging. > — Claude (Opus 4.7) - human_verification: ✔️ - provenance_license_confirmation: ✔️ ## Does this PR introduce any user-facing change? Internal API change only ## Benchmark > - JMH numbers across both suites, n=15 per benchmark on each side: every measurement is within error bars of equal. No regression. > - Inlining graphs confirm: both versions fully inline the user's encode/decode body into the generated row codec, both call sites fully > monomorphic (5120/5120 type profile). Baseline has one extra frame (Gen1::encode trampoline); HEAD goes straight from toRow to > WrappedCodec::encode. Both collapse to the same machine code in steady state. > Verdict: no performance regression, and the inlining shape is what the static-final + monomorphic-INVOKEINTERFACE model predicts. The "static → virtual" concern doesn't materialize because the old static call was itself a thin static trampoline around an identical interface call. Co-authored-by: Claude (on behalf of Steven Schlansker) <stevenschlansker+claude@apache.org>
1 parent 598e82f commit c6a228b

3 files changed

Lines changed: 144 additions & 235 deletions

File tree

java/fory-format/src/main/java/org/apache/fory/format/encoder/BaseBinaryEncoderBuilder.java

Lines changed: 69 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
import org.apache.fory.format.row.binary.writer.BinaryWriter;
7070
import org.apache.fory.format.type.CustomTypeEncoderRegistry;
7171
import org.apache.fory.format.type.CustomTypeHandler;
72+
import org.apache.fory.format.type.CustomTypeRegistration;
7273
import org.apache.fory.format.type.DataTypes;
7374
import org.apache.fory.format.type.Field;
7475
import org.apache.fory.format.type.Schema;
@@ -768,15 +769,18 @@ protected Expression newCollection(Expression arrayData, TypeRef<?> typeRef) {
768769
CustomCollectionFactory<?, ?> customFactory =
769770
customTypeHandler.findCollectionFactory(clazz, componentClass.getRawType());
770771
if (customFactory != null) {
772+
Reference factoryRef = customCollectionFactoryFieldRef(clazz, componentClass.getRawType());
771773
collection =
772-
new StaticInvoke(
773-
customTypeHandler.getClass(),
774-
"newCollection",
774+
new Cast(
775+
new Invoke(
776+
factoryRef,
777+
"newCollection",
778+
"newCollectionObject",
779+
TypeRef.of(Object.class),
780+
false,
781+
size),
775782
typeRef,
776-
false,
777-
new Expression.Null(typeRef, true),
778-
new Expression.Null(componentClass, true),
779-
size);
783+
"newCollection");
780784
} else if (TypeRef.of(clazz).isSupertypeOf(TypeRef.of(ArrayList.class))) {
781785
collection = new NewInstance(TypeRef.of(ArrayList.class), size);
782786
} else if (TypeRef.of(clazz).isSupertypeOf(TypeRef.of(HashSet.class))) {
@@ -921,26 +925,69 @@ protected Expression deserializeForObject(Expression value, TypeRef<?> typeRef)
921925
return new Invoke(foryRef, "deserialize", typeRef, value);
922926
}
923927

928+
// Static fields the row codec calls into. Initialized once at class load from
929+
// CustomTypeEncoderRegistry; encode/decode call sites invoke the cached instance.
930+
private final Map<TypeRef<?>, Reference> customCodecFieldRefs = new HashMap<>();
931+
private final Map<CustomTypeRegistration, Reference> customCollectionFactoryFieldRefs =
932+
new HashMap<>();
933+
934+
private Reference customCodecFieldRef(TypeRef<?> fieldType) {
935+
return customCodecFieldRefs.computeIfAbsent(
936+
fieldType,
937+
ft -> {
938+
String name = ctx.newName("_customCodec");
939+
Expression init =
940+
new StaticInvoke(
941+
CustomTypeEncoderRegistry.class,
942+
"findCodec",
943+
"",
944+
TypeRef.of(CustomCodec.class),
945+
false,
946+
false,
947+
false,
948+
Literal.ofClass(beanType.getRawType()),
949+
Literal.ofClass(ft.getRawType()));
950+
ctx.addField(true, true, ctx.type(CustomCodec.class), name, init);
951+
return new Reference(name, TypeRef.of(CustomCodec.class));
952+
});
953+
}
954+
955+
private Reference customCollectionFactoryFieldRef(Class<?> collectionType, Class<?> elementType) {
956+
return customCollectionFactoryFieldRefs.computeIfAbsent(
957+
new CustomTypeRegistration(collectionType, elementType),
958+
reg -> {
959+
String name = ctx.newName("_customCollectionFactory");
960+
Expression init =
961+
new StaticInvoke(
962+
CustomTypeEncoderRegistry.class,
963+
"findCollectionFactory",
964+
"",
965+
TypeRef.of(CustomCollectionFactory.class),
966+
false,
967+
false,
968+
false,
969+
Literal.ofClass(reg.getBeanType()),
970+
Literal.ofClass(reg.getFieldType()));
971+
ctx.addField(true, true, ctx.type(CustomCollectionFactory.class), name, init);
972+
return new Reference(name, TypeRef.of(CustomCollectionFactory.class));
973+
});
974+
}
975+
924976
protected Expression customEncode(Expression inputObject, TypeRef<?> rewrittenType) {
925-
return new Expression.StaticInvoke(
926-
customTypeHandler.getClass(),
927-
"encode",
928-
"rewrittenValue",
977+
Reference codecRef = customCodecFieldRef(inputObject.type());
978+
return new Cast(
979+
new Invoke(
980+
codecRef, "encode", "rewrittenObject", TypeRef.of(Object.class), true, inputObject),
929981
rewrittenType,
930-
true,
931-
new Expression.Null(beanType, true),
932-
inputObject);
982+
"rewrittenValue");
933983
}
934984

935985
protected Expression customDecode(TypeRef<?> typeRef, final Expression deserializedValue) {
936-
return new Expression.StaticInvoke(
937-
customTypeHandler.getClass(),
938-
"decode",
939-
"decodedValue",
986+
Reference codecRef = customCodecFieldRef(typeRef);
987+
return new Cast(
988+
new Invoke(
989+
codecRef, "decode", "decodedObject", TypeRef.of(Object.class), true, deserializedValue),
940990
typeRef,
941-
true,
942-
new Expression.Null(beanType, true),
943-
new Expression.Null(typeRef, true),
944-
deserializedValue);
991+
"decodedValue");
945992
}
946993
}

0 commit comments

Comments
 (0)