Skip to content

Commit e19683e

Browse files
committed
cleanup round 1
1 parent 4d471e1 commit e19683e

22 files changed

Lines changed: 510 additions & 856 deletions

File tree

.github/workflows/pr_build_linux.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ jobs:
307307
org.apache.comet.CometFuzzAggregateSuite
308308
org.apache.comet.CometFuzzIcebergSuite
309309
org.apache.comet.CometFuzzMathSuite
310-
org.apache.comet.CometCodegenDispatchFuzzSuite
310+
org.apache.comet.CometCodegenFuzzSuite
311311
org.apache.comet.DataGeneratorSuite
312312
- name: "shuffle"
313313
value: |
@@ -386,7 +386,7 @@ jobs:
386386
org.apache.comet.expressions.conditional.CometIfSuite
387387
org.apache.comet.expressions.conditional.CometCoalesceSuite
388388
org.apache.comet.expressions.conditional.CometCaseWhenSuite
389-
org.apache.comet.CometCodegenDispatchSmokeSuite
389+
org.apache.comet.CometCodegenSuite
390390
org.apache.comet.CometCodegenSourceSuite
391391
org.apache.comet.CometCodegenHOFSuite
392392
- name: "sql"

.github/workflows/pr_build_macos.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ jobs:
155155
org.apache.comet.CometFuzzAggregateSuite
156156
org.apache.comet.CometFuzzIcebergSuite
157157
org.apache.comet.CometFuzzMathSuite
158-
org.apache.comet.CometCodegenDispatchFuzzSuite
158+
org.apache.comet.CometCodegenFuzzSuite
159159
org.apache.comet.DataGeneratorSuite
160160
- name: "shuffle"
161161
value: |
@@ -233,7 +233,7 @@ jobs:
233233
org.apache.comet.expressions.conditional.CometIfSuite
234234
org.apache.comet.expressions.conditional.CometCoalesceSuite
235235
org.apache.comet.expressions.conditional.CometCaseWhenSuite
236-
org.apache.comet.CometCodegenDispatchSmokeSuite
236+
org.apache.comet.CometCodegenSuite
237237
org.apache.comet.CometCodegenSourceSuite
238238
org.apache.comet.CometCodegenHOFSuite
239239
- name: "sql"

common/src/main/java/org/apache/comet/codegen/CometBatchKernel.java

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,6 @@
2828
* {@code BoundReference.genCode} can call {@code this.getUTF8String(ord)} directly) and carries
2929
* typed input fields baked at codegen time, one per input column. Expression evaluation plus Arrow
3030
* read/write fuse into one method per expression tree.
31-
*
32-
* <p>Input scope: any {@code ValueVector[]}; the generated subclass casts each slot to the concrete
33-
* Arrow type the compile-time schema specified. Output is a generic {@code FieldVector}; the
34-
* generated subclass casts to the concrete type matching the bound expression's {@code dataType}.
35-
* Widen input support by adding vector classes to the getter switch in {@code
36-
* CometBatchKernelCodegen.emitTypedGetters}; widen output support by adding cases in {@code
37-
* CometBatchKernelCodegen.allocateOutput} and {@code emitOutputWriter}.
3831
*/
3932
public abstract class CometBatchKernel extends CometInternalRow {
4033

@@ -47,22 +40,22 @@ protected CometBatchKernel(Object[] references) {
4740
/**
4841
* Process one batch.
4942
*
50-
* @param inputs Arrow input vectors; length and concrete classes must match the schema the kernel
51-
* was compiled against
43+
* @param inputs Arrow input vectors; length and concrete classes match the schema the kernel was
44+
* compiled against
5245
* @param output Arrow output vector; caller allocates to the expression's {@code dataType}
5346
* @param numRows number of rows in this batch
5447
*/
5548
public abstract void process(ValueVector[] inputs, FieldVector output, int numRows);
5649

5750
/**
5851
* Run partition-dependent initialization. The generated subclass overrides this to execute
59-
* statements collected via {@code CodegenContext.addPartitionInitializationStatement}, for
60-
* example reseeding {@code Rand}'s {@code XORShiftRandom} from {@code seed + partitionIndex}.
52+
* statements collected via {@code CodegenContext.addPartitionInitializationStatement}, e.g.
53+
* reseeding {@code Rand}'s {@code XORShiftRandom} from {@code seed + partitionIndex}.
6154
* Deterministic expressions leave this as a no-op.
6255
*
63-
* <p>The caller must invoke this before the first {@code process} call of each partition. The
64-
* generated subclass is not thread-safe across concurrent {@code process} calls, so kernels are
65-
* allocated per dispatcher invocation and init is run once on the fresh instance.
56+
* <p>The caller invokes this before the first {@code process} call of each partition. The
57+
* generated subclass is not thread-safe across concurrent {@code process} calls; the dispatcher
58+
* allocates one per partition and serializes calls.
6659
*/
6760
public void init(int partitionIndex) {}
6861
}

common/src/main/scala/org/apache/comet/codegen/CometArrayData.scala

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,23 +27,16 @@ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
2727
import org.apache.comet.shims.CometInternalRowShim
2828

2929
/**
30-
* Throwing-default base for [[ArrayData]] in the Arrow-direct codegen kernel. Subclasses override
31-
* only the getters their element type needs (e.g. `numElements`, `isNullAt`, `getUTF8String` for
32-
* an `ArrayType(StringType)` input).
30+
* Throwing-default `ArrayData` base for the codegen kernel. Subclasses override only the getters
31+
* their element type needs.
3332
*
34-
* Consumer: `InputArray_${path}` nested classes the input emitter generates per `ArrayType` input
35-
* column. They back `getArray(ord)` plus the recursion for `Array<Array<...>>` and array-typed
36-
* map keys / struct fields.
33+
* Consumer: per-column `InputArray_${path}` nested classes that back `getArray(ord)` plus the
34+
* recursion for `Array<Array<...>>` and array-typed map keys / struct fields.
3735
*
38-
* `ArrayData` and [[CometInternalRow]]'s [[InternalRow]] are sibling abstract classes in Spark
39-
* (both extend `SpecializedGetters`, neither inherits the other), so a base aimed at one cannot
40-
* serve the other. The dispatch body that '''is''' shared between them lives in
41-
* [[CometSpecializedGettersDispatch]]. The third sibling, [[CometMapData]], backs `InputMap_*`
42-
* and routes `keyArray()` / `valueArray()` through `CometArrayData` instances.
43-
*
44-
* Mixes in [[CometInternalRowShim]] for the same reason `CometInternalRow` does: Spark 4.x adds
45-
* abstract `SpecializedGetters` methods (`getVariant`, `getGeography`, `getGeometry`) that both
46-
* `InternalRow` and `ArrayData` inherit; the per-profile shim provides throwing defaults.
36+
* `ArrayData` and `InternalRow` are sibling abstract classes, so a base aimed at one cannot serve
37+
* the other. The shared `get(ordinal, dataType)` dispatch lives in
38+
* [[CometSpecializedGettersDispatch]]. Mixes in [[CometInternalRowShim]] so Spark 4.x's
39+
* `getVariant` / `getGeography` / `getGeometry` get throwing defaults.
4740
*/
4841
abstract class CometArrayData extends ArrayData with CometInternalRowShim {
4942

0 commit comments

Comments
 (0)