Commit 70469a2
committed
[SPARK-55934][PYTHON][TEST][FOLLOWUP] Fix MAP_ARROW_ITER bench UDF return type
### What changes were proposed in this pull request?
Fix the default `ret_type` resolution in `_MapArrowIterBenchMixin._write_scenario` in `python/benchmarks/bench_eval_type.py`. The three benchmark UDFs (`identity_udf`, `sort_udf`, `filter_udf`) all return whole `pa.RecordBatch`es with the input row schema, so their declared return type should be the inner row `StructType`, not the first nested field's data type.
Before:
```python
ret_type = schema.fields[0].dataType.fields[0].dataType # first column's type
```
After:
```python
ret_type = schema.fields[0].dataType # the row's StructType
```
### Why are the changes needed?
`mapInArrow` UDFs are contractually `(Iterator[pa.RecordBatch]) -> Iterator[pa.RecordBatch]`, with the user-supplied `schema` describing the output rows as a whole. The benchmark UDFs return full batches but were declaring just one column's type, which is semantically inconsistent with the API.
This did not surface as an error because `worker.py` discards `return_type` for `SQL_MAP_ARROW_ITER_UDF` in `read_single_udf` (returns `None`) and only checks `Iterator[pa.RecordBatch]` structurally via `verify_return_type`. Schema-level validation is currently absent, so the mismatched type was tolerated. If the worker ever adds schema validation for this eval type, the previous declaration would break the benchmark.
### Does this PR introduce _any_ user-facing change?
No. Test-only change in the benchmark module.
### How was this patch tested?
- Confirmed the new default resolves to a 5-field `StructType` matching the input row schema for `sm_batch_few_col`.
- Ran `MapArrowIterUDFTimeBench.setup` + `time_worker` for `(sm_batch_few_col, pure_ints) x (identity_udf, sort_udf, filter_udf)`.
- Ran `MapArrowIterUDFPeakmemBench.setup` + `peakmem_worker` for `sm_batch_few_col/identity_udf`.
### Was this patch authored or co-authored using generative AI tooling?
Yes. Generated-by: Claude Code (claude-opus-4-7)
Closes apache#56168 from viirya/SPARK-55934-followup.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>1 parent 63f9c88 commit 70469a2
1 file changed
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1457 | 1457 | | |
1458 | 1458 | | |
1459 | 1459 | | |
1460 | | - | |
| 1460 | + | |
| 1461 | + | |
| 1462 | + | |
| 1463 | + | |
1461 | 1464 | | |
1462 | 1465 | | |
1463 | 1466 | | |
| |||
0 commit comments