[Python] to_pylist() on list-typed arrays is several times slower than converting via to_pandas()

### Describe the enhancement requested

`pa.Array.to_pylist()` on list-typed arrays is 2.5–10x slower than converting the
same array to pandas and then turning the resulting numpy arrays back into Python
lists — even though `to_pylist` does strictly less work conceptually.

This matters in practice: Apache Spark switched regular Python UDFs to Arrow
serialization by default and hit a performance regression on array columns caused
by this (see apache/spark#56940, apache/spark#56943). Working around it in Spark
via the pandas detour was rejected because it introduces type-coercion bugs
(e.g. `list<int32>` with a null element comes back as numpy `float64`
`[1., nan, 3.]` instead of `[1, None, 3]`), so the right fix is making
`to_pylist()` itself fast.

## Reproduction (pyarrow 24.0.0, Python 3.11, macOS arm64; same numbers on current master)

```python
import pyarrow as pa

N = 2_000_000
arr = pa.array([[f"s{j}", f"t{j}"] for j in range(N)], type=pa.list_(pa.string()))

arr.to_pylist()                          # 1.97 s
arr.to_pandas()                          # 0.46 s  (4.3x faster, does MORE work)
[x.tolist() for x in arr.to_pandas()]    # 0.78 s  (2.5x faster incl. ndarray->list)
arr.values.to_pylist()                   # 0.82 s  (4M flat strings)

# nested: 1M rows of [[j, j+1], [j+2]] as list<list<int32>>
nested.to_pylist()                       # 2.00 s
nested.to_pandas()                       # 0.20 s  (10x faster)
```

## Root cause

`Array.to_pylist` is implemented as a per-element scalar conversion
(`python/pyarrow/array.pxi`):

```python
return [x.as_py(maps_as_pydicts=maps_as_pydicts) for x in self]
```

For a `list<string>` array, every row pays for:

1. `Array.__iter__` → `getitem(i)` → C++ `arrow::Array::GetScalar(i)`, which
   allocates a `ListScalar` holding a sliced values array;
2. a Python `Scalar` wrapper (`Scalar.wrap`);
3. `ListScalar.as_py` → the `values` property wraps the slice in a *new Python
   `Array` object* (`pyarrow_wrap_array`), then recursively calls `.to_pylist()`
   on it, which allocates a fresh generator and repeats 1–2 for every element,
   where C++ `GetScalar` on a string array copies each value into a
   `std::string`, wraps it in a `Buffer` and allocates a `StringScalar`.

A `sample` profile of the repro shows where the time goes (~8365 samples):

- ~20% CPython GC (`gc_collect_main`): the per-row generator/Scalar/Array
  allocations are GC-tracked and repeatedly trigger collections that traverse
  the ever-growing result list;
- ~25% C++ `Array::GetScalar` (per-element scalar allocation + per-row values
  slicing);
- most of the rest is Python wrapper allocation and method dispatch
  (`Scalar.wrap`, `ListScalar.values` → `pyarrow_wrap_array`, `as_py` calls);
- the useful work — actually creating the 4M `str` objects (`unicode_new`) —
  is only ~7% of samples.

This was diagnosed back in 2021 in #28694 (ARROW-12976): maintainers agreed the
fix is to bypass Scalar creation entirely, but the issue was closed as stale in
Feb 2026 without a fix. #28689 is related.

## Prototype fix and results

A ~250-line Cython-level prototype on master (no C++ changes) gives:

| benchmark (2M / 1M rows) | master | patched | speedup |
|---|---|---|---|
| `list<string>` to_pylist | 1.93 s | **0.34 s** | 5.7x |
| `list<list<int32>>` to_pylist | 2.10 s | **0.65 s** | 3.2x |
| flat `string` to_pylist (4M) | 0.83 s | **0.05 s** | 16x |

i.e. `to_pylist` becomes ~2.2x faster than the pandas detour
(0.75 s) instead of 2.5x slower.

Two independent parts:

1. **Bulk list conversion** — `to_pylist` overrides on `ListArray`,
   `LargeListArray` and `FixedSizeListArray` that convert the referenced range
   of child values with a *single* recursive `to_pylist` call and then slice the
   resulting Python list per row using the raw C offsets and the validity
   bitmap. No per-row Scalar, no per-row Python Array wrapper, no per-row
   generator. `MapArray` explicitly keeps the generic path (association-tuple /
   `maps_as_pydicts` duplicate-key semantics).
2. **String leaf fast path** — `to_pylist` overrides on `StringArray` /
   `LargeStringArray` that decode values straight from the data buffer
   (`GetValue` + `PyUnicode_DecodeUTF8`), matching `StringScalar.as_py`
   (= `str(buf, 'utf8')`) exactly.

Semantics are unchanged: a differential test comparing the patched `to_pylist`
against the reference `[x.as_py() for x in arr]` with exact-type equality passes
for list/large_list/fixed_size_list/map over 8 leaf types, nested lists,
list<struct>, list<map>, sliced arrays, all-null/empty arrays, and both
`maps_as_pydicts` modes; in particular `list<int32>` `[1, None, 3]` stays
`[1, None, 3]` (ints + None). `pytest pyarrow/tests/test_array.py
test_scalars.py test_convert_builtin.py test_table.py` passes (1208 passed).

Natural follow-ups (same pattern): leaf fast paths for primitive/binary types
(would speed up the `list<list<int32>>` case further), string/binary views,
struct arrays, a bulk path for maps, and list-view types (these need care:
overlapping views should not share mutable sublist objects). Longer-term, a
single C++ `ToPyList` visitor (like `MonthDayNanoIntervalArrayToPyList`) could
cover all types without per-class Cython code.

I can submit the prototype as a PR.

### Component(s)

Python


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] to_pylist() on list-typed arrays is several times slower than converting via to_pandas() #50326

Describe the enhancement requested

Reproduction (pyarrow 24.0.0, Python 3.11, macOS arm64; same numbers on current master)

Root cause

Prototype fix and results

Component(s)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

benchmark (2M / 1M rows)	master	patched	speedup
`list<string>` to_pylist	1.93 s	0.34 s	5.7x
`list<list<int32>>` to_pylist	2.10 s	0.65 s	3.2x
flat `string` to_pylist (4M)	0.83 s	0.05 s	16x

Uh oh!

[Python] to_pylist() on list-typed arrays is several times slower than converting via to_pandas() #50326

Description

Describe the enhancement requested

Reproduction (pyarrow 24.0.0, Python 3.11, macOS arm64; same numbers on current master)

Root cause

Prototype fix and results

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions