Spark: Allow session-level split size override without DDL or source code changes

### Feature Request / Improvement

## Problem Statement

When different compute engines or hardware accelerators (e.g., GPU via RAPIDS Accelerator and CPU) read the same Iceberg table concurrently, they need different values for `read.split.target-size`. GPU readers benefit from significantly larger splits (e.g., 2GB) to saturate device memory bandwidth, while CPU readers perform better with the current default (128MB).

Today there are only two ways to control `read.split.target-size`:

1. **Table property** (`read.split.target-size`) -- requires DDL (`ALTER TABLE ... SET TBLPROPERTIES`), affects all readers globally, and is unsuitable when GPU and CPU workloads hit the same table simultaneously.
2. **Read option** (`split-size`) -- requires source code changes to every `DataFrameReader` call site, which is impractical for ad-hoc SQL queries and shared notebooks. Hardware accelerators like [RAPIDS Accelerator for Apache Spark](https://nvidia.github.io/spark-rapids/) are designed as drop-in replacements that require no application code changes, so requiring per-call-site read options defeats this benefit.

Neither approach allows a Spark session to declare "all my reads should use split size X" without modifying table metadata or application code.

## Proposed Solution

Add a Spark **session configuration** key that overrides the table property for split size:

```
spark.sql.iceberg.split-size
```

This follows the existing pattern used by other Iceberg session configs such as `spark.sql.iceberg.vectorization.enabled` and `spark.sql.iceberg.data-planning-mode`.

The resolution precedence would be:

1. Read option (`split-size`)
2. Table-scoped session configuration (`spark.sql.iceberg.split-size.<catalog>.<database>.<table>`)
3. Global session configuration (`spark.sql.iceberg.split-size`)
4. Table property (`read.split.target-size`)
5. Default (128MB)

### Usage

```sql
-- GPU session: use 2GB splits for all table reads
SET spark.sql.iceberg.split-size = 2147483648;

-- CPU session: keep the default or set a different value
SET spark.sql.iceberg.split-size = 134217728;

-- Override split size for a specific table in this session
SET spark.sql.iceberg.split-size.my_catalog.my_db.large_table = 1073741824;
```

No DDL or code changes needed -- each session gets its own split size, with optional per-table granularity.

### Alternative: Engine-scoped table property overrides

A further extension could add an engine or hardware type qualifier to the `read.split.target-size` table property itself (e.g., `read.split.target-size.gpu`), allowing a single table to declare optimal split sizes for different hardware profiles without session-level configuration. This would let table owners encode hardware-aware defaults directly in metadata.

## Scope

- Spark only (all supported shims: 3.4, 3.5, 4.0)
- Files affected: `SparkSQLProperties`, `SparkReadConf`, `SparkConfParser`
- Backward compatible: no behavior change unless the new session key is explicitly set


### Query engine

Spark

### Willingness to contribute

- [x] I can contribute this improvement/feature independently
- [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Allow session-level split size override without DDL or source code changes #16153

Feature Request / Improvement

Problem Statement

Proposed Solution

Usage

Alternative: Engine-scoped table property overrides

Scope

Query engine

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spark: Allow session-level split size override without DDL or source code changes #16153

Description

Feature Request / Improvement

Problem Statement

Proposed Solution

Usage

Alternative: Engine-scoped table property overrides

Scope

Query engine

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions