Skip to content

Missing documentation about transaction management in multi-threaded steps #5386

@pkernevez

Description

@pkernevez

Bug description
When a ChunkOrientedStep is configured with a TaskExecutor, the item processor is executed outside any transaction. This is a regression from Spring Batch 5.2.5 (where the processor ran inside the chunk transaction) and contradicts the documentation, which states:

chunk: The Java-specific name of the dependency that indicates that this is an item-based step and 
the number of items to be processed before the transaction is committed.

Environment
Migrating from Spring Boot 3.5.13 / Spring Batch 5.2.5 (working) to Spring Boot 4.0.5 / Spring Batch 6.0.3 (broken)
Java 25, PostgreSQL

Steps to reproduce
Clone https://github.com/pkernevez/pb-hibernate-proxy/tree/springbatch-transaction-issue (/!\ branch: springbatch-transaction-issue) and run IssueSpringBatchTest.
The test fails because the processor is annotated with @transactional(TxType.MANDATORY) and no transaction is active when it runs.
Commenting out .taskExecutor(executor) in BatchConfiguration makes the test pass.
To reproduce from scratch: build a ChunkOrientedStep with a chunk size, a TransactionManager, and a TaskExecutor, then annotate the processor with @transactional(TxType.MANDATORY).

        return new StepBuilder("currencyStep", jobRepository)
                .<Long, CurrencyEntity>chunk(10)
                .transactionManager(transactionManager)
//                .taskExecutor(executor)
                .reader(currencyReader)
                .processor(processor)
                .writer(writer)
                .build();

Expected behavior

  • In normal mode: one transaction per chunk, wrapping reader, processor, and writer for all items in that chunk.
  • In faultTolerant() mode after a rollback: one transaction per item (scan mode).

Investigation
I can't understand the current implementation regarding SpringBatch 5, the documentation and my experience.
This is my understanding without scanMode.

Without a TaskExecutor:

  1. doExecute iterate on the chunk starts a transaction before each
  2. For each chunk we process processNextChunk that delegates to processChunkSequentially
  3. We iterate on the item and process them in the transaction started in 1
    All is ok.

With a TaskExecutor:

  1. doExecute iterate on the chunk starts a transaction before each in the main thread
  2. For each chunk we process processNextChunk that delegates to processChunkConcurrently
  3. We iterate on the item and push them in the task executor.
  4. They are all executed in other threads without transations ❌
  5. Wait for all the item execution before returning to 1 for the next chunk

Consequences

  • Correctness: the processor runs without a transaction and items in a chunk no longer share a Hibernate session — breaking consistency guarantees.
  • Performance (caching): reference entities loaded by item N in the Session are not cached in the session for items N+1..N+chunkSize.
  • Performance (parallelism): previously, each worker processed a full chunk sequentially in its own transaction; a slow item only stalled one worker.
    Now a slow item blocks the main thread (and the whole batch) until all futures of the current chunk finish, even when the executor pool has spare capacity.

Proposed implementation

  1. doExecute iterate on the chunk starts a transaction before each in the main thread
  2. For each chunk we process processNextChunk that delegates to processChunkConcurrently
  3. We prepare the chunk to be executed in another thread, and push them in the task executor.
  4. In the main thread we directly return to 1 to prepare the next chunk to push in the executor
    4bis. In the thread of the task executor, we start a new transaction then iterate on the item of the chunk
  5. Wait in 'main-thread` for the completion of all the chunk/Future

Minimal Complete Reproducible example
Clone the repo https://github.com/pkernevez/pb-hibernate-proxy/tree/springbatch-transaction-issue
/!\ the right branch is springbatch-transaction-issue

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions