Failed to replicate batch invariant using vLLM inference

Hi, @Chillee 

I am trying to reproduce batch-invariant inference on the AIME24 dataset using the provided batch_invariant_ops.
I ran two different settings:
- Different batch sizes (16/12)
- Different sample shuffle orders

For each setting, we use greedy decoding and set `max_new_tokens=4096`, and for most samples, the outputs are perfectly batch-invariant.
However, I still observed two samples diverge during generation.

Here is the same token count across two batch_size settings. The outputs of samples 4 and 22 are different.
<img width="250" height="400" alt="Image" src="https://github.com/user-attachments/assets/97652bc0-01a2-4648-b9d5-4e966b930112" />
Here is the divergence point, the top1 token's logit differs.
<img width="1687" height="202" alt="Image" src="https://github.com/user-attachments/assets/1a65c435-ea97-49e9-876b-3b1d645028e9" />

Environment / Setup:
GPU: one L20 48G
Model: Qwen3-1.7B
Precision: fp16
Sampling Param Seed: 114514
Decoding: Greedy Decoding
Modifications: the only change I made was adjusting `BLOCK_N` in batch_invariant_ops (from 256 → 128) for fp16. No other modifications were made.

Do you have any suggestions on what might cause this non-determinism?
Is there anything else I should patch or verify in addition to batch_invariant_ops?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to replicate batch invariant using vLLM inference #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Failed to replicate batch invariant using vLLM inference #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions