Skip to content

Failed to replicate batch invariant using vLLM inference #6

@fopdoodle8

Description

@fopdoodle8

Hi, @Chillee

I am trying to reproduce batch-invariant inference on the AIME24 dataset using the provided batch_invariant_ops.
I ran two different settings:

  • Different batch sizes (16/12)
  • Different sample shuffle orders

For each setting, we use greedy decoding and set max_new_tokens=4096, and for most samples, the outputs are perfectly batch-invariant.
However, I still observed two samples diverge during generation.

Here is the same token count across two batch_size settings. The outputs of samples 4 and 22 are different.
Image
Here is the divergence point, the top1 token's logit differs.
Image

Environment / Setup:
GPU: one L20 48G
Model: Qwen3-1.7B
Precision: fp16
Sampling Param Seed: 114514
Decoding: Greedy Decoding
Modifications: the only change I made was adjusting BLOCK_N in batch_invariant_ops (from 256 → 128) for fp16. No other modifications were made.

Do you have any suggestions on what might cause this non-determinism?
Is there anything else I should patch or verify in addition to batch_invariant_ops?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions