fix: Align start, end, and probability lists in ExtractiveReader by bogdankostic · Pull Request #11347 · deepset-ai/haystack

bogdankostic · 2026-05-20T12:29:46Z

Related Issues

fixes https://github.com/deepset-ai/haystack-private/issues/339

ExtractiveReader.run was raising ValueError: zip() argument 3 is longer than arguments 1-2 whenever the number of valid answer spans for a sequence was smaller than answers_per_seq (e.g. short documents, or any case where the default answers_per_seq=20 exceeded the number of non-masked upper-triangular (start, end) token pairs).

Proposed Changes:

In _postprocess, filter each sequence's candidates_values row by the same valid mask and return a list[torch.Tensor] (one 1D tensor per sequence, variable length) instead of a 2D tensor. The return-type annotation and the _nest_answers probabilities parameter annotation were updated to match.

Root cause

_postprocess filtered the per-sequence start/end token-index lists by valid = candidates_values[i] > 0, producing variable-length lists, but returned candidates_values as the unfiltered 2D tensor of shape (batch_size, answers_per_seq).

How did you test it?

Added two unit tests in test/components/readers/test_extractive.py:

test_postprocess_filters_probs_when_answers_per_seq_exceeds_valid_spans:
constructs a batch where answers_per_seq=5 exceeds the 3 valid upper-triangular spans, asserts all three returned per-sequence structures have matching length 3 and that retained probabilities are all > 0.
test_nest_answers_accepts_variable_length_probability_rows:
calls _nest_answers with probabilities as a list[Tensor] containing rows of length 2 and 1 (the new contract) and
verifies scores flow through correctly.

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

vercel · 2026-05-20T12:29:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
haystack-docs	Ignored	Preview	May 20, 2026 2:22pm

github-actions · 2026-05-20T12:32:48Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
haystack/components/readers
extractive.py
Project Total

_{This report was generated by python-coverage-comment-action}

claude · 2026-05-20T12:38:58Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

anakin87 · 2026-05-20T14:08:24Z

haystack/haystack/components/readers/extractive.py

Line 392 in ec1777a

    
           while i < len(answers_without_query) and query_ids[i // answers_per_seq] == query_id:

(i // answers_per_seq) assumes that every sequence produced the same number of answers, which is no longer true in general. This still works because the reader uses a single query. I'd add a short comment here to explain this aspect.

bogdankostic · 2026-05-20T14:14:01Z

haystack/haystack/components/readers/extractive.py

Line 392 in ec1777a

while i < len(answers_without_query) and query_ids[i // answers_per_seq] == query_id:

(i // answers_per_seq) assumes that every sequence produced the same number of answers, which is no longer true in general. This still works because the reader uses a single query. I'd add a short comment here to explain this aspect.

Good catch! I added a comment.

anakin87

Seems like a good improvement.

My impression is that the implementation has always been wrong but then zip(..., strict=True) introduced in #10640 made the error surface.

I just left a minor comment to improve a test.

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

fix: Align start, end, and probability lists in ExtractiveReader

8a76082

bogdankostic requested a review from a team as a code owner May 20, 2026 12:29

bogdankostic requested review from anakin87 and removed request for a team May 20, 2026 12:29

github-actions Bot added the topic:tests label May 20, 2026

fix reno

e517a73

Add comment

7477b44

anakin87 reviewed May 20, 2026

View reviewed changes

Comment thread test/components/readers/test_extractive.py Outdated

anakin87 approved these changes May 20, 2026

View reviewed changes

Update test/components/readers/test_extractive.py

3ba1a9d

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

bogdankostic merged commit 435daf3 into main May 20, 2026
28 of 30 checks passed

bogdankostic deleted the extractive_reader_bug branch May 20, 2026 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Align start, end, and probability lists in ExtractiveReader#11347

fix: Align start, end, and probability lists in ExtractiveReader#11347
bogdankostic merged 4 commits into
mainfrom
extractive_reader_bug

bogdankostic commented May 20, 2026

Uh oh!

vercel Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 20, 2026

Uh oh!

anakin87 commented May 20, 2026

Uh oh!

bogdankostic commented May 20, 2026

Uh oh!

Uh oh!

anakin87 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bogdankostic commented May 20, 2026

Related Issues

Proposed Changes:

Root cause

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

claude Bot commented May 20, 2026

Code review

Uh oh!

anakin87 commented May 20, 2026

Uh oh!

bogdankostic commented May 20, 2026

Uh oh!

Uh oh!

anakin87 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

anakin87 left a comment •

edited

Loading