Skip to content

fix: avoid mutating Document.from_dict input#11330

Merged
julian-risch merged 4 commits into
deepset-ai:mainfrom
pragnyanramtha:codex/document-from-dict-copy-input
May 19, 2026
Merged

fix: avoid mutating Document.from_dict input#11330
julian-risch merged 4 commits into
deepset-ai:mainfrom
pragnyanramtha:codex/document-from-dict-copy-input

Conversation

@pragnyanramtha
Copy link
Copy Markdown
Contributor

Related Issues

  • None found. This fixes a small deserialization side effect in Document.from_dict().

Proposed Changes:

Document.from_dict() previously mutated the dictionary passed by the caller while converting serialized blob and sparse_embedding values and while unflattening metadata. This PR copies the input mapping before those transformations so callers can safely reuse their serialized document data after deserialization.

It also renames an existing intended test from from_from_dict_with_parameters to test_from_dict_with_parameters so pytest collects it, and adds a regression test for preserving the input dictionary.

How did you test it?

  • hatch run test:unit test/dataclasses/test_document.py
  • hatch run fmt-check haystack/dataclasses/document.py test/dataclasses/test_document.py
  • hatch run reno lint
  • git diff --check

Notes for the reviewer

This PR was prepared with an AI assistant. I reviewed the changes and ran the relevant focused checks.

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 17, 2026

@pragnyanramtha is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@pragnyanramtha pragnyanramtha marked this pull request as ready for review May 17, 2026 01:43
@pragnyanramtha pragnyanramtha requested a review from a team as a code owner May 17, 2026 01:43
@pragnyanramtha pragnyanramtha requested review from Copilot and julian-risch and removed request for a team May 17, 2026 01:43
pragnyanramtha and others added 2 commits May 18, 2026 23:42
The existing test only covered flattened metadata keys. This adds a
second case using an explicit `meta=` key, which exercises the
`data.pop("meta", {})` mutation site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview May 19, 2026 8:58am

Request Review

Copy link
Copy Markdown
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR @pragnyanramtha . The change looks good to me. I added another test case to explicitly test the behavior for the meta field in input data.

@julian-risch julian-risch enabled auto-merge (squash) May 19, 2026 08:59
@github-actions
Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/dataclasses
  document.py
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants