Skip to content

bug: AnswerBuilder.run() mutates input documents' meta dict #11371

@sachinn854

Description

@sachinn854

Describe the bug

AnswerBuilder.run() permanently modifies the meta dict of the original input Document objects.
After calling run(), every input document gains a source_index key (and a referenced key when
reference_pattern is set) — even though the caller never asked for that.

Error message

No error is thrown — this is a silent correctness bug.

Expected behavior

doc.meta should remain unchanged after run() returns. The source_index key should only appear
on the copy of the document inside GeneratedAnswer.documents, not on the original input document.

To Reproduce

from haystack import Document
from haystack.components.builders import AnswerBuilder

doc = Document(content="Paris is the capital of France.", meta={"source": "wiki"})
builder = AnswerBuilder()
builder.run(query="Capital of France?", replies=["Paris."], documents=[doc])

print(doc.meta)
# {"source": "wiki", "source_index": 1}  ← original was mutated

Additional context

Root cause: answer_builder.py line 207:

doc_meta: dict[str, Any] = doc.meta or {}

doc.meta or {} returns the SAME dict object when meta is non-empty (truthy). The next line
doc_meta["source_index"] = idx + 1 then mutates the original doc.meta directly.

dataclasses.replace(doc, meta=doc_meta) was clearly intended to avoid mutation, but the
shallow alias defeats it.

Fix: doc_meta: dict[str, Any] = dict(doc.meta)

Note: chat_prompt_builder.py even has an explicit comment "use dataclasses.replace to avoid
in-place mutation" — answer_builder.py was missed. Similar bug was fixed in Document.from_dict
via PR #11330.

FAQ Check

System:

  • OS: Windows 11
  • GPU/CPU:
  • Haystack version: main branch
  • DocumentStore:
  • Reader:
  • Retriever:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Contributions wanted!Looking for external contributionsP2Medium priority, add to the next sprint if no P1 available

    Type

    No fields configured for Bug.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions