Skip to content

fix: Update LLMMetadataExtractor to not modify documents in place#9553

Closed
sjrl wants to merge 7 commits intomainfrom
fix-llm-metadata-extractor
Closed

fix: Update LLMMetadataExtractor to not modify documents in place#9553
sjrl wants to merge 7 commits intomainfrom
fix-llm-metadata-extractor

Conversation

@sjrl
Copy link
Copy Markdown
Contributor

@sjrl sjrl commented Jun 25, 2025

Related Issues

Proposed Changes:

Update LLMMetadataExtractor to not modify Document objects in place.

I've opted to use the replace function from dataclasses to do this since it does create a new dataclass object and doesn't modify the old one in place.

How did you test it?

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@sjrl sjrl requested a review from a team as a code owner June 25, 2025 12:25
@sjrl sjrl requested review from julian-risch and removed request for a team June 25, 2025 12:25
@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 25, 2025
@sjrl sjrl self-assigned this Jun 25, 2025
@sjrl sjrl requested a review from a team as a code owner June 25, 2025 12:27
@sjrl sjrl requested review from dfokina and removed request for a team June 25, 2025 12:27
@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Jun 25, 2025

Pull Request Test Coverage Report for Build 15925816077

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 10 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 90.291%

Files with Coverage Reduction New Missed Lines %
components/extractors/llm_metadata_extractor.py 10 83.78%
Totals Coverage Status
Change from base Build 15875178481: 0.01%
Covered Lines: 11709
Relevant Lines: 12968

💛 - Coveralls

Copy link
Copy Markdown
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to better understand the issue and where distinguishing the parent documents and the new documents is a problem right now. As an alternative we could consider creating a new id only if the content changes, which is not the case here.

Comment thread haystack/components/extractors/llm_metadata_extractor.py Outdated
@sjrl sjrl added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jun 27, 2025
@sjrl sjrl changed the title fix: Update LLMMetadataExtractor to return documents with new IDs based on newly added metadata fix: Update LLMMetadataExtractor to not modify documents in place Jun 27, 2025
@sjrl
Copy link
Copy Markdown
Contributor Author

sjrl commented Jun 27, 2025

@julian-risch I don't think there is much utility in merging this PR anymore, but perhaps serve as a proposal for #9505

So if there are no objections, I'll go ahead and close this PR and update the issue with this proposal

@sjrl
Copy link
Copy Markdown
Contributor Author

sjrl commented Jun 30, 2025

Closing following up on this comment #9553 (comment)

@sjrl sjrl closed this Jun 30, 2025
@sjrl sjrl deleted the fix-llm-metadata-extractor branch March 20, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants