Skip to content

fix: refactor code and update init_params in debug_state#317

Merged
Amnah199 merged 32 commits intomainfrom
fix-breakpoints
Jun 13, 2025
Merged

fix: refactor code and update init_params in debug_state#317
Amnah199 merged 32 commits intomainfrom
fix-breakpoints

Conversation

@Amnah199
Copy link
Copy Markdown
Contributor

@Amnah199 Amnah199 commented Jun 5, 2025

Related Issues

  • fixes #9464
  • remove the methods that are not needed any longer

Proposed Changes:

  • remove remove_unserializable_data method (not needed after this PR is merged)
  • Reintroduce init params in debug_state. This was removed for simplicity while converging with haystack/main.
  • make breakpoint helper methods private

How did you test it?

Ran the tests

Notes for the reviewer

I would prefer to merge this PR before continuing with serialization on a separate one here. A single PR would be too jumbled.

Checklist

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 5, 2025

Pull Request Test Coverage Report for Build 15613105327

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 28 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-1.6%) to 87.208%

Files with Coverage Reduction New Missed Lines %
dataclasses/init.py 2 66.67%
core/pipeline/base.py 4 85.19%
core/pipeline/pipeline.py 22 63.29%
Totals Coverage Status
Change from base Build 15604645818: -1.6%
Covered Lines: 1268
Relevant Lines: 1454

💛 - Coveralls

Comment thread haystack_experimental/core/pipeline/pipeline.py Outdated
@Amnah199 Amnah199 marked this pull request as ready for review June 10, 2025 14:50
@Amnah199 Amnah199 requested a review from a team as a code owner June 10, 2025 14:50
@Amnah199 Amnah199 requested review from julian-risch and sjrl and removed request for a team and julian-risch June 10, 2025 14:50
@Amnah199
Copy link
Copy Markdown
Contributor Author

I would ignore the failing tests caused due to type mismatch of GeneratedAnswer in experimental and in core. This PR has been merged in haystack main which will resolves this.

Comment thread haystack_experimental/components/builders/answer_builder.py Outdated
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jun 11, 2025

Same for adding all the Answer dataclasses. Is it just for having the new to_dict method in GeneratedAnswer?

@Amnah199
Copy link
Copy Markdown
Contributor Author

Yes the new to_dict method in GeneratedAnswer serializes ChatMessage properly, and prevents JSON serializable error. I added the answer dataclasses and made AnswerBuilder here to prevent failing test cases but mismatch still exists.

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jun 11, 2025

Yes the new to_dict method in GeneratedAnswer serializes ChatMessage properly, and prevents JSON serializable error. I added the answer dataclasses and made AnswerBuilder here to prevent failing test cases but mismatch still exists.

Okay I gotcha. Would it be possible then to only bring over GeneratedAnswer and not all of the other answer data classes if they aren’t needed? Also it’d be great if you could inherit from the haystack GeneratedAnswer and only add the code you have to. You can see how we did this for ChatMessage in this repo as well.

@Amnah199
Copy link
Copy Markdown
Contributor Author

Yes the new to_dict method in GeneratedAnswer serializes ChatMessage properly, and prevents JSON serializable error. I added the answer dataclasses and made AnswerBuilder here to prevent failing test cases but mismatch still exists.

Okay I gotcha. Would it be possible then to only bring over GeneratedAnswer and not all of the other answer data classes if they aren’t needed? Also it’d be great if you could inherit from the haystack GeneratedAnswer and only add the code you have to. You can see how we did this for ChatMessage in this repo as well.

Right, I can do that. I initially added it as-is since I didn’t want to spend too much time patching GeneratedAnswer, given that it's going to be merged eventually. Just a heads-up — I might also need to add the AnswerJoiner to avoid the failing tests caused by mismatches.

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jun 11, 2025

@Amnah199 it looks like the errors are Connection errors from Pipeline. We have an option in pipeline’s init that lets your turn off the connection validation. So then these errors should go away.

@Amnah199 Amnah199 requested a review from sjrl June 12, 2025 09:58
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated

Raises a PipelineRuntimeError if any component is missing or if the state structure is invalid.

:param resume_state: The saved state to validate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstring could we mention what keys are expected in resume_state?

Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py
logger.info("Passed resume state validated successfully.")


def load_state(file_path: Union[str, Path]) -> Dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since we have State in Agent now I think we should think about a more specific name than state in the function name. I see in the docs we call it "pipeline state", but I believe we are really loading a dict with keys "input_data", "pipeline_breakpoint", "pipeline_state" so it feels a bit odd to call the whole thing pipeline_state if there is a key called pipeline_state inside.

Do you have an idea on some better namings we could use?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can stick to calling it debug_state or we consistently us resume_state internally so we can also keep that. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something different altogether like pipeline_snapshot?

Copy link
Copy Markdown
Contributor Author

@Amnah199 Amnah199 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm if we opt for this then we should consistently use pipeline_snapshot instead of state everywhere. Including the resume_state param for run method

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could make sense. I believe pipeline_snapshot is more informative than resume_state which from it's name sounds like it should be a boolean

Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Comment thread haystack_experimental/core/pipeline/breakpoint.py Outdated
Amnah199 and others added 7 commits June 12, 2025 15:35
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Validates the pipeline_breakpoint passed to the pipeline.

Make sure they are all valid components registered in the pipeline,
Makes sure the breakpoint contains a valid components registered in the pipeline.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Makes sure the breakpoint contains a valid components registered in the pipeline.
Makes sure the breakpoint contains a valid component registered in the pipeline.

Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise looks good!

@Amnah199 Amnah199 merged commit 6be854d into main Jun 13, 2025
8 checks passed
@Amnah199 Amnah199 deleted the fix-breakpoints branch June 13, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix workaround we introduced for DocumentJoiner and BranchJoiner

3 participants