Skip to content

feat(dspy): add metadata output, fix kwargs merging, and Mistral prompts#3161

Open
ahlemtr wants to merge 4 commits intodeepset-ai:mainfrom
ahlemtr:feat/dspy-metadata-kwargs
Open

feat(dspy): add metadata output, fix kwargs merging, and Mistral prompts#3161
ahlemtr wants to merge 4 commits intodeepset-ai:mainfrom
ahlemtr:feat/dspy-metadata-kwargs

Conversation

@ahlemtr
Copy link
Copy Markdown

@ahlemtr ahlemtr commented Apr 13, 2026

Related Issues

Proposed Changes:

Bug Fix: Robust Configuration Merging:
Change: Implemented a deep-merge strategy for generation_kwargs.
Technical Detail: Replaced direct assignment with {**self.generation_kwargs, **(generation_kwargs or {})}.
Impact: Resolves a bug where providing a single parameter at runtime (e.g., temperature) would drop all parameters defined during component initialization (e.g., max_tokens). The component now correctly preserves defaults while allowing runtime overrides.

  1. Feature: Enhanced Metadata & Observability:
    Change: Updated run and run_async return types to include a metadata dictionary.
    Technical Detail: The component now extracts and returns the DSPy rationale (Chain-of-Thought reasoning), model_name, and signature type.
    Impact: This enables transparency in pipelines. Users can now programmatically access the "thought process" behind a DSPy prediction, which was previously inaccessible in the Haystack integration.

3.Improvement: Mistral Instruction Formatting:
Change: Added a conditional prompt-wrapping layer for Mistral-based models.
Technical Detail: Introduced a check for "mistral" in the model string. If detected, the user message is automatically wrapped in [INST] {prompt} [/INST] tokens.
Impact: Fixes a common integration issue where Mistral models produce malformed outputs or ignore instructions when the specific instruction-tuning syntax is missing from the input string.

4.Infrastructure: Native Async Support:
Change: Fully implemented the run_async method.
Technical Detail: Leveraged self._module.acall() to ensure the integration is non-blocking and compatible with modern Haystack AsyncPipeline workflows.

How did you test it?

Automated Unit Tests: Verified all changes across 40 unit tests using hatch run test:unit.

Added new test cases in tests/test_chat_generator.py and tests/test_chat_generator_async.py specifically to validate the merging of generation_kwargs and the presence of metadata in the output.

Linting & Style: Executed hatch run default:ruff check src tests and achieved zero errors.

Fixed all E501 (Line too long) and W293 (Blank line whitespace) issues to comply with the project's strict PEP 8 standards.

Serialization Check: Confirmed that the component correctly serializes and deserializes via to_dict and from_dict, ensuring pipeline compatibility.

Notes for the reviewer

Dictionary Order: The config merge in the run methods prioritizes runtime generation_kwargs, which is the standard expected behavior for Haystack generators.

Metadata Consistency: The metadata returned is structured to be consistent across both synchronous and asynchronous execution paths.

Mock Updates: Several mocks in the test suite were updated to accommodate the new metadata return structure.

Checklist

[x] My code follows the code style of this project (Ruff verified).

[x] I have updated the unit tests to reflect my changes.

[x] All new and existing tests passed locally.

[x] I have formatted the strings in test_serialization.py to stay under the 120-character limit.

@ahlemtr ahlemtr requested a review from a team as a code owner April 13, 2026 17:46
@ahlemtr ahlemtr requested review from anakin87 and removed request for a team April 13, 2026 17:46
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Coverage report (dspy)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/dspy/src/haystack_integrations/components/generators/dspy/chat
  chat_generator.py
Project Total  

This report was generated by python-coverage-comment-action

@ahlemtr
Copy link
Copy Markdown
Author

ahlemtr commented Apr 13, 2026

This is my first-ever Pull Request to the Haystack core-integrations repository (and my first contribution ). Looking forward to the review and happy to make any adjustments needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DSPy Integration

2 participants