-
Notifications
You must be signed in to change notification settings - Fork 258
feat: add support for GitHubRepoForkerTool #1968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mpangrazzi
merged 24 commits into
deepset-ai:main
from
srini047:github_repo_forker_tool_integration
Aug 18, 2025
Merged
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
c3b4cfd
feat: add support for GitHubRepoForkerTool
srini047 549a548
Merge branch 'main' into github_repo_forker_tool_integration
srini047 efc1d8e
Merge branch 'main' into github_repo_forker_tool_integration
srini047 a28727e
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi 73859a2
fix: remove extra params
srini047 46652f8
Merge branch 'main' into github_repo_forker_tool_integration
srini047 f8d7e2d
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi 716584a
Merge branch 'main' into github_repo_forker_tool_integration
srini047 d293f27
fix: revert token check
srini047 cd5ff1d
fix: typing issues
srini047 0067644
Merge branch 'main' into github_repo_forker_tool_integration
srini047 47a1c1f
fix: test issue
srini047 4cd1b86
Merge branch 'main' into github_repo_forker_tool_integration
srini047 623f566
fix: revert as per comments
srini047 9460d55
Merge branch 'main' into github_repo_forker_tool_integration
srini047 e5cb0f6
Merge branch 'main' into github_repo_forker_tool_integration
srini047 cbf001d
Merge branch 'main' into github_repo_forker_tool_integration
srini047 651c304
Update integrations/github/src/haystack_integrations/components/conne…
sjrl 38ac22e
Update integrations/github/src/haystack_integrations/tools/github/rep…
sjrl 625dcb6
Merge branch 'main' into github_repo_forker_tool_integration
srini047 f238483
Merge branch 'main' into github_repo_forker_tool_integration
srini047 5b975c3
fix: test failures
srini047 78cf9d2
fix: formatting issue
srini047 df75847
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
integrations/github/src/haystack_integrations/prompts/github/repo_forker_prompt.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # SPDX-FileCopyrightText: 2023-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| REPO_FORKER_PROMPT = """Haystack-Agent uses this tool to fork GitHub repositories in order to contribute to issues. | ||
| Haystack-Agent initiates a fork so it can freely make changes for contributions. | ||
| A fork is required to open a pull request to the upstream repository. | ||
| Haystack-Agent works by forking the repository associated with a given issue. | ||
|
|
||
| <usage> | ||
| Pass a `url` string for the GitHub issue you want to work on in a fork. | ||
| It is REQUIRED to pass `url` to use this tool. | ||
| The structure must be "https://github.com/<repo-owner>/<repo-name>/issues/<issue-number>". | ||
|
|
||
| Examples: | ||
|
|
||
| - {"url": "https://github.com/deepset-ai/haystack/issues/9343"} | ||
| - will fork the "deepset-ai/haystack" repository to work on issue 9343 | ||
| - {"url": "https://github.com/deepset-ai/haystack-core-integrations/issues/1685"} | ||
| - will fork the "deepset-ai/haystack-core-integrations" repository to work on issue 1685 | ||
| </usage> | ||
|
|
||
| Haystack-Agent uses the `repo_forker` tool to create a copy (fork) of the target repository into its own account. | ||
| Haystack-Agent ensures the issue URL is valid and points to a real GitHub issue. | ||
| It parses the URL to identify the correct repository. | ||
|
|
||
| <thinking> | ||
| - Does this issue belong to the repository I need to work on? | ||
| - Can I extract the owner and repository name from the URL? | ||
| - Why am I forking this repository? (e.g., to implement a fix, to add a feature) | ||
| - Is there anything special about the branch or base state I should be aware of? | ||
| </thinking> | ||
|
|
||
| Haystack-Agent reflects on the results after forking: | ||
| <thinking> | ||
| - Did the fork succeed? Is the fork visible in my account? | ||
| - Can I access, clone, and push to my fork? | ||
| - Are there any permissions or fork-specific settings to configure before proceeding? | ||
| - Which branch will I be working on in the fork? | ||
| </thinking> | ||
|
|
||
| IMPORTANT | ||
| Haystack-Agent ONLY forks the repository mentioned in the given issue URL. | ||
| Haystack-Agent does NOT attempt to fork organizations, user profiles, or non-issue URLs. | ||
| Haystack-Agent knows that forking is a prerequisite to contributing changes and creating pull requests. | ||
|
|
||
| Haystack-Agent takes notes after the fork: | ||
| <scratchpad> | ||
| - Record the URL of the forked repository | ||
| - Note the original issue being worked on | ||
| - Document any post-fork steps (e.g., git cloning, installing dependencies) | ||
| - Make note of any errors or special setup requirements | ||
| </scratchpad> | ||
| """ | ||
|
|
||
| REPO_FORKER_SCHEMA = { | ||
| "properties": { | ||
| "url": {"type": "string", "description": "URL of the GitHub issue to work on in the fork."}, | ||
| }, | ||
| "required": ["url"], | ||
| "type": "object", | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
127 changes: 127 additions & 0 deletions
127
integrations/github/src/haystack_integrations/tools/github/repo_forker_tool.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| # SPDX-FileCopyrightText: 2023-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| from typing import Any, Callable, Dict, Optional, Union | ||
|
|
||
| from haystack.core.serialization import generate_qualified_class_name | ||
| from haystack.tools import ComponentTool | ||
| from haystack.utils import Secret, deserialize_secrets_inplace | ||
|
|
||
| from haystack_integrations.components.connectors.github.repo_forker import GitHubRepoForker | ||
| from haystack_integrations.prompts.github.repo_forker_prompt import REPO_FORKER_PROMPT, REPO_FORKER_SCHEMA | ||
| from haystack_integrations.tools.github.utils import deserialize_handlers, serialize_handlers | ||
|
|
||
|
|
||
| class GitHubRepoForkerTool(ComponentTool): | ||
| """ | ||
| A tool for forking Github repository. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
| name: Optional[str] = "repo_forker", | ||
| description: Optional[str] = REPO_FORKER_PROMPT, | ||
| parameters: Optional[Dict[str, Any]] = REPO_FORKER_SCHEMA, | ||
| github_token: Optional[Secret] = None, | ||
|
sjrl marked this conversation as resolved.
Outdated
|
||
| repo: Optional[str] = None, | ||
| branch: str = "main", | ||
|
srini047 marked this conversation as resolved.
Outdated
|
||
| raise_on_failure: bool = True, | ||
| outputs_to_string: Optional[Dict[str, Union[str, Callable[[Any], str]]]] = None, | ||
| inputs_from_state: Optional[Dict[str, str]] = None, | ||
| outputs_to_state: Optional[Dict[str, Dict[str, Union[str, Callable]]]] = None, | ||
| ): | ||
| """ | ||
| Initialize the GitHub Repo Forker tool. | ||
|
|
||
| :param name: Optional name for the tool. | ||
| :param description: Optional description. | ||
| :param parameters: Optional JSON schema defining the parameters expected by the Tool. | ||
| :param github_token: GitHub personal access token for API authentication | ||
| :param repo: Default repository in owner/repo format | ||
| :param branch: Default branch to work with | ||
| :param raise_on_failure: If True, raises exceptions on API errors | ||
| :param outputs_to_string: | ||
| Optional dictionary defining how a tool outputs should be converted into a string. | ||
| If the source is provided only the specified output key is sent to the handler. | ||
| If the source is omitted the whole tool result is sent to the handler. | ||
| Example: { | ||
| "source": "docs", "handler": format_documents | ||
| } | ||
| :param inputs_from_state: | ||
| Optional dictionary mapping state keys to tool parameter names. | ||
| Example: {"repository": "repo"} maps state's "repository" to tool's "repo" parameter. | ||
| :param outputs_to_state: | ||
| Optional dictionary defining how tool outputs map to keys within state as well as optional handlers. | ||
| If the source is provided only the specified output key is sent to the handler. | ||
| Example: { | ||
| "documents": {"source": "docs", "handler": custom_handler} | ||
| } | ||
| If the source is omitted the whole tool result is sent to the handler. | ||
| Example: { | ||
| "documents": {"handler": custom_handler} | ||
| } | ||
| """ | ||
| self.name = name | ||
| self.description = description | ||
| self.parameters = parameters | ||
| self.github_token = github_token | ||
| self.repo = repo | ||
| self.branch = branch | ||
| self.raise_on_failure = raise_on_failure | ||
| self.outputs_to_string = outputs_to_string | ||
| self.inputs_from_state = inputs_from_state | ||
| self.outputs_to_state = outputs_to_state | ||
|
|
||
| repo_forker = GitHubRepoForker( | ||
| github_token=github_token, | ||
| raise_on_failure=raise_on_failure, | ||
| ) | ||
|
|
||
| super().__init__( | ||
| component=repo_forker, | ||
| name=name, | ||
| description=description, | ||
| parameters=parameters, | ||
| outputs_to_string=self.outputs_to_string, | ||
| inputs_from_state=self.inputs_from_state, | ||
| outputs_to_state=self.outputs_to_state, | ||
| ) | ||
|
|
||
| def to_dict(self) -> Dict[str, Any]: | ||
| """ | ||
| Serializes the tool to a dictionary. | ||
|
|
||
| Returns: | ||
| Dictionary with serialized data. | ||
| """ | ||
| serialized = { | ||
| "name": self.name, | ||
| "description": self.description, | ||
| "parameters": self.parameters, | ||
| "github_token": self.github_token.to_dict() if self.github_token else None, | ||
| "repo": self.repo, | ||
| "branch": self.branch, | ||
| "raise_on_failure": self.raise_on_failure, | ||
| "outputs_to_string": self.outputs_to_string, | ||
| "inputs_from_state": self.inputs_from_state, | ||
| "outputs_to_state": self.outputs_to_state, | ||
| } | ||
|
|
||
| serialize_handlers(serialized, self.outputs_to_state, self.outputs_to_string) | ||
| return {"type": generate_qualified_class_name(type(self)), "data": serialized} | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, data: Dict[str, Any]) -> "GitHubRepoForkerTool": | ||
| """ | ||
| Deserializes the tool from a dictionary. | ||
|
|
||
| :param data: | ||
| Dictionary to deserialize from. | ||
| :returns: | ||
| Deserialized tool. | ||
| """ | ||
| inner_data = data["data"] | ||
| deserialize_secrets_inplace(inner_data, keys=["github_token"]) | ||
| deserialize_handlers(inner_data) | ||
| return cls(**inner_data) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # SPDX-FileCopyrightText: 2023-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| from haystack_integrations.prompts.github.repo_forker_prompt import REPO_FORKER_PROMPT, REPO_FORKER_SCHEMA | ||
| from haystack_integrations.tools.github.repo_forker_tool import GitHubRepoForkerTool | ||
| from haystack_integrations.tools.github.utils import message_handler | ||
|
|
||
|
|
||
| class TestGitHubRepoForkerTool: | ||
| def test_init(self, monkeypatch): | ||
| monkeypatch.setenv("GITHUB_TOKEN", "test-token") | ||
| tool = GitHubRepoForkerTool() | ||
|
|
||
| assert tool.name == "repo_forker" | ||
| assert tool.description == REPO_FORKER_PROMPT | ||
| assert tool.parameters == REPO_FORKER_SCHEMA | ||
| assert tool.github_token is None | ||
| assert tool.repo is None | ||
| assert tool.branch == "main" | ||
| assert tool.raise_on_failure is True | ||
| assert tool.outputs_to_string is None | ||
| assert tool.inputs_from_state is None | ||
| assert tool.outputs_to_state is None | ||
|
|
||
| def test_from_dict(self, monkeypatch): | ||
| monkeypatch.setenv("GITHUB_TOKEN", "test-token") | ||
| tool_dict = { | ||
| "type": "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool", | ||
| "data": { | ||
| "name": "repo_forker", | ||
| "description": REPO_FORKER_PROMPT, | ||
| "parameters": REPO_FORKER_SCHEMA, | ||
| "github_token": None, | ||
| "repo": None, | ||
| "branch": "main", | ||
| "raise_on_failure": True, | ||
| "outputs_to_string": None, | ||
| "inputs_from_state": None, | ||
| "outputs_to_state": None, | ||
| }, | ||
| } | ||
| tool = GitHubRepoForkerTool.from_dict(tool_dict) | ||
| assert tool.name == "repo_forker" | ||
| assert tool.description == REPO_FORKER_PROMPT | ||
| assert tool.parameters == REPO_FORKER_SCHEMA | ||
| assert tool.github_token is None | ||
| assert tool.repo is None | ||
| assert tool.branch == "main" | ||
| assert tool.raise_on_failure is True | ||
| assert tool.outputs_to_string is None | ||
| assert tool.inputs_from_state is None | ||
| assert tool.outputs_to_state is None | ||
|
|
||
| def test_to_dict(self, monkeypatch): | ||
| monkeypatch.setenv("GITHUB_TOKEN", "test-token") | ||
| tool = GitHubRepoForkerTool() | ||
| tool_dict = tool.to_dict() | ||
| assert tool_dict["type"] == "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool" | ||
| assert tool_dict["data"]["name"] == "repo_forker" | ||
| assert tool_dict["data"]["description"] == REPO_FORKER_PROMPT | ||
| assert tool_dict["data"]["parameters"] == REPO_FORKER_SCHEMA | ||
| assert tool_dict["data"]["github_token"] is None | ||
| assert tool_dict["data"]["repo"] is None | ||
| assert tool_dict["data"]["branch"] == "main" | ||
| assert tool_dict["data"]["raise_on_failure"] is True | ||
| assert tool_dict["data"]["outputs_to_string"] is None | ||
| assert tool_dict["data"]["inputs_from_state"] is None | ||
| assert tool_dict["data"]["outputs_to_state"] is None | ||
|
|
||
| def test_to_dict_with_extra_params(self, monkeypatch): | ||
| monkeypatch.setenv("GITHUB_TOKEN", "test-token") | ||
| tool = GitHubRepoForkerTool( | ||
| github_token=None, | ||
| repo="owner/repo", | ||
| branch="dev", | ||
| raise_on_failure=False, | ||
| outputs_to_string={"source": "docs", "handler": message_handler}, | ||
| inputs_from_state={"repository": "repo"}, | ||
| outputs_to_state={"documents": {"source": "docs", "handler": message_handler}}, | ||
| ) | ||
| tool_dict = tool.to_dict() | ||
| assert tool_dict["type"] == "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool" | ||
| assert tool_dict["data"]["name"] == "repo_forker" | ||
| assert tool_dict["data"]["description"] == REPO_FORKER_PROMPT | ||
| assert tool_dict["data"]["parameters"] == REPO_FORKER_SCHEMA | ||
| assert tool_dict["data"]["github_token"] is None | ||
| assert tool_dict["data"]["repo"] == "owner/repo" | ||
| assert tool_dict["data"]["branch"] == "dev" | ||
| assert tool_dict["data"]["raise_on_failure"] is False | ||
| assert ( | ||
| tool_dict["data"]["outputs_to_string"]["handler"] | ||
| == "haystack_integrations.tools.github.utils.message_handler" | ||
| ) | ||
| assert tool_dict["data"]["inputs_from_state"] == {"repository": "repo"} | ||
| assert tool_dict["data"]["outputs_to_state"]["documents"]["source"] == "docs" | ||
| assert ( | ||
| tool_dict["data"]["outputs_to_state"]["documents"]["handler"] | ||
| == "haystack_integrations.tools.github.utils.message_handler" | ||
| ) | ||
|
|
||
| def test_from_dict_with_extra_params(self, monkeypatch): | ||
| monkeypatch.setenv("GITHUB_TOKEN", "test-token") | ||
| tool_dict = { | ||
| "type": "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool", | ||
| "data": { | ||
| "name": "repo_forker", | ||
| "description": REPO_FORKER_PROMPT, | ||
| "parameters": REPO_FORKER_SCHEMA, | ||
| "github_token": None, | ||
| "repo": "owner/repo", | ||
| "branch": "dev", | ||
| "raise_on_failure": False, | ||
| "outputs_to_string": {"handler": "haystack_integrations.tools.github.utils.message_handler"}, | ||
| "inputs_from_state": {"repository": "repo"}, | ||
| "outputs_to_state": { | ||
| "documents": { | ||
| "source": "docs", | ||
| "handler": "haystack_integrations.tools.github.utils.message_handler", | ||
| } | ||
| }, | ||
| }, | ||
| } | ||
| tool = GitHubRepoForkerTool.from_dict(tool_dict) | ||
| assert tool.name == "repo_forker" | ||
| assert tool.description == REPO_FORKER_PROMPT | ||
| assert tool.parameters == REPO_FORKER_SCHEMA | ||
| assert tool.github_token is None | ||
| assert tool.repo == "owner/repo" | ||
| assert tool.branch == "dev" | ||
| assert tool.raise_on_failure is False | ||
| assert tool.outputs_to_string["handler"] == message_handler | ||
| assert tool.inputs_from_state == {"repository": "repo"} | ||
| assert tool.outputs_to_state["documents"]["source"] == "docs" | ||
| assert tool.outputs_to_state["documents"]["handler"] == message_handler |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.