Skip to content

refactor: restrict OpenAPI $ref resolution to in-document pointers#11226

Open
etairl wants to merge 2 commits intodeepset-ai:mainfrom
etairl:harden-openapi-ref-resolution
Open

refactor: restrict OpenAPI $ref resolution to in-document pointers#11226
etairl wants to merge 2 commits intodeepset-ai:mainfrom
etairl:harden-openapi-ref-resolution

Conversation

@etairl
Copy link
Copy Markdown

@etairl etairl commented Apr 30, 2026

Summary

  • OpenAPIServiceToFunctions._parse_openapi_spec was calling jsonref.replace_refs with the library default loader, which dispatches arbitrary $ref URIs to the filesystem (file://) and the network (http(s)://).
  • Pass an explicit loader that refuses any non-in-document $ref and set proxies=False so references must resolve eagerly during parsing.
  • In-document JSON-pointer references (those starting with #) continue to work unchanged.

This is a defensive hardening change for OpenAPI specs whose contents are not fully trusted (e.g. fetched from third-party catalogs, generated by an LLM, or uploaded by end users). Specs that depend on external $ref resolution will now raise a RuntimeError instead of silently performing filesystem reads or outbound HTTP requests.

Test plan

  • CI runs OpenAPIServiceToFunctions unit tests with the new loader.
  • Manual: a spec containing an in-document $ref (e.g. #/components/schemas/Foo) still resolves correctly.
  • Manual: a spec containing $ref: file:///etc/passwd or $ref: http://... raises and does not perform the I/O.

🤖 Generated with Claude Code

OpenAPIServiceToFunctions._parse_openapi_spec previously called
jsonref.replace_refs with the library default loader, which dispatches
arbitrary $ref URIs to the filesystem and the network. Pass an explicit
loader that rejects any non-in-document reference and disable proxies so
references must be resolved eagerly. In-document JSON-pointer refs
(those starting with "#") are unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@etairl etairl requested a review from a team as a code owner April 30, 2026 10:20
@etairl etairl requested review from anakin87 and removed request for a team April 30, 2026 10:20
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

@etairl is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @etairl!

Could you better explain why this is needed?
Please also read our security policy before. If you have identified a vulnerability, please send us the details confidentially via email.

I also noticed that you opened multiple PRs. I'd suggest focusing on one at a time.

@etairl
Copy link
Copy Markdown
Author

etairl commented Apr 30, 2026

Hello, @etairl!

Could you better explain why this is needed? Please also read our security policy before. If you have identified a vulnerability, please send us the details confidentially via email.

I also noticed that you opened multiple PRs. I'd suggest focusing on one at a time.

Hi @anakin87,

Sorry about the spam (they've been sitting in my todos for a few days, created the PRs today in one batch). So on one hand this probably falls under out of scope category ("attacker-controlled input to Haystack is considered out of scope"). The reason it still should be a PR is that users expect "parse this spec" to just parse - not to read files or hit the network, so the default should be safer. This is security hardening rather than vulnerability patching.

@anakin87
Copy link
Copy Markdown
Member

Thank you! I'll give this a better look later...

Release-note linter rejects single backticks for inline code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/converters
  openapi_functions.py 241, 268
Project Total  

This report was generated by python-coverage-comment-action

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 30, 2026

CLA assistant check
All committers have signed the CLA.

@etairl
Copy link
Copy Markdown
Author

etairl commented Apr 30, 2026

Thank you! I'll give this a better look later...

Thanks. Let me know of any required changes / questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants