Skip to content

Use parse_url() from urllib3.util instead of urlparse#6927

Closed
ani-sinha wants to merge 2 commits into
psf:mainfrom
ani-sinha:fix-ipv6-parsing
Closed

Use parse_url() from urllib3.util instead of urlparse#6927
ani-sinha wants to merge 2 commits into
psf:mainfrom
ani-sinha:fix-ipv6-parsing

Conversation

@ani-sinha

@ani-sinha ani-sinha commented Apr 1, 2025

Copy link
Copy Markdown

urllib.parse.urlparse() does not handle link local ipv6 addresses with port numbers. Use parse_url() from urllib3 instead.

Fixes: c0813a2 ("Use TLS settings in selecting connection pool")
Fixes: #6735
Co-authored-by: Amy Chen xiachen@redhat.com
Signed-off-by: Ani Sinha anisinha@redhat.com

@ani-sinha

Copy link
Copy Markdown
Author

@nateprewitt please take a look.

@sigmavirus24

Copy link
Copy Markdown
Contributor

This has been open roughly a day. Please do not ping maintainers directly like this

@ani-sinha

Copy link
Copy Markdown
Author

This has been open roughly a day. Please do not ping maintainers directly like this

My apologies. The issue this PR tries to fix #6735 has been open for a while and we have pinged on that issue several times with no engagement for a fix. Hence, I wanted to escalate. Thanks.

@sigmavirus24

Copy link
Copy Markdown
Contributor

You're escalating as a salaried employee of a company who invest in this to two people who aren't paid to work on this project

@ani-sinha

Copy link
Copy Markdown
Author

You're escalating as a salaried employee of a company who invest in this to two people who aren't paid to work on this project

Please understand that we are not trying to add stress to anyone. Its not personal. We have tried to engage on the GH issue that has been open for almost one year. There was no resolution. We are being pushed by our customers for a fix. We investigated the issue on our end, proposed a few solutions on that issue. There was no response from maintainers. Therefore, I had to send this MR and draw their attention. If I had the merge rights I would have simply merged this myself since I see the change as a low risk item.

@frittentheke

frittentheke commented Apr 3, 2025

Copy link
Copy Markdown

You're escalating as a salaried employee of a company who invest in this to two people who aren't paid to work on this project

If I had the merge rights I would have simply merged this myself since I see the change as a low risk item.

@ani-sinha Are you calling peer reviews pointless because you know best and can anticipate all the implications this change might have?
Don't get me wrong ... I worked on the Cloudinit / OpenStack issue (canonical/cloud-init#5419) and would love for this to work with every distribution including IBM.

But requests is a library: ANY change there must receive the proper review - no drive-by fixes.

@ani-sinha

Copy link
Copy Markdown
Author

@ani-sinha Are you calling peer reviews pointless because you know best and can anticipate all the implications this change might have?

I never said that and please refrain from drawing such conclusions.

But requests is a library: ANY change there must receive the proper review - no drive-by fixes.

yes of course! I said, it was my opinion that the change is safe enough. Therefore, if I had the merge rights, I would merge them without bothering the maintainers provided there were no objections (seems so far I have seen no technical arguments against this change only concern that I am bothering unpaid maintainers too early). If people have technical objections against this change they can certainly raise it here. Which is the whole point of raising this PR. Also note that even when a change it merged, if someone raises serious objections, it can also be reverted later. BUT, we need to make progress which is what has been lacking on this issue so far. Our customers are frustrated and pushing us. We need to provide them with answers.

@frittentheke

frittentheke commented Apr 3, 2025

Copy link
Copy Markdown

@ani-sinha Are you calling peer reviews pointless because you know best and can anticipate all the implications this change might have?

I never said that and please refrain from drawing such conclusions.

But requests is a library: ANY change there must receive the proper review - no drive-by fixes.

yes of course! I said, it was my opinion that the change is safe enough. Therefore, if I had the merge rights, I would merge them without bothering the maintainers provided there were no objections (seems so far I have seen no technical arguments against this change only concern that I am bothering unpaid maintainers too early). If people have technical objections against this change they can certainly raise it here. Which is the whole point of raising this PR. Also note that even when a change it merged, if someone raises serious objections, it can also be reverted later. BUT, we need to make progress which is what has been lacking on this issue so far. Our customers are frustrated and pushing us. We need to provide them with answers.

We are all professionals here - no response does not imply there are not objections. "LGTM" is the VERY LEAST you'd have to see to even come to your conclusion that your proposed change looks safe to someone else.

While I totally agree that fixing an issue at the root (the library in this case) is the right, while slower, choice!
Maybe you might be able to provide / write up why and how requests broke along the way. If I understood you right this is somewhat of a regression with "just" >=2.32.3 ? Is the issue related to c0813a2 then?

@ani-sinha

ani-sinha commented Apr 3, 2025

Copy link
Copy Markdown
Author

We are all professionals here - no response does not imply there are not objections. "LGTM" is the VERY LEAST you'd have to see to even come to your conclusion that your proposed change looks safe to someone else.

It varies from community to community. Sometimes in the lack of such explicit approval it is implicitly assumed that there are no strong objection from anyone on the change and its safe to go in. Its a judgement call on the part of the maintainer (who btw, sometimes may not also know all the details or familiar with that particular area of the code).

why and how requests broke along the way

The discussion in the issue has the details. I have added the reference to the issue in the commit log.
TL; DR:
The change that broke requests is c0813a2 . It uses urlparse() which is not able to handle link local ipv6 addresses with port names. When you added canonical/cloud-init@af4ac7f to cloud-init, it started breaking because of this.
The suggestion in this change is to use parse_url() from utllib3 instead. This is already used elewhere in the code and the library is also already imported. So impact is minimal and the change has been tested to fix the issue and not cause any apparent regression.

@ani-sinha

ani-sinha commented Apr 3, 2025

Copy link
Copy Markdown
Author

no response does not imply there are not objections. "LGTM" is the VERY LEAST you'd have to see to even come to your conclusion that your proposed change looks safe to someone else.

Also from practical point of view, a critical fix that addresses a regression cannot wait in the queue indefinitely until someone says LGTM. When a fair amount of time has passed (which in my option it has for this issue), a call needs to be made.

@frittentheke

Copy link
Copy Markdown

no response does not imply there are not objections. "LGTM" is the VERY LEAST you'd have to see to even come to your conclusion that your proposed change looks safe to someone else.

Also from practical point of view, a critical fix that introduces regression cannot wait in the queue indefinitely until someone says LGTM. When a fair amount of time has passed (which in my option it has for this issue), a call needs to be made.

You meant to say "that fixes a regression" right - I totally agree. A bug marked as or containing a regression should be higher priority than other issues. But in the end ... no matter how well you organize your work, if there is too much it will overwhelm you anyways.

Enough of the off-topic talk - thanks for staying at this issue and proposing a fix.

@sigmavirus24

Copy link
Copy Markdown
Contributor

Also from practical point of view, a critical fix that addresses a regression cannot wait in the queue indefinitely until someone says LGTM.

The confidence you have in this is astounding. With the context that Nate and I have we know that every url parsing change is a minefield. Each parser, regardless of how standard compliant (or browser behavior compliant) breaks someone and is a regression for someone

@ani-sinha

ani-sinha commented Apr 4, 2025

Copy link
Copy Markdown
Author

The confidence you have in this is astounding

The fact that we are having endless useless debate without actually making any progress - that is quite astounding to me.
We have done basic testing. We have found no apparent regression. If we can wait 10 years and try all kinds of new exhaustive ways to test this, no problem wth me. I will ask our team to apply this change downstream and move on.

To be frank, I don't even care whether this change gets merged or not. I want the issue to be fixed in whichever way the standards of this community are met. No one was actually taking any interest in sending a PR for a fix for the issue. I did. Now I almost regret it.

Thanks for your inputs. I'm done. It will be the last time you'll hear from me on this issue.

@TheRealFalcon

Copy link
Copy Markdown

Any updates here? This does seem to be a fairly significant regression.

@sigmavirus24

Copy link
Copy Markdown
Contributor

There's significant to this particular fix. Anyone can search the history of this project and see the variety of bugs caused by using urllib3 's parse_url. That means there's risk for different and worse bugs merging this

urllib.parse.urlparse() does not handle link local ipv6 addresses with port
numbers. Use parse_url() from urllib3 instead.

Fixes: c0813a2 ("Use TLS settings in selecting connection pool")
Fixes: psf#6735

Co-authored-by: Amy Chen <xiachen@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
parse_url does not return Url with "hostname" attribute but "host" attribute.
This skipped our functional testing but caught by code analysis tools.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
@frittentheke

frittentheke commented Jul 15, 2025

Copy link
Copy Markdown

@sigmavirus24 with requests being a library I totally understand that you are rather conservative (in the most positive sense) about changes in the critical path of handling input data.
But that said, is there any way forward with changes like this one - marking it for the next major release that then might contain (potentially) breaking changes?

At its core this change here attempts to have requests accept an actually valid IPv6 address, no less. There might be other approaches to implement or fix this, but the core issue is valid.

@ani-sinha ani-sinha closed this Apr 2, 2026
tboy1337 added a commit to tboy1337/requests that referenced this pull request Apr 12, 2026
Adds support for IPv6 zone identifiers (scope IDs) in URLs, enabling
correct handling of link-local IPv6 addresses such as
http://[fe80::1%25eth0]:8080/.

Changes:
- adapters.py: Add _has_ipv6_zone_id() helper with precompiled regex
  (_IPV6_ZONE_ID_RE) to detect zone IDs in the URL authority section.
  Update _urllib3_request_context() to use urllib3's parse_url() for
  zone ID URLs and urlparse() for all others, preserving backward
  compatibility. The poolmanager parameter is kept as an optional
  default-None argument to avoid breaking subclass callers.
- models.py: Add zone ID re-encoding logic in prepare_url() with three
  precompiled patterns (_AUTHORITY_BRACKET_RE, _RFC6874_ZONE_ID_RE,
  _RAW_ZONE_ID_RE). Reconstructs the bracketed host from the original
  URL to prevent parse_url's %25->% decoding from creating ambiguous
  percent sequences downstream (e.g. %2550 -> %50 on Python 3.14).
- tests/test_adapters.py: Comprehensive test coverage including zone ID
  detection (38 parametrized cases), URL parsing (15 cases), and
  integration tests covering pool key generation, client certificates,
  full mocked request flow, encoding equivalence, URL preparation
  preservation, request_url() path extraction, and proxy code path.

Fixes psf#6927, psf#6735, psf#7088.

Made-with: Cursor
tboy1337 added a commit to tboy1337/requests that referenced this pull request Apr 12, 2026
Adds support for IPv6 zone identifiers (scope IDs) in URLs, enabling
correct handling of link-local IPv6 addresses such as
http://[fe80::1%25eth0]:8080/.

Changes:
- adapters.py: Add _has_ipv6_zone_id() helper with precompiled regex
  (_IPV6_ZONE_ID_RE) to detect zone IDs in the URL authority section.
  Update _urllib3_request_context() to use urllib3's parse_url() for
  zone ID URLs and urlparse() for all others, preserving backward
  compatibility. The poolmanager parameter is kept as an optional
  default-None argument to avoid breaking subclass callers.
- models.py: Add zone ID re-encoding logic in prepare_url() with three
  precompiled patterns (_AUTHORITY_BRACKET_RE, _RFC6874_ZONE_ID_RE,
  _RAW_ZONE_ID_RE). Reconstructs the bracketed host from the original
  URL to prevent parse_url's %25->% decoding from creating ambiguous
  percent sequences downstream (e.g. %2550 -> %50 on Python 3.14).
- tests/test_adapters.py: Comprehensive test coverage including zone ID
  detection (38 parametrized cases), URL parsing (15 cases), and
  integration tests covering pool key generation, client certificates,
  full mocked request flow, encoding equivalence, URL preparation
  preservation, request_url() path extraction, and proxy code path.

Fixes psf#6927, psf#6735, psf#7088.

Made-with: Cursor
tboy1337 added a commit to tboy1337/requests that referenced this pull request Apr 12, 2026
Adds support for IPv6 zone identifiers (scope IDs) in URLs, enabling
correct handling of link-local IPv6 addresses such as
http://[fe80::1%25eth0]:8080/.

Changes:
- adapters.py: Add _has_ipv6_zone_id() helper with precompiled regex
  (_IPV6_ZONE_ID_RE) to detect zone IDs in the URL authority section.
  Update _urllib3_request_context() to use urllib3's parse_url() for
  zone ID URLs and urlparse() for all others, preserving backward
  compatibility. The poolmanager parameter is kept as an optional
  default-None argument to avoid breaking subclass callers.
- models.py: Add zone ID re-encoding logic in prepare_url() with three
  precompiled patterns (_AUTHORITY_BRACKET_RE, _RFC6874_ZONE_ID_RE,
  _RAW_ZONE_ID_RE). Reconstructs the bracketed host from the original
  URL to prevent parse_url's %25->% decoding from creating ambiguous
  percent sequences downstream (e.g. %2550 -> %50 on Python 3.14).
- tests/test_adapters.py: Comprehensive test coverage including zone ID
  detection (38 parametrized cases), URL parsing (15 cases), and
  integration tests covering pool key generation, client certificates,
  full mocked request flow, encoding equivalence, URL preparation
  preservation, request_url() path extraction, and proxy code path.

Fixes psf#6927, psf#6735, psf#7088.

Made-with: Cursor
alexwwang added a commit to alexwwang/tdd-pipeline that referenced this pull request May 2, 2026
… interaction, and state lifecycle

Replace fictional Examples F/G with real open-source disputes:
- F: ky prefixUrl leading-slash rejection (sindresorhus/ky) — MODIFY path
- G: requests URL parser deadlock (psf/requests#6927) — escalation with dossier

Rule-level additions through 4 rounds of oracle review:
- Same-grounds vs different-grounds REJECT branches (Rule 3 scope)
- Rule 3/4 interaction: independent mechanisms, priority on same grounds
- Dispute round definition with contested-state transition
- Post-ADOPT/MODIFY lifecycle: contested → pending verification → dropped
- Escalated issues suspended from gate tally (pending external resolution)
- Structured dossier format mandated (5-section template)
- Key lesson blocks for all three examples (E/F/G)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

requests 2.32.3 with IPv6 link local address fails with error: [Errno -2] Name or service not known

4 participants