Skip to content

gh-111788: Don't treat path in robots.txt as URL in urllib.robotparser#113231

Closed
aisk wants to merge 3 commits into
python:mainfrom
aisk:robotparser-weird-path
Closed

gh-111788: Don't treat path in robots.txt as URL in urllib.robotparser#113231
aisk wants to merge 3 commits into
python:mainfrom
aisk:robotparser-weird-path

Conversation

@aisk

@aisk aisk commented Dec 17, 2023

Copy link
Copy Markdown
Member

@erlend-aasland

Copy link
Copy Markdown
Contributor

Can you update the PR title to more accurately (and succinctly) describe the change?

@aisk aisk changed the title gh-111788: fix a bug that urllib.robotparser will raise exception whe… gh-111788: Don't treat path in robots.txt as URL in urllib.robotparser Jan 10, 2024
@aisk

aisk commented Jan 10, 2024

Copy link
Copy Markdown
Member Author

Thanks for the review, updated!

@aisk aisk changed the title gh-111788: Don't treat path in robots.txt as URL in urllib.robotparser gh-111788: Don't treat path in robots.txt as URL in urllib.robotparser Jan 10, 2024

@serhiy-storchaka serhiy-storchaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, but the path can already be spoiled by applying unquote() which can decode %3F in the path to false ?. Fixing this requires larger changes. See #138502 which fixes several issues including this.

@serhiy-storchaka

Copy link
Copy Markdown
Member

Fixed as a part of #138502.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants