Skip to content

Fix URL decoding to respect server-specified character encoding#159

Draft
Copilot wants to merge 5 commits into
masterfrom
copilot/fix-http-encoding-issue
Draft

Fix URL decoding to respect server-specified character encoding#159
Copilot wants to merge 5 commits into
masterfrom
copilot/fix-http-encoding-issue

Conversation

Copy link
Copy Markdown

Copilot AI commented Nov 15, 2025

URLs with non-ASCII characters were incorrectly decoded when servers specified non-UTF-8 encodings (e.g., ISO-8859-1). The code always used UTF-8 for URL decoding, causing %e9 (é in ISO-8859-1) to render as � (replacement character).

Changes

  • Added encoding-aware URL decoder (Library.UrlDecode)

    • Accepts optional Encoding parameter to decode percent-encoded URLs using the correct charset
    • Handles edge cases: plus signs as spaces, malformed sequences, partial percent escapes
  • Extract charset from HTTP responses (DirectoryParser.GetEncodingFromResponse)

    • Reads Content-Type header to determine server-specified encoding
    • Falls back to UTF-8 when charset is absent or invalid
  • Updated parsing methods to propagate encoding

    • ParseHtml, ParseJavaScriptDrawn, ParseTablesDirectoryListing now accept and use response encoding
    • Removed double-decoding (was calling both Uri.UnescapeDataString and WebUtility.UrlDecode)

Example

// Server sends: Content-Type: text/html; charset=ISO-8859-1
// HTML contains: <a href="pr%e9-montage.aiff">

// Before: "pr�-montage.aiff" (� = Unicode replacement character)
// After:  "pré-montage.aiff" (correctly decoded using ISO-8859-1)

Fixes #153

Original prompt

This section details on the original issue you should resolve

<issue_title>HTTP enconding</issue_title>
<issue_description>Describe the bug
It could be needed to take into account the specified encoding supplied by the web server.

See #150 (comment) for a possible issue.</issue_description>

<agent_instructions>Try to use the encoding specified by the webserver</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits November 15, 2025 22:53
Co-authored-by: KoalaBear84 <5886900+KoalaBear84@users.noreply.github.com>
Co-authored-by: KoalaBear84 <5886900+KoalaBear84@users.noreply.github.com>
Co-authored-by: KoalaBear84 <5886900+KoalaBear84@users.noreply.github.com>
Co-authored-by: KoalaBear84 <5886900+KoalaBear84@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix HTTP encoding to match server specifications Fix URL decoding to respect server-specified character encoding Nov 15, 2025
Copilot AI requested a review from KoalaBear84 November 15, 2025 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP enconding

2 participants