Skip to content

Fix URI fragment parsing without query#6393

Merged
JanuszL merged 2 commits into
NVIDIA:mainfrom
fallintoplace:fix-uri-fragment-parsing
Jun 30, 2026
Merged

Fix URI fragment parsing without query#6393
JanuszL merged 2 commits into
NVIDIA:mainfrom
fallintoplace:fix-uri-fragment-parsing

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • Stop URI authority and path parsing at query/fragment delimiters.
  • Only parse a query when ? is present, so path#fragment reaches the fragment parser.
  • Add URI parser coverage for fragment cases with and without a path/query.

Validation

  • git diff --check
  • Focused standalone C++17 parser compile/run against dali/util/uri.cc

Full gtest was not run locally because this checkout does not have a configured build directory.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug in the URI parser where #-delimited fragment components were incorrectly absorbed into the authority, path, or query when no query string was present. The fix stops authority and path scanning at #, and makes the query section conditional on encountering ? first, so scheme://host/path#frag now correctly yields an empty query and a non-empty fragment.

  • uri.cc: Three targeted loop-condition changes — authority loop gains != '#', path loop gains != '#', and the unconditional query-advance block becomes an if (*p == '?') guard.
  • uri_test.cc: Five new test cases covering fragment-only, fragment-without-query, empty-query-with-fragment, empty-path-with-fragment, and empty-path-with-query-and-fragment shapes.

Confidence Score: 5/5

Safe to merge — the change is a minimal, well-scoped correctness fix with no regressions on the existing test suite.

The three loop-condition changes are logically exhaustive: after each modified loop, *p can only be '\0', '?', or '#', and all three are handled. The path_and_query() computation (max(path_end_, query_end_)) in uri.h continues to work correctly for all new cases, as verified by the five new tests. None of the downstream callers (s3_filesystem.cc, file_label_loader.cc, etc.) use .fragment(), so their behavior is unchanged for URIs without '#' and improved for URIs that accidentally contained '#'.

No files require special attention.

Important Files Changed

Filename Overview
dali/util/uri.cc Core parse fix: authority/path loops now stop at '#', and the query section is guarded by 'if (*p == ?')'. Logic is correct for all reachable states after each loop.
dali/util/uri_test.cc Five new tests cover the previously-untested fragment parsing shapes; expected values verified against the path_and_query() max(path_end_, query_end_) formula in uri.h.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[URI::Parse] --> B[Scheme loop\nstop at ':']
    B --> C{'//' next?}
    C -- yes --> D["Authority loop\nstop at '/', '?', '#' NEW"]
    C -- no --> E
    D --> E{*p == null?}
    E -- yes --> Z[return parsed]
    E -- no --> F["Path loop\nstop at '?', '#' NEW"]
    F --> G{*p == null?}
    G -- yes --> Z
    G -- no --> H{"*p == '?' NEW guard"}
    H -- yes --> I[Advance past '?'\nQuery loop\nstop at '#']
    H -- no --> J
    I --> K{*p == null?}
    K -- yes --> Z
    K -- no --> J["Advance past '#'\nFragment loop\nstop at null"]
    J --> Z
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[URI::Parse] --> B[Scheme loop\nstop at ':']
    B --> C{'//' next?}
    C -- yes --> D["Authority loop\nstop at '/', '?', '#' NEW"]
    C -- no --> E
    D --> E{*p == null?}
    E -- yes --> Z[return parsed]
    E -- no --> F["Path loop\nstop at '?', '#' NEW"]
    F --> G{*p == null?}
    G -- yes --> Z
    G -- no --> H{"*p == '?' NEW guard"}
    H -- yes --> I[Advance past '?'\nQuery loop\nstop at '#']
    H -- no --> J
    I --> K{*p == null?}
    K -- yes --> Z
    K -- no --> J["Advance past '#'\nFragment loop\nstop at null"]
    J --> Z
Loading

Reviews (3): Last reviewed commit: "Add authority-less URI fragment test" | Re-trigger Greptile

Comment thread dali/util/uri_test.cc
Comment thread dali/util/uri_test.cc
EXPECT_EQ("", uri.path());
EXPECT_EQ("query", uri.query());
EXPECT_EQ("?query", uri.path_and_query());
EXPECT_EQ("fragment", uri.fragment());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four new cases use s3:// URIs (with an authority). The path-loop # fix also fires on authority-less URIs such as urn:path#frag or mailto:user@example.com#section, but that code path is untested. Consider adding one case like URI::Parse("urn:path#frag") to cover the path-only branch.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@fallintoplace fallintoplace force-pushed the fix-uri-fragment-parsing branch from b9c4aa5 to f29412a Compare June 23, 2026 18:56
@JanuszL

JanuszL commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@fallintoplace would you mind checking the review comment?

@fallintoplace

fallintoplace commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@JanuszL Have I already addressed this here with this commit? f29412a

Sorry if I cause any confusion. Thank you for your review.

@JanuszL

JanuszL commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

!build

@dali-automaton

Copy link
Copy Markdown
Collaborator

CI MESSAGE: [56212093]: BUILD STARTED

@dali-automaton

Copy link
Copy Markdown
Collaborator

CI MESSAGE: [56212093]: BUILD PASSED

@JanuszL

JanuszL commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@fallintoplace thank you for your contribution.

@JanuszL JanuszL merged commit 866d3ff into NVIDIA:main Jun 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants