Skip to content

Strip non printable characters from locators#1720

Merged
tjugdev merged 2 commits into
masterfrom
strip-non-printables
Jun 16, 2026
Merged

Strip non printable characters from locators#1720
tjugdev merged 2 commits into
masterfrom
strip-non-printables

Conversation

@tjugdev

@tjugdev tjugdev commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Overview

Remove non-printable characters from locators in the sourceUnits section of the CLI output. The graphs in the "projects" section still include non-printable characters, as arguably leaving them in is more correct and we don't currently have issues with them being there (mostly because we don't read that output).

Acceptance criteria

Locators in source units don't include non-printable characters.

Testing plan

In a new directory, create a pyproject.toml with the following contents:

[tool.poetry]
name = "nonprint"
version = "0.1.0"
description = "Reproduction for null bytes in Build.Imports"
authors = []

[tool.poetry.dependencies]
python = "^3.9"
"flask\u0000\u0000" = "^1.1.4"
"requests\u0000\u0000" = ">=2.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Run cabal run fossa -- analyze --output ./your-test-dir and confirm that the locators in the source units don't contain null characters.

Risks

Highlight any areas that you're unsure of, want feedback on, or want reviewers to pay particular attention to.

Example: I'm not sure I did X correctly, can reviewers please double-check that for me?

Metrics

Is this change something that can or should be tracked? If so, can we do it today? And how? If its easy, do it

References

CORE-7083

Checklist

  • I added tests for this PR's change (or explained in the PR description why tests don't make sense).
  • If this PR introduced a user-visible change, I added documentation into docs/.
  • If this PR added docs, I added links as appropriate to the user manual's ToC in docs/README.ms and gave consideration to how discoverable or not my documentation is.
  • If this change is externally visible, I updated Changelog.md. If this PR did not mark a release, I added my changes into an ## Unreleased section at the top.
  • If I made changes to .fossa.yml or fossa-deps.{json.yml}, I updated docs/references/files/*.schema.json AND I have updated example files used by fossa init command. You may also need to update these if you have added/removed new dependency type (e.g. pip) or analysis target type (e.g. poetry).
  • If I made changes to a subcommand's options, I updated docs/references/subcommands/<subcommand>.md.

@tjugdev tjugdev requested a review from a team as a code owner June 15, 2026 22:27
@tjugdev tjugdev requested a review from nficca June 15, 2026 22:27
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

renderLocator in Srclib.Types is updated to strip non-printable characters from the assembled locator string. A new Data.Char qualified import is added, and the function now passes the concatenated fetcher, project, and optional revision text through a local stripNonPrintable helper that uses Text.filter Char.isPrint. A new Hspec test in test/Srclib/TypesSpec.hs imports renderLocator and asserts that a Locator containing NUL bytes produces the expected normalized output string.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: removing non-printable characters from locators.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description includes all major template sections with sufficient detail: Overview, Acceptance criteria, Testing plan, References, and Checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/Srclib/TypesSpec.hs`:
- Around line 19-22: The renderLocator test currently validates the main use
case with multiple control characters but lacks edge case coverage. Consider
adding two additional test cases to the renderLocator describe block: one test
verifying that renderLocator correctly handles a locator with no non-printable
characters present (regression safety), and another test covering the edge case
where fields become empty or nearly empty after stripping non-printable
characters. These additions will improve the robustness of the test suite.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 08790897-9ffe-4a52-8e8c-b585157de8fe

📥 Commits

Reviewing files that changed from the base of the PR and between 98130f3 and 01e2fc4.

📒 Files selected for processing (2)
  • src/Srclib/Types.hs
  • test/Srclib/TypesSpec.hs

Comment thread test/Srclib/TypesSpec.hs
Comment on lines +19 to +22
describe "renderLocator" $ do
it "should strip non-printable characters such as NUL bytes" $ do
let locator = Locator "go" "github.com/gin-gonic\NUL/gin" (Just "v1.\t9.1\DEL")
renderLocator locator `shouldBe` "go+github.com/gin-gonic/gin$v1.9.1"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Test correctly validates non-printable character stripping.

The test case covers the main use case from the PR objectives (NUL bytes in dependency names) and includes multiple control character types (\NUL, \t, \DEL). The implementation correctly verifies these are removed from the rendered locator string.

Optional enhancement: Consider adding test cases for edge cases such as a locator with no non-printable characters (regression safety) or fields that become empty after stripping, though the current coverage is sufficient for the acceptance criteria.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/Srclib/TypesSpec.hs` around lines 19 - 22, The renderLocator test
currently validates the main use case with multiple control characters but lacks
edge case coverage. Consider adding two additional test cases to the
renderLocator describe block: one test verifying that renderLocator correctly
handles a locator with no non-printable characters present (regression safety),
and another test covering the edge case where fields become empty or nearly
empty after stripping non-printable characters. These additions will improve the
robustness of the test suite.

@tjugdev tjugdev merged commit 9cab5c4 into master Jun 16, 2026
21 of 31 checks passed
@tjugdev tjugdev deleted the strip-non-printables branch June 16, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants