Strip C1 control characters from displayed gem text#9597
Merged
Conversation
Match C1 controls (U+0080-U+009F) as codepoints and only for valid UTF-8 text, so multibyte characters are preserved and other encodings are left unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the post-install message through Gem::Text#clean_text before printing it so a crafted message cannot emit raw terminal control sequences. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Extends RubyGems’ text sanitization to also replace C1 control characters (U+0080–U+009F) with . in valid UTF-8 strings, and ensures gem post-install messages are sanitized before being displayed to the user.
Changes:
- Update
Gem::Text#clean_textto additionally scrub C1 control characters for valid UTF-8 text. - Route
Gem::Installerpost-install messages throughclean_textbefore printing. - Add unit tests for C1 stripping behavior and an installer integration test to verify post-install output sanitization.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
lib/rubygems/text.rb |
Adds UTF-8/validity-gated scrubbing for U+0080–U+009F control codepoints. |
lib/rubygems/installer.rb |
Sanitizes post_install_message before printing it to the UI. |
test/rubygems/test_gem_text.rb |
Adds coverage for C1 stripping, multibyte preservation, and non-UTF-8 pass-through. |
test/rubygems/test_gem_installer.rb |
Verifies installer output does not emit raw terminal control sequences in post-install messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+13
to
+16
| # C1 control characters (U+0080-U+009F) only occur in UTF-8 text and must | ||
| # be matched as codepoints so that multibyte characters are preserved. | ||
| if text.encoding == Encoding::UTF_8 && text.valid_encoding? | ||
| text = text.gsub(/[\u0080-\u009f]/, ".") |
Comment on lines
+109
to
+112
| def test_clean_text_preserves_multibyte_characters | ||
| text = [0xe9, 0x85].pack("U*") # U+00E9 kept, NEL (U+0085) stripped | ||
| assert_equal [0xe9, 0x2e].pack("U*"), clean_text(text) | ||
| end |
| File.chmod(dir_mode, gem_dir) if dir_mode | ||
|
|
||
| say spec.post_install_message if options[:post_install_message] && !spec.post_install_message.nil? | ||
| say clean_text(spec.post_install_message) if options[:post_install_message] && !spec.post_install_message.nil? |
Reword the comment to explain that the UTF-8 guard avoids splitting multibyte sequences, and assert preservation with U+0400, whose continuation byte falls in the C1 byte range. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
post_install_message may be a non-String such as an array, so call to_s before clean_text to avoid raising during install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is your fix for the problem, implemented in this PR?
Gem::Text#clean_textnow also strips C1 control characters (U+0080-U+009F).They are matched as codepoints and only for valid UTF-8 text, so multibyte characters are preserved and other encodings are left unchanged. The post-install message is now routed through
clean_textbefore it is printed.Make sure the following tasks are checked