Skip to content

Strip C1 control characters from displayed gem text#9597

Merged
hsbt merged 4 commits into
masterfrom
clean-text-c1-controls
Jun 5, 2026
Merged

Strip C1 control characters from displayed gem text#9597
hsbt merged 4 commits into
masterfrom
clean-text-c1-controls

Conversation

@hsbt
Copy link
Copy Markdown
Member

@hsbt hsbt commented Jun 4, 2026

What is your fix for the problem, implemented in this PR?

Gem::Text#clean_text now also strips C1 control characters (U+0080-U+009F).

They are matched as codepoints and only for valid UTF-8 text, so multibyte characters are preserved and other encodings are left unchanged. The post-install message is now routed through clean_text before it is printed.

Make sure the following tasks are checked

hsbt and others added 2 commits June 4, 2026 17:36
Match C1 controls (U+0080-U+009F) as codepoints and only for valid UTF-8
text, so multibyte characters are preserved and other encodings are left
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the post-install message through Gem::Text#clean_text before
printing it so a crafted message cannot emit raw terminal control
sequences.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 4, 2026 08:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends RubyGems’ text sanitization to also replace C1 control characters (U+0080–U+009F) with . in valid UTF-8 strings, and ensures gem post-install messages are sanitized before being displayed to the user.

Changes:

  • Update Gem::Text#clean_text to additionally scrub C1 control characters for valid UTF-8 text.
  • Route Gem::Installer post-install messages through clean_text before printing.
  • Add unit tests for C1 stripping behavior and an installer integration test to verify post-install output sanitization.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
lib/rubygems/text.rb Adds UTF-8/validity-gated scrubbing for U+0080–U+009F control codepoints.
lib/rubygems/installer.rb Sanitizes post_install_message before printing it to the UI.
test/rubygems/test_gem_text.rb Adds coverage for C1 stripping, multibyte preservation, and non-UTF-8 pass-through.
test/rubygems/test_gem_installer.rb Verifies installer output does not emit raw terminal control sequences in post-install messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rubygems/text.rb Outdated
Comment on lines +13 to +16
# C1 control characters (U+0080-U+009F) only occur in UTF-8 text and must
# be matched as codepoints so that multibyte characters are preserved.
if text.encoding == Encoding::UTF_8 && text.valid_encoding?
text = text.gsub(/[\u0080-\u009f]/, ".")
Comment on lines +109 to +112
def test_clean_text_preserves_multibyte_characters
text = [0xe9, 0x85].pack("U*") # U+00E9 kept, NEL (U+0085) stripped
assert_equal [0xe9, 0x2e].pack("U*"), clean_text(text)
end
Comment thread lib/rubygems/installer.rb Outdated
File.chmod(dir_mode, gem_dir) if dir_mode

say spec.post_install_message if options[:post_install_message] && !spec.post_install_message.nil?
say clean_text(spec.post_install_message) if options[:post_install_message] && !spec.post_install_message.nil?
hsbt and others added 2 commits June 4, 2026 18:39
Reword the comment to explain that the UTF-8 guard avoids splitting
multibyte sequences, and assert preservation with U+0400, whose
continuation byte falls in the C1 byte range.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
post_install_message may be a non-String such as an array, so call to_s
before clean_text to avoid raising during install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hsbt hsbt merged commit d2055b4 into master Jun 5, 2026
105 checks passed
@hsbt hsbt deleted the clean-text-c1-controls branch June 5, 2026 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants