fix: rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD by greynewell · Pull Request #106 · supermodeltools/cli

greynewell · 2026-04-09T19:13:47Z

Summary

stepName in internal/archdocs/pssg/schema/jsonld.go used byte-based slicing (step[:77]) to cap long recipe instruction steps
When a step contains multi-byte UTF-8 characters (e.g. "sauté", "jalapeño", "crème brûlée"), slicing at byte 77 can land mid-sequence, producing invalid UTF-8 in the generated JSON-LD <script> tag
Fix converts to []rune before truncating — same pattern already used in ReadClaudeMD, dotEscape, and other truncation points in the codebase

Test plan

TestStepName_MultiByteUTF8: 81 × "é" (162 bytes, 81 runes) — verifies output is valid UTF-8 and is truncated
TestStepName_ShortStep: short step returned unchanged
TestStepName_FirstSentence: first-sentence extraction unaffected
TestStepName_TruncatesLongASCII: ASCII truncation still works
go test ./... passes

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Fixed string truncation logic to properly handle multi-byte UTF-8 characters without splitting them mid-character.
Tests
- Added comprehensive unit tests for instruction step string handling, covering ASCII truncation, first-sentence extraction, and UTF-8 character validation scenarios.

stepName used byte-based slicing (step[:77]) to truncate long recipe instruction steps. When a step contains multi-byte UTF-8 characters (e.g. "sauté", "jalapeño", "crème"), slicing at byte 77 can land in the middle of a multi-byte sequence, producing invalid UTF-8 in the generated JSON-LD structured data. Fixes by converting to []rune before truncating, matching the same pattern used elsewhere in the codebase (e.g. ReadClaudeMD, dotEscape). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-09T19:14:03Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

The code updates UTF-8 string truncation logic in the stepName function. Instead of truncating based on byte position (which could break multi-byte UTF-8 characters), the code now converts strings to runes, truncates at the rune level, and converts back—ensuring complete characters are preserved. Comprehensive tests were added to verify this behavior.

Changes

Cohort / File(s)	Summary
UTF-8 Truncation Fix `internal/archdocs/pssg/schema/jsonld.go`	Updated `stepName` truncation logic to use rune-based slicing instead of byte-based slicing, preventing corruption when truncating strings with multi-byte UTF-8 characters.
Test Coverage `internal/archdocs/pssg/schema/jsonld_test.go`	Added comprehensive unit tests covering short inputs, first-sentence extraction, ASCII truncation, and UTF-8 edge cases to ensure truncation produces valid output.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

jonathanpopham

Poem

🔤 UTF-8 runes now whole, not torn in two,
Where multi-byte chars stay complete and true,
No more broken characters mid-slice,
Truncation done properly—oh, how nice! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: fixing rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD output.
Description check	✅ Passed	The description covers all required template sections with clear problem statement, solution explanation, and comprehensive test plan with concrete test cases.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-stepname-utf8-truncation

⚔️ Resolve merge conflicts

Resolve merge conflict in branch fix-stepname-utf8-truncation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greynewell · 2026-04-09T19:15:06Z

Duplicate of #104 which already fixed this in main.

greynewell requested a review from jonathanpopham as a code owner April 9, 2026 19:13

greynewell closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD#106

fix: rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD#106
greynewell wants to merge 1 commit intomainfrom
fix-stepname-utf8-truncation

greynewell commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 warning)

Uh oh!

greynewell commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

greynewell commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

greynewell commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greynewell commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading