fix: rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD#106
fix: rune-aware truncation in stepName to prevent invalid UTF-8 in JSON-LD#106greynewell wants to merge 1 commit intomainfrom
Conversation
stepName used byte-based slicing (step[:77]) to truncate long recipe instruction steps. When a step contains multi-byte UTF-8 characters (e.g. "sauté", "jalapeño", "crème"), slicing at byte 77 can land in the middle of a multi-byte sequence, producing invalid UTF-8 in the generated JSON-LD structured data. Fixes by converting to []rune before truncating, matching the same pattern used elsewhere in the codebase (e.g. ReadClaudeMD, dotEscape). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review WalkthroughThe code updates UTF-8 string truncation logic in the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
|
Duplicate of #104 which already fixed this in main. |
Summary
stepNameininternal/archdocs/pssg/schema/jsonld.goused byte-based slicing (step[:77]) to cap long recipe instruction steps<script>tag[]runebefore truncating — same pattern already used inReadClaudeMD,dotEscape, and other truncation points in the codebaseTest plan
TestStepName_MultiByteUTF8: 81 × "é" (162 bytes, 81 runes) — verifies output is valid UTF-8 and is truncatedTestStepName_ShortStep: short step returned unchangedTestStepName_FirstSentence: first-sentence extraction unaffectedTestStepName_TruncatesLongASCII: ASCII truncation still worksgo test ./...passes🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests