Skip to content

Commit 161c66e

Browse files
greynewellclaude
andauthored
fix(graph): rune-based truncation in dotEscape prevents invalid UTF-8 in DOT output (#105)
* fix(graph2md): writeGraphData uses zero lineCount when startLine absent writeGraphData computed lineCount only when startLine > 0 && endLine > 0. For nodes without a startLine (API returns 0), the condition was false and lineCount remained 0, so the graph visualisation data showed lc=0 even though the same node's frontmatter correctly computed line_count=endLine (using effectiveStart=1). Fix: mirror the effectiveStart=1 defaulting used by all frontmatter writers — if endLine > 0 but startLine <= 0, treat startLine as 1. Adds TestGraphDataLineCountMissingStartLine to catch the regression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove extra spaces in parseGraphData struct literal goimports rejects manually aligned spaces; use single space. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(graph): rune-based truncation in dotEscape to avoid invalid UTF-8 dotEscape used byte indexing (s[len(s)-39:]) to take the last 39 characters of a node name. For names with multi-byte UTF-8 characters (e.g. accented letters, CJK paths), the byte offset could land in the middle of a multi-byte sequence, producing invalid UTF-8 in the DOT output and breaking downstream Graphviz tools. Fix: convert to []rune and slice by rune index. Adds TestWriteDOT_LongNameTruncated_MultiByteUTF8 as a regression test (41 × "é" → the old byte slice cut byte 43, which is the second byte of U+00E9, producing invalid UTF-8). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8fa3e47 commit 161c66e

File tree

2 files changed

+31
-2
lines changed

2 files changed

+31
-2
lines changed

internal/graph/handler.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,9 @@ func writeDOT(w io.Writer, g *api.Graph, filter string) error {
121121
}
122122

123123
func dotEscape(s string) string {
124-
if len(s) > 40 {
125-
s = "…" + s[len(s)-39:]
124+
runes := []rune(s)
125+
if len(runes) > 40 {
126+
return "…" + string(runes[len(runes)-39:])
126127
}
127128
return s
128129
}

internal/graph/handler_test.go

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"bytes"
55
"strings"
66
"testing"
7+
"unicode/utf8"
78

89
"github.com/supermodeltools/cli/internal/api"
910
)
@@ -307,3 +308,30 @@ func TestPrintGraph_HumanDefault(t *testing.T) {
307308
t.Errorf("human output should contain table headers:\n%s", buf.String())
308309
}
309310
}
311+
312+
// TestWriteDOT_LongNameTruncated_MultiByteUTF8 verifies that dotEscape does
313+
// not split a multi-byte UTF-8 character when truncating long node names.
314+
// Before the fix, s[len(s)-39:] used byte indexing, which could land in the
315+
// middle of a multi-byte character and produce invalid UTF-8 in the DOT file.
316+
func TestWriteDOT_LongNameTruncated_MultiByteUTF8(t *testing.T) {
317+
// 41 × "é" (2 bytes each) = 82 bytes, 41 runes.
318+
// byte-based slice: s[82-39:] = s[43:] — byte 43 is the second byte of "é"
319+
// (U+00E9 encodes as 0xC3 0xA9), producing invalid UTF-8 without the fix.
320+
longName := strings.Repeat("é", 41)
321+
g := &api.Graph{
322+
Nodes: []api.Node{
323+
{ID: "n1", Labels: []string{"Function"}, Properties: map[string]any{"name": longName}},
324+
},
325+
}
326+
var buf bytes.Buffer
327+
if err := writeDOT(&buf, g, ""); err != nil {
328+
t.Fatalf("writeDOT: %v", err)
329+
}
330+
out := buf.String()
331+
if !utf8.ValidString(out) {
332+
t.Errorf("writeDOT output contains invalid UTF-8 (byte-based truncation of multi-byte name)")
333+
}
334+
if strings.Contains(out, longName) {
335+
t.Errorf("long multi-byte name should be truncated in DOT output")
336+
}
337+
}

0 commit comments

Comments
 (0)