Skip to content

Commit 1a215eb

Browse files
authored
Fewer props (#8)
* Rename East_Asian_Full_Wide to Always_Wide * _RI_PAIR is just _Always_Wide * _VS16 is now _Always_Wide * Rename _VS15 to _Always_Narrow * Extended_Pictographic + Emoji_Presentation = Always_Wide * Use serial iotas * Make it a jump table * isRIPrefix, isVSPrefix for clarity
1 parent e663c23 commit 1a215eb

7 files changed

Lines changed: 1346 additions & 1596 deletions

File tree

README.md

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -77,36 +77,41 @@ goos: darwin
7777
goarch: arm64
7878
pkg: github.com/clipperhouse/displaywidth/comparison
7979
cpu: Apple M2
80-
BenchmarkString_Mixed/clipperhouse/displaywidth-8 11290 ns/op 149.42 MB/s 0 B/op 0 allocs/op
81-
BenchmarkString_Mixed/mattn/go-runewidth-8 14439 ns/op 116.84 MB/s 0 B/op 0 allocs/op
82-
BenchmarkString_Mixed/rivo/uniseg-8 20076 ns/op 84.03 MB/s 0 B/op 0 allocs/op
8380
84-
BenchmarkString_EastAsian/clipperhouse/displaywidth-8 11248 ns/op 149.98 MB/s 0 B/op 0 allocs/op
85-
BenchmarkString_EastAsian/mattn/go-runewidth-8 24063 ns/op 70.11 MB/s 0 B/op 0 allocs/op
86-
BenchmarkString_EastAsian/rivo/uniseg-8 20051 ns/op 84.14 MB/s 0 B/op 0 allocs/op
81+
BenchmarkString_Mixed/clipperhouse/displaywidth-8 10929 ns/op 154.36 MB/s 0 B/op 0 allocs/op
82+
BenchmarkString_Mixed/mattn/go-runewidth-8 14540 ns/op 116.02 MB/s 0 B/op 0 allocs/op
83+
BenchmarkString_Mixed/rivo/uniseg-8 19751 ns/op 85.41 MB/s 0 B/op 0 allocs/op
8784
88-
BenchmarkString_ASCII/clipperhouse/displaywidth-8 1116 ns/op 114.71 MB/s 0 B/op 0 allocs/op
89-
BenchmarkString_ASCII/mattn/go-runewidth-8 1182 ns/op 108.27 MB/s 0 B/op 0 allocs/op
90-
BenchmarkString_ASCII/rivo/uniseg-8 1620 ns/op 79.04 MB/s 0 B/op 0 allocs/op
85+
BenchmarkString_EastAsian/clipperhouse/displaywidth-8 10885 ns/op 154.98 MB/s 0 B/op 0 allocs/op
86+
BenchmarkString_EastAsian/mattn/go-runewidth-8 23969 ns/op 70.38 MB/s 0 B/op 0 allocs/op
87+
BenchmarkString_EastAsian/rivo/uniseg-8 19852 ns/op 84.98 MB/s 0 B/op 0 allocs/op
9188
92-
BenchmarkString_Emoji/clipperhouse/displaywidth-8 3264 ns/op 221.82 MB/s 0 B/op 0 allocs/op
93-
BenchmarkString_Emoji/mattn/go-runewidth-8 4804 ns/op 150.71 MB/s 0 B/op 0 allocs/op
94-
BenchmarkString_Emoji/rivo/uniseg-8 6783 ns/op 106.74 MB/s 0 B/op 0 allocs/op
89+
BenchmarkString_ASCII/clipperhouse/displaywidth-8 1103 ns/op 116.01 MB/s 0 B/op 0 allocs/op
90+
BenchmarkString_ASCII/mattn/go-runewidth-8 1166 ns/op 109.79 MB/s 0 B/op 0 allocs/op
91+
BenchmarkString_ASCII/rivo/uniseg-8 1584 ns/op 80.83 MB/s 0 B/op 0 allocs/op
9592
96-
BenchmarkRune_Mixed/clipperhouse/displaywidth-8 3759 ns/op 448.83 MB/s 0 B/op 0 allocs/op
97-
BenchmarkRune_Mixed/mattn/go-runewidth-8 5417 ns/op 311.40 MB/s 0 B/op 0 allocs/op
93+
BenchmarkString_Emoji/clipperhouse/displaywidth-8 3108 ns/op 232.93 MB/s 0 B/op 0 allocs/op
94+
BenchmarkString_Emoji/mattn/go-runewidth-8 4802 ns/op 150.76 MB/s 0 B/op 0 allocs/op
95+
BenchmarkString_Emoji/rivo/uniseg-8 6607 ns/op 109.58 MB/s 0 B/op 0 allocs/op
9896
99-
BenchmarkRune_EastAsian/clipperhouse/displaywidth-8 3678 ns/op 458.69 MB/s 0 B/op 0 allocs/op
100-
BenchmarkRune_EastAsian/mattn/go-runewidth-8 15908 ns/op 106.05 MB/s 0 B/op 0 allocs/op
97+
BenchmarkRune_Mixed/clipperhouse/displaywidth-8 3456 ns/op 488.20 MB/s 0 B/op 0 allocs/op
98+
BenchmarkRune_Mixed/mattn/go-runewidth-8 5400 ns/op 312.39 MB/s 0 B/op 0 allocs/op
10199
102-
BenchmarkRune_ASCII/clipperhouse/displaywidth-8 265.2 ns/op 482.70 MB/s 0 B/op 0 allocs/op
103-
BenchmarkRune_ASCII/mattn/go-runewidth-8 265.2 ns/op 482.67 MB/s 0 B/op 0 allocs/op
100+
BenchmarkRune_EastAsian/clipperhouse/displaywidth-8 3475 ns/op 485.41 MB/s 0 B/op 0 allocs/op
101+
BenchmarkRune_EastAsian/mattn/go-runewidth-8 15701 ns/op 107.44 MB/s 0 B/op 0 allocs/op
104102
105-
BenchmarkRune_Emoji/clipperhouse/displaywidth-8 1522 ns/op 475.65 MB/s 0 B/op 0 allocs/op
106-
BenchmarkRune_Emoji/mattn/go-runewidth-8 2295 ns/op 315.53 MB/s 0 B/op 0 allocs/op
103+
BenchmarkRune_ASCII/clipperhouse/displaywidth-8 257.0 ns/op 498.13 MB/s 0 B/op 0 allocs/op
104+
BenchmarkRune_ASCII/mattn/go-runewidth-8 266.4 ns/op 480.50 MB/s 0 B/op 0 allocs/op
105+
106+
BenchmarkRune_Emoji/clipperhouse/displaywidth-8 1384 ns/op 523.02 MB/s 0 B/op 0 allocs/op
107+
BenchmarkRune_Emoji/mattn/go-runewidth-8 2273 ns/op 318.45 MB/s 0 B/op 0 allocs/op
107108
```
108109

109110
## Compatibility
110111

111112
`clipperhouse/displaywidth`, `mattn/go-runewidth`, and `rivo/uniseg` should give the
112113
same outputs for real-world text. See [comparison/README.md](comparison/README.md).
114+
115+
If you wish to investigate the core logic, see the `lookupProperties` and `width`
116+
functions in [width.go](width.go#L112). The core of the trie generation logic is in
117+
`BuildPropertyBitmap` in [unicode.go](internal/gen/unicode.go#L309).

comparison/README.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,31 +19,32 @@ goos: darwin
1919
goarch: arm64
2020
pkg: github.com/clipperhouse/displaywidth/comparison
2121
cpu: Apple M2
22-
BenchmarkString_Mixed/clipperhouse/displaywidth-8 11290 ns/op 149.42 MB/s 0 B/op 0 allocs/op
23-
BenchmarkString_Mixed/mattn/go-runewidth-8 14439 ns/op 116.84 MB/s 0 B/op 0 allocs/op
24-
BenchmarkString_Mixed/rivo/uniseg-8 20076 ns/op 84.03 MB/s 0 B/op 0 allocs/op
2522
26-
BenchmarkString_EastAsian/clipperhouse/displaywidth-8 11248 ns/op 149.98 MB/s 0 B/op 0 allocs/op
27-
BenchmarkString_EastAsian/mattn/go-runewidth-8 24063 ns/op 70.11 MB/s 0 B/op 0 allocs/op
28-
BenchmarkString_EastAsian/rivo/uniseg-8 20051 ns/op 84.14 MB/s 0 B/op 0 allocs/op
23+
BenchmarkString_Mixed/clipperhouse/displaywidth-8 10929 ns/op 154.36 MB/s 0 B/op 0 allocs/op
24+
BenchmarkString_Mixed/mattn/go-runewidth-8 14540 ns/op 116.02 MB/s 0 B/op 0 allocs/op
25+
BenchmarkString_Mixed/rivo/uniseg-8 19751 ns/op 85.41 MB/s 0 B/op 0 allocs/op
2926
30-
BenchmarkString_ASCII/clipperhouse/displaywidth-8 1116 ns/op 114.71 MB/s 0 B/op 0 allocs/op
31-
BenchmarkString_ASCII/mattn/go-runewidth-8 1182 ns/op 108.27 MB/s 0 B/op 0 allocs/op
32-
BenchmarkString_ASCII/rivo/uniseg-8 1620 ns/op 79.04 MB/s 0 B/op 0 allocs/op
27+
BenchmarkString_EastAsian/clipperhouse/displaywidth-8 10885 ns/op 154.98 MB/s 0 B/op 0 allocs/op
28+
BenchmarkString_EastAsian/mattn/go-runewidth-8 23969 ns/op 70.38 MB/s 0 B/op 0 allocs/op
29+
BenchmarkString_EastAsian/rivo/uniseg-8 19852 ns/op 84.98 MB/s 0 B/op 0 allocs/op
3330
34-
BenchmarkString_Emoji/clipperhouse/displaywidth-8 3264 ns/op 221.82 MB/s 0 B/op 0 allocs/op
35-
BenchmarkString_Emoji/mattn/go-runewidth-8 4804 ns/op 150.71 MB/s 0 B/op 0 allocs/op
36-
BenchmarkString_Emoji/rivo/uniseg-8 6783 ns/op 106.74 MB/s 0 B/op 0 allocs/op
31+
BenchmarkString_ASCII/clipperhouse/displaywidth-8 1103 ns/op 116.01 MB/s 0 B/op 0 allocs/op
32+
BenchmarkString_ASCII/mattn/go-runewidth-8 1166 ns/op 109.79 MB/s 0 B/op 0 allocs/op
33+
BenchmarkString_ASCII/rivo/uniseg-8 1584 ns/op 80.83 MB/s 0 B/op 0 allocs/op
3734
38-
BenchmarkRune_Mixed/clipperhouse/displaywidth-8 3759 ns/op 448.83 MB/s 0 B/op 0 allocs/op
39-
BenchmarkRune_Mixed/mattn/go-runewidth-8 5417 ns/op 311.40 MB/s 0 B/op 0 allocs/op
35+
BenchmarkString_Emoji/clipperhouse/displaywidth-8 3108 ns/op 232.93 MB/s 0 B/op 0 allocs/op
36+
BenchmarkString_Emoji/mattn/go-runewidth-8 4802 ns/op 150.76 MB/s 0 B/op 0 allocs/op
37+
BenchmarkString_Emoji/rivo/uniseg-8 6607 ns/op 109.58 MB/s 0 B/op 0 allocs/op
4038
41-
BenchmarkRune_EastAsian/clipperhouse/displaywidth-8 3678 ns/op 458.69 MB/s 0 B/op 0 allocs/op
42-
BenchmarkRune_EastAsian/mattn/go-runewidth-8 15908 ns/op 106.05 MB/s 0 B/op 0 allocs/op
39+
BenchmarkRune_Mixed/clipperhouse/displaywidth-8 3456 ns/op 488.20 MB/s 0 B/op 0 allocs/op
40+
BenchmarkRune_Mixed/mattn/go-runewidth-8 5400 ns/op 312.39 MB/s 0 B/op 0 allocs/op
4341
44-
BenchmarkRune_ASCII/clipperhouse/displaywidth-8 265.2 ns/op 482.70 MB/s 0 B/op 0 allocs/op
45-
BenchmarkRune_ASCII/mattn/go-runewidth-8 265.2 ns/op 482.67 MB/s 0 B/op 0 allocs/op
42+
BenchmarkRune_EastAsian/clipperhouse/displaywidth-8 3475 ns/op 485.41 MB/s 0 B/op 0 allocs/op
43+
BenchmarkRune_EastAsian/mattn/go-runewidth-8 15701 ns/op 107.44 MB/s 0 B/op 0 allocs/op
4644
47-
BenchmarkRune_Emoji/clipperhouse/displaywidth-8 1522 ns/op 475.65 MB/s 0 B/op 0 allocs/op
48-
BenchmarkRune_Emoji/mattn/go-runewidth-8 2295 ns/op 315.53 MB/s 0 B/op 0 allocs/op
45+
BenchmarkRune_ASCII/clipperhouse/displaywidth-8 257.0 ns/op 498.13 MB/s 0 B/op 0 allocs/op
46+
BenchmarkRune_ASCII/mattn/go-runewidth-8 266.4 ns/op 480.50 MB/s 0 B/op 0 allocs/op
47+
48+
BenchmarkRune_Emoji/clipperhouse/displaywidth-8 1384 ns/op 523.02 MB/s 0 B/op 0 allocs/op
49+
BenchmarkRune_Emoji/mattn/go-runewidth-8 2273 ns/op 318.45 MB/s 0 B/op 0 allocs/op
4950
```

internal/gen/trie.go

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,7 @@ func WriteTrieGo(trie *triegen.Trie, outputPath string) error {
102102
// writeProperties writes the character properties definitions to the buffer.
103103
// It uses PropertyDefinitions from unicode.go as the single source of truth.
104104
func writeProperties(w io.Writer) {
105-
fmt.Fprintf(w, "// property represents the properties of a character as bit flags\n")
106-
fmt.Fprintf(w, "// The underlying type is uint8 since we only use %d bits for flags.\n", len(PropertyDefinitions))
105+
fmt.Fprintf(w, "// property is an enum representing the properties of a character\n")
107106
fmt.Fprintf(w, "type property uint8\n\n")
108107
fmt.Fprintf(w, "const (\n")
109108

@@ -112,7 +111,7 @@ func writeProperties(w io.Writer) {
112111

113112
constName := "_" + prop.Name
114113
if i == 0 {
115-
fmt.Fprintf(w, "%s property = 1 << iota\n", constName)
114+
fmt.Fprintf(w, "%s property = iota + 1\n", constName)
116115
} else {
117116
fmt.Fprintf(w, "%s\n", constName)
118117
}

internal/gen/unicode.go

Lines changed: 29 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ type UnicodeData struct {
2323
ZeroWidthChars map[rune]bool // Special zero-width characters
2424
}
2525

26-
// property represents the properties of a character as bit flags
26+
// property represents the properties of a character
2727
type property uint8
2828

2929
// PropertyDefinition describes a single character property flag
@@ -35,22 +35,24 @@ type PropertyDefinition struct {
3535
// PropertyDefinitions is the single source of truth for all character properties.
3636
// The order matters - it defines the bit positions (via iota).
3737
var PropertyDefinitions = []PropertyDefinition{
38-
{"East_Asian_Full_Wide", "Always 2 wide"},
38+
{"Zero_Width", "Always 0 width, includes combining marks, control characters, non-printable, etc"},
39+
{"Always_Wide", "Always 2 wide"},
3940
{"East_Asian_Ambiguous", "Width depends on EastAsianWidth option"},
40-
{"Extended_Pictographic", "Extended pictographic character (from emoji-data.txt)"},
41-
{"Emoji_Presentation", "Has default emoji presentation (width 2 unless overridden by VS15)"},
42-
{"ZeroWidth", "Always 0 width, includes combining marks, control characters, non-printable, etc"},
43-
{"VS15", "VARIATION SELECTOR-15 (U+FE0E) requests text presentation (width 1); not in the trie, see [width]"},
44-
{"VS16", "VARIATION SELECTOR-16 (U+FE0F) requests emoji presentation (width 2); not in the trie, see [width]"},
45-
{"RI_PAIR", "Regional Indicator Pair (flag) grapheme cluster; not in the trie, see [width]"},
41+
{"Always_Narrow", "VARIATION SELECTOR-15 (U+FE0E) requests text presentation (width 1); not in the trie, see [width]"},
4642
}
4743

44+
// these constants are used to build the property bitmap, internally.
45+
// the external properties are above. Keep them in the same order!
4846
const (
49-
East_Asian_Full_Wide property = 1 << iota // F, W
50-
East_Asian_Ambiguous // A
51-
Extended_Pictographic // Extended_Pictographic from emoji-data
52-
Emoji_Presentation // Emoji_Presentation from emoji-data
53-
ZeroWidth // ZWSP, ZWJ, ZWNJ, etc.
47+
// ZWSP, ZWJ, ZWNJ, etc.
48+
zero_Width property = iota + 1
49+
// F, W
50+
always_Wide
51+
// A
52+
east_Asian_Ambiguous
53+
// VS15 requests text presentation (width 1)
54+
// not used in the trie, but noted here for reference
55+
// always_Narrow
5456
)
5557

5658
// ParseUnicodeData downloads and parses all required Unicode data files
@@ -314,38 +316,33 @@ func extractRunesFromRangeTable(table *unicode.RangeTable, target map[rune]bool)
314316

315317
// BuildPropertyBitmap creates a properties bitmap for a given rune
316318
func BuildPropertyBitmap(r rune, data *UnicodeData) property {
317-
var props property
319+
if data.CombiningMarks[r] {
320+
return zero_Width
321+
}
322+
if data.ControlChars[r] {
323+
return zero_Width
324+
}
325+
if data.ZeroWidthChars[r] {
326+
return zero_Width
327+
}
318328

319329
// East Asian Width
320330
// Only store properties that affect width calculation
321331
if eaw, exists := data.EastAsianWidth[r]; exists {
322332
switch eaw {
323333
case "F", "W":
324-
props |= East_Asian_Full_Wide
334+
return always_Wide
325335
case "A":
326-
props |= East_Asian_Ambiguous
336+
return east_Asian_Ambiguous
327337
// H (Halfwidth), Na (Narrow), and N (Neutral) are not stored
328338
// as they all result in width 1 (default behavior)
329339
}
330340
}
331341

332-
if data.CombiningMarks[r] {
333-
props |= ZeroWidth
334-
}
335-
if data.ControlChars[r] {
336-
props |= ZeroWidth
337-
}
338-
if data.ZeroWidthChars[r] {
339-
props |= ZeroWidth
340-
}
341-
342342
// Emoji properties
343-
if data.ExtendedPictographic[r] {
344-
props |= Extended_Pictographic
345-
}
346-
if data.EmojiPresentation[r] {
347-
props |= Emoji_Presentation
343+
if data.ExtendedPictographic[r] && data.EmojiPresentation[r] {
344+
return always_Wide
348345
}
349346

350-
return props
347+
return 0
351348
}

0 commit comments

Comments
 (0)