You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/for-contributors/Generator/name-processing.md
+96-1Lines changed: 96 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -92,12 +92,107 @@ For specifics on how these processors and other steps work, it is best to refer
92
92
93
93
## Name Splitting
94
94
95
-
(TODO: Explain how name splitting works, relate it to tokenization. Explain decisions like why "2D" is split as "2_D")
95
+
Name splitting involves splitting an identifier into separate "tokens" (also called "words" by the code) and is handled
96
+
by the `NameSplitter` class. These tokens can refer to literal words (as identified by underscore/pascal case
97
+
separations), but can also refer to groups of numbers or capitalized letters.
98
+
99
+
The goal of name splitting is to have a consistent representation of a name where each part of the name can be examined
100
+
individually. This is helpful when names differ by casing or by different types of separation.
101
+
102
+
For example, `VkAccessFlags`, `vkCreateBuffer`, and `VK_MAX_MEMORY_HEAPS` effectively have the same shared prefix.
103
+
104
+
For specifics on how this process works and the exact behaviors, it is best to refer to the `NameSplitter` source code
105
+
and the `NameSplitterTests` test cases.
106
+
107
+
### Name Splitting - Notable Decisions
108
+
109
+
#### Handling of Numbers
110
+
111
+
Numbers are always split out as their own individual token. This is because this is easier to work with and consistent
112
+
than special casing when numbers should "stick" to preceding or proceeding tokens.
113
+
114
+
For example:
115
+
-`2D` is split as `2_D`
116
+
-`R32` is split as `R_32`
117
+
118
+
In these two cases, both inputs can be considered one English word, so it can be argued that the output should be the
119
+
same as the input. However, this means the name splitting code should have preferences for when numbers should "stick"
120
+
one way or the other.
121
+
122
+
This gets even messier with names like `Image_2D_RGB16` or `Image2D_RGB16`. Although these exact names have not shown
123
+
up in native code, names like `SpvImageFormatR32ui` do in fact exist.
124
+
125
+
Because the goal of name splitting is to have a consistent tokenized representation of the name, it can be argued
126
+
that it is safer to go for a more naive approach that does not attempt to group numbers with letters together at all.
127
+
In this case, a more naive approach means simpler code. It also means less potential surprises since the output is more
128
+
resistant to subtle changes in the input.
96
129
97
130
## Name Prettification
98
131
132
+
As hinted to previous, name prettification is the process of transforming an identifier to follow the Framework
133
+
Design Guidelines and is handled by the `NamePrettifier` class.
134
+
135
+
This primarily involves pascal casing and the removal of underscore separators. Acronyms are also handled. By default,
136
+
acronyms of length 2 are preserved (matching the guidelines), while acronyms of greater lengths are pascal-cased.
137
+
138
+
For example, "UI" is prettified as "UI" while "GUI" is prettified as "Gui".
139
+
Similarly, "GL" is prettified as "GL" while "EGL" is prettified as "Egl".
140
+
141
+
Name prettification takes in a name "fragment" and outputs another fragment representing the prettified version of the
142
+
input. The input is first split using `NameSplitter` to get a tokenized representation of the name before being
143
+
processed.
144
+
145
+
For specifics on how this process works and the exact behaviors, it is best to refer to the `NamePrettifier` source code
146
+
and the `NamePrettifierTests` test cases.
147
+
99
148
(TODO: Explain how prettification works alongside name splitting. Explain how acronyms are handled. Explain why number fragments are merged to preceding letter fragments and how this affects acronyms and pascal casing.)
100
149
150
+
### Name Prettification - Notable Decisions
151
+
152
+
#### Output of Fully Capitalized Names
153
+
154
+
By default, the `NamePrettifier` disallows outputs that are all caps.
155
+
For example, if `GL` is the output and `allowAllCaps` is the default of false, then `Gl` will be the actual output.
156
+
157
+
This is to prevent fully capitalized member names, so the codebase typically overrides this behavior when dealing with
158
+
type names. This means the `GL` class remains as `GL`.
159
+
160
+
#### Handling of Acronyms that contain Numbers
161
+
162
+
An acronym includes the capital letters and the numbers immediately following those letters.
163
+
164
+
For example:
165
+
-`2D` is split as `2_D`. There are 2 acronyms of length 1 here.
166
+
-`R32` is split as `R_32`. There is 1 acronym of length 3 here.
167
+
168
+
Where this behavior matters is in the following case:
169
+
-`RG` is split as `RG` and is prettified as `RG`, however the `NamePrettifier` also disallows outputs that are fully
170
+
capitalized by default. This means `RG` is actually output as `Rg`.
171
+
-`RG32` is split as `RG_32`. Because this is an acronym of length 4, it is output as `Rg32`.
172
+
173
+
Notably, means that `RG` and `RG32` are consistently output as `Rg-`.
174
+
175
+
In the code, this is implemented by merging number tokens with preceding letter tokens.
176
+
177
+
For example:
178
+
-`2_D` is merged as `2_D`.
179
+
-`RG_32` is merged as `RG32`.
180
+
181
+
This can be argued to be a hack, but simplifies acronym length calculations and continues to work with the code that
182
+
handles pascal casing.
183
+
184
+
#### Acronym Indeterminate Inputs
185
+
186
+
(TODO)
187
+
188
+
#### Handling of Consecutive Acronyms
189
+
190
+
(TODO)
191
+
192
+
#### Lowercase "x" between Numbers
193
+
194
+
(TODO)
195
+
101
196
## Name Affixes
102
197
103
198
(TODO: Explain the motivation behind this system. Explain that users configure how name affixes are processed while mods identify affixes (separation of concerns).)
0 commit comments