You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exclude symbols from auto-generated heading identifiers (#181)
The djot spec forms a heading's auto identifier from its plain text
content "excluding non-textual elements such as footnote references
and symbols". djot-php already dropped footnote references but emitted
symbols as `:name:`, so `# Release notes :tada:` produced the ID
`Release-notes-tada` instead of `Release-notes`.
extractPlainText() gains a forId mode that skips Symbol and
FootnoteRef nodes. generateId() still warms the plain-text cache so
display consumers (TOC labels, permalinks) keep the symbol text, but
builds the identifier from the symbol/footnote-excluded text. As a
result a heading whose only content is a symbol now correctly falls
back to a generated s-N identifier.
The deliberate CSS-validity deviations (apostrophe/quote/colon
replacement, leading-digit prefix) are unchanged and now documented
against the settled spec wording.
The ID is `Introduction`, not `Introduction1` or `Introduction[^1]`.
142
144
145
+
Symbols are likewise dropped from the identifier (but kept in the
146
+
human-readable plain text used for things like TOC labels):
147
+
148
+
```djot
149
+
# Release notes :tada:
150
+
```
151
+
152
+
The ID is `Release-notes`, not `Release-notes-tada`. A heading whose only
153
+
content is a symbol falls back to a generated `s-N` ID.
154
+
143
155
---
144
156
145
157
## CSS-Safe Heading IDs
@@ -219,19 +231,23 @@ Explicit IDs are used as-is without normalization.
219
231
220
232
### Spec Alignment
221
233
222
-
The djot spec's wording on auto-ID generation is being clarified in [jgm/djot#391](https://github.com/jgm/djot/issues/391). djot-php's normalization aligns with the proposed direction in most respects and deliberately deviates in two places — both motivated by producing valid CSS identifiers for `querySelector()` consumers.
234
+
The remove-vs-replace question raised in [jgm/djot#391](https://github.com/jgm/djot/issues/391) was settled by [jgm/djot#393](https://github.com/jgm/djot/pull/393), which reworded the spec to: *"replacing each maximal run of non-alphanumeric ASCII characters with `-`, removing any leading or trailing `-`"*. Note that #393 only changes the spec **prose** — the djot.js reference implementation is unchanged and (per djot's own changelog policy) remains the authoritative behavior. The new prose is actually broader than djot.js itself: it would also strip `_`, which djot.js keeps.
235
+
236
+
djot-php replaces (does not remove) mid-word punctuation — the direction #393 settled on — and tracks the djot.js **implementation** where the prose and implementation disagree, deliberately deviating only where required to produce valid CSS identifiers for `querySelector()` consumers.
The apostrophe / quote / semicolon / colon deviation is deliberate: these characters are not valid in unescaped CSS identifiers, so preserving them per the spec would force every JS consumer to round-trip through `CSS.escape()` before doing a selector lookup. The leading-digit and empty-result behaviors fill in spec gaps that other implementations handle inconsistently.
250
+
The apostrophe / quote / semicolon / colon deviation is deliberate: these characters are not valid in unescaped CSS identifiers, so preserving them per djot.js would force every JS consumer to round-trip through `CSS.escape()` before doing a selector lookup. The leading-digit and empty-result behaviors fill in gaps that the spec and implementation handle inconsistently.
0 commit comments