Add CustomHighlighter trait + InputState::set_custom_highlighter for third-party syntax engines#2328
Add CustomHighlighter trait + InputState::set_custom_highlighter for third-party syntax engines#2328glani wants to merge 3 commits into
Conversation
Today the `Input` element's render pipeline composes two style sources: the built-in tree-sitter `SyntaxHighlighter` and `DiagnosticSet`. There is no public path for consumers to plug a different highlighter — for example, syntect for Sublime grammars that tree-sitter doesn't cover, language-server semantic tokens, or custom DSL tokenizers. This change adds a third source. Consumers implement `CustomHighlighter::styles(&self, range, &App) -> Vec<(Range, HighlightStyle)>` and install it via `InputState::set_custom_highlighter(Some(arc), cx)`. The trait output is composed alongside the existing sources in `element.rs::highlight_lines` with a fixed precedence: tree-sitter (base) -> custom (overlay) -> diagnostics (top) Diagnostics keep highest priority so error underlines remain visible regardless of language coloring. When no custom highlighter is set (the default), the new combine call passes an empty `Vec` — zero behavior change for existing consumers. The new public surface is one trait, one field on the existing `InputMode::CodeEditor` variant (enum-private), one setter, and the re-exports. Includes: - Unit test verifying the setter round-trips (install / clear). - Integration test verifying tree-sitter and custom outputs compose through `gpui::combine_highlights` without panicking and produce well-formed (non-overlapping, ascending) ranges.
|
The Syntect is rather poor; I used it in the early stages of implementing CodeEditor. It's not suitable for complex applications. TreeSitter is currently the best choice, and it's easily extensible. I've already designed ways to extend it; you can see the example of adding tree-sitter-navi in the documentation. gpui-component/crates/story/examples/editor.rs Lines 56 to 66 in 01a116a And TreeSitter is easy to extend a new language. Therefore, I don't really want to introduce this mechanism, as it would bring more problems. First, |
|
I think I understand your requirements. Sorry, you wrote too much, but my English isn't very good. Please look at the implementation of MonacoEditor to see how its API is designed to support this requirement. We should refer to it. And then please add an example of this new feature to When you finished that, I will review and accept this feature. |
Reshapes the trait per maintainer feedback on PR longbridge#2328: - fn styles(&self, range: Range<usize>, cx: &App) - -> Vec<(Range<usize>, HighlightStyle)>; + fn tokens(&self, range: Range<usize>) + -> Vec<(Range<usize>, SharedString)>; The previous shape returned pre-resolved `HighlightStyle` and took `&App`, forcing implementors to resolve theme tokens on the render thread every frame. That bypassed upstream's own tree-sitter path, which emits scope-name strings resolved through `HighlightTheme::style()` at `highlighter.rs:719`. Per Monaco Editor's design (referenced in review), tokenizers emit named scopes; a separate theme map resolves names to styles. The trait now constrains the hot path to a `SharedString` read so heavy parsing can be arranged off-thread (subscribe to `InputEvent::Change`, cache, hand back ranges on call) -- addressing the 200k-line performance concern by construction. Token names use upstream's existing scope vocabulary (`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...); `.`-namespaced names fall back to their prefix via `SyntaxColors::style`. Unrecognized names render with the default style. Adds a worked example at `crates/story/examples/editor.rs` (`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` / `NOTE` markers with `keyword.special`, recomputed on `InputEvent::Change`. A syntect or language-server consumer would follow the same shape with a different parser inside `refresh()`. Tests updated to the new method signature; both still verify install / clear round-trip and tree-sitter composition through `combine_highlights`.
Reshapes the trait per maintainer feedback on PR longbridge#2328: - fn styles(&self, range: Range<usize>, cx: &App) - -> Vec<(Range<usize>, HighlightStyle)>; + fn tokens(&self, range: Range<usize>) + -> Vec<(Range<usize>, SharedString)>; The previous shape returned pre-resolved `HighlightStyle` and took `&App`, forcing implementors to resolve theme tokens on the render thread every frame. That bypassed upstream's own tree-sitter path, which emits scope-name strings resolved through `HighlightTheme::style()` at `highlighter.rs:719`. Per Monaco Editor's design (referenced in review), tokenizers emit named scopes; a separate theme map resolves names to styles. The trait now constrains the hot path to a `SharedString` read so heavy parsing can be arranged off-thread (subscribe to `InputEvent::Change`, cache, hand back ranges on call) -- addressing the 200k-line performance concern by construction. Token names use upstream's existing scope vocabulary (`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...); `.`-namespaced names fall back to their prefix via `SyntaxColors::style`. Unrecognized names render with the default style. Adds a worked example at `crates/story/examples/editor.rs` (`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` / `NOTE` markers with `keyword.special`, recomputed on `InputEvent::Change`. A syntect or language-server consumer would follow the same shape with a different parser inside `refresh()`. Tests updated to the new method signature; both still verify install / clear round-trip and tree-sitter composition through `combine_highlights`.
c1209de to
e679936
Compare
Reshapes the trait per maintainer feedback on PR longbridge#2328: - fn styles(&self, range: Range<usize>, cx: &App) - -> Vec<(Range<usize>, HighlightStyle)>; + fn tokens(&self, range: Range<usize>) + -> Vec<(Range<usize>, SharedString)>; The previous shape returned pre-resolved `HighlightStyle` and took `&App`, forcing implementors to resolve theme tokens on the render thread every frame. That bypassed upstream's own tree-sitter path, which emits scope-name strings resolved through `HighlightTheme::style()` at `highlighter.rs:719`. Per Monaco Editor's design (referenced in review), tokenizers emit named scopes; a separate theme map resolves names to styles. The trait now constrains the hot path to a `SharedString` read so heavy parsing can be arranged off-thread (subscribe to `InputEvent::Change`, cache, hand back ranges on call) -- addressing the 200k-line performance concern by construction. Token names use upstream's existing scope vocabulary (`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...); `.`-namespaced names fall back to their prefix via `SyntaxColors::style`. Unrecognized names render with the default style. Adds a worked example at `crates/story/examples/editor.rs` (`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` / `NOTE` markers with `keyword.special`, recomputed on `InputEvent::Change`. A syntect or language-server consumer would follow the same shape with a different parser inside `refresh()`. Tests updated to the new method signature; both still verify install / clear round-trip and tree-sitter composition through `combine_highlights`.
|
Hey. Revised per your review. Trait now returns Added example at Please have a look. |
Summary
Adds a
CustomHighlightertrait andInputState::set_custom_highlighterso consumers can plug a third syntax-style source into theInputelement's render pipeline alongside the built-in tree-sitterSyntaxHighlighterandDiagnosticSet. DefaultNone; zero behavior change for existing consumers.The gap
In
crates/ui/src/input/element.rs::highlight_lines(lines 1048–1133 atmain), the per-frame highlight pass combines exactly three style sources:SyntaxHighlighter(line 1083). Concrete struct incrates/ui/src/highlighter/highlighter.rs, not extensible.DiagnosticSet::styles_for_range(line 1122; declared atcrates/ui/src/highlighter/diagnostics.rs:316).pub(crate)— not callable from outside the crate. Hardcoded to wavy-underline styling derived fromDiagnosticSeverity(4 variants). Designed for LSP-style diagnostics, not general-purpose highlighting.A consumer that wants a different highlighter — syntect for Sublime grammars, custom regex tokenizers, language-server semantic tokens, custom DSLs — has no public API path.
InputState::set_highlighter(state.rs:605) accepts only aSharedStringlanguage name and the resolution always ends in the same concreteSyntaxHighlighter.This PR adds the missing fourth source.
Proposal
New trait in
crates/ui/src/highlighter/custom.rs:cx: &Applets implementors resolve theme tokens at render time (mirrors the existingDiagnosticSet::styles_for_range(_, cx)precedent).One field on
InputMode::CodeEditor:Default
Nonein both constructor paths.One setter on
InputState, paired with the existingset_highlighter:Render-path delta in
element.rs::highlight_lines— append a single pull fromstate.modeand one extracombine_highlightscall:Compose order: tree-sitter (base) → custom (overlay) → diagnostics (top, wavy underlines). Diagnostics keep highest priority because errors must be visible regardless of language coloring.
The viewport-clamping pattern that the existing tree-sitter path uses for long-line skipping (
MAX_HIGHLIGHT_LINE_LENGTH) does not apply to custom highlighter output — the implementor is responsible for their own performance characteristics. Documented in the trait's threading section.Backward compatibility
Zero behavior change for existing consumers. Default
custom_highlighterisNone, the new combine call gets an emptyVec, andcombine_highlightsshort-circuits. The added public surface is exactly:pub trait CustomHighlighterpub fn InputState::set_custom_highlightercrate::highlighterThe new field on
InputMode::CodeEditoris enum-variant-private from outside the crate — not a breaking change.Concrete consumer
Heretic Merge is a desktop merge tool using
gpui-component'sInputfor its diff editor. It already ships syntect-backed highlighting infrastructure for ~16 additional Sublime-grammar languages that the upstream tree-sitter set doesn't cover (Perl, Haskell, OCaml, Erlang, Clojure, Lisp, R, LaTeX, BibTeX, reStructuredText, Sass, D, Pascal, Tcl, XML).Today its
EditorBuffer::with_pathcallssyntect_dispatch::probe(name, content)and storesVec<SyntectDecoration>(byte-range + theme-token-name pairs) on the buffer. The decorations are computed but cannot reach pixels — there is no public hook into theInputrender path. This PR provides that hook. The integration is a one-screenimpl CustomHighlighterthat maps the cached decorations toHighlightStyles using its theme bridge.The trait shape is general enough that other consumers (custom DSLs in domain-specific tools, language servers via semantic tokens, vim-grammar plugins) plug in identically.
Alternatives considered
Inputowns text layout (per-glyph positions and font runs). External painting can't align with the actual rendered text without duplicating layout, which is most ofInput::element.gpui-component. Tractable but every upstream bump becomes a re-rebase. The trait surface added here is small and orthogonal; clean to land upstream.tree_sitter_*::LANGUAGEconstants are statically generated.DiagnosticSet. Wrong abstraction: diagnostics are severity-typed (Error/Warning/Info/Hint), styled as wavy underlines, expected to be sparse. Syntax highlighting is dense, color-driven, and scope-typed. Conflating them harms both.Anticipated questions
"Why not extend
LanguageRegistry::registerto accept arbitraryLanguageConfigs?"LanguageConfigis tree-sitter-specific (tree_sitter::Language+ tree-sitter highlight queries). syntect output isn't a parse tree; it's already-tokenized regions. Adapting it would require either a synthetic tree-sitter grammar (impractical) or a parallel pipeline that ignores the tree-sitter machinery — which is what this proposal is, just as a separate trait instead of a shoehornedLanguageConfig."This adds another combine call to the render path."
One
combine_highlightscall. In the no-custom-highlighter case the cost is zero — theVecis empty,combine_highlightsshort-circuits. In the with-custom case, cost is dominated by the implementor'sstylesbody.combine_highlightsis designed for exactly this layered composition."Can't this be done as a
Decorationextension?"gpui-componentdoesn't currently expose a publicDecorationAPI. Inventing one is more architecture work than this proposal. If a future epic adds one, this trait can be deprecated in favor of it; until then the trait is the minimal surface that solves the problem."Why is
cx: &Appinstyles's signature? Coupling to render context."Implementors need theme access (
cx.theme()) to resolve color tokens at render time. Withoutcx, they'd cache a snapshot of the theme — drift problems on theme switch. Mirrors the existingDiagnosticSet::styles_for_range(_, cx)precedent.Naming alternatives
This PR uses
set_custom_highlighterto mirror the existingset_highlighter. The traitCustomHighlighteris named to differentiate from the concreteSyntaxHighlighter. If you prefer different naming, happy to rebase to any of:set_extra_highlighter/ExtraHighlighterset_secondary_highlighter/SecondaryHighlighteradd_highlight_source/HighlightSource(emphasizes multi-source composition)Functional shape is what matters; bikeshed the names per maintainer taste.
Checklist
cargo build --release -p gpui-component✓cargo clippy --workspace --all-targets -- -D warnings✓cargo test -p gpui-component --lib✓ (145 tests pass, including 2 new tests for this change)test_set_custom_highlighter_round_trip(install / clear)test_custom_highlighter_composes_with_tree_sitter(verifies non-empty composition throughcombine_highlightswith real SQL tree-sitter input + a custom highlighter painting bytes 0..6, asserts well-formed output)