Skip to content

Add CustomHighlighter trait + InputState::set_custom_highlighter for third-party syntax engines#2328

Open
glani wants to merge 3 commits into
longbridge:mainfrom
glani:feature/custom-highlighter
Open

Add CustomHighlighter trait + InputState::set_custom_highlighter for third-party syntax engines#2328
glani wants to merge 3 commits into
longbridge:mainfrom
glani:feature/custom-highlighter

Conversation

@glani
Copy link
Copy Markdown
Contributor

@glani glani commented May 1, 2026

Summary

Adds a CustomHighlighter trait and InputState::set_custom_highlighter so consumers can plug a third syntax-style source into the Input element's render pipeline alongside the built-in tree-sitter SyntaxHighlighter and DiagnosticSet. Default None; zero behavior change for existing consumers.

The gap

In crates/ui/src/input/element.rs::highlight_lines (lines 1048–1133 at main), the per-frame highlight pass combines exactly three style sources:

  1. The built-in tree-sitter SyntaxHighlighter (line 1083). Concrete struct in crates/ui/src/highlighter/highlighter.rs, not extensible.
  2. DiagnosticSet::styles_for_range (line 1122; declared at crates/ui/src/highlighter/diagnostics.rs:316). pub(crate) — not callable from outside the crate. Hardcoded to wavy-underline styling derived from DiagnosticSeverity (4 variants). Designed for LSP-style diagnostics, not general-purpose highlighting.
  3. Hover-definition layout (line 1125). A single ephemeral style; not a general source.

A consumer that wants a different highlighter — syntect for Sublime grammars, custom regex tokenizers, language-server semantic tokens, custom DSLs — has no public API path. InputState::set_highlighter (state.rs:605) accepts only a SharedString language name and the resolution always ends in the same concrete SyntaxHighlighter.

This PR adds the missing fourth source.

Proposal

New trait in crates/ui/src/highlighter/custom.rs:

pub trait CustomHighlighter: Send + Sync {
    fn styles(
        &self,
        range: Range<usize>,
        cx: &App,
    ) -> Vec<(Range<usize>, HighlightStyle)>;
}

cx: &App lets implementors resolve theme tokens at render time (mirrors the existing DiagnosticSet::styles_for_range(_, cx) precedent).

One field on InputMode::CodeEditor:

custom_highlighter: Option<Arc<dyn CustomHighlighter>>,

Default None in both constructor paths.

One setter on InputState, paired with the existing set_highlighter:

pub fn set_custom_highlighter(
    &mut self,
    highlighter: Option<Arc<dyn CustomHighlighter>>,
    cx: &mut Context<Self>,
)

Render-path delta in element.rs::highlight_lines — append a single pull from state.mode and one extra combine_highlights call:

let custom_styles = match &custom_highlighter {
    Some(h) => h.styles(visible_byte_range.clone(), cx),
    None => Vec::new(),
};
// ...
styles = gpui::combine_highlights(custom_styles, styles).collect();
styles = gpui::combine_highlights(diagnostic_styles, styles).collect();

Compose order: tree-sitter (base) → custom (overlay) → diagnostics (top, wavy underlines). Diagnostics keep highest priority because errors must be visible regardless of language coloring.

The viewport-clamping pattern that the existing tree-sitter path uses for long-line skipping (MAX_HIGHLIGHT_LINE_LENGTH) does not apply to custom highlighter output — the implementor is responsible for their own performance characteristics. Documented in the trait's threading section.

Backward compatibility

Zero behavior change for existing consumers. Default custom_highlighter is None, the new combine call gets an empty Vec, and combine_highlights short-circuits. The added public surface is exactly:

  • pub trait CustomHighlighter
  • pub fn InputState::set_custom_highlighter
  • a re-export of the trait from crate::highlighter

The new field on InputMode::CodeEditor is enum-variant-private from outside the crate — not a breaking change.

Concrete consumer

Heretic Merge is a desktop merge tool using gpui-component's Input for its diff editor. It already ships syntect-backed highlighting infrastructure for ~16 additional Sublime-grammar languages that the upstream tree-sitter set doesn't cover (Perl, Haskell, OCaml, Erlang, Clojure, Lisp, R, LaTeX, BibTeX, reStructuredText, Sass, D, Pascal, Tcl, XML).

Today its EditorBuffer::with_path calls syntect_dispatch::probe(name, content) and stores Vec<SyntectDecoration> (byte-range + theme-token-name pairs) on the buffer. The decorations are computed but cannot reach pixels — there is no public hook into the Input render path. This PR provides that hook. The integration is a one-screen impl CustomHighlighter that maps the cached decorations to HighlightStyles using its theme bridge.

The trait shape is general enough that other consumers (custom DSLs in domain-specific tools, language servers via semantic tokens, vim-grammar plugins) plug in identically.

Alternatives considered

  • Decorate from outside the crate. Blocked: Input owns text layout (per-glyph positions and font runs). External painting can't align with the actual rendered text without duplicating layout, which is most of Input::element.
  • Fork gpui-component. Tractable but every upstream bump becomes a re-rebase. The trait surface added here is small and orthogonal; clean to land upstream.
  • Synthetic tree-sitter grammar from syntect output. Likely impossible without modifying tree-sitter parsers — tree_sitter_*::LANGUAGE constants are statically generated.
  • Extend DiagnosticSet. Wrong abstraction: diagnostics are severity-typed (Error/Warning/Info/Hint), styled as wavy underlines, expected to be sparse. Syntax highlighting is dense, color-driven, and scope-typed. Conflating them harms both.

Anticipated questions

"Why not extend LanguageRegistry::register to accept arbitrary LanguageConfigs?"
LanguageConfig is tree-sitter-specific (tree_sitter::Language + tree-sitter highlight queries). syntect output isn't a parse tree; it's already-tokenized regions. Adapting it would require either a synthetic tree-sitter grammar (impractical) or a parallel pipeline that ignores the tree-sitter machinery — which is what this proposal is, just as a separate trait instead of a shoehorned LanguageConfig.

"This adds another combine call to the render path."
One combine_highlights call. In the no-custom-highlighter case the cost is zero — the Vec is empty, combine_highlights short-circuits. In the with-custom case, cost is dominated by the implementor's styles body. combine_highlights is designed for exactly this layered composition.

"Can't this be done as a Decoration extension?"
gpui-component doesn't currently expose a public Decoration API. Inventing one is more architecture work than this proposal. If a future epic adds one, this trait can be deprecated in favor of it; until then the trait is the minimal surface that solves the problem.

"Why is cx: &App in styles's signature? Coupling to render context."
Implementors need theme access (cx.theme()) to resolve color tokens at render time. Without cx, they'd cache a snapshot of the theme — drift problems on theme switch. Mirrors the existing DiagnosticSet::styles_for_range(_, cx) precedent.

Naming alternatives

This PR uses set_custom_highlighter to mirror the existing set_highlighter. The trait CustomHighlighter is named to differentiate from the concrete SyntaxHighlighter. If you prefer different naming, happy to rebase to any of:

  • set_extra_highlighter / ExtraHighlighter
  • set_secondary_highlighter / SecondaryHighlighter
  • add_highlight_source / HighlightSource (emphasizes multi-source composition)

Functional shape is what matters; bikeshed the names per maintainer taste.

Checklist

  • cargo build --release -p gpui-component
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test -p gpui-component --lib ✓ (145 tests pass, including 2 new tests for this change)
  • New unit test: test_set_custom_highlighter_round_trip (install / clear)
  • New integration test: test_custom_highlighter_composes_with_tree_sitter (verifies non-empty composition through combine_highlights with real SQL tree-sitter input + a custom highlighter painting bytes 0..6, asserts well-formed output)

Today the `Input` element's render pipeline composes two style sources:
the built-in tree-sitter `SyntaxHighlighter` and `DiagnosticSet`. There
is no public path for consumers to plug a different highlighter — for
example, syntect for Sublime grammars that tree-sitter doesn't cover,
language-server semantic tokens, or custom DSL tokenizers.

This change adds a third source. Consumers implement
`CustomHighlighter::styles(&self, range, &App) -> Vec<(Range, HighlightStyle)>`
and install it via `InputState::set_custom_highlighter(Some(arc), cx)`.
The trait output is composed alongside the existing sources in
`element.rs::highlight_lines` with a fixed precedence:

  tree-sitter (base) -> custom (overlay) -> diagnostics (top)

Diagnostics keep highest priority so error underlines remain visible
regardless of language coloring. When no custom highlighter is set
(the default), the new combine call passes an empty `Vec` — zero
behavior change for existing consumers.

The new public surface is one trait, one field on the existing
`InputMode::CodeEditor` variant (enum-private), one setter, and the
re-exports.

Includes:
- Unit test verifying the setter round-trips (install / clear).
- Integration test verifying tree-sitter and custom outputs compose
  through `gpui::combine_highlights` without panicking and produce
  well-formed (non-overlapping, ascending) ranges.
@huacnlee
Copy link
Copy Markdown
Member

huacnlee commented May 3, 2026

The Syntect is rather poor;

I used it in the early stages of implementing CodeEditor. It's not suitable for complex applications.

TreeSitter is currently the best choice, and it's easily extensible. I've already designed ways to extend it; you can see the example of adding tree-sitter-navi in ​​the documentation.

LanguageRegistry::singleton().register(
"navi",
&LanguageConfig::new(
"navi",
tree_sitter_navi::LANGUAGE.into(),
vec![],
tree_sitter_navi::HIGHLIGHTS_QUERY,
"",
"",
),
);

And TreeSitter is easy to extend a new language.

Therefore, I don't really want to introduce this mechanism, as it would bring more problems.

First, styles method are designed for high-performance scenarios involving highlighting massive amounts of code (200,000 lines or more). To enabling this customize, especially for users with insufficient understanding of editor development, will likely result in poor performance. This requires a high level of knowledge in the code editor domain from the user.

@huacnlee
Copy link
Copy Markdown
Member

huacnlee commented May 3, 2026

I think I understand your requirements. Sorry, you wrote too much, but my English isn't very good.

Please look at the implementation of MonacoEditor to see how its API is designed to support this requirement. We should refer to it.

And then please add an example of this new feature to examples/editor.rs so I can test this case when I make changes later.

When you finished that, I will review and accept this feature.

glani added a commit to glani/gpui-component that referenced this pull request May 5, 2026
Reshapes the trait per maintainer feedback on PR longbridge#2328:

    - fn styles(&self, range: Range<usize>, cx: &App)
    -     -> Vec<(Range<usize>, HighlightStyle)>;
    + fn tokens(&self, range: Range<usize>)
    +     -> Vec<(Range<usize>, SharedString)>;

The previous shape returned pre-resolved `HighlightStyle` and took
`&App`, forcing implementors to resolve theme tokens on the render
thread every frame. That bypassed upstream's own tree-sitter path,
which emits scope-name strings resolved through `HighlightTheme::style()`
at `highlighter.rs:719`.

Per Monaco Editor's design (referenced in review), tokenizers emit
named scopes; a separate theme map resolves names to styles. The trait
now constrains the hot path to a `SharedString` read so heavy parsing
can be arranged off-thread (subscribe to `InputEvent::Change`, cache,
hand back ranges on call) -- addressing the 200k-line performance
concern by construction.

Token names use upstream's existing scope vocabulary
(`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...);
`.`-namespaced names fall back to their prefix via
`SyntaxColors::style`. Unrecognized names render with the default
style.

Adds a worked example at `crates/story/examples/editor.rs`
(`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` /
`NOTE` markers with `keyword.special`, recomputed on
`InputEvent::Change`. A syntect or language-server consumer would
follow the same shape with a different parser inside `refresh()`.

Tests updated to the new method signature; both still verify
install / clear round-trip and tree-sitter composition through
`combine_highlights`.
Reshapes the trait per maintainer feedback on PR longbridge#2328:

    - fn styles(&self, range: Range<usize>, cx: &App)
    -     -> Vec<(Range<usize>, HighlightStyle)>;
    + fn tokens(&self, range: Range<usize>)
    +     -> Vec<(Range<usize>, SharedString)>;

The previous shape returned pre-resolved `HighlightStyle` and took
`&App`, forcing implementors to resolve theme tokens on the render
thread every frame. That bypassed upstream's own tree-sitter path,
which emits scope-name strings resolved through `HighlightTheme::style()`
at `highlighter.rs:719`.

Per Monaco Editor's design (referenced in review), tokenizers emit
named scopes; a separate theme map resolves names to styles. The trait
now constrains the hot path to a `SharedString` read so heavy parsing
can be arranged off-thread (subscribe to `InputEvent::Change`, cache,
hand back ranges on call) -- addressing the 200k-line performance
concern by construction.

Token names use upstream's existing scope vocabulary
(`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...);
`.`-namespaced names fall back to their prefix via
`SyntaxColors::style`. Unrecognized names render with the default
style.

Adds a worked example at `crates/story/examples/editor.rs`
(`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` /
`NOTE` markers with `keyword.special`, recomputed on
`InputEvent::Change`. A syntect or language-server consumer would
follow the same shape with a different parser inside `refresh()`.

Tests updated to the new method signature; both still verify
install / clear round-trip and tree-sitter composition through
`combine_highlights`.
@glani glani force-pushed the feature/custom-highlighter branch from c1209de to e679936 Compare May 5, 2026 14:42
glani added a commit to glani/gpui-component that referenced this pull request May 5, 2026
Reshapes the trait per maintainer feedback on PR longbridge#2328:

    - fn styles(&self, range: Range<usize>, cx: &App)
    -     -> Vec<(Range<usize>, HighlightStyle)>;
    + fn tokens(&self, range: Range<usize>)
    +     -> Vec<(Range<usize>, SharedString)>;

The previous shape returned pre-resolved `HighlightStyle` and took
`&App`, forcing implementors to resolve theme tokens on the render
thread every frame. That bypassed upstream's own tree-sitter path,
which emits scope-name strings resolved through `HighlightTheme::style()`
at `highlighter.rs:719`.

Per Monaco Editor's design (referenced in review), tokenizers emit
named scopes; a separate theme map resolves names to styles. The trait
now constrains the hot path to a `SharedString` read so heavy parsing
can be arranged off-thread (subscribe to `InputEvent::Change`, cache,
hand back ranges on call) -- addressing the 200k-line performance
concern by construction.

Token names use upstream's existing scope vocabulary
(`"keyword"`, `"string"`, `"comment"`, `"variable.special"`, ...);
`.`-namespaced names fall back to their prefix via
`SyntaxColors::style`. Unrecognized names render with the default
style.

Adds a worked example at `crates/story/examples/editor.rs`
(`MarkerHighlighter`): tags `TODO` / `FIXME` / `XXX` / `HACK` /
`NOTE` markers with `keyword.special`, recomputed on
`InputEvent::Change`. A syntect or language-server consumer would
follow the same shape with a different parser inside `refresh()`.

Tests updated to the new method signature; both still verify
install / clear round-trip and tree-sitter composition through
`combine_highlights`.
@glani
Copy link
Copy Markdown
Contributor Author

glani commented May 5, 2026

Hey.

Revised per your review.

Trait now returns (Range, SharedString) — implementors emit scope
names ("keyword", "comment", …); render path resolves them
through the existing HighlightTheme::style(). Same vocabulary as
the tree-sitter path. No &App on the hot path, so heavy parsing
runs off-thread.

Added example at crates/story/examples/editor.rs — highlights
TODO/FIXME/XXX/HACK/NOTE.

Please have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants