Skip to content

Herb: Implement Syntax Tree Diff Engine#1518

Merged
marcoroth merged 11 commits into
mainfrom
diff-engine
Apr 19, 2026
Merged

Herb: Implement Syntax Tree Diff Engine#1518
marcoroth merged 11 commits into
mainfrom
diff-engine

Conversation

@marcoroth
Copy link
Copy Markdown
Owner

@marcoroth marcoroth commented Mar 28, 2026

This pull request introduces a syntax tree diff engine that compares two parsed Herb syntax trees and produces the minimal set of HTML-semantic differences. The engine is implemented in C so it's available across all bindings (Ruby, Node.js/WASM, Rust, Java).

Motivation

The diff engine is designed to power reactivity and smart repopulation of affected nodes in HTML+ERB templates. By computing the minimal set of semantic changes between two ASTs, consumers can determine exactly which elements, attributes, or text content changed and apply targeted updates instead of reprocessing the entire document.

This enables hot-module reloading (HMR) for HTML+ERB templates, where a dev server can patch only what changed in the DOM and thus preserving element state, focus, scroll position, and event listeners. It also opens the door for incremental re-linting, incremental re-formatting, and language server diagnostics that only recompute affected regions.

How it works

The diff uses a multi-stage approach:

  1. Merkle hashing: a bottom-up pass computes FNV-1a hashes for every node, incorporating all children. Identical subtrees share the same hash and are skipped in O(1). This is the same concept as Merkle trees, used in Git and other content-addressable systems.

  2. LCS-based children diffing: child arrays are compared using the Longest Common Subsequence algorithm to find the minimal edit sequence of insertions, deletions, and keeps. This is the same algorithm behind diff and git diff, applied to AST node arrays instead of text lines.

  3. Move detection: after LCS, unmatched remove+insert pairs are checked for matching identity (same tag name + same attributes). Matches become node_moved operations instead of separate remove+insert. Elements are matched by tag name for the LCS pass, and by tag name + attributes (order-independent, using XOR of attribute hashes) for move detection. This approach is similar to React's reconciliation.

  4. Wrap/unwrap detection: detects when a node is wrapped in a new parent (e.g., <div></div><% if true? %><div></div><% end %>) or unwrapped from one. This uses Merkle hash matching to find removed nodes that appear as children of inserted nodes (wrap) or children of removed nodes that match inserted nodes (unwrap).

Operation types

node_inserted           — new child node added
node_removed            — child node removed
node_replaced           — node type changed entirely
node_moved              — node reordered within parent
node_wrapped            — node wrapped in a new parent element or ERB block
node_unwrapped          — node unwrapped from its parent
text_changed            — HTML text content changed
erb_content_changed     — ERB expression/code changed
attribute_added         — new attribute on element
attribute_removed       — attribute removed from element
attribute_value_changed — attribute value changed
tag_name_changed        — element tag name changed (div → span)

Usage

result = Herb.diff(old_source, new_source)

result.identical?  # => false
result.operations  # => [#<Herb::DiffOperation type=attribute_value_changed path=[0, 0]>, ...]

CLI:

bin/herb diff old.html.erb new.html.erb
4 differences found:

  1. node_removed at path [0]
     old: AST_HTML_ELEMENT_NODE

  2. node_inserted at path [0]
     new: AST_HTML_ELEMENT_NODE

  3. node_inserted at path [1]
     new: AST_HTML_TEXT_NODE

  4. node_inserted at path [2]
     new: AST_HTML_ELEMENT_NODE

Examples

Attribute and content changes
<div class="container">
  <% if current_user %>
    <p>Hello, <%= current_user.name %></p>
  <% end %>
</div>

changed to:

<div class="wrapper" id="main">
  <% if current_user %>
    <p>Hello, <%= current_user.email %></p>
    <span class="badge">Admin</span>
  <% end %>
</div>

produces:

1. attribute_value_changed  (class="container" → class="wrapper")
2. attribute_added          (id="main")
3. erb_content_changed      (current_user.name → current_user.email)
4. node_inserted            (<span>)

Move detection

<ul><li class="a">A</li><li class="b">B</li></ul>

changed to:

<ul><li class="b">B</li><li class="a">A</li></ul>

produces:

1. node_moved  (old_index=1, new_index=0)

Wrap/unwrap detection

<div>Content</div>

changed to:

<% if condition? %><div>Content</div><% end %>

produces:

1. node_wrapped  (old: HTMLElementNode, new: ERBIfNode)

Playground

A new "Diff" tab is added to the playground with two modes. The Live mode diffs on every keystroke, showing a scrollable feed of changes with undo/rollback and in the checkpoint mode you take a snapshot, edit, then explicitly diff. The diff is paused when the parse result has errors.

CleanShot.2026-03-24.at.02.13.58.mp4

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Mar 28, 2026

npx https://pkg.pr.new/@herb-tools/formatter@1518
npx https://pkg.pr.new/@herb-tools/language-server@1518
npx https://pkg.pr.new/@herb-tools/linter@1518

commit: ab18c8c

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 28, 2026

🌿 Interactive Playground and Documentation Preview

A preview deployment has been built for this pull request. Try out the changes live in the interactive playground:


🌱 Grown from commit ab18c8c


✅ Preview deployment has been cleaned up.

@marcoroth marcoroth changed the title Herb: Implement Diff Engine Herb: Implement Syntax Tree Diff Engine Mar 30, 2026
marcoroth added a commit that referenced this pull request Apr 19, 2026
The Ruby 3.0 series has been EOL for over 2 years, and Ruby 3.2 was also
just EOL'd. Lets bump the minimum Ruby version for Herb v0.10 to Ruby
3.2, so we can use `Data` in the gem itself, #1518 is already making use
of `Data` already.
@marcoroth marcoroth marked this pull request as ready for review April 19, 2026 05:38
@marcoroth marcoroth merged commit 2a9d977 into main Apr 19, 2026
32 checks passed
@marcoroth marcoroth deleted the diff-engine branch April 19, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant