Skip to content

Optimize preceding-sibling, following-sibling, following, and descendant axes with a unified single mechanism#341

Open
tompng wants to merge 2 commits into
ruby:masterfrom
tompng:xpath_step_optimize_more
Open

Optimize preceding-sibling, following-sibling, following, and descendant axes with a unified single mechanism#341
tompng wants to merge 2 commits into
ruby:masterfrom
tompng:xpath_step_optimize_more

Conversation

@tompng

@tompng tompng commented Jun 27, 2026

Copy link
Copy Markdown
Member

Builds on top of #339
Fixes #331

Unify the positional-predicate handling of the descendant, descendant-or-self,
following, following-sibling, and preceding-sibling axes into a single
event-stream mechanism (sequence_positional_scan).

Detail

Consider xpath like this: anchor/axis::*[test-predicate][simple-positional-predicate].
Some xpath axes (descendant, descendant-or-self, following, preceding-sibling, following-sibling) have same common structure:

  • Anchor start point, anchor end point, and nodes that passed test predicate lines up in a single sequence
  • Anchor start and end are nested and doesn't cross over

Example:

anchor1-start
  node1
  node2
  anchor2-start
    node3
    node4
  anchor2-end
  node5
  anchor3-start
    node6
  anchor3-end
  node7
anchor1-end

# Nodesets: [
#   [node1, node2, node3, node4, node5, node6, node7], # from anchor1
#   [node3, node4], # from anchor2
#   [node6] # from anchor3
# ]

The above sequence and anchor ranges can be represented as event stream like this:

[:push, node1, node2, :push, node3, node4, :pop, node5, :push, node6, :pop, node7, :pop]

Events are:

  • :push: Add a new anchor point
  • :pop: Remove last anchor point
  • node: Add a node that passed test predicate

Axis scanner of following, descendant, descendant-or-self, preceding-sibling and following-sibling will construct an event stream and passes it to sequence_positional_scan which implements positional predicate optimization.
Optimization logic in sequence_positional_scan is basically the same as the one implemented in preceding/following-sibling before.

Unchanged axes (preceding, ancestor)

preceding and ancestor axes are intentionally left on the existing path. Unlike following axis, its anchor-exclusion semantics don't fit the nested push/pop model.

Benchmark

XPath in this benchmark is specially crafted to precisely measure axis scan without noise.
Using * and count() will avoid O(n^2) sort and namespace lookup which may be fixed in a near future.

xml = '<a>'*200+'<a/>'*200+'</a>'*200
xpaths = [
  "count(//*/descendant::*[position()<10])",
  "count(//*/preceding-sibling::*[position()<10])",
  "count(//*/following-sibling::*[position()<10])",
  "count(//*/following::*[position()<10])",
]
xpaths.each do |xpath|
  puts xpath
  doc = REXML::Document.new(xml)
  t=Time.now; p [REXML::XPath.match(doc, xpath), Time.now-t]
  doc = Nokogiri::XML.parse(xml)
  t=Time.now; p [doc.xpath(xpath), Time.now-t]
end
scenario REXML(master) REXML(this PR) Nokogiri
descendant::*[position()<10] 0.032287 sec 0.002089 sec 0.010074 sec
preceding-sibling::*[position()<10] 0.001075 sec 0.001239 sec 0.00265 sec
following-sibling::*[position()<10] 0.000892 sec 0.001041 sec 0.002627 sec
following::*[position()<10] 0.110464 sec 0.000942 sec 0.002706 sec

In this example, XPath match is faster than Nokogiri, mainly because nokogiri doesn't optimize [position()<N].

tompng added 2 commits June 24, 2026 00:11
Expand optimizable simple positional queries to support `[last()]` `[last()-N]` and `[position() <=> last()-N]`.
This will open the door for future optimization
Copilot AI review requested due to automatic review settings June 27, 2026 16:25

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors XPath axis scanning to unify positional-predicate optimization across descendant, descendant-or-self, following, and sibling axes via a shared event-stream mechanism (sequence_positional_scan), extending support to simple last()/last()-N forms and adding targeted regression tests.

Changes:

  • Extend simple positional predicate detection to cover last() and last()-N, producing 0-based forward/reverse index operators.
  • Replace per-axis positional handling with a shared event-stream scan (sequence_positional_scan) and integrate it into descendant/following/sibling axis scanners.
  • Add tests covering out-of-range last()-N, and positional behaviors across descendant/descendant-or-self/following and sibling axes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test/xpath/test_predicate.rb Adds regression tests for out-of-range last()-N predicates.
test/xpath/test_base.rb Adds positional tests for descendant/descendant-or-self/following across multiple anchor shapes.
test/xpath/test_axis_preceding_sibling.rb Expands sibling-axis tests to cover additional position expressions and last()-N variants.
lib/rexml/xpath_parser.rb Implements unified positional predicate parsing and the shared event-stream scanning mechanism; updates axes to use it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rexml/xpath_parser.rb
Comment on lines 980 to 989
def following(nodeset, tester, selector)
nodesets = nodeset.select {|node| node.respond_to?(:parent) }.map do |node|
following_nodes(node)
anchors = Set.new.compare_by_identity.replace(nodeset)
events = []
descendant_traverse_event(nodeset.first.document || nodeset.first.root) do |type, node|
events << :push if type == :leave && anchors.include?(node)
events << node if !events.empty? && type == :enter && tester.call(node)
end
non_optimized_nodesets_select(nodesets, tester, selector)
end

def following_nodes(node)
followings = []
following_node = next_sibling_node(node)
while following_node
followings << following_node
following_node = following_node_of(following_node)
end
followings
end

def following_node_of( node )
return node.children[0] if node.kind_of?(Element) and node.children.size > 0

next_sibling_node(node)
end

def next_sibling_node(node)
psn = node.next_sibling_node
while psn.nil?
return nil if node.parent.nil? or node.parent.class == Document
node = node.parent
psn = node.next_sibling_node
end
psn
anchors.size.times { events << :pop }
sequence_positional_scan(events, selector)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XMLDecl mixed in xpath descendant::node() match result

2 participants