Optimize preceding-sibling, following-sibling, following, and descendant axes with a unified single mechanism#341
Open
tompng wants to merge 2 commits into
Open
Conversation
Expand optimizable simple positional queries to support `[last()]` `[last()-N]` and `[position() <=> last()-N]`. This will open the door for future optimization
…ant axes with a unified single mechanism
There was a problem hiding this comment.
Pull request overview
This PR refactors XPath axis scanning to unify positional-predicate optimization across descendant, descendant-or-self, following, and sibling axes via a shared event-stream mechanism (sequence_positional_scan), extending support to simple last()/last()-N forms and adding targeted regression tests.
Changes:
- Extend simple positional predicate detection to cover
last()andlast()-N, producing 0-based forward/reverse index operators. - Replace per-axis positional handling with a shared event-stream scan (
sequence_positional_scan) and integrate it into descendant/following/sibling axis scanners. - Add tests covering out-of-range
last()-N, and positional behaviors across descendant/descendant-or-self/following and sibling axes.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| test/xpath/test_predicate.rb | Adds regression tests for out-of-range last()-N predicates. |
| test/xpath/test_base.rb | Adds positional tests for descendant/descendant-or-self/following across multiple anchor shapes. |
| test/xpath/test_axis_preceding_sibling.rb | Expands sibling-axis tests to cover additional position expressions and last()-N variants. |
| lib/rexml/xpath_parser.rb | Implements unified positional predicate parsing and the shared event-stream scanning mechanism; updates axes to use it. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
980
to
989
| def following(nodeset, tester, selector) | ||
| nodesets = nodeset.select {|node| node.respond_to?(:parent) }.map do |node| | ||
| following_nodes(node) | ||
| anchors = Set.new.compare_by_identity.replace(nodeset) | ||
| events = [] | ||
| descendant_traverse_event(nodeset.first.document || nodeset.first.root) do |type, node| | ||
| events << :push if type == :leave && anchors.include?(node) | ||
| events << node if !events.empty? && type == :enter && tester.call(node) | ||
| end | ||
| non_optimized_nodesets_select(nodesets, tester, selector) | ||
| end | ||
|
|
||
| def following_nodes(node) | ||
| followings = [] | ||
| following_node = next_sibling_node(node) | ||
| while following_node | ||
| followings << following_node | ||
| following_node = following_node_of(following_node) | ||
| end | ||
| followings | ||
| end | ||
|
|
||
| def following_node_of( node ) | ||
| return node.children[0] if node.kind_of?(Element) and node.children.size > 0 | ||
|
|
||
| next_sibling_node(node) | ||
| end | ||
|
|
||
| def next_sibling_node(node) | ||
| psn = node.next_sibling_node | ||
| while psn.nil? | ||
| return nil if node.parent.nil? or node.parent.class == Document | ||
| node = node.parent | ||
| psn = node.next_sibling_node | ||
| end | ||
| psn | ||
| anchors.size.times { events << :pop } | ||
| sequence_positional_scan(events, selector) | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on top of #339
Fixes #331
Unify the positional-predicate handling of the
descendant,descendant-or-self,following,following-sibling, andpreceding-siblingaxes into a singleevent-stream mechanism (
sequence_positional_scan).Detail
Consider xpath like this:
anchor/axis::*[test-predicate][simple-positional-predicate].Some xpath axes (descendant, descendant-or-self, following, preceding-sibling, following-sibling) have same common structure:
Example:
The above sequence and anchor ranges can be represented as event stream like this:
Events are:
:push: Add a new anchor point:pop: Remove last anchor pointnode: Add a node that passed test predicateAxis scanner of
following,descendant,descendant-or-self,preceding-siblingandfollowing-siblingwill construct an event stream and passes it tosequence_positional_scanwhich implements positional predicate optimization.Optimization logic in
sequence_positional_scanis basically the same as the one implemented inpreceding/following-siblingbefore.Unchanged axes (preceding, ancestor)
precedingandancestoraxes are intentionally left on the existing path. Unlikefollowingaxis, its anchor-exclusion semantics don't fit the nested push/pop model.Benchmark
XPath in this benchmark is specially crafted to precisely measure axis scan without noise.
Using
*andcount()will avoidO(n^2)sort and namespace lookup which may be fixed in a near future.descendant::*[position()<10]preceding-sibling::*[position()<10]following-sibling::*[position()<10]following::*[position()<10]In this example, XPath match is faster than Nokogiri, mainly because nokogiri doesn't optimize
[position()<N].