Skip to content

Commit a6aa43c

Browse files
authored
Optimize XPath step (#315)
Refactor `step` so that nodeset materialization is deferred. Instead of building the full nodeset up front and filtering through predicates, each axis returns a *scan descriptor* (`[generator_name, generator_argument]`), and `step` picks a scan strategy based on the predicates' shape. Predicates are classified into three groups: | kind | examples | strategy passed to the generator | |---|---|---| | position-independent | `[@A="1"]`, `[name()="foo"]`, `[@A=@b]` | `:uniq` — emit deduplicated matching nodes | | simple positional | `[N]`, `[position()=N]`, `[position()>N]`, `[position()<N]` | `[op, value]` — positional scan with one comparison | | complex / position-dependent | `[position()*@A]`, `[last()-1]`, ... | `:nodesets` — fall back to per-anchor nodesets + the previous `evaluate_predicate` pipeline | Mixed predicate lists are split: position-independent predicates *before* the first positional predicate are folded into the node test; predicates *after* it are applied per-node on the result. Each axis can implement zero, one, or all of the three strategies. If a strategy is not implemented, the generator falls back to producing `:nodesets` and the common slow path (`non_optimized_nodesets_select`) handles dedup / positional filtering on flattened nodesets — i.e. the same behavior as before this PR. This pull request adds fast paths for: - `descendant` / `descendant-or-self`: `:uniq` (single DFS with a seen-set; this is what speeds up `//a//a//a//a`) - `ancestor` / `ancestor-or-self`: `:uniq` (parent-chain walk with a seen-set) - `preceding-sibling` / `following-sibling`: `:uniq` and `[op, value]` (sibling scan with anchor-index tracking) Other axes (`child`, `parent`, `self`, `attribute`, `preceding`, `following`, etc) keeps the previous behavior via the fallback path; they can be optimized in follow-ups without changing call sites. ## Detail For `//a//a//a` style queries, the previous code built nodesets keyed by each anchor, including the same descendant once per anchor. The new `:uniq` path scans every node at most once per step. For `[position() > N]` style predicates on wide trees (e.g. `//a/preceding-sibling::*[position()>2]`), we previously built the full preceding-sibling nodeset for each anchor and then ran `evaluate_predicate`. The new `[op, value]` path scans children once per parent and uses anchor-index bookkeeping to recover per-anchor positions. Note: general XPath cannot be linear — e.g. `*[position() * number(@A) % number(@b) = 1]` is genuinely O(n²) — so the goal is only to add a fast-path for specific case: position-independent predicates and simple-positional predicates. ## Benchmark of best case ```ruby DEPTH = 500 xml = '<a>' * DEPTH + '</a>' * DEPTH doc = REXML::Document.new(xml) WIDTH = 1000 xml_wide = '<root>' + '<child/>' * WIDTH + '</root>' doc_wide = REXML::Document.new(xml_wide) REXML::XPath.match(doc, "//a//a"); # processing time: 30.756939s → 0.126807s REXML::XPath.match(doc_wide, "//*/preceding-sibling::*[position()=10]"); # processing time: 2.446333s → 0.083954s ``` ## Benchmark of various case ### Scenario ```yaml prelude: | require "rexml" xml_wide = "<root>" + (1..1000).map { |i| "<item id='#{i}'/>" }.join + "</root>" wide = REXML::Document.new(xml_wide) xml_deep = "<root>" + (1..1000).map { |i| "<item id='#{i}'>" }.join + '</item>'*1000 + "</root>" deep = REXML::Document.new(xml_deep) benchmark: child: REXML::XPath.match(wide, "root/item") descendant: REXML::XPath.match(deep, "//item") descendant-descendant: REXML::XPath.match(deep, "//item//item") descendant-descendant-wildcard: REXML::XPath.match(deep, "//*//*") ancestor-descendant: REXML::XPath.match(deep, "descendant::*/ancestor::*/descendant::*") preceding-following-sibling: REXML::XPath.match(wide, "//*/preceding-sibling::*/following-sibling::*") preceding-following-sibling-positional: REXML::XPath.match(wide, "//*/preceding-sibling::*[10]/following-sibling::*[10]") ``` ### Compares master, xpath_step_optimize (this pull), sort_on_demand(#330), sort_improve(Emulate ideal sort computation time), and its combinations. There's no implementation of `sort_improve` yet, so I used the code below to emulate the computational cost of ideal sort. ```ruby def sort(array_of_nodes) # Just spend time to emulate the ideal computational cost of sorting nodes parents = Set.new.compare_by_identity array_of_nodes.each { parents << it.parent if it.parent } 4.times do # find the common ancestor nodes = array_of_nodes seen = Set.new.compare_by_identity while nodes.size >= 2 new_nodes = Set.new.compare_by_identity nodes.map(&:parent).each do |parent| if parent && !seen.include?(parent) seen << parent new_nodes << parent end end nodes = new_nodes end # iterate each node's siblings parents.each{it.children.each{}} end array_of_nodes # not sorted end ``` ### Result ``` Comparison: child master: 1288.1 i/s master_sort_improve: 1190.4 i/s - 1.08x slower xpath_step_optimize_sort_on_demand_sort_improve: 875.3 i/s - 1.47x slower xpath_step_optimize_sort_improve: 861.3 i/s - 1.50x slower xpath_step_optimize_sort_on_demand: 92.3 i/s - 13.96x slower xpath_step_optimize: 91.7 i/s - 14.05x slower sort_on_demand: 90.6 i/s - 14.21x slower descendant master_sort_improve: 75.5 i/s xpath_step_optimize_sort_on_demand_sort_improve: 75.1 i/s - 1.01x slower xpath_step_optimize_sort_improve: 68.8 i/s - 1.10x slower sort_on_demand: 21.4 i/s - 3.52x slower xpath_step_optimize_sort_on_demand: 21.4 i/s - 3.52x slower master: 20.9 i/s - 3.61x slower xpath_step_optimize: 11.7 i/s - 6.45x slower descendant-descendant xpath_step_optimize_sort_on_demand_sort_improve: 47.5 i/s xpath_step_optimize_sort_improve: 41.9 i/s - 1.13x slower xpath_step_optimize_sort_on_demand: 17.9 i/s - 2.65x slower master_sort_improve: 8.6 i/s - 5.54x slower sort_on_demand: 6.7 i/s - 7.07x slower xpath_step_optimize: 6.1 i/s - 7.84x slower master: 4.6 i/s - 10.24x slower descendant-descendant-wildcard xpath_step_optimize_sort_on_demand_sort_improve: 339.5 i/s xpath_step_optimize_sort_improve: 155.9 i/s - 2.18x slower xpath_step_optimize_sort_on_demand: 26.4 i/s - 12.86x slower master_sort_improve: 10.0 i/s - 33.96x slower sort_on_demand: 7.7 i/s - 44.30x slower xpath_step_optimize: 6.8 i/s - 50.06x slower master: 4.9 i/s - 68.58x slower ancestor-descendant xpath_step_optimize_sort_on_demand_sort_improve: 377.9 i/s xpath_step_optimize_sort_improve: 203.7 i/s - 1.85x slower xpath_step_optimize_sort_on_demand: 26.3 i/s - 14.39x slower xpath_step_optimize: 8.7 i/s - 43.24x slower master_sort_improve: 7.8 i/s - 48.55x slower sort_on_demand: 6.3 i/s - 59.93x slower master: 5.0 i/s - 75.46x slower preceding-following-sibling xpath_step_optimize_sort_on_demand_sort_improve: 684.1 i/s xpath_step_optimize_sort_improve: 424.5 i/s - 1.61x slower xpath_step_optimize_sort_on_demand: 85.8 i/s - 7.98x slower xpath_step_optimize: 23.7 i/s - 28.91x slower master_sort_improve: 20.9 i/s - 32.72x slower sort_on_demand: 19.3 i/s - 35.39x slower master: 13.9 i/s - 49.11x slower preceding-following-sibling-positional xpath_step_optimize_sort_on_demand_sort_improve: 425.4 i/s xpath_step_optimize_sort_improve: 315.0 i/s - 1.35x slower xpath_step_optimize_sort_on_demand: 84.3 i/s - 5.05x slower xpath_step_optimize: 23.3 i/s - 18.22x slower master_sort_improve: 2.1 i/s - 201.38x slower sort_on_demand: 2.1 i/s - 204.75x slower master: 1.9 i/s - 222.08x slower ``` In scenario "child" and "descendant", this PR is slower than master because it adds one additional `sort` call. The difference will be small when `sort` is improved. In most case, this PR itself does not unleash its full potential because sort is the next bottleneck. Combining with `sort` improvement is important. The difference of "descendant-descendant" and "descendant-descendant-wildcard" shows that after optimizing sort, the bottleneck will be namespace lookup in qname check for deeply nested xml.
1 parent 7d9e7c2 commit a6aa43c

5 files changed

Lines changed: 573 additions & 169 deletions

File tree

0 commit comments

Comments
 (0)