Commit a6aa43c
authored
Optimize XPath step (#315)
Refactor `step` so that nodeset materialization is deferred. Instead of
building the full nodeset up front and filtering through predicates,
each axis returns a *scan descriptor* (`[generator_name,
generator_argument]`), and `step` picks a scan strategy based on the
predicates' shape.
Predicates are classified into three groups:
| kind | examples | strategy passed to the generator |
|---|---|---|
| position-independent | `[@A="1"]`, `[name()="foo"]`, `[@A=@b]` |
`:uniq` — emit deduplicated matching nodes |
| simple positional | `[N]`, `[position()=N]`, `[position()>N]`,
`[position()<N]` | `[op, value]` — positional scan with one comparison |
| complex / position-dependent | `[position()*@A]`, `[last()-1]`, ... |
`:nodesets` — fall back to per-anchor nodesets + the previous
`evaluate_predicate` pipeline |
Mixed predicate lists are split: position-independent predicates
*before* the first positional predicate are folded into the node test;
predicates *after* it are applied per-node on the result.
Each axis can implement zero, one, or all of the three strategies. If a
strategy is not implemented, the generator falls back to producing
`:nodesets` and the common slow path (`non_optimized_nodesets_select`)
handles dedup / positional filtering on flattened nodesets — i.e. the
same behavior as before this PR.
This pull request adds fast paths for:
- `descendant` / `descendant-or-self`: `:uniq` (single DFS with a
seen-set; this is what speeds up `//a//a//a//a`)
- `ancestor` / `ancestor-or-self`: `:uniq` (parent-chain walk with a
seen-set)
- `preceding-sibling` / `following-sibling`: `:uniq` and `[op, value]`
(sibling scan with anchor-index tracking)
Other axes (`child`, `parent`, `self`, `attribute`, `preceding`,
`following`, etc) keeps the previous behavior via the fallback path;
they can be optimized in follow-ups without changing call sites.
## Detail
For `//a//a//a` style queries, the previous code built nodesets keyed by
each anchor, including the same descendant once per anchor. The new
`:uniq` path scans every node at most once per step.
For `[position() > N]` style predicates on wide trees (e.g.
`//a/preceding-sibling::*[position()>2]`), we previously built the full
preceding-sibling nodeset for each anchor and then ran
`evaluate_predicate`. The new `[op, value]` path scans children once per
parent and uses
anchor-index bookkeeping to recover per-anchor positions.
Note: general XPath cannot be linear — e.g. `*[position() * number(@A) %
number(@b) = 1]` is genuinely O(n²) — so the goal is only to add a
fast-path for specific case: position-independent predicates and
simple-positional predicates.
## Benchmark of best case
```ruby
DEPTH = 500
xml = '<a>' * DEPTH + '</a>' * DEPTH
doc = REXML::Document.new(xml)
WIDTH = 1000
xml_wide = '<root>' + '<child/>' * WIDTH + '</root>'
doc_wide = REXML::Document.new(xml_wide)
REXML::XPath.match(doc, "//a//a");
# processing time: 30.756939s → 0.126807s
REXML::XPath.match(doc_wide, "//*/preceding-sibling::*[position()=10]");
# processing time: 2.446333s → 0.083954s
```
## Benchmark of various case
### Scenario
```yaml
prelude: |
require "rexml"
xml_wide = "<root>" + (1..1000).map { |i| "<item id='#{i}'/>" }.join + "</root>"
wide = REXML::Document.new(xml_wide)
xml_deep = "<root>" + (1..1000).map { |i| "<item id='#{i}'>" }.join + '</item>'*1000 + "</root>"
deep = REXML::Document.new(xml_deep)
benchmark:
child: REXML::XPath.match(wide, "root/item")
descendant: REXML::XPath.match(deep, "//item")
descendant-descendant: REXML::XPath.match(deep, "//item//item")
descendant-descendant-wildcard: REXML::XPath.match(deep, "//*//*")
ancestor-descendant: REXML::XPath.match(deep, "descendant::*/ancestor::*/descendant::*")
preceding-following-sibling: REXML::XPath.match(wide, "//*/preceding-sibling::*/following-sibling::*")
preceding-following-sibling-positional: REXML::XPath.match(wide, "//*/preceding-sibling::*[10]/following-sibling::*[10]")
```
### Compares
master, xpath_step_optimize (this pull), sort_on_demand(#330),
sort_improve(Emulate ideal sort computation time), and its combinations.
There's no implementation of `sort_improve` yet, so I used the code
below to emulate the computational cost of ideal sort.
```ruby
def sort(array_of_nodes)
# Just spend time to emulate the ideal computational cost of sorting nodes
parents = Set.new.compare_by_identity
array_of_nodes.each { parents << it.parent if it.parent }
4.times do
# find the common ancestor
nodes = array_of_nodes
seen = Set.new.compare_by_identity
while nodes.size >= 2
new_nodes = Set.new.compare_by_identity
nodes.map(&:parent).each do |parent|
if parent && !seen.include?(parent)
seen << parent
new_nodes << parent
end
end
nodes = new_nodes
end
# iterate each node's siblings
parents.each{it.children.each{}}
end
array_of_nodes # not sorted
end
```
### Result
```
Comparison:
child
master: 1288.1 i/s
master_sort_improve: 1190.4 i/s - 1.08x slower
xpath_step_optimize_sort_on_demand_sort_improve: 875.3 i/s - 1.47x slower
xpath_step_optimize_sort_improve: 861.3 i/s - 1.50x slower
xpath_step_optimize_sort_on_demand: 92.3 i/s - 13.96x slower
xpath_step_optimize: 91.7 i/s - 14.05x slower
sort_on_demand: 90.6 i/s - 14.21x slower
descendant
master_sort_improve: 75.5 i/s
xpath_step_optimize_sort_on_demand_sort_improve: 75.1 i/s - 1.01x slower
xpath_step_optimize_sort_improve: 68.8 i/s - 1.10x slower
sort_on_demand: 21.4 i/s - 3.52x slower
xpath_step_optimize_sort_on_demand: 21.4 i/s - 3.52x slower
master: 20.9 i/s - 3.61x slower
xpath_step_optimize: 11.7 i/s - 6.45x slower
descendant-descendant
xpath_step_optimize_sort_on_demand_sort_improve: 47.5 i/s
xpath_step_optimize_sort_improve: 41.9 i/s - 1.13x slower
xpath_step_optimize_sort_on_demand: 17.9 i/s - 2.65x slower
master_sort_improve: 8.6 i/s - 5.54x slower
sort_on_demand: 6.7 i/s - 7.07x slower
xpath_step_optimize: 6.1 i/s - 7.84x slower
master: 4.6 i/s - 10.24x slower
descendant-descendant-wildcard
xpath_step_optimize_sort_on_demand_sort_improve: 339.5 i/s
xpath_step_optimize_sort_improve: 155.9 i/s - 2.18x slower
xpath_step_optimize_sort_on_demand: 26.4 i/s - 12.86x slower
master_sort_improve: 10.0 i/s - 33.96x slower
sort_on_demand: 7.7 i/s - 44.30x slower
xpath_step_optimize: 6.8 i/s - 50.06x slower
master: 4.9 i/s - 68.58x slower
ancestor-descendant
xpath_step_optimize_sort_on_demand_sort_improve: 377.9 i/s
xpath_step_optimize_sort_improve: 203.7 i/s - 1.85x slower
xpath_step_optimize_sort_on_demand: 26.3 i/s - 14.39x slower
xpath_step_optimize: 8.7 i/s - 43.24x slower
master_sort_improve: 7.8 i/s - 48.55x slower
sort_on_demand: 6.3 i/s - 59.93x slower
master: 5.0 i/s - 75.46x slower
preceding-following-sibling
xpath_step_optimize_sort_on_demand_sort_improve: 684.1 i/s
xpath_step_optimize_sort_improve: 424.5 i/s - 1.61x slower
xpath_step_optimize_sort_on_demand: 85.8 i/s - 7.98x slower
xpath_step_optimize: 23.7 i/s - 28.91x slower
master_sort_improve: 20.9 i/s - 32.72x slower
sort_on_demand: 19.3 i/s - 35.39x slower
master: 13.9 i/s - 49.11x slower
preceding-following-sibling-positional
xpath_step_optimize_sort_on_demand_sort_improve: 425.4 i/s
xpath_step_optimize_sort_improve: 315.0 i/s - 1.35x slower
xpath_step_optimize_sort_on_demand: 84.3 i/s - 5.05x slower
xpath_step_optimize: 23.3 i/s - 18.22x slower
master_sort_improve: 2.1 i/s - 201.38x slower
sort_on_demand: 2.1 i/s - 204.75x slower
master: 1.9 i/s - 222.08x slower
```
In scenario "child" and "descendant", this PR is slower than master
because it adds one additional `sort` call. The difference will be small
when `sort` is improved.
In most case, this PR itself does not unleash its full potential because
sort is the next bottleneck. Combining with `sort` improvement is
important.
The difference of "descendant-descendant" and
"descendant-descendant-wildcard" shows that after optimizing sort, the
bottleneck will be namespace lookup in qname check for deeply nested
xml.1 parent 7d9e7c2 commit a6aa43c
5 files changed
Lines changed: 573 additions & 169 deletions
File tree
- lib/rexml
- test/xpath
0 commit comments