Skip to content

Commit ceaa829

Browse files
authored
Sort nodeset on demand (#330)
Delay sorting, only sort when it is needed. In most case, sorting nodeset is not needed. Sort is only required in: - Final result - Creating nodesets(each nodeset should be axis-ordered) from a single nodeset - Ideally, this can be skipped if the following predicate is not position-dependent - Nodeset passed to a function (first node in document order is used) ### Number of sort operations | XPath | master(before #315) | master(after #315) | this PR | | --- | --- | --- | --- | | `/a/b/c/d/e` | 3 | 4 | 1 | | `(a/b/c/d)[position()>1]/e/f/g` | 5 | 7 | 2 | | `number(/a/b/c/d/e)` | 3 | 4 |1 | | `count(/a/b/c/d/e)` | 3 | 4 | 0 | | `//a//b//c//d//e` | 8 | 9 | 1 | | `/a[1]/b[1]/c[1]/d[1]/e` | 0 | 1 | 1 | #315 removed one `nodesets.size == 1` optimization path. This pull request will reduce the performance regression. To reduce more sort calls, we need to mark nodeset ordering: introducing `Nodeset = Struct.new(:nodes, :order)` but IMO, it shouldn't be done now. If `sort` is optimized, one extra sort won't be a problem. Optimizing `step` will be harder and the code may be complicated. ### Note This pull request will slightly add complexity and a risk to forgot sorting the nodeset in some path. The effect may seem drastic in some case for now, but it's just because `sort` is currently worst `O(n^2)`. We can improve `sort` performance, so there's an option to leave the sort strategy simple.
1 parent 9640216 commit ceaa829

4 files changed

Lines changed: 31 additions & 7 deletions

File tree

lib/rexml/functions.rb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,9 @@ def get_namespace( node_set = nil )
8585
if node_set == nil
8686
yield @context[:node] if @context[:node].respond_to?(:namespace)
8787
else
88-
if node_set.respond_to? :each
88+
if node_set.kind_of? Array
8989
result = []
90-
node_set.each do |node|
90+
XPathParser.sort(node_set).each do |node|
9191
result << yield(node) if node.respond_to?(:namespace)
9292
end
9393
result
@@ -149,7 +149,7 @@ def string( object=@context[:node] )
149149
else
150150
case object
151151
when Array
152-
string(object[0])
152+
string(XPathParser.sort(object).first)
153153
when Float
154154
if object.nan?
155155
"NaN"

lib/rexml/xpath_parser.rb

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ def match(path_stack, node)
156156
result = expr(path_stack, nodeset)
157157
case result
158158
when Array # nodeset
159-
result.uniq
159+
XPathParser.sort(result)
160160
else
161161
[result]
162162
end
@@ -323,7 +323,7 @@ def expr( path_stack, nodeset, context=nil )
323323
# If result is a nodeset, apply following predicates
324324
path_stack.unshift(:node)
325325
nodeset = step(path_stack) do
326-
[:iterate_nodesets, [result]]
326+
[:iterate_nodesets, [XPathParser.sort(result)]]
327327
end
328328
else
329329
return result
@@ -571,6 +571,7 @@ def split_positional_predicates(predicates)
571571
end
572572

573573
# Performs an axis scanning step.
574+
# Returns an unordered non-duplicated nodeset of matching nodes.
574575
# The caller provides a scanner method and its argument, which determines the axis to scan and the nodes to scan from:
575576
# step(path_stack) { [scanner_method, scanner_argument] }
576577
# Scanner methods are called with `(scanner_argument, tester_block, selector)`
@@ -621,7 +622,7 @@ def step(path_stack, any_type: :element)
621622
nodes << node
622623
end
623624
end
624-
new_nodeset = sort(nodes.to_a)
625+
new_nodeset = nodes.to_a
625626
ensure
626627
leave(:step, path_stack, new_nodeset) if @debug
627628
end
@@ -761,7 +762,7 @@ def leave(tag, *args)
761762
# in and out of function calls. If I knew what the index of the nodes was,
762763
# I wouldn't have to do this. Maybe add a document IDX for each node?
763764
# Problems with mutable documents. Or, rewrite everything.
764-
def sort(array_of_nodes)
765+
def self.sort(array_of_nodes)
765766
return array_of_nodes if array_of_nodes.size <= 1
766767

767768
new_arry = []

test/test_jaxen.rb

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,13 @@ def process_value_of(context, variables, namespaces, value_of)
8787
xpath = value_of.attributes["select"]
8888
matched = XPath.match(context, xpath, namespaces, variables, strict: true)
8989

90+
# XPath.match can be a nodeset or a primitive value wrapped in an array.
91+
# We need to unwrap primitive value because Functions doesn't accept array which is not a nodeset.
92+
unless matched.all? { |node| node.is_a?(REXML::Node) }
93+
assert_equal(1, matched.size, 'Primitive value should be a single value')
94+
matched = matched.first
95+
end
96+
9097
message = user_message(context, xpath, matched)
9198
assert_equal(expected || "",
9299
REXML::Functions.string(matched),

test/xpath/test_base.rb

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1505,5 +1505,21 @@ def test_descendant_axis_position_predicate_per_context_node
15051505
result = XPath.match(xmldoc, "//a/descendant::b[1]")
15061506
assert_equal(["b1", "b2"], result.map { |e| e.attributes["id"] })
15071507
end
1508+
1509+
def test_reverse_axis_document_order_sort
1510+
doc = Document.new("<a><b>1</b><c>2</c><d>3</d><e/></a>")
1511+
assert_equal(["b", "c", "d"], XPath.match(doc, "//e/preceding-sibling::*").map(&:name))
1512+
assert_equal(["d"], XPath.match(doc, "//e/preceding-sibling::*[1]").map(&:name))
1513+
assert_equal(["b"], XPath.match(doc, "(//e/preceding-sibling::*)[1]").map(&:name))
1514+
end
1515+
1516+
def test_reverse_axis_function_argument_sort
1517+
doc = Document.new("<a><b>1</b><c>2</c><d>3</d><e/></a>")
1518+
assert_equal(["e"], XPath.match(doc, "//e[name(preceding-sibling::*)='b']").map(&:name))
1519+
assert_equal(["e"], XPath.match(doc, "//e[string(preceding-sibling::*)='1']").map(&:name))
1520+
assert_equal(["e"], XPath.match(doc, "//e[number(preceding-sibling::*)=1]").map(&:name))
1521+
assert_equal(["e"], XPath.match(doc, "//e[10 + preceding-sibling::* = 11]").map(&:name))
1522+
assert_equal(["e"], XPath.match(doc, "//e[preceding-sibling::* = '1']").map(&:name))
1523+
end
15081524
end
15091525
end

0 commit comments

Comments
 (0)