-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[RDF] RDF.RSnapshotOptions with a distributed RDF #21756
Description
Check duplicate issues.
- Checked for duplicates
Description
With a distributed RDF, I would like to use ROOT.RDF.RSnapshotOptions() for writing a TTree into an existing file (and for e.g. lazy evaluation and other options), but I get the error shown at the end of this issue. For using the "UPDATE" option with Snapshot and distributed executor there's naturally the issue of the workers writing to the same file and how to manage this.
The issue is not critical, since the workaround is easy (not using the distributed RDF or not using the RSnapshotOptions and just e.g. hadding the files after distributed processing).
The Traceback of the error seems also confusing, since it says that the issue is TypeError: takes at most 4 arguments (5 given) although there's clearly 3 arguments being passed.
Kind of related to issue #17136.
Error:
Traceback (most recent call last):
File "/eos/home-n/ntoikka/rdf_dask_reproducer/reproducer.py", line 25, in <module>
try:
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Proxy.py", line 273, in _create_new_op
return get_proxy_for(op, newnode)
File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.13.11-41c2c/x86_64-el9-gcc13-opt/lib/python3.13/functools.py", line 934, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Proxy.py", line 301, in _
return get_proxy_for.dispatch(Operation.InstantAction)(operation, node)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Proxy.py", line 290, in _
execute_graph(node)
~~~~~~~~~~~~~^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Proxy.py", line 58, in execute_graph
node.get_head().execute_graph()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/HeadNode.py", line 255, in execute_graph
returned_values = self._execute_and_retrieve_results(mapper, local_nodes)
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/HeadNode.py", line 205, in _execute_and_retrieve_results
return self.backend.ProcessAndMerge(self._build_ranges(), mapper, distrdf_reducer)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Backends/Dask/Backend.py", line 215, in ProcessAndMerge
return final_results.compute()
~~~~~~~~~~~~~~~~~~~~~^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/python3.13/site-packages/dask/base.py", line 374, in compute
(result,) = compute(self, traverse=False, **kwargs)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/python3.13/site-packages/dask/base.py", line 662, in compute
results = schedule(dsk, keys, **kwargs)
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Backends/Dask/Backend.py", line 161, in dask_mapper
return mapper(current_range)
^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Backends/Base.py", line 105, in distrdf_mapper
mergeables = get_mergeable_values(rdf_plus.rdf, current_range.id, computation_graph_callable,
^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/Backends/Base.py", line 62, in get_mergeable_values
actions = computation_graph_callable(starting_node, range_id, exec_id)
^^^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/ComputationGraphGenerator.py", line 220, in trigger_computation_graph
_ACTIONS_REGISTER[exec_id] = generate_computation_graph(graph, starting_node, range_id)
^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/ComputationGraphGenerator.py", line 187, in generate_computation_graph
rdf_node, in_task_op = _call_rdf_operation(node.operation, graph[node.parent_id].rdf_node, range_id)
^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.13.11-41c2c/x86_64-el9-gcc13-opt/lib/python3.13/functools.py", line 934, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
^^^^^^^^^^^^^^^
File "/cvmfs/sft.cern.ch/lcg/views/LCG_109_swan/x86_64-el9-gcc13-opt/lib/DistRDF/ComputationGraphGenerator.py", line 135, in _call_rdf_operation
rdf_node = rdf_operation(*in_task_op.args, **in_task_op.kwargs)
^^^^^^^^^^^^^^^^^
TypeError: Template method resolution failed:
none of the 3 overloaded methods succeeded. Full details:
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, initializer_list<string> columnList, const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, const ROOT::RDF::ColumnNames_t& columnList, const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
none of the 3 overloaded methods succeeded. Full details:
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, initializer_list<string> columnList, const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, const ROOT::RDF::ColumnNames_t& columnList, const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
Failed to instantiate "Snapshot(std::string,std::string,std::string,::ROOT::RDF::RSnapshotOptions*)"
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
TypeError: takes at most 4 arguments (5 given)
Reproducer
from dask.distributed import LocalCluster, Client
import ROOT
def create_connection():
cluster = LocalCluster(n_workers=2, threads_per_worker=1, processes=True, memory_limit="2GiB")
client = Client(cluster)
return client
if __name__ == "__main__":
connection = create_connection()
df0 = ROOT.RDataFrame(1000, executor=connection)
df1 = ROOT.RDataFrame(1000, executor=connection)
ROOT.gRandom.SetSeed(1)
df0 = df0.Define("gaus", "gRandom->Gaus(10, 1)").Define("exponential", "gRandom->Exp(10)")
df1 = df1.Define("gaus", "gRandom->Gaus(10, 1)").Define("exponential", "gRandom->Exp(10)")
# Create a snapshot without options
df0.Snapshot("df0", "df0.root")
# Create a snapshot to update the previous file(s)
options = ROOT.RDF.RSnapshotOptions()
options.fMode = "UPDATE"
try:
df1.Snapshot("df1", "df0.root", options=options) # this is a bit misleading, since the file df0.root doesn't exist because of the _0, _1 split that distribution causes
except Exception as e:
print(f"Error: {e}")
ROOT version
ROOT Version: 6.38.00
Built for linuxx8664gcc on Feb 03 2026, 21:53:45
From tags/6-38-00@6-38-00
Installation method
LCG 109
Operating system
EL9
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status