Skip to content

Commit 1fa5145

Browse files
committed
Phase 13.19.ADF.FIX1 + Phase 13.20.ADF: vector draw diagnostic + export_tree performance
Two phases combined in one commit (both non-invasive, both verified). === Phase 13.19.ADF.FIX1 — vector draw() kwarg diagnostic (no source changes) === Diagnostic test suite for the vector draw() kwarg propagation bug. Architect reproducer: aDF.draw('[y1..y6]:staveITS', group_by='mP3', group_by_bins=6) produced 421 legend entries instead of 6. K1 tests (boundary diagnostic, monkey-patch DFDraw methods): K1_1 signature binding : PASS K1_2 single-expr forwarding : PASS K1_3 draw_batch forwarding : PASS K1_4 draw_figures forwarding : SKIP (spec format) K1_5 vector-expr forwarding : PASS Conclusion: ADF forwards kwargs correctly. Bug was dfdraw-internal. K2 tests (end-to-end output verification, post-dfdraw FIX1 fe007b7): K2_1 scalar baseline : PASS K2_2 vector+group_by legend bounded: PASS (gate) K2_3 production reproducer mirror : PASS (architect repro) K2_4 title not duplicated : PASS Verified dfdraw FIX1 works end-to-end through ADF pipeline. No ADF source changes needed for this bug. === Phase 13.20.ADF — export_tree metadata batching (AliasDataFrame.py) === Performance fix: export_tree with N subframes opened ROOT file N+1 times for metadata writing. Each TFile.Open('UPDATE') on a 500MB+ file costs ~5-8s. With 15 subframes in production TPC calibration: ~80-130s overhead. Fix: separate data-write (uproot, unchanged) from metadata-write (ROOT). Batch all metadata into single TFile.Open/Close via new methods: _write_all_data_to_uproot — recursive data write, no ROOT _collect_metadata_targets — builds flat (adf, treename) list _write_all_metadata_to_root — single TFile.Open for all trees _write_metadata_to_tree — writes to already-open TFile Also fixes latent nested-subframe crash: old code passed uproot file object to ROOT.TFile.Open in recursive path (line 4800-4801). E1 roundtrip tests (4/4 PASS): 0, 3, 15 subframes + timing. E2 fix-specific tests (3/4 PASS, 1 xfail): E2_1 single TFile.Open per export : PASS (core gate) E2_2 nested subframe roundtrip : xfail (pre-existing read_tree limitation — does not recursively load nested subframes; export correctly writes them after this fix) E2_3 backward compatibility : PASS E2_4 standalone metadata write : PASS Profile evidence: profile_gr11_tf0.prof, 159s/11% of 1452s total. Expected savings: 80-130s on production files. Cross-team review: Consolidated Performance Review item AliceO2Group#12. Test count: 1510 passed, 7 failed, 1 error, 8 skipped. Pre-existing failures unchanged (6F+1E from Phase 13.18 baseline). Only new failure: E2_2 (xfail, pre-existing read_tree limitation).
1 parent 1f3b45b commit 1fa5145

3 files changed

Lines changed: 657 additions & 15 deletions

File tree

UTILS/dfextensions/AliasDataFrame/AliasDataFrame.py

Lines changed: 72 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4807,37 +4807,95 @@ def export_tree(self, filename_or_file, treename="tree", dropAliasColumns=True,
48074807
f[treename] = {col: export_df[col].values for col in export_df.columns}
48084808
return
48094809

4810-
# Full export mode: existing behavior
4810+
# Full export mode: two-phase write (Phase 13.20.ADF Fix A)
4811+
# Phase 1: write ALL tree data via uproot (single file open)
4812+
# Phase 2: write ALL metadata via ROOT (single TFile.Open)
4813+
# Previous code opened TFile N+1 times for N subframes.
48114814
is_path = isinstance(filename_or_file, str)
48124815

48134816
if is_path:
4814-
with uproot.recreate(filename_or_file,compression=compression) as f:
4815-
self._write_to_uproot(f, treename, dropAliasColumns)
4816-
self._write_metadata_to_root(filename_or_file, treename)
4817+
# Phase 1: uproot data write (main tree + all subframes recursively)
4818+
with uproot.recreate(filename_or_file, compression=compression) as f:
4819+
self._write_all_data_to_uproot(f, treename, dropAliasColumns)
4820+
# Phase 2: ROOT metadata write (single TFile.Open for all trees)
4821+
self._write_all_metadata_to_root(filename_or_file, treename)
48174822
else:
4818-
self._write_to_uproot(filename_or_file, treename, dropAliasColumns)
4819-
for subframe_name, entry in self._subframes.items():
4820-
entry["frame"]._write_metadata_to_root(filename_or_file, f"{treename}__subframe__{subframe_name}")
4823+
# Called from recursive data-write path — data only, no metadata
4824+
self._write_all_data_to_uproot(filename_or_file, treename, dropAliasColumns)
48214825

4822-
def _write_to_uproot(self, uproot_file, treename, dropAliasColumns):
4826+
def _write_all_data_to_uproot(self, uproot_file, treename, dropAliasColumns):
4827+
"""Write tree data for self + all subframes recursively. No metadata, no TFile.Open."""
48234828
export_cols = [col for col in self.df.columns if not dropAliasColumns or col not in self.aliases]
48244829
dtype_casts = {col: np.float32 for col in export_cols if self.df[col].dtype == np.float16}
48254830
export_df = self.df[export_cols].astype(dtype_casts)
48264831

4827-
#uproot_file[treename] = export_df
48284832
uproot_file[treename] = {col: export_df[col].values for col in export_df.columns}
4833+
# Recurse for subframes — data only, no metadata
48294834
for subframe_name, entry in self._subframes.items():
4830-
entry["frame"].export_tree(uproot_file, f"{treename}__subframe__{subframe_name}", dropAliasColumns)
4835+
sf_treename = f"{treename}__subframe__{subframe_name}"
4836+
entry["frame"]._write_all_data_to_uproot(uproot_file, sf_treename, dropAliasColumns)
4837+
4838+
def _collect_metadata_targets(self, treename):
4839+
"""
4840+
Recursively collect (adf_instance, treename) pairs for all trees needing metadata.
4841+
4842+
Returns a flat list: [(self, treename), (sf1, sf1_treename), (sf2, sf2_treename), ...].
4843+
Used by _write_all_metadata_to_root to write everything in a single TFile.Open.
4844+
"""
4845+
targets = [(self, treename)]
4846+
for sf_name, entry in self._subframes.items():
4847+
sf_treename = f"{treename}__subframe__{sf_name}"
4848+
targets.extend(entry["frame"]._collect_metadata_targets(sf_treename))
4849+
return targets
4850+
4851+
def _write_all_metadata_to_root(self, filename, treename):
4852+
"""
4853+
Write metadata for main tree + all subframes in a single TFile.Open.
4854+
4855+
Phase 13.20.ADF Fix A: replaces N+1 separate TFile.Open/Close cycles
4856+
with 1, saving ~80-130s on production files with 15+ subframes.
4857+
"""
4858+
targets = self._collect_metadata_targets(treename)
4859+
f = ROOT.TFile.Open(filename, "UPDATE")
4860+
try:
4861+
for adf_instance, tree_name in targets:
4862+
adf_instance._write_metadata_to_tree(f, tree_name)
4863+
finally:
4864+
f.Close()
48314865

48324866
def _write_metadata_to_root(self, filename, treename):
48334867
"""
4834-
Write schema metadata to ROOT file.
4835-
4868+
Write schema metadata to ROOT file (backward-compatible standalone entry point).
4869+
4870+
Opens TFile, writes metadata for this tree only, closes.
4871+
For batch writing (main + subframes), use _write_all_metadata_to_root instead.
4872+
"""
4873+
f = ROOT.TFile.Open(filename, "UPDATE")
4874+
try:
4875+
self._write_metadata_to_tree(f, treename)
4876+
finally:
4877+
f.Close()
4878+
4879+
def _write_metadata_to_tree(self, open_tfile, treename):
4880+
"""
4881+
Write schema metadata to an already-open TFile. No open/close.
4882+
4883+
Phase 13.20.ADF Fix A: extracted from _write_metadata_to_root so that
4884+
_write_all_metadata_to_root can call it N times within a single
4885+
TFile.Open context.
4886+
48364887
Phase 4b: Uses unified schema serialization format.
48374888
Also sets TTree aliases for ROOT TTree::Draw compatibility.
48384889
"""
4839-
f = ROOT.TFile.Open(filename, "UPDATE")
4840-
tree = f.Get(treename)
4890+
tree = open_tfile.Get(treename)
4891+
if not tree:
4892+
import warnings
4893+
warnings.warn(
4894+
f"_write_metadata_to_tree: tree '{treename}' not found in file. "
4895+
f"Metadata for this tree will not be written.",
4896+
RuntimeWarning
4897+
)
4898+
return
48414899

48424900
# Set TTree aliases for ROOT compatibility
48434901
for alias, expr in self.aliases.items():
@@ -4877,7 +4935,6 @@ def _write_metadata_to_root(self, filename, treename):
48774935
jmeta = json.dumps(metadata)
48784936
tree.GetUserInfo().Add(ROOT.TObjString(jmeta))
48794937
tree.Write("", ROOT.TObject.kOverwrite)
4880-
f.Close()
48814938

48824939
@staticmethod
48834940
def read_tree(filename, treename="tree", entry_start=None, entry_stop=None,

0 commit comments

Comments
 (0)