-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Correct TChain::Add handling of partially loaded chain. #22672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| // Minimal reproducer: TChain::Add(TChain*) does not update the cached entry | ||
| // count. After merging chains this way, GetEntries() / GetEntriesFast() return | ||
| // a wrong (too small) value, even though every file and every entry is present | ||
| // in the merged chain (a LoadTree walk and CopyTree both see the full sample). | ||
| // | ||
| // Run: root -l -b -q chain_add_getentries.C | ||
| // | ||
| // Observed with ROOT 6.40.00. | ||
|
|
||
| #include "TChain.h" | ||
| #include "TFile.h" | ||
| #include "TTree.h" | ||
| #include "TRandom3.h" | ||
| #include "TSystem.h" | ||
| #include "TROOT.h" | ||
|
|
||
| namespace { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this then is executed as a macro, why this namespace?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (the macro is a copy/paste of the user report).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah I see: great to have such reproducers delivered to us! Perhaps this can be streamlined before integrating. |
||
|
|
||
| // Create nfiles files under dir, each with a TTree "tree" holding a random | ||
| // number of entries. Returns the total number of entries written. | ||
| Long64_t make_sample(const std::string& dir, int nfiles, TRandom3& rng) | ||
| { | ||
| gSystem->mkdir(dir.c_str(), kTRUE); | ||
| Long64_t total = 0; | ||
| for (int f = 0; f < nfiles; ++f) { | ||
| const std::string fname = dir + "/f_" + std::to_string(f) + ".root"; | ||
| TFile out(fname.c_str(), "RECREATE"); | ||
| TTree t("tree", "tree"); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just adding here that time and again we've had issues with tests opening a TFile and creating a TTree on the stack. Maybe to avoid headeaches better to do both on the heap. |
||
| Double_t x; | ||
| t.Branch("x", &x); | ||
| const Long64_t n = 50 + rng.Integer(200); // 50..249 entries per file | ||
| for (Long64_t i = 0; i < n; ++i) { x = rng.Gaus(); t.Fill(); } | ||
| t.Write(); | ||
| out.Close(); | ||
| total += n; | ||
| } | ||
| return total; | ||
| } | ||
|
|
||
| TChain* make_chain(const std::string& dir, int nfiles) | ||
| { | ||
| TChain* c = new TChain("tree"); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are many places in the test with a bare new without a corresponding delete, it would be much better to handle these via unique_ptr |
||
| for (int f = 0; f < nfiles; ++f) | ||
| c->Add((dir + "/f_" + std::to_string(f) + ".root").c_str()); | ||
| return c; | ||
| } | ||
|
|
||
| } // namespace | ||
|
|
||
| int chain_add_getentries() | ||
| { | ||
| printf("ROOT version: %s\n\n", gROOT->GetVersion()); | ||
|
|
||
| TRandom3 rng(42); | ||
| const std::string base = "chain_add_repro_files"; | ||
| gSystem->Exec(("rm -rf " + base).c_str()); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe something xplatform would be better here? |
||
|
|
||
| // Three independent samples with different file counts. | ||
| const int nA = 4, nB = 7, nC = 5; | ||
| const Long64_t tA = make_sample(base + "/A", nA, rng); | ||
| const Long64_t tB = make_sample(base + "/B", nB, rng); | ||
| const Long64_t tC = make_sample(base + "/C", nC, rng); | ||
| const Long64_t truth = tA + tB + tC; | ||
| printf("entries written: A=%lld B=%lld C=%lld → true total = %lld\n", | ||
| tA, tB, tC, truth); | ||
|
|
||
| // One chain per sample; query each count (e.g. to print per-sample yields). | ||
| TChain* a = make_chain(base + "/A", nA); | ||
| TChain* b = make_chain(base + "/B", nB); | ||
| TChain* c = make_chain(base + "/C", nC); | ||
| printf("per-chain GetEntries(): a=%lld b=%lld c=%lld (sum=%lld)\n", | ||
| a->GetEntries(), b->GetEntries(), c->GetEntries(), | ||
| a->GetEntries() + b->GetEntries() + c->GetEntries()); | ||
|
|
||
| // Merge by adding TChains into a TChain. | ||
| a->Add(b); | ||
| a->Add(c); | ||
|
|
||
| bool result = (a->GetEntries() == truth); | ||
|
|
||
| printf("\nafter a->Add(b); a->Add(c):\n"); | ||
| printf(" GetEntries() = %lld ← %s (expected %lld)\n", | ||
| a->GetEntries(), result ? "Correct" : "WRONG", truth); | ||
| printf(" GetEntriesFast() = %lld ← %s\n", a->GetEntriesFast(), (a->GetEntriesFast() == truth) ? "Correct" : "WRONG"); | ||
| printf(" files in chain = %d (all present)\n", | ||
| a->GetListOfFiles()->GetEntries()); | ||
|
|
||
| // The entries really are all there: an explicit LoadTree walk reaches them. | ||
| Long64_t nav = 0; | ||
| while (a->LoadTree(nav) >= 0) ++nav; | ||
| printf(" reachable via LoadTree loop = %lld ← correct total\n", nav); | ||
|
|
||
| // Workaround: add the files into a single chain instead of chaining chains. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm not following the logic well, but why is the workaround part of the test?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose it is no longer strictly necessary but harmless to do the additional cross check. I will update the comment. |
||
| TChain* good = new TChain("tree"); | ||
| for (int f = 0; f < nA; ++f) good->Add((base + "/A/f_" + std::to_string(f) + ".root").c_str()); | ||
| for (int f = 0; f < nB; ++f) good->Add((base + "/B/f_" + std::to_string(f) + ".root").c_str()); | ||
| for (int f = 0; f < nC; ++f) good->Add((base + "/C/f_" + std::to_string(f) + ".root").c_str()); | ||
| printf("\nsingle chain (Add per file): GetEntries() = %lld ← %s\n", | ||
| good->GetEntries(), (good->GetEntries() == truth) ? "Correct" : "WRONG"); | ||
|
|
||
| return result ? 0 : 1; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -222,19 +222,25 @@ Int_t TChain::Add(TChain* chain) | |
| delete[] fTreeOffset; | ||
| fTreeOffset = trees; | ||
| } | ||
| chain->GetEntries(); //to force the computation of nentries | ||
|
|
||
| TIter next(chain->GetListOfFiles()); | ||
| Int_t nf = 0; | ||
| TChainElement* element = nullptr; | ||
| while ((element = (TChainElement*) next())) { | ||
| Long64_t nentries = element->GetEntries(); | ||
| if (fTreeOffset[fNtrees] == TTree::kMaxEntries) { | ||
| if (nentries == TTree::kMaxEntries) { | ||
| fEntries = TTree::kMaxEntries; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this line completely. fEntries is the member of this chain (the one where Add(TChain *other) is being called). So why are we setting the number of entries to kMaxEntries if the entries of the other chain are not known? Aren't we losing info on the preexisting number of entries in this chain?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are not. |
||
| fTreeOffset[fNtrees+1] = TTree::kMaxEntries; | ||
| } else { | ||
| fTreeOffset[fNtrees+1] = fTreeOffset[fNtrees] + nentries; | ||
| if (fTreeOffset[fNtrees] == TTree::kMaxEntries) { | ||
| fTreeOffset[fNtrees+1] = TTree::kMaxEntries; | ||
| } else { | ||
| fTreeOffset[fNtrees+1] = fTreeOffset[fNtrees] + nentries; | ||
| } | ||
| if (fEntries != TTree::kMaxEntries) | ||
| fEntries += nentries; | ||
| } | ||
| fNtrees++; | ||
| fEntries += nentries; | ||
| TChainElement* newelement = new TChainElement(element->GetName(), element->GetTitle()); | ||
| newelement->SetPacketSize(element->GetPacketSize()); | ||
| newelement->SetNumberEntries(nentries); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aclic'd macro works great. I wonder if an executable can be better, making testing a bit faster, of course if no dictionary is required to be generated on the fly.