Summary
tensorflow.xml duplicates a block of <new> + <putfield> field attachments (shuffle, batch, repeat, prefetch, with_options, take, map, filter, concatenate, reduce, enumerate, …) across roughly ten Dataset-allocating methods (from_tensor_slices, TextLineDataset.do, shuffle.do, batch.do, etc.). The block is annotated as load-bearing with this comment on shuffle.do():
NOTE: This block of dataset field initializations is duplicated intentionally. See shuffle/do() for rationale.
The commit that documented the duplication (99a1a96e) attributes the requirement to WALA's 1-CFA context sensitivity limits.
What's stale
PythonTensorAnalysisEngine.getCallGraphBuilder (lines 95-119) explicitly applies 2-CFA to callees in the TF framework:
final ContextSelector targeted2CFA =
new nCFAContextSelector(2, new ContextInsensitiveSelector());
...
if (calleeClass.contains(targetFramework)) {
return targeted2CFA.getCalleeTarget(caller, site, callee, actualParameters);
}
TF callees — including all Dataset methods — use 2-CFA, not 1-CFA. The "load-bearing for 1-CFA" rationale on the XML block doesn't reflect the current engine state.
Proposed work
- Spike: in a worktree, strip the duplication from the dataset-allocating methods (keep only the canonical block in
from_tensor_slices). Run mvn test from root.
- Based on outcome:
- If green: remove the duplication for real; drop or simplify the comment. The maintenance burden of attaching each new dataset method to every fresh allocation goes away.
- If regressions: keep the duplication but update the comment with the actual current rationale — likely structural ("chained
dataset.shuffle(...).batch(...) requires fresh field attachments at each step so chained operations don't alias each other") rather than depth-of-context. The 1-CFA framing goes away either way.
Out of scope
Other intentional duplicates flagged in #425 (orphaned class blocks, wrong class paths). This issue is specifically about the dataset-method field-attachment block.
Background
Surfaced during the #509 design discussion when looking at whether the same XML-attribute-attachment pattern needed for set_shape could be deduplicated by analogy to the dataset case. Turned out the dataset case's rationale itself was stale.
Summary
tensorflow.xmlduplicates a block of<new>+<putfield>field attachments (shuffle,batch,repeat,prefetch,with_options,take,map,filter,concatenate,reduce,enumerate, …) across roughly ten Dataset-allocating methods (from_tensor_slices,TextLineDataset.do,shuffle.do,batch.do, etc.). The block is annotated as load-bearing with this comment onshuffle.do():The commit that documented the duplication (
99a1a96e) attributes the requirement to WALA's 1-CFA context sensitivity limits.What's stale
PythonTensorAnalysisEngine.getCallGraphBuilder(lines 95-119) explicitly applies 2-CFA to callees in the TF framework:TF callees — including all Dataset methods — use 2-CFA, not 1-CFA. The "load-bearing for 1-CFA" rationale on the XML block doesn't reflect the current engine state.
Proposed work
from_tensor_slices). Runmvn testfrom root.dataset.shuffle(...).batch(...)requires fresh field attachments at each step so chained operations don't alias each other") rather than depth-of-context. The 1-CFA framing goes away either way.Out of scope
Other intentional duplicates flagged in #425 (orphaned class blocks, wrong class paths). This issue is specifically about the dataset-method field-attachment block.
Background
Surfaced during the #509 design discussion when looking at whether the same XML-attribute-attachment pattern needed for
set_shapecould be deduplicated by analogy to the dataset case. Turned out the dataset case's rationale itself was stale.