Skip to content

Audit dataset-method field-duplication in tensorflow.xml — comment cites 1-CFA but engine uses 2-CFA #549

@khatchad

Description

@khatchad

Summary

tensorflow.xml duplicates a block of <new> + <putfield> field attachments (shuffle, batch, repeat, prefetch, with_options, take, map, filter, concatenate, reduce, enumerate, …) across roughly ten Dataset-allocating methods (from_tensor_slices, TextLineDataset.do, shuffle.do, batch.do, etc.). The block is annotated as load-bearing with this comment on shuffle.do():

NOTE: This block of dataset field initializations is duplicated intentionally. See shuffle/do() for rationale.

The commit that documented the duplication (99a1a96e) attributes the requirement to WALA's 1-CFA context sensitivity limits.

What's stale

PythonTensorAnalysisEngine.getCallGraphBuilder (lines 95-119) explicitly applies 2-CFA to callees in the TF framework:

final ContextSelector targeted2CFA =
    new nCFAContextSelector(2, new ContextInsensitiveSelector());
...
if (calleeClass.contains(targetFramework)) {
  return targeted2CFA.getCalleeTarget(caller, site, callee, actualParameters);
}

TF callees — including all Dataset methods — use 2-CFA, not 1-CFA. The "load-bearing for 1-CFA" rationale on the XML block doesn't reflect the current engine state.

Proposed work

  1. Spike: in a worktree, strip the duplication from the dataset-allocating methods (keep only the canonical block in from_tensor_slices). Run mvn test from root.
  2. Based on outcome:
    • If green: remove the duplication for real; drop or simplify the comment. The maintenance burden of attaching each new dataset method to every fresh allocation goes away.
    • If regressions: keep the duplication but update the comment with the actual current rationale — likely structural ("chained dataset.shuffle(...).batch(...) requires fresh field attachments at each step so chained operations don't alias each other") rather than depth-of-context. The 1-CFA framing goes away either way.

Out of scope

Other intentional duplicates flagged in #425 (orphaned class blocks, wrong class paths). This issue is specifically about the dataset-method field-attachment block.

Background

Surfaced during the #509 design discussion when looking at whether the same XML-attribute-attachment pattern needed for set_shape could be deduplicated by analogy to the dataset case. Turned out the dataset case's rationale itself was stale.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions