Audit dataset-method field-duplication in `tensorflow.xml` — comment cites 1-CFA but engine uses 2-CFA

## Summary

`tensorflow.xml` duplicates a block of `<new>` + `<putfield>` field attachments (`shuffle`, `batch`, `repeat`, `prefetch`, `with_options`, `take`, `map`, `filter`, `concatenate`, `reduce`, `enumerate`, …) across roughly ten Dataset-allocating methods (`from_tensor_slices`, `TextLineDataset.do`, `shuffle.do`, `batch.do`, etc.). The block is annotated as load-bearing with this comment on `shuffle.do()`:

> NOTE: This block of dataset field initializations is duplicated intentionally. See shuffle/do() for rationale.

The commit that documented the duplication (`99a1a96e`) attributes the requirement to **WALA's 1-CFA context sensitivity limits**.

## What's stale

`PythonTensorAnalysisEngine.getCallGraphBuilder` (lines 95-119) explicitly applies **2-CFA** to callees in the TF framework:

```java
final ContextSelector targeted2CFA =
    new nCFAContextSelector(2, new ContextInsensitiveSelector());
...
if (calleeClass.contains(targetFramework)) {
  return targeted2CFA.getCalleeTarget(caller, site, callee, actualParameters);
}
```

TF callees — including all Dataset methods — use 2-CFA, not 1-CFA. The "load-bearing for 1-CFA" rationale on the XML block doesn't reflect the current engine state.

## Proposed work

1. **Spike**: in a worktree, strip the duplication from the dataset-allocating methods (keep only the canonical block in `from_tensor_slices`). Run `mvn test` from root.
2. Based on outcome:
   - **If green**: remove the duplication for real; drop or simplify the comment. The maintenance burden of attaching each new dataset method to every fresh allocation goes away.
   - **If regressions**: keep the duplication but update the comment with the *actual* current rationale — likely structural ("chained `dataset.shuffle(...).batch(...)` requires fresh field attachments at each step so chained operations don't alias each other") rather than depth-of-context. The 1-CFA framing goes away either way.

## Out of scope

Other intentional duplicates flagged in wala/ML#425 (orphaned class blocks, wrong class paths). This issue is specifically about the dataset-method field-attachment block.

## Background

Surfaced during the wala/ML#509 design discussion when looking at whether the same XML-attribute-attachment pattern needed for `set_shape` could be deduplicated by analogy to the dataset case. Turned out the dataset case's rationale itself was stale.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit dataset-method field-duplication in `tensorflow.xml` — comment cites 1-CFA but engine uses 2-CFA #549

Summary

What's stale

Proposed work

Out of scope

Background

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Audit dataset-method field-duplication in tensorflow.xml — comment cites 1-CFA but engine uses 2-CFA #549

Description

Summary

What's stale

Proposed work

Out of scope

Background

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Audit dataset-method field-duplication in `tensorflow.xml` — comment cites 1-CFA but engine uses 2-CFA #549