Skip to content

Potential typo in "Inference with metadata CFG" Figure in Pi0.7: Text Prompt (+) attends to the negative branch? #934

@aiming1998

Description

@aiming1998

Hi Physical Intelligence team,

First of all, thank you for the incredible work on Pi0.7 and the highly insightful technical report!
I was studying the sequence packing strategy for inference-time CFG and noticed a potential discrepancy between the text description and the provided attention mask figure titled "Inference with metadata CFG" in Fig. 19.

  1. The Text Description
    The text states that for efficient inference, both positive and negative examples are packed into the same sequence to construct an attention tree with two branches, "which do not attend to one another." This perfectly aligns with standard CFG logic to maintain parallel, isolated computation.

  2. The Visual Discrepancy
    However, if we look closely at the attention mask matrix (specifically the row for Text Prompt (+)):
    Starting from the causal mask triangle of Text Prompt (+) and looking horizontally to the left, there are solid grey blocks indicating active attention over both the Text Prompt (-) and Flow actions (-) columns.

Image

My Questions:
Is this simply a minor graphical typo in the figure? (i.e., should the area to the left of the Text Prompt (+) triangle be completely blank/masked out to ensure strict isolation?)
Or is there a non-trivial architectural reason why the positive text prompt is designed to attend to the negative tokens and actions? If so, wouldn't this break the parallel processing nature of the CFG branches?
Looking forward to your clarification! Thanks again for the great work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions