Skip to content

Fix the next_block attribute of the new block in ControlFlowGraph.split_block#170

Merged
MatthieuDartiailh merged 2 commits into
MatthieuDartiailh:mainfrom
BergLucas:fix/split_block
Sep 3, 2025
Merged

Fix the next_block attribute of the new block in ControlFlowGraph.split_block#170
MatthieuDartiailh merged 2 commits into
MatthieuDartiailh:mainfrom
BergLucas:fix/split_block

Conversation

@BergLucas
Copy link
Copy Markdown
Contributor

Hi,

First of all, thank you for maintaining this library, it is really useful and complete!

I'm doing this little PR because while I was refactoring Pynguin's instrumentation component (se2p/pynguin#110), I encountered a small bug in the function ControlFlowGraph.split_block. When splitting a BasicBlock, I think that the next_block attribute of the newly created block should be assigned to the previous next_block attribute, but it is not.

If I'm not mistaken, the execution of the function should produce this result:

Input:
BasicBlock1.next_block = BasicBlock2

Expected result:
BasicBlock1.next_block = BasicBlock3
BasicBlock3.next_block = BasicBlock2

However, it produces this result instead:

Actual result:
BasicBlock1.next_block = BasicBlock3
BasicBlock3.next_block = None

Is this a bug, or is it expected behaviour?

Thank you in advance, and have a nice day!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.43%. Comparing base (12df292) to head (29d660f).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #170   +/-   ##
=======================================
  Coverage   95.43%   95.43%           
=======================================
  Files           7        7           
  Lines        2146     2147    +1     
  Branches      481      481           
=======================================
+ Hits         2048     2049    +1     
  Misses         56       56           
  Partials       42       42           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

I will need a bit more time to double check things but this makes sense to me. Could you modify a test to enforce this ?

@BergLucas
Copy link
Copy Markdown
Contributor Author

Here you go, I've modified the few tests about the split_block method to check that it works in each case.

@BergLucas
Copy link
Copy Markdown
Contributor Author

I forgot to mention this, but I found this potential bug because, for now, it's possible that a TryBegin instruction isn't the last instruction in a BasicBlock. However, in my use case, I needed the TryBegin instructions to always be at the end because they jump to another BasicBlock when an exception is thrown. Therefore, I used the split_block function to enforce this rule. But thinking about it, in a basic block, only the last instruction should able to jump to another block, so I wonder if this isn't another bug.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

Indeed !
The introduction of TryBegin was quite a challenge at the time and I never took the time to think about the possible consequences pour the CFG. Currently, we only introduce block for opcode appearing on opcode.hasjmp. We could extend the logic to opcode in opcode.hasexp when inside a TryBegin (note however that any such opcode could cause the execution of a code block to terminate when encountered outside of a TryBegin). This would likely require to keep track of the target block in case an exception is raised.
I would be open in a PR in that sense if the splitting of block for exception was optional.

Note that the approach you describe is somewhat wrong since it is never the TryBegin that will cause the jump.

@BergLucas
Copy link
Copy Markdown
Contributor Author

Yes, you're right. When I said that the TryBegin instructions caused the jumps, I meant that it was in these instructions that we specified the basic blocks that should be executed in case of exceptions.

I was mainly making this comment because at first, I didn't notice that these instructions weren't necessarily at the end of the basic blocks as, in Python 3.10, their equivalent were the SETUP_FINALLY instructions which were in opcode.hasjrel so they were always at the end and I expected the TryBegin instructions to have the same behaviour.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

Could you describe a bit more what is your use case and what invariants you need the CFG to uphold ?
I think there is room for improvement, but I would like a concrete example to design the right API.

@BergLucas
Copy link
Copy Markdown
Contributor Author

Hi,

Thank you very much for your quick responses!

To summarise, my use case is to create a networkx.DiGraph that matches a ControlFlowGraph so that I can navigate it more easily and add some instrumentation instructions in the right places.

Here is an example of code with explanations below:

from bytecode.instr import Instr, TryBegin, InstrArg
from bytecode import Bytecode, ControlFlowGraph
import matplotlib.pyplot as plt
import networkx as nx


def foo(x: int):
    try:
        return 10 / 0
    except ZeroDivisionError:
        print("Division by zero error")


cfg = ControlFlowGraph.from_bytecode(Bytecode.from_code(foo.__code__))


def display_instr(instr: InstrArg):
    if isinstance(instr, Instr):
        return f"{instr.name} {instr.arg if isinstance(instr.arg, (int, str)) else ''}"

    return str(instr.__class__.__name__)


def display_graph(graph: nx.DiGraph):
    pos = nx.nx_agraph.graphviz_layout(graph, prog="dot")
    labels = {
        n: "\n".join(display_instr(instr) for instr in d["basic_block"])
        for n, d in graph.nodes(data=True)
    }
    nx.draw(graph, pos, labels=labels)
    plt.show()


def convert3_10(control_flow_graph: ControlFlowGraph) -> nx.DiGraph:
    graph = nx.DiGraph()

    for basic_block in control_flow_graph:
        graph.add_node(id(basic_block), basic_block=basic_block)

    for basic_block in control_flow_graph:
        jump = basic_block.get_jump()
        next_block = basic_block.next_block

        if next_block is not None:
            graph.add_edge(id(basic_block), id(next_block))

        if jump is not None:
            graph.add_edge(id(basic_block), id(jump))

    return graph


display_graph(convert3_10(cfg))


def convert3_11(control_flow_graph: ControlFlowGraph) -> nx.DiGraph:
    graph = nx.DiGraph()

    for basic_block in control_flow_graph:
        graph.add_node(id(basic_block), basic_block=basic_block)

    for basic_block in control_flow_graph:
        last_instr = basic_block[-1]
        if isinstance(last_instr, TryBegin):
            jump = last_instr.target
        else:
            jump = basic_block.get_jump()

        next_block = basic_block.next_block

        if next_block is not None:
            graph.add_edge(id(basic_block), id(next_block))

        if jump is not None:
            graph.add_edge(id(basic_block), id(jump))

    return graph


display_graph(convert3_11(cfg))

Python 3.10

The convert3_10 function shows a simplified implementation of what I was doing in Python 3.10 to create a DiGraph matching a ControlFlowGraph. In this case, the invariant that exists on BasicBlocks stating that we can only jump to another BasicBlock at the last instruction of the block is respected.

The use of the convert3_10 function on the foo function gives the following DiGraph:

310

In this graph, we can see that there is a SETUP_FINALLY instruction that indicates the BasicBlock to execute if an exception is raised.

Python 3.11

In Python 3.11, the SETUP_FINALLY instruction was replaced by a new system that significantly changed the way exceptions are handled. You managed this by creating the TryBegin and TryEnd instructions, which, in my opinion, represent the new system in a rather elegant way. Although these instructions do not actually exist in the Python bytecode, I thought that from the viewpoint of the bytecode library, they would work as if they were real instructions and follow the same rules as real instructions. Therefore, I assumed that the invariant on BasicBlocks would be respected and that by looking only at the last instruction, I would be able to easily find the BasicBlock to which we jump when an exception is raised. Thus, I updated the convert3_10 function, which resulted in the convert3_11 function. However, currently, the TryBegin instructions may violate the invariant, which may be the expected behaviour, but it breaks the convert3_11 function. So I wondered whether it was a bug or the expected behaviour that the TryBegin instructions were not necessarily at the end of the BasicBlocks, given that their equivalent in Python 3.10 (the SETUP_FINALLY instructions) were basically doing the same thing and were always at the end.

The use of the convert3_11 function on the foo function gives the following broken DiGraph:

311

It was more of a thought that occurred to me than a change I absolutely need in the library. If the current behaviour is the one you think is the best, then leave it as it is, and I'll use the split_block function to enforce the behaviour I want only in my use case.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

Thanks for your answer. It does help me understand your use case.

However it still seem somewhat backward to me since neither SETUP_FINALLY nor TryBegin can actually jump and only set a target for a possible jump in a later instruction. So I won't change the current behavior to align with your expectation.

Since the existence of exception mean the execution can be interrupted at any block I do not see how to make a usable CFG that cwould only raise exception on the last instruction of a block. However there may be room to make it easier to figure out if instruction within a block are within an exception handling region.

I will merge this shortly and release it as part of the next version which adds support for 3.14.

@MatthieuDartiailh MatthieuDartiailh merged commit 06df583 into MatthieuDartiailh:main Sep 3, 2025
14 checks passed
@BergLucas BergLucas deleted the fix/split_block branch September 3, 2025 21:53
@BergLucas
Copy link
Copy Markdown
Contributor Author

Thank you! And no problem, as long as the split_block function is fixed, I can get the behaviour I expected myself, so it's not a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants