Skip to content

--limit <group> still merges non-selected groups' data, causing host data overwrites #1756

@etiennecollin

Description

@etiennecollin

Describe the bug

When using --limit <group_name>, host data from other groups is still merged into the inventory before filtering occurs. As a result, if the same host exists in multiple groups, the last declared group's data overwrites the others for conflicting keys, even when that group is not part of the limit selection.

This makes it impossible to safely define environment or role-specific host data for the same host across multiple groups when relying on --limit.

For example, given two groups containing the same host with different values for important_data, running with --limit group_a still results in the value from group_b being applied if group_b is declared later in the inventory.

To Reproduce

Sometimes, a user might want to define different groups which assign different data to the same host based on the group using the --limit <group_name> flag. For example, see this sample inventory file:

group_a = [
    (
        "@local",
        {
            "important_data": "foo"
        }, # Some data, let's call it data_a
    )
]

group_b = [
    (
        "@local",
        {
            "important_data": "bar"
        }, # Some other data, let's call it data_b
    )
]

In this case deploying using --limit group_a will use data_b, because data_b overwrites data_a. In other words, if we get the important_data, we get bar instead of the expected foo.

Here is a simple partial stack trace of the overwriting behaviour when running pyinfra inventory.py deploy.py --limit group_a:

  • pyinfra_cli/cli.py:383 where inventory = make_inventory( is called
  • pyinfra_cli/inventory.py:206 where return make_inventory_from_files(inventory, override_data, cwd, group_data_directories) is called
  • pyinfra_cli/inventory.py:362 where fake_inventory = Inventory((all_hosts, all_data), **fake_groups) is called
  • pyinfra/api/inventory.py:58 where self.make_hosts_and_groups(names, groups) is called
  • pyinfra/api/inventory.py:75 where name_to_data[name].update(data) is called

In this case, if the same key is used across multiple groups for the same host, the value for the key is overwritten to the last declared value. Later in the make_hosts_and_groups function, host_data is created with host_data = name_to_data[name]and is used to fill self.host_data[sub_name] = sub_data which is used to initialize the Host objects.

Expected behavior

When using --limit, only the selected groups should contribute host data during inventory construction. In the example above, running:

pyinfra inventory.py deploy.py --limit group_a

should result in:

host.data.important_data == "foo"

and data from group_b should not be merged into the host.

Additional context

This appears to happen because inventory data is merged before the --limit filtering is fully applied. The issue is especially problematic for inventories that intentionally reuse the same host across multiple logical groups with different configuration data.

Potential solutions

I’m not familiar with the codebase, but I see a couple of possible directions for addressing this:

  1. Apply --limit filtering earlier in pyinfra_cli/cli.py (before make_inventory is called), so that only the selected groups contribute host data during inventory construction.
  2. Adjust make_hosts_and_groups in pyinfra/api/inventory.py so that data merging is aware of group context and does not allow overwriting data from other groups.

I think the second solution might be better, as the modifications could potentially be limited to make_hosts_and_groups in pyinfra/api/inventory.py.

Meta

Pyinfra installed with uv

❯ uv run pyinfra --support

    If you are having issues with pyinfra or wish to make feature requests, please
    check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
    When adding an issue, be sure to include the following:

    System: Darwin
      Platform: macOS-15.7.3-arm64-arm-64bit
      Release: 24.6.0
      Machine: arm64
    pyinfra: v3.8.0
      click: v8.3.1
      distro: v1.9.0
      gevent: v25.9.1
      jinja2: v3.1.6
      packaging: v26.0
      paramiko: v3.5.1
      pydantic: v2.12.5
      python-dateutil: v2.9.0.post0
      typeguard: v4.5.1
      types-paramiko: v4.0.0.20260508
      typing-extensions: v4.15.0
    Executable: /Users/etiennecollin/github/homelab/.venv/bin/pyinfra
    Python: 3.12.6 (CPython, Clang 18.1.8 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLICLI mode specific issues.bugLabel for all kind of bugs.

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions