Describe the bug
When using --limit <group_name>, host data from other groups is still merged into the inventory before filtering occurs. As a result, if the same host exists in multiple groups, the last declared group's data overwrites the others for conflicting keys, even when that group is not part of the limit selection.
This makes it impossible to safely define environment or role-specific host data for the same host across multiple groups when relying on --limit.
For example, given two groups containing the same host with different values for important_data, running with --limit group_a still results in the value from group_b being applied if group_b is declared later in the inventory.
To Reproduce
Sometimes, a user might want to define different groups which assign different data to the same host based on the group using the --limit <group_name> flag. For example, see this sample inventory file:
group_a = [
(
"@local",
{
"important_data": "foo"
}, # Some data, let's call it data_a
)
]
group_b = [
(
"@local",
{
"important_data": "bar"
}, # Some other data, let's call it data_b
)
]
In this case deploying using --limit group_a will use data_b, because data_b overwrites data_a. In other words, if we get the important_data, we get bar instead of the expected foo.
Here is a simple partial stack trace of the overwriting behaviour when running pyinfra inventory.py deploy.py --limit group_a:
pyinfra_cli/cli.py:383 where inventory = make_inventory( is called
pyinfra_cli/inventory.py:206 where return make_inventory_from_files(inventory, override_data, cwd, group_data_directories) is called
pyinfra_cli/inventory.py:362 where fake_inventory = Inventory((all_hosts, all_data), **fake_groups) is called
pyinfra/api/inventory.py:58 where self.make_hosts_and_groups(names, groups) is called
pyinfra/api/inventory.py:75 where name_to_data[name].update(data) is called
In this case, if the same key is used across multiple groups for the same host, the value for the key is overwritten to the last declared value. Later in the make_hosts_and_groups function, host_data is created with host_data = name_to_data[name]and is used to fill self.host_data[sub_name] = sub_data which is used to initialize the Host objects.
Expected behavior
When using --limit, only the selected groups should contribute host data during inventory construction. In the example above, running:
pyinfra inventory.py deploy.py --limit group_a
should result in:
host.data.important_data == "foo"
and data from group_b should not be merged into the host.
Additional context
This appears to happen because inventory data is merged before the --limit filtering is fully applied. The issue is especially problematic for inventories that intentionally reuse the same host across multiple logical groups with different configuration data.
Potential solutions
I’m not familiar with the codebase, but I see a couple of possible directions for addressing this:
- Apply
--limit filtering earlier in pyinfra_cli/cli.py (before make_inventory is called), so that only the selected groups contribute host data during inventory construction.
- Adjust
make_hosts_and_groups in pyinfra/api/inventory.py so that data merging is aware of group context and does not allow overwriting data from other groups.
I think the second solution might be better, as the modifications could potentially be limited to make_hosts_and_groups in pyinfra/api/inventory.py.
Meta
Pyinfra installed with uv
❯ uv run pyinfra --support
If you are having issues with pyinfra or wish to make feature requests, please
check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
When adding an issue, be sure to include the following:
System: Darwin
Platform: macOS-15.7.3-arm64-arm-64bit
Release: 24.6.0
Machine: arm64
pyinfra: v3.8.0
click: v8.3.1
distro: v1.9.0
gevent: v25.9.1
jinja2: v3.1.6
packaging: v26.0
paramiko: v3.5.1
pydantic: v2.12.5
python-dateutil: v2.9.0.post0
typeguard: v4.5.1
types-paramiko: v4.0.0.20260508
typing-extensions: v4.15.0
Executable: /Users/etiennecollin/github/homelab/.venv/bin/pyinfra
Python: 3.12.6 (CPython, Clang 18.1.8 )
Describe the bug
When using
--limit <group_name>, host data from other groups is still merged into the inventory before filtering occurs. As a result, if the same host exists in multiple groups, the last declared group's data overwrites the others for conflicting keys, even when that group is not part of the limit selection.This makes it impossible to safely define environment or role-specific host data for the same host across multiple groups when relying on
--limit.For example, given two groups containing the same host with different values for
important_data, running with--limit group_astill results in the value fromgroup_bbeing applied ifgroup_bis declared later in the inventory.To Reproduce
Sometimes, a user might want to define different groups which assign different data to the same host based on the group using the
--limit <group_name>flag. For example, see this sample inventory file:In this case deploying using
--limit group_awill usedata_b, becausedata_boverwritesdata_a. In other words, if we get theimportant_data, we getbarinstead of the expectedfoo.Here is a simple partial stack trace of the overwriting behaviour when running
pyinfra inventory.py deploy.py --limit group_a:pyinfra_cli/cli.py:383whereinventory = make_inventory(is calledpyinfra_cli/inventory.py:206wherereturn make_inventory_from_files(inventory, override_data, cwd, group_data_directories)is calledpyinfra_cli/inventory.py:362wherefake_inventory = Inventory((all_hosts, all_data), **fake_groups)is calledpyinfra/api/inventory.py:58whereself.make_hosts_and_groups(names, groups)is calledpyinfra/api/inventory.py:75wherename_to_data[name].update(data)is calledIn this case, if the same key is used across multiple groups for the same host, the value for the key is overwritten to the last declared value. Later in the
make_hosts_and_groupsfunction,host_datais created withhost_data = name_to_data[name]and is used to fillself.host_data[sub_name] = sub_datawhich is used to initialize theHostobjects.Expected behavior
When using
--limit, only the selected groups should contribute host data during inventory construction. In the example above, running:should result in:
and data from
group_bshould not be merged into the host.Additional context
This appears to happen because inventory data is merged before the
--limitfiltering is fully applied. The issue is especially problematic for inventories that intentionally reuse the same host across multiple logical groups with different configuration data.Potential solutions
I’m not familiar with the codebase, but I see a couple of possible directions for addressing this:
--limitfiltering earlier inpyinfra_cli/cli.py(beforemake_inventoryis called), so that only the selected groups contribute host data during inventory construction.make_hosts_and_groupsinpyinfra/api/inventory.pyso that data merging is aware of group context and does not allow overwriting data from other groups.I think the second solution might be better, as the modifications could potentially be limited to
make_hosts_and_groupsinpyinfra/api/inventory.py.Meta
Pyinfra installed with
uv❯ uv run pyinfra --support If you are having issues with pyinfra or wish to make feature requests, please check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues . When adding an issue, be sure to include the following: System: Darwin Platform: macOS-15.7.3-arm64-arm-64bit Release: 24.6.0 Machine: arm64 pyinfra: v3.8.0 click: v8.3.1 distro: v1.9.0 gevent: v25.9.1 jinja2: v3.1.6 packaging: v26.0 paramiko: v3.5.1 pydantic: v2.12.5 python-dateutil: v2.9.0.post0 typeguard: v4.5.1 types-paramiko: v4.0.0.20260508 typing-extensions: v4.15.0 Executable: /Users/etiennecollin/github/homelab/.venv/bin/pyinfra Python: 3.12.6 (CPython, Clang 18.1.8 )