Grouping models for auto load/unload in models.ini #24236

malko · 2026-06-06T15:29:11Z

malko
Jun 6, 2026

Hi there, i'm playing a lot around with llama cpp lately i have two gpu cards (rtx4060ti and rtx5060ti) and fine tuned my models.ini to be able to load models smoothly on both cards using llama-server. But when i want to change a model on one card i must first unload manually one model before loading another. And as my two cards have not the same vram available i have each models tied to one or another gpu.

Here's the idea i think will really be a nice addition:
in models.ini if we can add a "group" parameter and then when a request shows up for a model in this group the llama-server could check first which model is loaded in this group, if this is not the requested model it would automatically unload other models from that group. we can even push this a little further and have a group-max-models if we want a group to handle 2 or 3 models at the same time (I personally don't have use for this bur maybe it can be useful for others).

What do you think of that idea ?

I know of llama-swap but this will allow to use llama-server directly without the need to depend on another third party app.
Also it would rely on user to fine tune their config, and avoid llama-server complex code change to check on the actual state of available resources on the host, simply check a setting before executing the request.

I already know the --models-max parameter, but this is different, --models-max doesn't take into account which gpu is used or not, for example on my 5060ti i load qwen3.6 35b A3B or GLM4.7 but on my rtx4060ti i run smaller models like gemma4 E4B, I can't rely on --models-max = 2 as if only qwen is loaded i just can't run GLM4.7 without unloading qwen before. If they are marked in the same group it would unload qwen when i try to "talk" to glm as they are part of the same group and only one of the group may be loaded at a time.

If this is perceived as a good idea, i can even try to make a PR for this if pointed in the right direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouping models for auto load/unload in models.ini #24236

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Grouping models for auto load/unload in models.ini #24236

Uh oh!

malko Jun 6, 2026

Replies: 0 comments

malko
Jun 6, 2026