Replies: 2 comments 5 replies
-
|
I also would like this. Both a timeout to automatically unload after a timeout, then reload on request. or API unload/load. Similar to ollama. |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
Thanks so much for adding the feature. it properly times out and unloads the model, is there a setting to have it auto reload the last model used when it receives the next api query from silly tavern? Keep up the amazing work with this engine! |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It would be great if the API could load a model into vram upon receiving a request (if not already loaded), and unload a model from vram upon request (like
keep_alivein Ollama). Having the model always loaded when Koboldcpp server is running is problematic when using ComfyUI workflows with other services, models, generations (image generation), that need vram. This might also allow model swapping/changing via API.Beta Was this translation helpful? Give feedback.
All reactions