Feature Suggestion | Serverless Runpod

I see that Runpod has a serverless option. Rather than stopping and starting these instances, is it possible to use these models serverless? It looks like you can modify theBloke's dockerfile and configure a network volume to use the model in the workspace of the network volume.

- dockerfile setup
- create network volume
- create an instance on the network volume
- download the model(s) into the instance, putting it in the volume.
- delete the instance
- mount the volume to the serverless GPU endpoint docker template

I am trying to play with doing this, but I have been busy with work, and I don't know what I'm doing here. I have a lot of questions as to whether this is possible or practical. Does each request wait for the model to load into the VRAM? 

Serverless could be a cheap and easy way to have permanent setups for using Autogen. This could be especially nice for having multiple serverless GPU endpoints for different AI models that specialize in specific tasks without having the cost or risk of leaving an instance running.

Also, can you set a custom API Key for your runpod endpoint? To make sure your endpoints don't get used by someone else.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Suggestion | Serverless Runpod #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Suggestion | Serverless Runpod #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions