Skip to content

Feature Suggestion | Serverless Runpod #9

@GeoffMillerAZ

Description

@GeoffMillerAZ

I see that Runpod has a serverless option. Rather than stopping and starting these instances, is it possible to use these models serverless? It looks like you can modify theBloke's dockerfile and configure a network volume to use the model in the workspace of the network volume.

  • dockerfile setup
  • create network volume
  • create an instance on the network volume
  • download the model(s) into the instance, putting it in the volume.
  • delete the instance
  • mount the volume to the serverless GPU endpoint docker template

I am trying to play with doing this, but I have been busy with work, and I don't know what I'm doing here. I have a lot of questions as to whether this is possible or practical. Does each request wait for the model to load into the VRAM?

Serverless could be a cheap and easy way to have permanent setups for using Autogen. This could be especially nice for having multiple serverless GPU endpoints for different AI models that specialize in specific tasks without having the cost or risk of leaving an instance running.

Also, can you set a custom API Key for your runpod endpoint? To make sure your endpoints don't get used by someone else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions