Skip to content

2nd GPU for PFlash? #102

@tomByrer

Description

@tomByrer

I happen to have both RTX3080 (10GB) & 3090.
You mentioned that a drafter on 2nd GPU would prevent loading & unloading of the draft model.
How exactly can I do this?
& could other small LLMs be used, like Qwen Coder 1.5B or Qwen3.5 4b?

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions