Different models will need different quantizations and trtllm-build flags based on: use case, hardware, expected throughput, etc.
New requirements when running build-engine command:
REPL Requirements
Stretch goal
Just so the thought isn't lost, it would be nice for users to publish their build/run settings somewhere and explain what model, use case, hardware, versions, benchmarks, for the given config. Be able to pull those down and just run. Maybe even from the REPL
Different models will need different quantizations and trtllm-build flags based on: use case, hardware, expected throughput, etc.
New requirements when running build-engine command:
REPL Requirements
Stretch goal
Just so the thought isn't lost, it would be nice for users to publish their build/run settings somewhere and explain what model, use case, hardware, versions, benchmarks, for the given config. Be able to pull those down and just run. Maybe even from the REPL