Skip to content

Quanitize and Build templates #2

@sarmiena

Description

@sarmiena

Different models will need different quantizations and trtllm-build flags based on: use case, hardware, expected throughput, etc.

New requirements when running build-engine command:

  • allow no json file to be present (which would just use blank values for all quantize/build). this would mean the user needs to supply bare minimum values manually using --quantize- --trtllm-build-. currently we cause an error when this happens
  • move json files from script/models/*.json to script/build_templates/$MODEL/<template_name>.json
  • Add REPL for asking template preferences

REPL Requirements

  • Asks which template you want to use. Only shows available options from based on the model. Choose a number and hit enter
  • If you select one from the list, it prints the config and asks for confirmation before running.
  • If you say "No", then it reprints options
  • Has a option

Stretch goal

Just so the thought isn't lost, it would be nice for users to publish their build/run settings somewhere and explain what model, use case, hardware, versions, benchmarks, for the given config. Be able to pull those down and just run. Maybe even from the REPL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions