Skip to content

Latest commit

 

History

History
23 lines (20 loc) · 1.21 KB

File metadata and controls

23 lines (20 loc) · 1.21 KB

TODO

All current tasks completed. See archived list below.

Archived List

  • Simplify current codebase. Just support dual mode, remove all others. (2026-04-23)
  • Changed target LLM from Gemma 4 26B A4B IT to Qwen3.6-27B. Centralized configuration in .env. (2026-04-24)
  • Set Model ID for API calls as coder, an alias. (2026-04-23)
  • Use environment variable for all API keys for security. (ex. .env) — via deploy_nginx.sh + envsubst from template. (2026-04-23)
  • For easy maintenance, keep this system not requiring sudo privilege if possible — implemented via ~/llm-serving symlink, user-level nginx on high ports, removed port 80 config. (2026-04-23)
  • Establish 2 node llama cpp servers
  • Check options
    • Support Flash Attention (enabled by default, auto)
    • Parallel Execution Slot, --parallel 2 per each node (2 GPUs per node)
  • Support API key (read from .env, DPI_FACTORY_API_KEY)
    • Using nginx with Bearer token auth
  • Internet Serving Feature
  • Model re-naming for easy API calling (alias: coder)
    • Use -a coder when starting servers