All current tasks completed. See archived list below.
- Simplify current codebase. Just support dual mode, remove all others. (2026-04-23)
- Changed target LLM from Gemma 4 26B A4B IT to Qwen3.6-27B. Centralized configuration in
.env. (2026-04-24) - Set Model ID for API calls as
coder, an alias. (2026-04-23) - Use environment variable for all API keys for security. (ex.
.env) — viadeploy_nginx.sh+envsubstfrom template. (2026-04-23) - For easy maintenance, keep this system not requiring
sudoprivilege if possible — implemented via~/llm-servingsymlink, user-level nginx on high ports, removed port 80 config. (2026-04-23) - Establish 2 node llama cpp servers
- Check options
- Support Flash Attention (enabled by default, auto)
- Parallel Execution Slot, --parallel 2 per each node (2 GPUs per node)
- Support API key (read from .env, DPI_FACTORY_API_KEY)
- Using nginx with Bearer token auth
- Internet Serving Feature
- outer port: 19101 ✓
- outer API base: http://125.188.35.185:19101/v1 ✓
- API key required
- Model re-naming for easy API calling (alias:
coder)- Use
-a coderwhen starting servers
- Use