You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance backend configurations with revision support and chat options (#16)
* fix: update Ollama backend default URL to remove /v1 suffix and set chat mode default to True
* feat: add revision support and unified chat template formatting for all backends
- Add `hf_revision`/`revision` parameter to model config, resolve, and backend constructors
- Parse and propagate revision from model spec (e.g. repo/model@main)
- Implement `format_prompt` utility to apply chat templates using tokenizer or HuggingFace model
- Use `format_prompt` in mlx-lm, llama-cpp, and openai-compat backends for consistent prompt formatting
- Add `transformers` as a required dependency
* feat: add disable_thinking option to suppress reasoning mode across backends
- Introduce `disable_thinking` flag to all backend configs and CLI, defaulting to True for consistent output comparison.
- Implement cross-backend support for disabling reasoning/thinking mode (Qwen3, DeepSeek-R1, Ollama, vLLM, OpenAI/OpenRouter).
- Strip reasoning-trigger tokens from prompts when disabled.
- Route to Ollama's native /api/chat endpoint with `think: false` when appropriate.
- Add tests for prompt formatting, payload construction, and retry logic when disabling thinking.
* feat: add disable_thinking option to suppress reasoning mode across backends
- Introduce `disable_thinking` flag to all backend configs and CLI, defaulting to True for consistent output comparison.
- Implement cross-backend support for disabling reasoning/thinking mode (Qwen3, DeepSeek-R1, Ollama, vLLM, OpenAI/OpenRouter).
- Strip reasoning-trigger tokens from prompts when disabled.
- Route to Ollama's native /api/chat endpoint with `think: false` when appropriate.
- Add tests for prompt formatting, payload construction, and retry logic when disabling thinking.
* refactor: move transformers dependency to optional http extra and cache tokenizer loading
* feat: add revision support for vllm-mlx backend and propagate resolved revision in diff command
* feat: propagate hf_revision to diff command and clarify Ollama chat handling comments
* feat: add --chat option to CLI and propagate chat mode to backend configuration
* fix: refine model spec revision parsing and tighten HF repo detection for tokenizer loading
* fix: pass revision to model loader in mlx backend
0 commit comments