Skip to content

Commit 78be2b4

Browse files
committed
Adopt daemon-style inference config for v0.1.1
1 parent 6c42e7d commit 78be2b4

7 files changed

Lines changed: 476 additions & 155 deletions

File tree

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
## [0.1.1] - 2026-04-14
2+
3+
### Added
4+
5+
- Support for reading text-generation profiles from the Bitloops daemon config schema, including shared runtimes under `[inference.runtimes.<name>]`.
6+
- Validation for legacy profile fields, unsupported profile keys, missing runtime references, invalid numeric values, and Ollama chat URLs that do not target `/api/chat`.
7+
8+
### Changed
9+
10+
- `bitloops-inference` now reads `task`, `driver`, and `runtime` from daemon-style profile definitions instead of the older per-profile `kind`, `provider_name`, and `timeout_secs` fields.
11+
- Non-`text_generation` inference profiles are ignored during config loading and profile discovery, so CLI commands only expose runnable text-generation profiles.
12+
- Documentation and test fixtures now use the daemon config layout and string-based temperature examples with environment interpolation.
13+
14+
### Fixed
15+
16+
- Request timeouts are now resolved from the referenced runtime’s `request_timeout_secs` value instead of per-profile timeout settings.
17+
- Config validation now reports when a config file does not define any text-generation profiles.
18+
19+
## [0.1.0] - 2026-04-13
20+
21+
### Added
22+
23+
- Initial `bitloops-inference` Rust workspace with a shared protocol crate and a stdio runtime for out-of-process Bitloops inference.
24+
- Protocol v1 request and response types for `describe`, `infer`, and `shutdown`, using line-delimited JSON over `stdin` and `stdout`.
25+
- `run`, `validate-config`, and `describe-profile` CLI commands for running the runtime and inspecting configured inference profiles.
26+
- OpenAI Chat Completions and Ollama Chat providers with normalised text and `json_object` responses, usage reporting, finish reasons, and provider-specific HTTP error handling.
27+
- TOML-based profile configuration with environment-variable interpolation and default inference settings for temperature and output token limits.
28+
- Mocked provider integration tests, child-process protocol-loop tests, hosted-runner CI, and release automation for macOS, Linux, and Windows artefacts.
29+
30+
### Fixed
31+
32+
- Intel macOS release builds now use the correct hosted runner label in the release workflow.
33+
- Release packaging and GitHub Release artefact publication now generate the expected target-specific archives and clean up stale assets.

Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ members = [
66
resolver = "2"
77

88
[workspace.package]
9-
version = "0.1.0"
9+
version = "0.1.1"
1010
edition = "2024"
1111
license = "Apache-2.0"
1212

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,35 @@ bitloops-inference describe-profile --config config.toml --profile openai_fast
2323

2424
## Config
2525

26-
Profiles are defined under `[inference.profiles.<name>]`.
26+
`bitloops-inference` reads the Bitloops daemon inference config. Text-generation profiles live under `[inference.profiles.<name>]` and reference a runtime from `[inference.runtimes.<name>]`.
2727

2828
```toml
29+
[inference.runtimes.bitloops_inference]
30+
request_timeout_secs = 60
31+
2932
[inference.profiles.openai_fast]
30-
kind = "openai_chat_completions"
31-
provider_name = "openai"
33+
task = "text_generation"
34+
driver = "openai_chat_completions"
35+
runtime = "bitloops_inference"
3236
model = "gpt-4.1-mini"
3337
base_url = "https://api.openai.com/v1/chat/completions"
3438
api_key = "${OPENAI_API_KEY}"
35-
temperature = 0.1
36-
timeout_secs = 60
39+
temperature = "0.1"
3740
max_output_tokens = 200
3841

3942
[inference.profiles.ollama_local]
40-
kind = "ollama_chat"
41-
provider_name = "ollama"
43+
task = "text_generation"
44+
driver = "ollama_chat"
45+
runtime = "bitloops_inference"
4246
model = "qwen2.5-coder:14b"
4347
base_url = "http://127.0.0.1:11434/api/chat"
44-
temperature = 0.1
45-
timeout_secs = 120
48+
temperature = "0.1"
4649
max_output_tokens = 200
4750
```
4851

49-
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately.
52+
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately. Non-text-generation profiles in the same daemon config are ignored by `bitloops-inference`.
5053

51-
## Supported provider kinds
54+
## Supported drivers
5255

5356
- `openai_chat_completions`
5457
- `ollama_chat`
@@ -83,19 +86,19 @@ Example responses:
8386
Run config validation first:
8487

8588
```bash
86-
cargo run -p bitloops-inference -- validate-config --config ./config.toml
89+
cargo run -p bitloops-inference -- validate-config --config ./bitloops-daemon-config.toml
8790
```
8891

8992
Describe a profile:
9093

9194
```bash
92-
cargo run -p bitloops-inference -- describe-profile --config ./config.toml --profile ollama_local
95+
cargo run -p bitloops-inference -- describe-profile --config ./bitloops-daemon-config.toml --profile ollama_local
9396
```
9497

9598
Start the stdio runtime:
9699

97100
```bash
98-
cargo run -p bitloops-inference -- run --config ./config.toml --profile ollama_local
101+
cargo run -p bitloops-inference -- run --config ./bitloops-daemon-config.toml --profile ollama_local
99102
```
100103

101104
You can then write protocol lines to `stdin` manually or from another process.

0 commit comments

Comments
 (0)