You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`bitloops-inference` reads the Bitloops daemon inference config. Text-generation profiles live under `[inference.profiles.<name>]` and reference a runtime from `[inference.runtimes.<name>]`.
26
+
`bitloops-inference` reads the Bitloops daemon inference config. Text-generation and structured-generation profiles live under `[inference.profiles.<name>]` and reference a runtime from `[inference.runtimes.<name>]`.
27
27
28
28
```toml
29
29
[inference.runtimes.bitloops_inference]
@@ -49,7 +49,7 @@ temperature = "0.1"
49
49
max_output_tokens = 200
50
50
```
51
51
52
-
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately. Non-text-generation profiles in the same daemon config are ignored by `bitloops-inference`.
52
+
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately. Profiles unrelated to text or structured generation in the same daemon config are ignored by `bitloops-inference`.
53
53
54
54
The public Bitloops platform gateway has a dedicated `bitloops_platform_chat` driver. It defaults to the production Bitloops platform endpoint, and the Bitloops host can optionally provide `base_url` when a test or non-production override is needed:
55
55
@@ -74,8 +74,30 @@ If `base_url` is omitted, `bitloops-inference` uses `https://platform.bitloops.n
74
74
-`openai_chat_completions`
75
75
-`bitloops_platform_chat`
76
76
-`ollama_chat`
77
+
-`codex_exec`
78
+
-`claude_code_print`
77
79
78
-
Both providers normalise their outputs into one canonical inference response with `text`, optional `parsed_json`, optional token usage, finish reason, provider name, and model name.
80
+
All providers normalise their outputs into one canonical inference response with `text`, optional `parsed_json`, optional token usage, finish reason, provider name, and model name.
81
+
82
+
Structured-generation CLI profiles use the runtime command and args directly:
83
+
84
+
```toml
85
+
[inference.runtimes.codex]
86
+
command = "codex"
87
+
args = []
88
+
startup_timeout_secs = 5
89
+
request_timeout_secs = 300
90
+
91
+
[inference.profiles.local_agent]
92
+
task = "structured_generation"
93
+
driver = "codex_exec"
94
+
runtime = "codex"
95
+
model = "gpt-5.4-mini"
96
+
temperature = "0.1"
97
+
max_output_tokens = 4096
98
+
```
99
+
100
+
`codex_exec` writes a temporary JSON Schema file, runs `codex exec --output-schema <schema-file> --output-last-message <result-file>`, and returns the parsed result file as `parsed_json`. `claude_code_print` runs `claude -p --output-format json --allowedTools Read,Grep,Glob` and treats schema adherence as prompt-guided JSON rather than strict schema enforcement.
79
101
80
102
## How Bitloops calls it
81
103
@@ -131,6 +153,26 @@ cargo nextest run
131
153
cargo dev-clippy
132
154
```
133
155
156
+
There is also an ad hoc manual performance runner that hits a live provider and prints JSON latency analytics, including per-request timings, min/max/mean/median, p95, p99, throughput, and token totals when the provider reports usage. It does not make assertions or act as part of the automated test suite. Each request appends a random cache-buster suffix to the prompt to reduce the chance of provider-side caching affecting the timings. It expects the worker count, prompt, run count, and token through environment variables:
157
+
158
+
```bash
159
+
BITLOOPS_INFERENCE_PERF_WORKERS=4 \
160
+
BITLOOPS_INFERENCE_PERF_RUNS=20 \
161
+
BITLOOPS_INFERENCE_PERF_PROMPT="Summarise the benefits of isolating provider logic." \
162
+
BITLOOPS_PLATFORM_GATEWAY_TOKEN=... \
163
+
cargo run -p bitloops-inference --bin bitloops-inference-perf
164
+
```
165
+
166
+
Optional overrides:
167
+
168
+
-`BITLOOPS_INFERENCE_PERF_DRIVER`: `bitloops_platform_chat` (default) or `openai_chat_completions`
169
+
-`BITLOOPS_INFERENCE_PERF_BASE_URL`: override the default provider endpoint
170
+
-`BITLOOPS_INFERENCE_PERF_MODEL`: override the default model for the selected driver
171
+
-`BITLOOPS_INFERENCE_PERF_SYSTEM_PROMPT`: override the default system prompt
172
+
-`BITLOOPS_INFERENCE_PERF_TIMEOUT_SECS`
173
+
-`BITLOOPS_INFERENCE_PERF_TEMPERATURE`
174
+
-`BITLOOPS_INFERENCE_PERF_MAX_OUTPUT_TOKENS`
175
+
134
176
## CI and releases
135
177
136
178
GitHub Actions runs a lean hosted-runner CI pipeline for formatting, clippy, `nextest`, and native release-build smoke checks on Linux, macOS, and Windows.
0 commit comments