Skip to content

Commit bc38820

Browse files
committed
Added support for the use of local CLI agents for inference
1 parent 1033c3b commit bc38820

17 files changed

Lines changed: 1473 additions & 31 deletions

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,5 @@ target
2121
#.idea/
2222

2323
.claude
24-
.codex
24+
.codex
25+
.opencode

Cargo.lock

Lines changed: 3 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,14 @@ members = [
66
resolver = "2"
77

88
[workspace.package]
9-
version = "0.1.3"
9+
version = "0.1.4"
1010
edition = "2024"
1111
license = "Apache-2.0"
1212

1313
[workspace.dependencies]
1414
assert_cmd = "2.0.17"
1515
bitloops-inference-protocol = { path = "crates/bitloops-inference-protocol" }
16+
getrandom = "0.2.17"
1617
mockito = "1.7.0"
1718
pico-args = "0.5.0"
1819
predicates = "3.1.3"

README.md

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ bitloops-inference describe-profile --config config.toml --profile openai_fast
2323

2424
## Config
2525

26-
`bitloops-inference` reads the Bitloops daemon inference config. Text-generation profiles live under `[inference.profiles.<name>]` and reference a runtime from `[inference.runtimes.<name>]`.
26+
`bitloops-inference` reads the Bitloops daemon inference config. Text-generation and structured-generation profiles live under `[inference.profiles.<name>]` and reference a runtime from `[inference.runtimes.<name>]`.
2727

2828
```toml
2929
[inference.runtimes.bitloops_inference]
@@ -49,7 +49,7 @@ temperature = "0.1"
4949
max_output_tokens = 200
5050
```
5151

52-
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately. Non-text-generation profiles in the same daemon config are ignored by `bitloops-inference`.
52+
String fields support `${ENV_VAR}` interpolation. Missing environment variables fail validation immediately. Profiles unrelated to text or structured generation in the same daemon config are ignored by `bitloops-inference`.
5353

5454
The public Bitloops platform gateway has a dedicated `bitloops_platform_chat` driver. It defaults to the production Bitloops platform endpoint, and the Bitloops host can optionally provide `base_url` when a test or non-production override is needed:
5555

@@ -74,8 +74,30 @@ If `base_url` is omitted, `bitloops-inference` uses `https://platform.bitloops.n
7474
- `openai_chat_completions`
7575
- `bitloops_platform_chat`
7676
- `ollama_chat`
77+
- `codex_exec`
78+
- `claude_code_print`
7779

78-
Both providers normalise their outputs into one canonical inference response with `text`, optional `parsed_json`, optional token usage, finish reason, provider name, and model name.
80+
All providers normalise their outputs into one canonical inference response with `text`, optional `parsed_json`, optional token usage, finish reason, provider name, and model name.
81+
82+
Structured-generation CLI profiles use the runtime command and args directly:
83+
84+
```toml
85+
[inference.runtimes.codex]
86+
command = "codex"
87+
args = []
88+
startup_timeout_secs = 5
89+
request_timeout_secs = 300
90+
91+
[inference.profiles.local_agent]
92+
task = "structured_generation"
93+
driver = "codex_exec"
94+
runtime = "codex"
95+
model = "gpt-5.4-mini"
96+
temperature = "0.1"
97+
max_output_tokens = 4096
98+
```
99+
100+
`codex_exec` writes a temporary JSON Schema file, runs `codex exec --output-schema <schema-file> --output-last-message <result-file>`, and returns the parsed result file as `parsed_json`. `claude_code_print` runs `claude -p --output-format json --allowedTools Read,Grep,Glob` and treats schema adherence as prompt-guided JSON rather than strict schema enforcement.
79101

80102
## How Bitloops calls it
81103

@@ -131,6 +153,26 @@ cargo nextest run
131153
cargo dev-clippy
132154
```
133155

156+
There is also an ad hoc manual performance runner that hits a live provider and prints JSON latency analytics, including per-request timings, min/max/mean/median, p95, p99, throughput, and token totals when the provider reports usage. It does not make assertions or act as part of the automated test suite. Each request appends a random cache-buster suffix to the prompt to reduce the chance of provider-side caching affecting the timings. It expects the worker count, prompt, run count, and token through environment variables:
157+
158+
```bash
159+
BITLOOPS_INFERENCE_PERF_WORKERS=4 \
160+
BITLOOPS_INFERENCE_PERF_RUNS=20 \
161+
BITLOOPS_INFERENCE_PERF_PROMPT="Summarise the benefits of isolating provider logic." \
162+
BITLOOPS_PLATFORM_GATEWAY_TOKEN=... \
163+
cargo run -p bitloops-inference --bin bitloops-inference-perf
164+
```
165+
166+
Optional overrides:
167+
168+
- `BITLOOPS_INFERENCE_PERF_DRIVER`: `bitloops_platform_chat` (default) or `openai_chat_completions`
169+
- `BITLOOPS_INFERENCE_PERF_BASE_URL`: override the default provider endpoint
170+
- `BITLOOPS_INFERENCE_PERF_MODEL`: override the default model for the selected driver
171+
- `BITLOOPS_INFERENCE_PERF_SYSTEM_PROMPT`: override the default system prompt
172+
- `BITLOOPS_INFERENCE_PERF_TIMEOUT_SECS`
173+
- `BITLOOPS_INFERENCE_PERF_TEMPERATURE`
174+
- `BITLOOPS_INFERENCE_PERF_MAX_OUTPUT_TOKENS`
175+
134176
## CI and releases
135177

136178
GitHub Actions runs a lean hosted-runner CI pipeline for formatting, clippy, `nextest`, and native release-build smoke checks on Linux, macOS, and Windows.

crates/bitloops-inference-protocol/src/lib.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,17 @@ pub enum ProviderKind {
1212
#[serde(rename = "openai_chat_completions")]
1313
OpenAiChatCompletions,
1414
OllamaChat,
15+
CodexExec,
16+
ClaudeCodePrint,
1517
}
1618

1719
impl ProviderKind {
1820
pub const fn as_str(self) -> &'static str {
1921
match self {
2022
Self::OpenAiChatCompletions => "openai_chat_completions",
2123
Self::OllamaChat => "ollama_chat",
24+
Self::CodexExec => "codex_exec",
25+
Self::ClaudeCodePrint => "claude_code_print",
2226
}
2327
}
2428
}
@@ -130,6 +134,8 @@ pub struct ProviderMetadata {
130134
pub struct ProviderCapabilities {
131135
pub response_modes: Vec<ResponseMode>,
132136
pub usage_reporting: bool,
137+
#[serde(default)]
138+
pub structured_output: Vec<String>,
133139
}
134140

135141
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]

crates/bitloops-inference/Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@ license.workspace = true
66

77
[dependencies]
88
bitloops-inference-protocol.workspace = true
9+
getrandom.workspace = true
910
pico-args.workspace = true
1011
serde.workspace = true
1112
serde_json.workspace = true
13+
tempfile.workspace = true
1214
thiserror.workspace = true
1315
toml.workspace = true
1416
ureq.workspace = true
@@ -17,4 +19,3 @@ ureq.workspace = true
1719
assert_cmd.workspace = true
1820
mockito.workspace = true
1921
predicates.workspace = true
20-
tempfile.workspace = true
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
use std::process::ExitCode;
2+
3+
fn main() -> ExitCode {
4+
match bitloops_inference::run_perf_report_from_env() {
5+
Ok(()) => ExitCode::SUCCESS,
6+
Err(error) => {
7+
eprintln!("{error}");
8+
ExitCode::FAILURE
9+
}
10+
}
11+
}

0 commit comments

Comments
 (0)