You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document details the usage and specifications for the local validation and benchmarking features of `comptext` CLI (`ctxt`).
4
+
5
+
## 1. Local Validation Command
6
+
7
+
The `ctxt validate` command prints the standard local validation commands used to ensure codebase integrity and safety compliance.
8
+
9
+
### Usage
10
+
```bash
11
+
ctxt validate
12
+
```
13
+
14
+
### Output
15
+
```text
16
+
Standard local validation commands:
17
+
cargo fmt --all --check
18
+
cargo check
19
+
cargo test
20
+
cargo clippy -- -D warnings
21
+
```
22
+
23
+
---
24
+
25
+
## 2. Deterministic Benchmark Command
26
+
27
+
The `ctxt benchmark` command evaluates context packaging and model request generation deterministically under an offline sandbox.
28
+
29
+
### Usage
30
+
```bash
31
+
ctxt benchmark --provider dummy "How should I test this repo?"
32
+
```
33
+
34
+
-**`--provider`**: Optional argument. Currently, only `"dummy"` is supported to prevent unauthorized live network calls (fails closed if another provider is specified). Defaults to `"dummy"`.
35
+
-**task description**: The target prompt to run the benchmark against.
36
+
37
+
### Artifact Outputs
38
+
39
+
Each benchmark run builds a schema-checked Context Pack and runs the offline model query. It writes a deterministic JSON artifact to `.comptext/benchmark.latest.json` containing:
40
+
41
+
-`schema_version`: Version of the benchmark format.
42
+
-`task`: The prompt task.
43
+
-`provider`: The provider used.
44
+
-`context_pack_path`: Filepath to the generated Context Pack.
45
+
-`request_artifact_path`: Filepath to the generated Model Request.
46
+
-`response_artifact_path`: Filepath to the generated Model Response.
47
+
-`validation_commands`: List of local validation commands.
48
+
-`network`: Network state declaration (always `"offline-only"` in this phase).
49
+
-`secrets`: Secrets handling status (always `"redacted"`).
50
+
-`status`: Benchmark run completion status (always `"success"` if successful).
0 commit comments