Skip to content

Commit 33abd06

Browse files
committed
Add validate and benchmark flow (Phase 9)
1 parent a9760a8 commit 33abd06

5 files changed

Lines changed: 365 additions & 35 deletions

File tree

PROJEKT.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ CompText CLI is an experimental terminal context client for building determinist
2121
```text
2222
CURRENT_PHASE: 9
2323
CURRENT_TASK: Validate and Benchmark
24-
LAST_GREEN_PHASE: 8
25-
STATUS: active
24+
LAST_GREEN_PHASE: 9
25+
STATUS: complete
2626
```
2727

2828
### Autonomy Contract
@@ -83,7 +83,7 @@ git push
8383
| **Phase 6** | Apply Gate | Implement `ctxt apply` to confirm/apply changes and run verification | **COMPLETE** |
8484
| **Phase 7** | Provider Config Layer | Support dynamic provider profile switching and configurations | **COMPLETE** |
8585
| **Phase 8** | OpenAI-Compatible Adapter | Implement OpenAI adapter skeleton | **COMPLETE** |
86-
| **Phase 9** | Validate and Benchmark | Local validation, dry-runs, and deterministic benchmark flows | **ACTIVE** |
86+
| **Phase 9** | Validate and Benchmark | Local validation, dry-runs, and deterministic benchmark flows | **COMPLETE** |
8787

8888
---
8989

docs/VALIDATE_BENCHMARK.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Validation and Benchmarking
2+
3+
This document details the usage and specifications for the local validation and benchmarking features of `comptext` CLI (`ctxt`).
4+
5+
## 1. Local Validation Command
6+
7+
The `ctxt validate` command prints the standard local validation commands used to ensure codebase integrity and safety compliance.
8+
9+
### Usage
10+
```bash
11+
ctxt validate
12+
```
13+
14+
### Output
15+
```text
16+
Standard local validation commands:
17+
cargo fmt --all --check
18+
cargo check
19+
cargo test
20+
cargo clippy -- -D warnings
21+
```
22+
23+
---
24+
25+
## 2. Deterministic Benchmark Command
26+
27+
The `ctxt benchmark` command evaluates context packaging and model request generation deterministically under an offline sandbox.
28+
29+
### Usage
30+
```bash
31+
ctxt benchmark --provider dummy "How should I test this repo?"
32+
```
33+
34+
- **`--provider`**: Optional argument. Currently, only `"dummy"` is supported to prevent unauthorized live network calls (fails closed if another provider is specified). Defaults to `"dummy"`.
35+
- **task description**: The target prompt to run the benchmark against.
36+
37+
### Artifact Outputs
38+
39+
Each benchmark run builds a schema-checked Context Pack and runs the offline model query. It writes a deterministic JSON artifact to `.comptext/benchmark.latest.json` containing:
40+
41+
- `schema_version`: Version of the benchmark format.
42+
- `task`: The prompt task.
43+
- `provider`: The provider used.
44+
- `context_pack_path`: Filepath to the generated Context Pack.
45+
- `request_artifact_path`: Filepath to the generated Model Request.
46+
- `response_artifact_path`: Filepath to the generated Model Response.
47+
- `validation_commands`: List of local validation commands.
48+
- `network`: Network state declaration (always `"offline-only"` in this phase).
49+
- `secrets`: Secrets handling status (always `"redacted"`).
50+
- `status`: Benchmark run completion status (always `"success"` if successful).

reports/phase_9_status.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Phase 9 Status Report: Validate and Benchmark
2+
3+
## Status Summary
4+
- **Phase**: Phase 9: Validate and Benchmark
5+
- **Status**: success
6+
- **Date**: 2026-06-04
7+
8+
---
9+
10+
## Metadata details
11+
- **PHASE**: Phase 9: Validate and Benchmark
12+
- **STATUS**: success
13+
- **FILES_CHANGED**:
14+
- `src/cli.rs`
15+
- `tests/cli_smoke.rs`
16+
- `docs/VALIDATE_BENCHMARK.md`
17+
- `reports/phase_9_status.md`
18+
- `PROJEKT.md`
19+
- **COMMANDS_RUN**:
20+
- `cargo fmt --all --check`
21+
- `cargo check`
22+
- `cargo test`
23+
- `cargo clippy -- -D warnings`
24+
- `cargo run --bin ctxt -- validate`
25+
- `cargo run --bin ctxt -- benchmark --provider dummy "How should I test this repo?"`
26+
- **VALIDATION**:
27+
- Code formatting checked and green.
28+
- Compilation successful without warnings.
29+
- All 35 tests (27 unit tests, 8 integration smoke tests) passed successfully.
30+
- Manual execution of `ctxt validate` and `ctxt benchmark` successfully verified.
31+
- **ARTIFACTS**:
32+
- `.comptext/benchmark.latest.json` (generated during benchmark run, ignored by git)
33+
- **GIT**: Pending commit and push
34+
- **NETWORK**: offline-only (no network requests executed)
35+
- **SECRETS**: Redacted from all outputs and metadata.
36+
- **POLICY_DECISIONS**: Benchmark execution fails closed if any non-dummy/network provider is specified.
37+
- **RISKS**: None. Clean offline mock execution maintains sandbox boundaries.
38+
- **NEXT**: Validate and finalize

0 commit comments

Comments
 (0)