Skip to content

Commit 036e358

Browse files
committed
docs: add public quickstart to README
1 parent e00bee0 commit 036e358

File tree

1 file changed

+59
-2
lines changed

1 file changed

+59
-2
lines changed

README.md

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,63 @@ This repository contains **benchmark task definitions**, **evaluation configs**,
66

77
---
88

9+
## Quickstart (Public / First-Time Users)
10+
11+
### Who this repo is for
12+
13+
- Researchers evaluating coding agents on realistic software engineering tasks
14+
- Practitioners comparing baseline vs MCP-enabled agent configurations
15+
- Contributors authoring new benchmark tasks or extending evaluation tooling
16+
17+
### What you can do without Harbor
18+
19+
You can inspect task definitions, run validation and analysis scripts, and use the metrics/report pipeline on existing Harbor run outputs.
20+
21+
```bash
22+
git clone https://github.com/sjarmak/CodeContextBench.git
23+
cd CodeContextBench
24+
25+
# Fast repo sanity check (docs/config refs)
26+
python3 scripts/repo_health.py --quick
27+
28+
# Explore task-based docs navigation
29+
sed -n '1,120p' docs/START_HERE_BY_TASK.md
30+
31+
# Inspect available benchmark suites
32+
ls benchmarks
33+
```
34+
35+
### What requires Harbor (benchmark execution)
36+
37+
Running benchmark tasks requires:
38+
39+
- [Harbor](https://github.com/laude-institute/harbor/tree/main) installed and configured
40+
- Docker
41+
- Valid agent/runtime credentials used by your Harbor setup
42+
- A Max subscription (for the default harness path documented in this repo)
43+
44+
Recommended pre-run checks:
45+
46+
```bash
47+
python3 scripts/check_infra.py
48+
python3 scripts/validate_tasks_preflight.py --all
49+
```
50+
51+
Then start with a dry run:
52+
53+
```bash
54+
bash configs/run_selected_tasks.sh --dry-run
55+
```
56+
57+
### First places to read
58+
59+
- `docs/START_HERE_BY_TASK.md` for task-oriented navigation
60+
- `docs/CONFIGS.md` for the 2-config evaluation matrix
61+
- `docs/EVALUATION_PIPELINE.md` for scoring and reporting outputs
62+
- `docs/REPO_HEALTH.md` for the pre-push health gate
63+
64+
---
65+
966
## Benchmark Suites (SDLC-Aligned)
1067

1168
Eight suites organized by software development lifecycle phase:
@@ -170,6 +227,8 @@ For the full multi-layer evaluation pipeline (verifier, LLM judge, statistical a
170227

171228
## Running with Harbor
172229

230+
This section assumes Harbor is already installed and configured. If not, start with the Quickstart section above and `python3 scripts/check_infra.py`.
231+
173232
### SDLC Tasks
174233

175234
The unified runner executes all 170 SDLC tasks across the 2-config matrix:
@@ -218,8 +277,6 @@ bash configs/run_selected_tasks.sh --selection-file configs/selected_mcp_unique_
218277

219278
All runners support `--baseline-only`, `--full-only`, `--task TASK_ID`, and `--parallel N` flags.
220279

221-
Requires [Harbor](https://github.com/laude-institute/harbor/tree/main) installed and configured with a Max subscription.
222-
223280
---
224281

225282
## Quality Assurance & Validation

0 commit comments

Comments
 (0)