Skip to content

Commit 9a7697b

Browse files
arekay-nvclaude
andcommitted
docs: add component design specs and refactor developer documentation
- Add Design.md specs for all 15 top-level components under src/inference_endpoint/ - Restructure AGENTS.md: move code style details to DEVELOPMENT.md, update component table with runner.py and async_utils services - Update README.md: add Component Design Specs table, use python3 in examples - Reformat DEVELOPMENT.md: remove emojis, add commit type list, exact-version pinning guidance - Update CLI_QUICK_REFERENCE.md, LOCAL_TESTING.md, ENDPOINT_CLIENT.md, GITHUB_SETUP.md for consistency - Fix stale references: pkl→jsonl throughout, CLIError for eval mode, dataset_manager Design.md reflects current supported formats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c00652d commit 9a7697b

27 files changed

Lines changed: 1866 additions & 142 deletions

File tree

AGENTS.md

Lines changed: 9 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,7 @@ High-performance benchmarking tool for LLM inference endpoints targeting 50k+ QP
99
## Common Commands
1010

1111
```bash
12-
# Development setup
13-
python3.12 -m venv venv && source venv/bin/activate
14-
pip install -e ".[dev,test]"
15-
pre-commit install
12+
# Development setup — see docs/DEVELOPMENT.md for full instructions
1613

1714
# Testing
1815
pytest # All tests (excludes slow/performance)
@@ -73,7 +70,7 @@ CLI is auto-generated from `config/schema.py` Pydantic models via cyclopts. Fiel
7370

7471
- **CLI mode** (`offline`/`online`): cyclopts constructs `OfflineBenchmarkConfig`/`OnlineBenchmarkConfig` (subclasses in `config/schema.py`) directly from CLI args. Type locked via `Literal`. `--dataset` is repeatable with TOML-style format `[perf|acc:]<path>[,key=value...]` (e.g. `--dataset data.csv,samples=500,parser.prompt=article`). Full accuracy support via `accuracy_config.eval_method=pass_at_1` etc.
7572
- **YAML mode** (`from-config`): `BenchmarkConfig.from_yaml_file()` loads YAML, resolves env vars, and auto-selects the right subclass via Pydantic discriminated union. Optional `--timeout`/`--mode` overrides via `config.with_updates()`.
76-
- **eval**: Not yet implemented (raises `NotImplementedError`)
73+
- **eval**: Not yet implemented (raises `CLIError` with a tracking issue link)
7774

7875
### Config Construction & Validation
7976

@@ -137,7 +134,11 @@ src/inference_endpoint/
137134
│ └── utils.py # Port range helpers
138135
├── async_utils/
139136
│ ├── loop_manager.py # LoopManager (uvloop + eager_task_factory)
137+
│ ├── runner.py # run_async() — uvloop + eager_task_factory entry point for CLI commands
140138
│ ├── event_publisher.py # Async event pub/sub
139+
│ ├── services/
140+
│ │ ├── event_logger/ # EventLoggerService: writes EventRecords to JSONL/SQLite
141+
│ │ └── metrics_aggregator/ # MetricsAggregatorService: real-time metrics (TTFT, TPOT, ISL, OSL)
141142
│ └── transport/ # ZMQ-based IPC transport layer
142143
│ ├── protocol.py # Transport protocols + TransportConfig base
143144
│ ├── record.py # Transport records
@@ -192,25 +193,9 @@ tests/
192193

193194
## Development Standards
194195

195-
### Code Style
196+
### Code Style and Pre-commit Hooks
196197

197-
- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
198-
- **Type checking**: `mypy` (via pre-commit)
199-
- **Formatting**: `ruff-format` (double quotes, space indent)
200-
- **License headers**: Required on all Python files (enforced by pre-commit hook `scripts/add_license_header.py`)
201-
- **Conventional commits**: `feat:`, `fix:`, `docs:`, `test:`, `chore:`
202-
203-
### Pre-commit Hooks
204-
205-
All of these run automatically on commit:
206-
207-
- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
208-
- `ruff` (lint + autofix) and `ruff-format`
209-
- `mypy` type checking
210-
- `prettier` for YAML/JSON/Markdown
211-
- License header enforcement
212-
213-
**Always run `pre-commit run --all-files` before committing.**
198+
See [Development Guide](docs/DEVELOPMENT.md) for formatting, linting, and pre-commit hook details.
214199

215200
### Data Types & Serialization
216201

@@ -291,7 +276,7 @@ Update AGENTS.md as part of any PR that includes a **significant refactor**, mea
291276
- **Added or removed CLI commands/subcommands** — update CLI Modes and Common Commands
292277
- **Changed test infrastructure** (new fixtures, changed markers, new test directories) — update Testing section
293278
- **Added or removed key dependencies** — update Key Dependencies table
294-
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update Code Style and Pre-commit Hooks
279+
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)
295280
- **Changed hot-path patterns** (new transport, changed serialization, new performance constraints) — update Performance Guidelines
296281

297282
### How to Update

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@ Generally we encourage people to become MLCommons members if they wish to contri
77
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
88

99
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
10+
11+
For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).

README.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ inference-endpoint benchmark offline \
6666

6767
```bash
6868
# Start local echo server
69-
python -m inference_endpoint.testing.echo_server --port 8765 &
69+
python3 -m inference_endpoint.testing.echo_server --port 8765 &
7070

7171
# Test with dummy dataset (included in repo)
7272
inference-endpoint benchmark offline \
@@ -94,33 +94,51 @@ pytest -m "not performance and not run_explicitly"
9494

9595
## 📚 Documentation
9696

97+
- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines
9798
- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
9899
- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server
99100
- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop
101+
- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning
102+
- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning
100103
- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup
101104

105+
### Component Design Specs
106+
107+
Each top-level component under `src/inference_endpoint/` has a corresponding spec:
108+
109+
| Component | Spec |
110+
| ----------------- | ---------------------------------------------------------------- |
111+
| Core types | [docs/core/Design.md](docs/core/Design.md) |
112+
| Load generator | [docs/load_generator/Design.md](docs/load_generator/Design.md) |
113+
| Endpoint client | [docs/endpoint_client/Design.md](docs/endpoint_client/Design.md) |
114+
| Metrics | [docs/metrics/Design.md](docs/metrics/Design.md) |
115+
| Config | [docs/config/Design.md](docs/config/Design.md) |
116+
| Async utils | [docs/async_utils/Design.md](docs/async_utils/Design.md) |
117+
| Dataset manager | [docs/dataset_manager/Design.md](docs/dataset_manager/Design.md) |
118+
| Commands (CLI) | [docs/commands/Design.md](docs/commands/Design.md) |
119+
| OpenAI adapter | [docs/openai/Design.md](docs/openai/Design.md) |
120+
| SGLang adapter | [docs/sglang/Design.md](docs/sglang/Design.md) |
121+
| Evaluation | [docs/evaluation/Design.md](docs/evaluation/Design.md) |
122+
| Testing utilities | [docs/testing/Design.md](docs/testing/Design.md) |
123+
| Profiling | [docs/profiling/Design.md](docs/profiling/Design.md) |
124+
| Plugins | [docs/plugins/Design.md](docs/plugins/Design.md) |
125+
| Utils | [docs/utils/Design.md](docs/utils/Design.md) |
126+
102127
## 🎯 Architecture
103128

104129
The system follows a modular, event-driven architecture:
105130

106131
```
107-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
108-
│ Dataset │ │ Load │ │ Endpoint │
109-
│ Manager │───▶│ Generator │───▶│ Client │
110-
└─────────────────┘ └─────────────────┘ └─────────────────┘
111-
│ │ │
112-
▼ ▼ ▼
113-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
114-
│ Metrics │ │ Configuration │ │ Endpoint │
115-
│ Collector │◄───│ Manager │ │ (External) │
116-
└─────────────────┘ └─────────────────┘ └─────────────────┘
132+
Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint
133+
134+
Metrics Collector
135+
(EventRecorder + MetricsReporter)
117136
```
118137

119-
- **Load Generator**: Central orchestrator managing query lifecycle
120-
- **Dataset Manager**: Handles benchmark datasets and preprocessing
121-
- **Endpoint Client**: Abstract interface for endpoint communication
122-
- **Metrics Collector**: Performance measurement and analysis
123-
- **Configuration Manager**: System configuration (TBD)
138+
- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines
139+
- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events
140+
- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC
141+
- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter)
124142

125143
## Accuracy Evaluation
126144

@@ -132,14 +150,13 @@ configuration. Currently, Inference Endpoints provides the following pre-defined
132150
- LiveCodeBench (default: lite, release_v6)
133151

134152
However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the
135-
[LiveCodeBench](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md) documentation
136-
for details and explanations.
153+
[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for
154+
details and explanations.
137155

138156
## 🚧 Pending Features
139157

140158
The following features are planned for future releases:
141159

142-
- [ ] **Performance Tuning** - Advanced performance optimization features
143160
- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support
144161
- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages
145162

@@ -166,7 +183,8 @@ We are grateful to these communities for their contributions to LLM benchmarking
166183

167184
## 📄 License
168185

169-
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
186+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for
187+
details.
170188

171189
## 🔗 Links
172190

docs/CLI_QUICK_REFERENCE.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,6 @@
11
# CLI Quick Reference
22

3-
## Architecture
4-
5-
The CLI is auto-generated from Pydantic models in `config/schema.py` using
6-
cyclopts. schema.py is the single source of truth for both YAML configs and CLI flags.
7-
8-
- **All schema fields** available as CLI flags on each subcommand (dotted kebab-case)
9-
- **Shorthand aliases** declared via `cyclopts.Parameter(alias="--flag")` on schema fields
10-
- **`${VAR}` interpolation** in YAML files (with `${VAR:-default}` fallback)
3+
Command-line reference for all `inference-endpoint` subcommands, flags, load patterns, and usage examples.
114

125
## Commands
136

@@ -109,6 +102,8 @@ Flag names shown as `--full.dotted.path --alias`. Both forms work.
109102
- `--endpoint-config.api-key --api-key` - API authentication
110103
- `--endpoint-config.api-type --api-type` - API type: openai/sglang (default: openai)
111104
- `--report-dir` - Report output directory
105+
Note: applies to CLI-driven `benchmark offline` / `benchmark online`; `benchmark from-config`
106+
does not expose a CLI override for `report_dir`, so set it in the YAML.
112107
- `--timeout` - Global timeout in seconds
113108
- `--enable-cpu-affinity / --no-cpu-affinity` - NUMA-aware CPU pinning (default: true)
114109

@@ -169,7 +164,7 @@ Accuracy config is supported in both CLI and YAML:
169164
inference-endpoint benchmark offline \
170165
--endpoints URL --model M \
171166
--dataset perf:perf.jsonl \
172-
--dataset acc:eval.jsonl,accuracy_config.eval_method=pass_at_1,accuracy_config.ground_truth=answer \
167+
--dataset acc:eval.jsonl,accuracy_config.eval_method=pass_at_1,accuracy_config.ground_truth=answer,accuracy_config.extractor=boxed_math_extractor \
173168
--mode both
174169
```
175170

@@ -242,10 +237,9 @@ inference-endpoint init submission
242237
243238
# 2. Edit submission_template.yaml (set model, datasets, ruleset, endpoint)
244239
245-
# 3. Run (YAML mode)
240+
# 3. Run (YAML mode - config-driven; CLI only allows --config, --timeout, and --mode; set report-dir in the YAML)
246241
inference-endpoint benchmark from-config \
247-
--config submission_template.yaml \
248-
--report-dir official_results
242+
--config submission_template.yaml
249243
```
250244

251245
### Validate First

0 commit comments

Comments
 (0)