Skip to content

Commit c60b654

Browse files
doc: Documentation cleanup, refactor (#204)
* docs: add component design specs and refactor developer documentation - Add Design.md specs for all 15 top-level components under src/inference_endpoint/ - Restructure AGENTS.md: move code style details to DEVELOPMENT.md, update component table with runner.py and async_utils services - Update README.md: add Component Design Specs table, use python3 in examples - Reformat DEVELOPMENT.md: remove emojis, add commit type list, exact-version pinning guidance - Update CLI_QUICK_REFERENCE.md, LOCAL_TESTING.md, ENDPOINT_CLIENT.md, GITHUB_SETUP.md for consistency - Fix stale references: pkl→jsonl throughout, CLIError for eval mode, dataset_manager Design.md reflects current supported formats --------- Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com> Co-authored-by: Alice Cheng <alicheng@nvidia.com>
1 parent 8c0c63d commit c60b654

36 files changed

Lines changed: 2848 additions & 942 deletions

File tree

.claude/skills/msgspec-patterns/SKILL.md

Lines changed: 412 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
name: msgspec-struct-gc-check
3+
description: Check whether msgspec.Struct types can safely use gc=False. Use when adding or changing msgspec.Struct definitions, or when reviewing code that uses msgspec structs.
4+
allowed-tools: Read, Grep, Glob
5+
---
6+
7+
# msgspec.Struct gc=False Safety Check
8+
9+
## When to use this skill
10+
11+
- Adding or modifying a class that inherits from `msgspec.Struct`
12+
- Reviewing or refactoring code that defines or uses msgspec structs
13+
- Deciding whether to add or remove `gc=False` on a Struct
14+
15+
## Why gc=False matters
16+
17+
Setting `gc=False` on a Struct means instances are **never tracked** by Python's garbage collector. This reduces GC pressure and can improve performance when many structs are allocated. The **only** risk: if a **reference cycle** involves only gc=False structs (or objects not tracked by GC), that cycle will **never be collected** (memory leak).
18+
19+
Reference: [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc).
20+
21+
## Verified safety constraints
22+
23+
Use these constraints to decide if a Struct can use `gc=False`. All must hold.
24+
25+
### 1. No reference cycles
26+
27+
- The struct (and any container it references) must never be part of a reference cycle.
28+
- **Multiple variables** pointing to the same struct (`x = s; y = x`) are **safe** — that is not a cycle. A cycle is A → B → … → A.
29+
- **Returning** a struct from a function is **safe**. What matters is whether any reference path leads back to the struct (e.g. struct's list contains the struct or something that holds the struct).
30+
31+
### 2. No mutation that could create cycles
32+
33+
- **Do not mutate** struct fields after construction in a way that could introduce a cycle (e.g. set a field to an object that references the struct, or append the struct to its own list/dict).
34+
- **Frozen structs** (`frozen=True`) prevent field reassignment; `force_setattr` in `__post_init__` is one-time init only, so that's acceptable.
35+
- Assigning **scalars** (int, str, bool, float, None) to fields is safe — they cannot form cycles.
36+
37+
### 3. Mutable containers (list, dict, set) on the struct
38+
39+
- If the struct has list/dict/set fields, either:
40+
- **Never mutate** those containers after creation (no `.append`, `.update`, `[...] = ...`, etc.), and never store in them any object that references the struct, or
41+
- Do not use `gc=False` (conservative).
42+
- **Reading** from containers (e.g. `x = struct.foobars[i]`) does not create cycles and is allowed.
43+
44+
### 4. Nested structs
45+
46+
- If a struct holds another Struct (or holds containers that hold Structs), the same rules apply to the whole reference graph: no cycles, no mutation that could create cycles. If any nested Struct uses `gc=False`, the whole graph must still be cycle-free.
47+
48+
### 5. Generic / mixins
49+
50+
- With `gc=False`, the type must be compatible with `__slots__` (e.g. if using `Generic`, the mixin must define `__slots__ = ()`). See msgspec issue #631 / PR #635.
51+
52+
## Checklist for "can use gc=False"
53+
54+
- [ ] Struct and everything it references can never participate in a reference cycle.
55+
- [ ] No mutation of struct fields after construction that could introduce a cycle (frozen or init-only mutation is ok; scalar assignment is ok).
56+
- [ ] Any list/dict/set fields are never mutated after creation, or we do not use gc=False.
57+
- [ ] No storing the struct (or anything that references it) inside its own container fields.
58+
- [ ] If Generic/mixins are used, `__slots__` compatibility is satisfied.
59+
60+
## Checklist for "must NOT use gc=False"
61+
62+
- [ ] Struct is mutated after creation in a way that could create a cycle (e.g. appending self to a list field).
63+
- [ ] Container fields are mutated after creation and could hold the struct or back-references.
64+
- [ ] Struct is used in a pattern where it's stored in a container that the struct (or its fields) also references.
65+
66+
## Quick per-struct analysis steps
67+
68+
1. List all fields and their types (scalars vs containers vs nested Structs).
69+
2. Search the codebase for: assignments to this struct's fields, mutations of its container fields (`.append`, `.update`, etc.), and any place the struct instance is stored (e.g. in a list/dict that might be referenced by the struct).
70+
3. If only scalars or immutable types, or frozen with no container mutation → likely safe for gc=False.
71+
4. If mutable containers and they're never mutated (and never made to reference the struct) → likely safe; otherwise → do not use gc=False.
72+
73+
## Risky structs: audit and at-risk comment
74+
75+
A struct is **risky** for gc=False if it has a condition that would normally disallow gc=False (e.g. mutable list/dict/set fields), but that condition might never arise in practice (e.g. the field is only ever read, never mutated after construction).
76+
77+
### Auditing a risky struct
78+
79+
1. Identify the at-risk condition (e.g. "has `metadata: dict` that could be mutated").
80+
2. Search the codebase for all uses of that struct and of the at-risk field:
81+
- Any assignment to the field: `obj.field = ...`, `obj.field[key] = ...`, `obj.field.append(...)`, `obj.field.update(...)`, etc.
82+
- Any code path that could store the struct (or something holding it) inside that container.
83+
3. If the audit finds **no** such mutation or cycle-creating storage, the condition never arises and gc=False is acceptable **provided** you add the at-risk marker so future changes are re-audited.
84+
85+
### When audit passes
86+
87+
- Set `gc=False` on the struct.
88+
- Add an **at-risk comment** and docstring note:
89+
90+
- **Above the class**: a short comment stating why gc=False is used despite the at-risk condition, and when the audit was done (e.g. `# gc=False: audit YYYY-MM: <condition> is only read, never mutated.`).
91+
- **In the docstring**: a line that signals to future readers and to this skill that changes touching this struct must be re-audited. Use this format:
92+
93+
`AT-RISK (gc=False): Has <brief condition>. Any change that <what would violate safety> must be audited; if so, remove gc=False.`
94+
95+
- Example (for a struct with a `metadata` dict that is only ever read):
96+
97+
```python
98+
# gc=False: audit 2026-03: metadata dict is only ever read, never mutated after construction.
99+
class QueryResult(msgspec.Struct, ..., gc=False):
100+
"""Result of a completed inference query.
101+
102+
AT-RISK (gc=False): Has mutable container field `metadata`. Any change that
103+
mutates `metadata` after construction or stores this struct in a container
104+
referenced by this struct must be audited; if so, remove gc=False.
105+
...
106+
```
107+
108+
### When touching an at-risk struct
109+
110+
If you are adding or changing code that uses a struct marked AT-RISK (gc=False):
111+
112+
1. Re-run the audit for that struct (searches above).
113+
2. If your change mutates the at-risk field(s) or creates a cycle (e.g. stores the struct in its own container), **remove** `gc=False` from the struct and remove the at-risk comment/docstring line.
114+
3. If your change does not touch the at-risk field or create cycles, the existing gc=False and at-risk comment remain; you may add a short note in the at-risk comment if the audit was re-checked (e.g. update the audit date).
115+
116+
## References
117+
118+
- [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc)
119+
- [msgspec Performance Tips – Use gc=False](https://jcristharif.com/msgspec/perf-tips.html#use-gc-false)
120+
- [msgspec #631 – Generic structs and gc=False](https://github.com/jcrist/msgspec/issues/631)

.github/workflows/test.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,13 @@ jobs:
3030
run: |
3131
pytest -xv -m "not slow and not performance" --cov=src --cov-report=xml --cov-report=html
3232
33-
- name: Upload coverage to Codecov
34-
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
33+
- name: Upload coverage report
34+
uses: actions/upload-artifact@v4
3535
with:
36-
file: ./coverage.xml
37-
flags: unittests
38-
name: codecov-umbrella
39-
fail_ci_if_error: false
36+
name: coverage-report
37+
path: |
38+
coverage.xml
39+
htmlcov/
4040
4141
audit:
4242
runs-on: ubuntu-latest

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,5 +189,10 @@ outputs/
189189
# Example vLLM virtualenv
190190
examples/03_BenchmarkComparison/vllm_venv/
191191

192-
# Cursor artifacts (local development only)
192+
# Agent artifacts (local development only)
193193
.cursor_artifacts/
194+
.claude/agent-memory/
195+
196+
# User-specific local rules (local Docker dev); do not commit
197+
.cursor/rules/local-docker-dev.mdc
198+
CLAUDE.local.md

AGENTS.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ CLI is auto-generated from `config/schema.py` Pydantic models via cyclopts. Fiel
7373

7474
- **CLI mode** (`offline`/`online`): cyclopts constructs `OfflineBenchmarkConfig`/`OnlineBenchmarkConfig` (subclasses in `config/schema.py`) directly from CLI args. Type locked via `Literal`. `--dataset` is repeatable with TOML-style format `[perf|acc:]<path>[,key=value...]` (e.g. `--dataset data.csv,samples=500,parser.prompt=article`). Full accuracy support via `accuracy_config.eval_method=pass_at_1` etc.
7575
- **YAML mode** (`from-config`): `BenchmarkConfig.from_yaml_file()` loads YAML, resolves env vars, and auto-selects the right subclass via Pydantic discriminated union. Optional `--timeout`/`--mode` overrides via `config.with_updates()`.
76-
- **eval**: Not yet implemented (raises `NotImplementedError`)
76+
- **eval**: Not yet implemented (raises `CLIError` with a tracking issue link)
7777

7878
### Config Construction & Validation
7979

@@ -137,7 +137,11 @@ src/inference_endpoint/
137137
│ └── utils.py # Port range helpers
138138
├── async_utils/
139139
│ ├── loop_manager.py # LoopManager (uvloop + eager_task_factory)
140+
│ ├── runner.py # run_async() — uvloop + eager_task_factory entry point for CLI commands
140141
│ ├── event_publisher.py # Async event pub/sub
142+
│ ├── services/
143+
│ │ ├── event_logger/ # EventLoggerService: writes EventRecords to JSONL/SQLite
144+
│ │ └── metrics_aggregator/ # MetricsAggregatorService: real-time metrics (TTFT, TPOT, ISL, OSL)
141145
│ └── transport/ # ZMQ-based IPC transport layer
142146
│ ├── protocol.py # Transport protocols + TransportConfig base
143147
│ ├── record.py # Transport records
@@ -192,26 +196,20 @@ tests/
192196

193197
## Development Standards
194198

195-
### Code Style
199+
### Code Style and Pre-commit Hooks
196200

197201
- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
198202
- **Type checking**: `mypy` (via pre-commit)
199203
- **Formatting**: `ruff-format` (double quotes, space indent)
200204
- **License headers**: Required on all Python files (enforced by pre-commit hook `scripts/add_license_header.py`)
201205
- **Conventional commits**: `feat:`, `fix:`, `docs:`, `test:`, `chore:`
202206

203-
### Pre-commit Hooks
204-
205-
All of these run automatically on commit:
206-
207-
- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
208-
- `ruff` (lint + autofix) and `ruff-format`
209-
- `mypy` type checking
210-
- `prettier` for YAML/JSON/Markdown
211-
- License header enforcement
207+
All of these hooks run automatically on commit: trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements, `ruff` (lint + autofix), `ruff-format`, `mypy`, `prettier` (YAML/JSON/Markdown), license header enforcement.
212208

213209
**Always run `pre-commit run --all-files` before committing.**
214210

211+
See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details.
212+
215213
### Data Types & Serialization
216214

217215
- **Core types** (`Query`, `QueryResult`, `StreamChunk`): `msgspec.Struct` with `frozen=True`, `array_like=True`, `gc=False`, `omit_defaults=True`
@@ -291,7 +289,7 @@ Update AGENTS.md as part of any PR that includes a **significant refactor**, mea
291289
- **Added or removed CLI commands/subcommands** — update CLI Modes and Common Commands
292290
- **Changed test infrastructure** (new fixtures, changed markers, new test directories) — update Testing section
293291
- **Added or removed key dependencies** — update Key Dependencies table
294-
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update Code Style and Pre-commit Hooks
292+
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)
295293
- **Changed hot-path patterns** (new transport, changed serialization, new performance constraints) — update Performance Guidelines
296294

297295
### How to Update

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@
22

33
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44

5+
Full guidance is maintained in AGENTS.md (shared with all AI coding agents) and is included below:
6+
57
@AGENTS.md

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@ Generally we encourage people to become MLCommons members if they wish to contri
77
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
88

99
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
10+
11+
For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).

README.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ inference-endpoint benchmark offline \
6666

6767
```bash
6868
# Start local echo server
69-
python -m inference_endpoint.testing.echo_server --port 8765 &
69+
python3 -m inference_endpoint.testing.echo_server --port 8765 &
7070

7171
# Test with dummy dataset (included in repo)
7272
inference-endpoint benchmark offline \
@@ -94,33 +94,51 @@ pytest -m "not performance and not run_explicitly"
9494

9595
## 📚 Documentation
9696

97+
- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines
9798
- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
9899
- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server
99100
- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop
101+
- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning
102+
- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning
100103
- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup
101104

105+
### Component Design Specs
106+
107+
Each top-level component under `src/inference_endpoint/` has a corresponding spec:
108+
109+
| Component | Spec |
110+
| ----------------- | ---------------------------------------------------------------- |
111+
| Core types | [docs/core/DESIGN.md](docs/core/DESIGN.md) |
112+
| Load generator | [docs/load_generator/DESIGN.md](docs/load_generator/DESIGN.md) |
113+
| Endpoint client | [docs/endpoint_client/DESIGN.md](docs/endpoint_client/DESIGN.md) |
114+
| Metrics | [docs/metrics/DESIGN.md](docs/metrics/DESIGN.md) |
115+
| Config | [docs/config/DESIGN.md](docs/config/DESIGN.md) |
116+
| Async utils | [docs/async_utils/DESIGN.md](docs/async_utils/DESIGN.md) |
117+
| Dataset manager | [docs/dataset_manager/DESIGN.md](docs/dataset_manager/DESIGN.md) |
118+
| Commands (CLI) | [docs/commands/DESIGN.md](docs/commands/DESIGN.md) |
119+
| OpenAI adapter | [docs/openai/DESIGN.md](docs/openai/DESIGN.md) |
120+
| SGLang adapter | [docs/sglang/DESIGN.md](docs/sglang/DESIGN.md) |
121+
| Evaluation | [docs/evaluation/DESIGN.md](docs/evaluation/DESIGN.md) |
122+
| Testing utilities | [docs/testing/DESIGN.md](docs/testing/DESIGN.md) |
123+
| Profiling | [docs/profiling/DESIGN.md](docs/profiling/DESIGN.md) |
124+
| Plugins | [docs/plugins/DESIGN.md](docs/plugins/DESIGN.md) |
125+
| Utils | [docs/utils/DESIGN.md](docs/utils/DESIGN.md) |
126+
102127
## 🎯 Architecture
103128

104129
The system follows a modular, event-driven architecture:
105130

106131
```
107-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
108-
│ Dataset │ │ Load │ │ Endpoint │
109-
│ Manager │───▶│ Generator │───▶│ Client │
110-
└─────────────────┘ └─────────────────┘ └─────────────────┘
111-
│ │ │
112-
▼ ▼ ▼
113-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
114-
│ Metrics │ │ Configuration │ │ Endpoint │
115-
│ Collector │◄───│ Manager │ │ (External) │
116-
└─────────────────┘ └─────────────────┘ └─────────────────┘
132+
Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint
133+
134+
Metrics Collector
135+
(event logging + reporting)
117136
```
118137

119-
- **Load Generator**: Central orchestrator managing query lifecycle
120-
- **Dataset Manager**: Handles benchmark datasets and preprocessing
121-
- **Endpoint Client**: Abstract interface for endpoint communication
122-
- **Metrics Collector**: Performance measurement and analysis
123-
- **Configuration Manager**: System configuration (TBD)
138+
- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines
139+
- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events
140+
- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC
141+
- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter)
124142

125143
## Accuracy Evaluation
126144

@@ -132,14 +150,13 @@ configuration. Currently, Inference Endpoints provides the following pre-defined
132150
- LiveCodeBench (default: lite, release_v6)
133151

134152
However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the
135-
[LiveCodeBench](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md) documentation
136-
for details and explanations.
153+
[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for
154+
details and explanations.
137155

138156
## 🚧 Pending Features
139157

140158
The following features are planned for future releases:
141159

142-
- [ ] **Performance Tuning** - Advanced performance optimization features
143160
- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support
144161
- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages
145162

@@ -166,7 +183,8 @@ We are grateful to these communities for their contributions to LLM benchmarking
166183

167184
## 📄 License
168185

169-
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
186+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for
187+
details.
170188

171189
## 🔗 Links
172190

docs/CLI_DESIGN.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,9 +172,11 @@ InputValidationError 2 Bad user input, invalid config
172172
SetupError 3 Dataset load failure, connection error
173173
ExecutionError 4 Benchmark failed after setup
174174
CLIError 1 Generic CLI error (base class)
175-
NotImplementedError 1 Unimplemented command (eval)
176175
```
177176

177+
The reserved `eval` command currently raises `CLIError` with a tracking issue link rather than a
178+
dedicated exception type.
179+
178180
## Development Guide
179181

180182
### Adding a CLI flag

0 commit comments

Comments
 (0)