diff --git a/.changeset/add-score-command.md b/.changeset/add-score-command.md new file mode 100644 index 0000000000..719b1cc008 --- /dev/null +++ b/.changeset/add-score-command.md @@ -0,0 +1,6 @@ +--- +'@redocly/cli': minor +--- + +Added new `score` command that analyzes OpenAPI 3.x descriptions and produces an AI Agent Readiness score (0-100). +Reports normalized subscores, raw per-operation metrics, and top hotspot operations with human-readable explanations. Supports `--format=stylish` (default) and `--format=json` output. diff --git a/docs/@v2/commands/index.md b/docs/@v2/commands/index.md index 8dafe429df..add90a21f3 100644 --- a/docs/@v2/commands/index.md +++ b/docs/@v2/commands/index.md @@ -13,10 +13,11 @@ Documentation commands: API management commands: -- [`stats`](stats.md) Gather statistics for a document. - [`bundle`](bundle.md) Bundle API description. -- [`split`](split.md) Split API description into a multi-file structure. - [`join`](join.md) Join API descriptions [experimental feature]. +- [`score`](score.md) Score an API for integration simplicity and AI agent readiness. +- [`split`](split.md) Split API description into a multi-file structure. +- [`stats`](stats.md) Gather statistics for a document. Linting commands: diff --git a/docs/@v2/commands/score.md b/docs/@v2/commands/score.md new file mode 100644 index 0000000000..3de86cc191 --- /dev/null +++ b/docs/@v2/commands/score.md @@ -0,0 +1,133 @@ +# `score` + +## Introduction + +The `score` command analyzes an OpenAPI description and produces a composite **Agent Readiness** score (0–100) that measures how easy the API is to integrate and how usable it is by AI agents and LLM-based tooling. Higher is better. + +In addition to the top-level score, the command reports normalized subscores, raw metrics for every operation, and a list of **hotspot operations** — the endpoints most likely to cause integration friction — along with human-readable explanations. + +{% admonition type="warning" name="Important" %} +The `score` command is considered an experimental feature. This means it's still a work in progress and may go through major changes. + +The `score` command supports OpenAPI 3.x descriptions only. +{% /admonition %} + +### Metrics + +The following raw metrics are collected per operation and aggregated across the document: + +| Metric | Description | +| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Parameter count | Total parameters (path, query, header, cookie) per operation. | +| Required parameter count | How many of those parameters are required. | +| Request body presence | Whether the operation defines a request body. | +| Top-level writable field count | Number of non-`readOnly` top-level properties in request schemas. | +| Max request/response schema depth | Deepest nesting level in request and response schemas. | +| Polymorphism count | Number of `oneOf`, `anyOf`, and `allOf` usages. `anyOf` is penalized more heavily because it allows ambiguous combinations of schemas, making it harder for consumers and AI agents to determine the correct shape. | +| Property count | Total schema properties across request and response. | +| Description coverage | Fraction of operations, parameters, and schema properties that have descriptions. | +| Ambiguous identifier count | Parameters with generic names (e.g. `id`, `name`, `type`) and no description. | +| Constraint coverage | Count of constraining keywords (`enum`, `format`, `pattern`, `minimum`, `maximum`, `minLength`, `maxLength`, `discriminator`, etc.). | +| Request/response example coverage | Whether request and response media types include `example` or `examples`. | +| Structured error response coverage | How many 4xx/5xx responses include a content schema or meaningful description. | +| Security scheme coverage | Whether operations reference documented security schemes with descriptions. | +| Cross-operation dependency depth | Inferred from shared `$ref` usage across operations. Operations that share many schemas form a dependency graph; deeper graphs indicate tightly coupled multi-step interactions. | + +### Subscores + +The following subscores are normalized to 0–1 and combined into the composite Agent Readiness score: + +`parameterSimplicity`, `schemaSimplicity`, `documentationQuality`, `constraintClarity`, `exampleCoverage`, `errorClarity`, `dependencyClarity`, `identifierClarity`, `polymorphismClarity`, `discoverability`. + +The `discoverability` subscore reflects the total number of operations in the API. Larger APIs (approaching 1,000+ operations) receive a lower discoverability score because finding the right endpoint becomes harder for both humans and AI agents. + +### Hotspots + +The command identifies the operations with the lowest scores and provides reasons such as: + +- "High parameter count (N)" +- "Deep schema nesting (depth M)" +- "Polymorphism (anyOf) without discriminator" +- "Missing request and response examples" +- "No structured error responses (4xx/5xx)" +- "Missing operation description" + +## Usage + +```bash +redocly score +redocly score [--format=