Skip to content

Commit a4b0a71

Browse files
authored
ci: gate docstring quality and coverage in CI (#616) (#689)
* ci: gate docstring quality and coverage in CI (#616) Add a hard-fail docstring quality gate to the docs-publish workflow: - New 'Docstring quality gate' step runs --quality --fail-on-quality --threshold 100; fails if any quality issue is found or coverage drops below 100% (both currently pass in CI) - Existing audit_coverage step (soft-fail, threshold 80) retained for the summary coverage metric Add typeddict_mismatch checks to audit_coverage.py: - typeddict_phantom: Attributes: documents a field not declared in the TypedDict - typeddict_undocumented: declared field absent from Attributes: section - Mirrors the existing param_mismatch logic for functions Pre-commit: enable --fail-on-quality on the manual-stage hook (CI is the hard gate; hook remains stages: [manual] as docs must be pre-built). Update CONTRIBUTING.md and docs/docs/guide/CONTRIBUTING.md with TypedDict docstring requirements and the two new audit check kinds. * style: fix ruff formatting in audit_coverage.py * ci: remove soft-fail mode labels from docs workflow summary * docs: improve doc check reporting with line numbers, GHA annotations, and fix hints audit_coverage.py: - Add file/line fields to every issue dict (repo-relative path + def line) - _print_quality_report now shows [file:line] per issue, per-kind Fix:/Ref: hints linking to CONTRIBUTING.md anchors, and emits ::error file=...,line=... GHA annotations so issues appear inline in PR diffs - Cap GHA annotations at 10 per check kind with "N more in job log" notice - Add _KIND_FIX_HINTS and _gha_file_annotation helpers; _CONTRIB_DOCS_URL constant validate.py: - Convert all check functions from list[str] to list[dict] errors (file/line/message) - Add line-number tracking to validate_source_links, validate_internal_links, validate_anchor_collisions, and validate_doc_imports - Emit per-error GHA annotations with file/line; shared 20-annotation budget across all checks so every category gets representation in PR diff - Fix icon bug: summary rows now use correct pass/fail icon - Group detailed errors by check type with section headers docs-publish.yml: - Add --orphans and --output /tmp/quality_report.json to quality gate step - Upload quality_report.json as docstring-quality-report artifact (30-day retention) pyproject.toml: - cli/**/*.py: suppress only D2/D3/D4xx style rules; enable D1xx (missing docstrings) as a ruff-level complement to the audit_coverage quality gate docs/docs/guide/CONTRIBUTING.md: - Add CI docstring checks reference section with per-kind tables (fix instructions + anchors) for all 11 check kinds across 4 categories - Add callout explaining GHA annotation cap (10 per kind) and where to find the full list (job log + JSON artifact) * fix: repair orphan check for Mintlify v2 docs.json and improve summary links audit_coverage.py (audit_nav_orphans): - Probe docs/docs/docs.json before docs/mint.json so both Mintlify v1 and v2 nav configs are supported - Extend _extract to handle plain string page entries used by docs.json (v2 uses "pages": ["api/..."] strings; v1 used {"page": "api/..."} dicts) - Previously mint.json was never found, nav_refs stayed empty, and every MDX file was reported as an orphan docs-publish.yml (Write job summary): - When the quality gate fails, render a prominent markdown callout with a direct link to the CI docstring checks reference section in CONTRIBUTING.md - Add a per-kind fix reference table with clickable anchor links to each category section (missing/short, args/returns, class Option C, TypedDict) - Per-kind Ref: URLs in the raw log are inside a text block and do not render as links in the step summary; this table surfaces them rendered * ci: per-kind quality breakdown in summary table, drop misleading skipped notice docs-publish.yml: - Parse per-kind counts from _print_quality_report section headers in the quality gate log (e.g. "Missing docstrings (12)") and show them as a comma-separated breakdown in the Docstring Quality table cell instead of just the total — gives developers an immediate view of which categories are failing without expanding the log audit_coverage.py: - Remove the "skipped (pass --quality to enable)" GHA notice emitted by the coverage-only step; there is always a dedicated quality gate step immediately after so the notice was misleading and redundant * docs: add fix hints and doc refs to coverage and MDX validation output audit_coverage.py: - Coverage miss section now shows a structured Fix:/Ref: block with the exact generate-ast.py command and a link to CONTRIBUTING.md#validating-docstrings - Missing symbols listed one per line (symbol indented under module) for scannability instead of comma-joined on one long line - Emit a ::error or ::warning GHA annotation with symbol/module counts when coverage symbols are undocumented validate.py: - Add _CHECK_FIX_HINTS dict mapping each check label to a (fix text, ref URL) pair, covering all 8 check types with specific fix instructions and links into CONTRIBUTING.md (root or guide as appropriate) - _print_check_errors now prints Fix:/Ref: under each section header, matching the pattern established by _print_quality_report * docs: add annotation gap checks and public-API-only doc filter audit_coverage.py: - Add missing_param_type check: fires when Args: section exists but one or more concrete params lack Python type annotations; naturally non-overlapping with no_args (which fires when section is absent) - Add missing_return_type check: fires when Returns: section is documented but the function has no return annotation; naturally non-overlapping with no_returns (annotation exists but section absent) - Add fix hints and CONTRIBUTING.md anchors for both new check kinds - Update kind_labels and iteration order in _print_quality_report generate-ast.py: - Add remove_internal_modules() post-generation filter step - Uses AST-based import analysis: a submodule is internal when the parent __init__.py imports from at least one sibling submodule but not from this one (import-based visibility, not __all__ name-matching) - Conservative: keeps module when parent imports nothing (indeterminate) or __init__.py cannot be parsed - _CONFIRMED_INTERNAL_MODULES hardcoded set for known internals where parent imports nothing (json_util, backend_instrumentation); these should eventually be renamed with _ prefix per Python convention - Package index files (stem == parent dir) are never filtered docs.json: nav regenerated by build; internal modules removed from nav CONTRIBUTING.md: add missing_param_type / missing_return_type to CI docstring checks reference table docs-publish.yml: add both new kinds to summary kind_short and kind_anchors fix-reference table * fix: align coverage scope with doc generator's public-API filter audit_coverage.py was walking all non-_-prefixed source modules via Griffe, including internal modules (json_util, backend_instrumentation, etc.) whose MDX files were removed by remove_internal_modules() in generate-ast.py. This caused coverage to drop because those symbols were no longer 'documented' but were still counted in the denominator. Apply the same import-based public-API filter in discover_public_symbols(): skip submodules that the parent __init__.py does not import from, mirroring the generate-ast.py logic. _CONFIRMED_INTERNAL_MODULES kept in sync. Also drop the per-kind anchor table from the GHA job summary. Anchors in GitHub Actions summaries only navigate to the top of the referenced document, so the table added noise without working links. * feat: add param_type_mismatch and return_type_mismatch docstring checks Adds two new quality check kinds that fire when the type explicitly stated in an Args:/Returns: docstring entry disagrees with the Python type annotation in the function signature: param_type_mismatch — 'param (OldType): ...' vs annotation 'NewType' return_type_mismatch — 'Returns: OldType: ...' vs annotation '-> NewType' Both checks fire only when BOTH sides have an explicit type; one-sided absence is already handled by missing_param_type / missing_return_type. Type comparison uses _types_match() / _normalize_type() which handles: - typing aliases: List→list, Dict→dict, Optional→X|None, Union→A|B - typing. prefix stripping - pipe-union component ordering (str|None == None|str) - incidental whitespace Known conservative suppressions (prefer false negatives over false positives, since there is no per-site suppression mechanism): - Nested generics not fully expandable by regex (e.g. Optional[list[str]]) are silently skipped — both sides must fully normalise to be compared - Union with bracket-containing members - Callable argument ordering * docs: document param_type_mismatch and return_type_mismatch check kinds * fix: correct parent __init__.py path for module files in _is_public_submodule For a module file mellea/pkg/submodule.py, Griffe gives filepath ending in .py (not __init__.py). The parent __init__.py is fp.parent/__init__.py. The previous code used fp.parent.parent which is correct for packages (whose filepath IS the __init__.py) but goes one level too far for plain module files — it was checking the grandparent init instead of the parent. Effect: genslot, react, unit_test_eval and similar non-exported modules in stdlib/components were incorrectly counted as public symbols, inflating the denominator and lowering the reported coverage percentage. * style: fix ruff formatting and EN dash in validate.py * revert: restore full D suppression for cli/ (see #705) * docs: add artifact download link to quality gate failure summary * feat: add --warn-only to validate.py for pre-commit informational mode * docs: fix 36 docstring quality gate failures across 17 files - Fix missing_param_type, missing_return_type, param_type_mismatch, return_type_mismatch, no_args, no_returns, and missing docstring issues - Add TYPE_CHECKING imports for HuggingFace types in util.py with type: ignore[union-attr] for pre-existing None-safety gaps - Add Granite3ChatCompletion import to granite32/33 input.py for correct sanitize() parent signature match - Convert reST-style docstrings to Google style in intrinsics/input.py - Document AST single-quote normalization for Literal types in CONTRIBUTING.md * fix: suppress pre-existing mypy errors exposed by new type annotations Adding TYPE_CHECKING annotations to util.py made mypy check function bodies it previously skipped (untyped params = implicit Any = no body checking). This exposed a pre-existing Tensor-not-callable issue and a dict-variance issue in mobject.py. Suppress with targeted type: ignore comments — these are not new bugs, just newly visible ones.
1 parent f0e778e commit a4b0a71

25 files changed

Lines changed: 1364 additions & 401 deletions

File tree

.github/workflows/docs-publish.yml

Lines changed: 84 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -105,12 +105,31 @@ jobs:
105105
id: audit_coverage
106106
run: |
107107
set -o pipefail
108-
uv run python tooling/docs-autogen/audit_coverage.py --docs-dir docs/docs/api --threshold 80 --quality 2>&1 \
108+
uv run python tooling/docs-autogen/audit_coverage.py --docs-dir docs/docs/api --threshold 80 2>&1 \
109109
| tee /tmp/audit_coverage.log
110110
continue-on-error: ${{ inputs.strict_validation != true }}
111111

112+
- name: Docstring quality gate
113+
id: quality_gate
114+
run: |
115+
set -o pipefail
116+
uv run python tooling/docs-autogen/audit_coverage.py \
117+
--docs-dir docs/docs/api \
118+
--quality --fail-on-quality --threshold 100 \
119+
--orphans \
120+
--output /tmp/quality_report.json 2>&1 \
121+
| tee /tmp/quality_gate.log
122+
112123
# -- Upload artifact for deploy job --------------------------------------
113124

125+
- name: Upload quality report
126+
if: always()
127+
uses: actions/upload-artifact@v7
128+
with:
129+
name: docstring-quality-report
130+
path: /tmp/quality_report.json
131+
retention-days: 30
132+
114133
- name: Upload docs artifact
115134
if: success() || (inputs.strict_validation != true)
116135
uses: actions/upload-artifact@v7
@@ -141,12 +160,11 @@ jobs:
141160
markdownlint_outcome = "${{ steps.markdownlint.outcome }}"
142161
validate_outcome = "${{ steps.validate_mdx.outcome }}"
143162
coverage_outcome = "${{ steps.audit_coverage.outcome }}"
144-
strict = "${{ inputs.strict_validation }}" == "true"
145-
mode = "" if strict else " *(soft-fail)*"
146-
163+
quality_gate_outcome = "${{ steps.quality_gate.outcome }}"
147164
lint_log = read_log("/tmp/markdownlint.log")
148165
validate_log = read_log("/tmp/validate_mdx.log")
149166
coverage_log = read_log("/tmp/audit_coverage.log")
167+
quality_gate_log = read_log("/tmp/quality_gate.log")
150168
151169
# Count markdownlint issues (lines matching file:line:col format)
152170
lint_issues = len([l for l in lint_log.splitlines() if re.match(r'.+:\d+:\d+ ', l)])
@@ -186,45 +204,81 @@ jobs:
186204
187205
mdx_detail = parse_validate_detail(validate_log)
188206
189-
# Docstring quality annotation emitted by audit_coverage.py into the log
190-
# Format: ::notice title=Docstring quality::message
191-
# or ::warning title=Docstring quality::message
192-
quality_match = re.search(r"::(notice|warning|error) title=Docstring quality::(.+)", coverage_log)
193-
if quality_match:
194-
quality_level, quality_msg = quality_match.group(1), quality_match.group(2)
195-
quality_icon = "✅" if quality_level == "notice" else "⚠️"
196-
quality_status = "pass" if quality_level == "notice" else "warning"
197-
quality_detail = re.sub(r"\s*—\s*see job summary.*$", "", quality_msg)
198-
quality_row = f"| Docstring Quality | {quality_icon} {quality_status}{mode} | {quality_detail} |"
207+
# Parse per-kind counts from the quality gate log.
208+
# _print_quality_report emits section headers like:
209+
# " Missing docstrings (12)"
210+
# " Missing Args section (5)"
211+
# Capture label -> count from those lines, then build a compact
212+
# per-kind breakdown for the summary table cell.
213+
kind_short = {
214+
"Missing docstrings": "missing",
215+
"Short docstrings": "short",
216+
"Missing Args section": "no_args",
217+
"Missing Returns section": "no_returns",
218+
"Missing Yields section (generator)": "no_yields",
219+
"Missing Raises section": "no_raises",
220+
"Missing class Args section": "no_class_args",
221+
"Duplicate Args: in class + __init__ (Option C violation)": "dup_init_args",
222+
"Param name mismatches (documented but not in signature)": "param_mismatch",
223+
"TypedDict phantom fields (documented but not declared)": "td_phantom",
224+
"TypedDict undocumented fields (declared but missing from Attributes:)": "td_undoc",
225+
"Missing parameter type annotations (type absent from API docs)": "missing_param_type",
226+
"Missing return type annotations (type absent from API docs)": "missing_return_type",
227+
"Param type mismatch (docstring vs annotation)": "param_type_mismatch",
228+
"Return type mismatch (docstring vs annotation)": "return_type_mismatch",
229+
}
230+
section_re = re.compile(r"^\s{2}(.+?)\s+\((\d+)\)\s*$", re.MULTILINE)
231+
kind_counts = {}
232+
for m in section_re.finditer(quality_gate_log):
233+
label, count = m.group(1), int(m.group(2))
234+
short = kind_short.get(label)
235+
if short:
236+
kind_counts[short] = count
237+
238+
if kind_counts:
239+
parts = [f"{v} {k}" for k, v in kind_counts.items()]
240+
quality_gate_detail = ", ".join(parts)
199241
else:
200-
quality_row = None
242+
# Fall back to the summary annotation message
243+
qm = re.search(r"::(notice|warning|error) title=Docstring quality::(.+)", quality_gate_log)
244+
quality_gate_detail = re.sub(r"\s*—\s*see job summary.*$", "", qm.group(2)) if qm else ""
201245
202-
# Split coverage log at quality section to avoid duplicate output in collapsibles
203-
quality_start = coverage_log.find("🔬 Running docstring quality")
204-
if quality_start != -1:
205-
quality_log = coverage_log[quality_start:]
206-
coverage_display_log = coverage_log[:quality_start].strip()
207-
else:
208-
quality_log = ""
209-
coverage_display_log = coverage_log
246+
CONTRIB_URL = (
247+
"https://github.com/generative-computing/mellea/blob/main"
248+
"/docs/docs/guide/CONTRIBUTING.md"
249+
)
250+
REPO = "${{ github.repository }}"
251+
RUN_ID = "${{ github.run_id }}"
252+
ARTIFACT_URL = f"https://github.com/{REPO}/actions/runs/{RUN_ID}#artifacts"
210253
211254
lines = [
212255
"## Docs Build — Validation Summary\n",
213256
"| Check | Result | Details |",
214257
"|-------|--------|---------|",
215-
f"| Markdownlint | {icon(markdownlint_outcome)} {markdownlint_outcome}{mode} | {lint_detail} |",
216-
f"| MDX Validation | {icon(validate_outcome)} {validate_outcome}{mode} | {mdx_detail} |",
217-
f"| API Coverage | {icon(coverage_outcome)} {coverage_outcome}{mode} | {cov_detail} |",
258+
f"| Markdownlint | {icon(markdownlint_outcome)} {markdownlint_outcome} | {lint_detail} |",
259+
f"| MDX Validation | {icon(validate_outcome)} {validate_outcome} | {mdx_detail} |",
260+
f"| API Coverage | {icon(coverage_outcome)} {coverage_outcome} | {cov_detail} |",
261+
f"| Docstring Quality | {icon(quality_gate_outcome)} {quality_gate_outcome} | {quality_gate_detail} |",
218262
]
219-
if quality_row:
220-
lines.append(quality_row)
221263
lines.append("")
222264
265+
# When the quality gate fails, surface a direct link to the fix reference.
266+
# Per-kind Ref: URLs in the log output are inside a ```text``` block and
267+
# don't render as links there.
268+
if quality_gate_outcome == "failure":
269+
lines += [
270+
"> ❌ **Docstring quality gate failed.** "
271+
f"See the [CI docstring checks reference]({CONTRIB_URL}#ci-docstring-checks-reference) "
272+
"for per-kind fix instructions, or expand **Docstring quality details** below for the full list. \n"
273+
f"> The full machine-readable report is available as the [`docstring-quality-report` artifact]({ARTIFACT_URL}).",
274+
"",
275+
]
276+
223277
for title, log, limit in [
224278
("Markdownlint output", lint_log, 5_000),
225279
("MDX validation output", validate_log, 5_000),
226-
("API coverage output", coverage_display_log, 5_000),
227-
("Docstring quality details", quality_log, 1_000_000),
280+
("API coverage output", coverage_log, 5_000),
281+
("Docstring quality details", quality_gate_log, 1_000_000),
228282
]:
229283
if log:
230284
lines += [

.pre-commit-config.yaml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,17 +49,16 @@ repos:
4949
hooks:
5050
- id: docs-mdx-validate
5151
name: Validate generated MDX docs
52-
entry: bash -c 'test -d docs/docs/api && uv run --no-sync python tooling/docs-autogen/validate.py docs/docs/api --skip-coverage || true'
52+
entry: bash -c 'test -d docs/docs/api && uv run --no-sync python tooling/docs-autogen/validate.py docs/docs/api --skip-coverage --warn-only || true'
5353
language: system
5454
pass_filenames: false
5555
files: (docs/docs/.*\.mdx$|tooling/docs-autogen/)
56-
# TODO(#616): Move to normal commit flow once docstring quality issues reach 0.
57-
# Griffe loads the full package (~10s), so this is manual-only for now to avoid
58-
# slowing down every Python commit. Re-enable (remove stages: [manual]) and add
59-
# --fail-on-quality once quality issues are resolved.
56+
# Docstring quality gate — manual only (CI is the hard gate via docs-publish.yml).
57+
# Run locally with: pre-commit run docs-docstring-quality --hook-stage manual
58+
# Requires generated API docs (run `uv run python tooling/docs-autogen/build.py` first).
6059
- id: docs-docstring-quality
61-
name: Audit docstring quality (informational)
62-
entry: bash -c 'test -d docs/docs/api && uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --docs-dir docs/docs/api || true'
60+
name: Audit docstring quality
61+
entry: uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --fail-on-quality --threshold 0 --docs-dir docs/docs/api
6362
language: system
6463
pass_filenames: false
6564
files: (mellea/.*\.py$|cli/.*\.py$)

CONTRIBUTING.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,25 @@ differs in type or behaviour from the constructor input — for example, when a
174174
argument is wrapped into a `CBlock`, or when a class-level constant is relevant to
175175
callers. Pure-echo entries that repeat `Args:` verbatim should be omitted.
176176

177+
**`TypedDict` classes are a special case.** Their fields *are* the entire public
178+
contract, so when an `Attributes:` section is present it must exactly match the
179+
declared fields. The audit will flag:
180+
181+
- `typeddict_phantom``Attributes:` documents a field that is not declared in the `TypedDict`
182+
- `typeddict_undocumented` — a declared field is absent from the `Attributes:` section
183+
184+
```python
185+
class ConstraintResult(TypedDict):
186+
"""Result of a constraint check.
187+
188+
Attributes:
189+
passed: Whether the constraint was satisfied.
190+
reason: Human-readable explanation.
191+
"""
192+
passed: bool
193+
reason: str
194+
```
195+
177196
#### Validating docstrings
178197

179198
Run the coverage and quality audit to check your changes before committing:
@@ -194,6 +213,8 @@ Key checks the audit enforces:
194213
| `no_args` | Standalone function has params but no `Args:` section |
195214
| `no_returns` | Function has a non-trivial return annotation but no `Returns:` section |
196215
| `param_mismatch` | `Args:` documents names not present in the actual signature |
216+
| `typeddict_phantom` | `TypedDict` `Attributes:` documents a field not declared in the class |
217+
| `typeddict_undocumented` | `TypedDict` has a declared field absent from its `Attributes:` section |
197218

198219
**IDE hover verification** — open any of these existing classes in VS Code and hover
199220
over the class name or a constructor call to confirm the hover card shows `Args:` once

cli/alora/intrinsic_uploader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def upload_intrinsic(
4040
base_model (str): Base model ID or path (e.g.
4141
``"ibm-granite/granite-3.3-2b-instruct"``). Must contain at most
4242
one ``"/"`` separator.
43-
type (Literal["lora", "alora"]): Adapter type, used as the leaf
43+
type (Literal['lora', 'alora']): Adapter type, used as the leaf
4444
directory name in the repository layout.
4545
io_yaml (str): Path to the ``io.yaml`` configuration file for
4646
intrinsic input/output processing.

cli/alora/train.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,14 @@
1616
import typer
1717
from datasets import Dataset
1818
from peft import LoraConfig, get_peft_model
19-
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainerCallback
19+
from transformers import (
20+
AutoModelForCausalLM,
21+
AutoTokenizer,
22+
TrainerCallback,
23+
TrainerControl,
24+
TrainerState,
25+
TrainingArguments,
26+
)
2027
from trl import DataCollatorForCompletionOnlyLM, SFTConfig, SFTTrainer
2128

2229
# Handle MPS with old PyTorch versions on macOS only
@@ -39,7 +46,9 @@
3946
)
4047

4148

42-
def load_dataset_from_json(json_path, tokenizer, invocation_prompt):
49+
def load_dataset_from_json(
50+
json_path: str, tokenizer: AutoTokenizer, invocation_prompt: str
51+
) -> Dataset:
4352
"""Load a JSONL dataset and format it for SFT training.
4453
4554
Reads ``item``/``label`` pairs from a JSONL file and builds a HuggingFace
@@ -73,7 +82,7 @@ def load_dataset_from_json(json_path, tokenizer, invocation_prompt):
7382
return Dataset.from_dict({"input": inputs, "target": targets})
7483

7584

76-
def formatting_prompts_func(example):
85+
def formatting_prompts_func(example: dict) -> list[str]:
7786
"""Concatenate input and target columns for SFT prompt formatting.
7887
7988
Args:
@@ -101,7 +110,13 @@ class SaveBestModelCallback(TrainerCallback):
101110
def __init__(self):
102111
self.best_eval_loss = float("inf")
103112

104-
def on_evaluate(self, args, state, control, **kwargs):
113+
def on_evaluate(
114+
self,
115+
args: TrainingArguments,
116+
state: TrainerState,
117+
control: TrainerControl,
118+
**kwargs,
119+
):
105120
"""Save the adapter weights if the current evaluation loss is a new best.
106121
107122
Called automatically by the HuggingFace Trainer after each evaluation

cli/decompose/decompose.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,19 @@ class DecompVersion(StrEnum):
4949
def reorder_subtasks(
5050
subtasks: list[DecompSubtasksResult],
5151
) -> list[DecompSubtasksResult]:
52+
"""Topologically sort subtasks by their ``depends_on`` relationships.
53+
54+
Args:
55+
subtasks: List of subtask dicts, each with a ``"tag"`` and optional
56+
``"depends_on"`` field.
57+
58+
Returns:
59+
list[DecompSubtasksResult]: The subtasks reordered so that dependencies
60+
come before dependents, with numbering prefixes updated.
61+
62+
Raises:
63+
ValueError: If a circular dependency is detected.
64+
"""
5265
subtask_map = {subtask["tag"].lower(): subtask for subtask in subtasks}
5366

5467
graph = {}
@@ -78,6 +91,19 @@ def reorder_subtasks(
7891
def verify_user_variables(
7992
decomp_data: DecompPipelineResult, input_var: list[str] | None
8093
) -> DecompPipelineResult:
94+
"""Validate that all required input variables and dependencies exist.
95+
96+
Args:
97+
decomp_data: The decomposition pipeline result containing subtasks.
98+
input_var: User-provided input variable names, or ``None`` for none.
99+
100+
Returns:
101+
DecompPipelineResult: The (possibly reordered) decomposition data.
102+
103+
Raises:
104+
ValueError: If a subtask requires an input variable that was not
105+
provided, or depends on a subtask tag that does not exist.
106+
"""
81107
if input_var is None:
82108
input_var = []
83109

cli/eval/runner.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ def __init__(
4949
self.score = score
5050
self.validation_reason = validation_reason
5151

52-
def to_dict(self):
52+
def to_dict(self) -> dict:
5353
"""Serialise the input evaluation result to a plain dictionary.
5454
5555
Returns:
@@ -84,7 +84,7 @@ def __init__(self, test_eval: TestBasedEval, input_results: list[InputEvalResult
8484
self.test_eval = test_eval
8585
self.input_results = input_results
8686

87-
def to_dict(self):
87+
def to_dict(self) -> dict:
8888
"""Serialise the test evaluation result to a plain dictionary.
8989
9090
Returns:
@@ -366,7 +366,7 @@ def execute_test_eval(
366366
return test_result
367367

368368

369-
def parse_judge_output(judge_output: str):
369+
def parse_judge_output(judge_output: str) -> tuple[int | None, str]:
370370
"""Parse score and justification from a judge model's output string.
371371
372372
Args:

0 commit comments

Comments
 (0)