Each task is a proof-generation unit.
Public inputs:
- implementation files
- specification files
- one editable proof file
- one theorem name
Public output:
- the full contents of the editable proof file
Evaluation:
- reject obvious placeholders
- write the file into a temp workspace
- run Lean
- check that the theorem exists
Repo split:
Benchmark/Cases/...: hidden solved proofs for maintenanceBenchmark/Generated/...: public unsolved templatescases/*/*/tasks/*.yaml: public task manifests
The manifest is the public API. Important fields:
implementation_filesspecification_fileseditable_filestheorem_nameproof_family
Hidden maintenance fields:
reference_solution_modulereference_solution_declaration