Skip to content

feat: Add Lean 4 formalization with hybrid Lean/Coq resolution#3

Merged
haasonsaas merged 5 commits into
mainfrom
feat/lean-formalization
Apr 11, 2026
Merged

feat: Add Lean 4 formalization with hybrid Lean/Coq resolution#3
haasonsaas merged 5 commits into
mainfrom
feat/lean-formalization

Conversation

@haasonsaas
Copy link
Copy Markdown
Contributor

Summary

  • LeanTranslator: translates claims to Lean 4 theorems (arithmetic, factorial, fibonacci, gcd, inequality, logic) via structured IR dispatch or regex fallback, using omega and native_decide tactics
  • LeanProver: invokes the lean binary with sorry/admit/axiom detection, result caching, timeout handling, and proper ProofStatus assignment (MACHINE_CHECKED, REFUTED, FORMALIZED_UNPROVED, etc.)
  • HybridLeanCoqResolver: configurable Lean-first or Coq-first with optional fallback, exposing prove_specification() for drop-in use with the existing detector

Replaces the reverted #1 with a clean implementation that follows existing patterns (same FormalSpecProofResult pipeline, same test style, same lint compliance).

Test plan

  • 16 new unit tests: translator (8), prover (6), hybrid resolver (4)
  • All 286 tests pass, 5 expected skips
  • CI lint (ruff check --select ...) passes clean
  • Full ruff check passes clean (no E501)
  • CI should go green on all 3 Python versions + coq-smoke

🤖 Generated with Claude Code

Adds three modules following the existing Coq prover patterns:

- LeanTranslator: translates claims to Lean 4 theorems via IR or regex,
  using omega/native_decide tactics
- LeanProver: invokes the lean binary with sorry/admit detection,
  caching, timeout handling, and proper ProofStatus assignment
- HybridLeanCoqResolver: tries one backend first with configurable
  fallback, exposing prove_specification() for drop-in use

16 new tests covering translation, prover status paths, assumption
detection, timeout, and hybrid fallback behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 11, 2026

PR Summary

Medium Risk
Adds a new lean subprocess-based proof backend and a hybrid fallback resolver, which can change proof outcomes and introduces dependency/timeout/availability behavior differences. Main risks are around solver-status correctness, translation gaps, and runtime environments without Lean installed.

Overview
Adds Lean 4 support to the formal verification module: a new LeanTranslator that turns supported claim IR/text patterns into Lean theorems, and a LeanProver that runs the lean binary with caching, timeouts, and rejection of unsound proofs containing sorry/admit/axiom.

Introduces HybridLeanCoqResolver to try Lean-first (or Coq-first) with optional fallback to the other prover, returning the best available ProofResult. Exports the new resolver/prover/translator from formal_verification.__init__ and adds unit tests covering translation, prover statuses, and hybrid fallback behavior.

Reviewed by Cursor Bugbot for commit 9b8792a. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread formal_verification/lean_translator.py Outdated
Comment on lines +247 to +249
lean = (
f"theorem exists_claim : ∃ {variable} : Nat, "
f"{prop_lean} := ⟨1, by omega⟩"

This comment was marked as outdated.

Comment thread formal_verification/lean_prover.py
- LeanProver: use ProofStatus.default_for_solver_result() for failed
  proofs instead of hardcoding REFUTED — a tactic insufficiency is
  INCONCLUSIVE, not a definitive disproof
- LeanTranslator._exists: infer witness from property bounds (x > 5 → 6,
  x >= 10 → 10, x = 3 → 3) instead of hardcoding 1
- Add test for witness inference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread formal_verification/lean_translator.py
_implication now extracts single-letter free variables from the
hypothesis/conclusion and wraps them in ∀ binders, matching the Coq
translator's behavior.  Without this, implications like "if x > 5
then x > 3" generated invalid Lean 4 with undeclared variables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread formal_verification/lean_translator.py
Comment thread formal_verification/hybrid_lean_coq_resolver.py
…atus

- _multiplication: use native_decide instead of omega for nonlinear
  arithmetic, matching factorial/fibonacci/gcd
- HybridLeanCoqResolver: translation failures now return INCONCLUSIVE
  instead of UNAVAILABLE (which means binary-not-found)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +65 to +66

return self.coq_prover.prove_specification(spec)

This comment was marked as outdated.

When Lean fails and Coq fallback also fails (e.g. UNAVAILABLE), return
the Lean result instead of discarding it.  Symmetric logic applied to
both _lean_then_coq and _coq_then_lean — the initial prover's result
is preserved when the fallback doesn't prove either.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +121 to +131
error_message="Proof attempt timed out",
counter_example=None,
proof_output="",
prover_name="lean",
solver_status=ProofStatus.TIMEOUT.value,
)

if self.use_cache and self.cache:
self.cache.put(spec, result)

return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: An unhandled UnicodeDecodeError in _run_lean can cause an UnboundLocalError in prove_specification because the result variable is used before it is assigned.
Severity: MEDIUM

Suggested Fix

Expand the except block in prove_specification to catch other potential exceptions from _run_lean, such as UnicodeDecodeError. When such an exception is caught, assign a sensible default or error state to the result variable before the function attempts to use it, preventing the UnboundLocalError.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: formal_verification/lean_prover.py#L113-L131

Potential issue: In `prove_specification`, the `try...except` block only handles
`subprocess.TimeoutExpired`. If the `_run_lean` function raises a different exception,
such as a `UnicodeDecodeError` from decoding non-UTF-8 output from the Lean binary, the
exception will not be caught. This prevents the `result` variable from being assigned.
The function then proceeds to the `self.cache.put(spec, result)` and `return result`
statements, which will raise an `UnboundLocalError` because `result` was never assigned,
causing a crash.

@haasonsaas haasonsaas merged commit e861fa8 into main Apr 11, 2026
12 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the spend limit has been reached. To enable Bugbot Autofix, have a team admin raise the spend limit in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9b8792a. Configure here.

)

if self.use_cache and self.cache:
self.cache.put(spec, result)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout results incorrectly cached unlike CoqProver

Medium Severity

The cache.put call is placed after the try/except block, which means timeout results are cached (including to disk). In CoqProver, caching happens inside the try block before the except, so timeout results are intentionally not cached. This inconsistency means a transient timeout in LeanProver permanently prevents retries for the same spec, since subsequent calls return the cached TIMEOUT result without ever invoking lean again.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9b8792a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant