Improving error localization in Pulse#4247
Merged
Merged
Conversation
The Pulse prover uses lazy continuations (cont_elab_thunk) where the check_slprop_equiv_ext call inside prove_atom_result is deferred until elaboration time. By then, the with_error_bound scope from the branch checker has already exited, causing errors to be localized to the entire if-expression rather than the offending branch. Fix: capture the current error_range_bound when prove_atom_result is created (while the correct bound is still active), and restore it inside the lazy closure via RU.with_error_bound. Also wrap each branch check in Pulse.Checker.If with RU.with_error_bound using the branch's source range, so that errors within a branch are bounded to that branch. Before: error at (10,2-13,8) pointing to the whole if-expression After: error at (13,4-13,8) pointing to the specific branch statement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
16 test files in pulse/test/error_messages/ covering various error categories: assertion failures, framing, ghost effects, if-branch mismatches, type mismatches, subtyping, unfolding, etc. Each file uses [@@expect_failure [code]] to verify the expected error is produced. These serve as a regression suite for error localization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When assert, fold, unfold, assume, or rewrite statements appear as the
first statement in a function body (after '{'), the Menhir parser was
producing ranges that started at the '{' character rather than at the
keyword.
Root cause: withBindersOpt could derive epsilon, causing Menhir's
$startpos to inherit $endpos of the preceding token (LBRACE). This
propagated through pulseStmtNonempty -> pulseStmtNoSeq -> pulseStmt and
into the statement's range.
Fix: Split each 'bs=withBindersOpt KEYWORD ...' alternative into two:
one starting directly with the keyword (no binders), and one with
explicit 'bs=withBinders KEYWORD ...' (which starts with WITH, a real
token). Remove the now-unused withBindersOpt non-terminal.
Before: assert errors at (10,1-11,16) including the '{' line
After: assert errors at (11,2-11,16) pinpointing the assert
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add comprehensive regression test suite for Pulse error localization. Each test file triggers a specific error category and pins down the exact error range via .fst.output.expected files. Test categories covered (35 tests): - Assertion failures (assert, fold, unfold, rewrite, nested block) - Type mismatches (if condition, let binding, return, spec, with-local) - Framing/resource errors (leftover, duplicate, missing post, mutual excl) - Branch mismatches (if, match) - Effect errors (ghost in non-ghost context) - Prover failures (exists intro, invariant payload, subtyping) - Scope/binding errors (open nonexistent, arity mismatch) - Loop errors (while invariant preservation) - Sequence/bind errors (postcondition at continuation) - Closure/local function errors (pre-condition mismatch at call site) - Multiple errors in single function The Makefile uses diff-based comparison (same approach as FStar/tests/error-messages) with: - --print_expected_failures for clean output - --ext fstar:no_absolute_paths for reproducibility - Warning suppression for non-localization noise - Integrated into pulse/test as a SUBDIRS entry Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The prover's prove_pure and prove_with_pure functions create lazy continuations (via cont_elab_thunk) that defer check_prop_validity calls until elaboration time. By then, the with_error_bound scope has exited, causing errors to be localized to the outer term (e.g., entire function body) rather than the specific statement. Fix: capture the current error_range_bound when the deferred closure is created and restore it when check_prop_validity runs inside the closure. Same pattern as the if-branch fix for prove_atom_result. Example: 'fold (my_inv r (-1)); ()' now correctly shows error at the fold statement (13,2-13,22) instead of spanning both statements (13,2-14,4). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add continuation statements (; ()) after error-causing statements in test files to ensure error localization doesn't bleed into following code. This catches regressions where the error range expands to include the continuation. Also uncomment [@@expect_failure] attributes on FoldError, IllTypedInvariant, and NestedBlock so __verify and __diff both succeed in the test harness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lization Previously, prove_atom_result deferred check_slprop_equiv_ext to a continuation closure. This caused errors from mismatched atoms to be mislocalized when the closure executed after with_error_bound had exited. In particular, 'assert (P); continuation' would show errors at the continuation rather than at the assert statement, because the postcondition prover's deferred closure ran first in continuation order. By running check_slprop_equiv_ext eagerly (at prove time rather than elaboration time), errors are reported with the correct error_range_bound context. This is safe because the check uses the original environment g (not the extended g'), so it doesn't depend on later prover state. The saved_bound/get_error_bound mechanism is retained in prove_pure and prove_with_pure, where the deferred checks need the extended environment g''. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the prover matches an atom (slprop) from context to goal, it needs to verify the equivalence via check_slprop_equiv_ext. Previously this was always deferred in a continuation closure, which caused error mislocalization: if the continuation ran after with_error_bound exited, the error range was wrong. The fix uses a hybrid approach: - If both terms are uvar-free (checked via deep_compress_safe + no_uvars_in_term), run the check eagerly while the correct error_range_bound is active. - If either term has unresolved uvars (which later prover steps may instantiate), defer the check in a continuation but capture/restore the error_range_bound. Also factors out shared helpers (with_saved_bound, check_atom_equiv) to reduce duplication across the eager/deferred paths and the prove_pure/prove_with_pure functions that use the same saved-bound pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the prover matches atoms or pure goals, the equivalence/validity checks are deferred in continuation closures (cont_elab_thunk). These closures run after the original with_error_bound scope has exited, causing errors to be reported with incorrect ranges. Fix: capture the active error_range_bound at match time (via get_error_bound) and restore it inside the deferred closure (via with_error_bound). This is done in prove_pure and prove_with_pure via a shared with_saved_bound helper. The atom equivalence check in prove_atom_result remains deferred (inside the continuation) as before—this is necessary because the prover may match atoms before all unification variables are resolved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Regenerate .output.expected files for both pulse/test/error_messages (21 files) and pulse/test/bug-reports (21 files) to account for: 1. Upstream PR #4258 (gebner_default_smtfail): SMT failure messages now include inline Env/VC details instead of 'Use --query_stats for more details' 2. Parser range improvements from earlier commits in this branch: Bug267.fst and IntroGhost.fst show tighter error ranges due to the withBindersOpt parser fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Leftover from the reverted eager atom equivalence check experiment. The inline logic in prove_atom_result's deferred closure does the same work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- RewriteNopWarning.fst: (12,1-13,21) -> (13,2-13,21) due to parser fix - Test.RewriteBy.fst: (31,1-37,8) -> (32,2-37,8) due to parser fix - FStarC.Parser.Dep: sort valid_namespaces list for deterministic output - OpenScopeError.fst: regenerated with sorted namespace list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The prover's prove_atom_result defers check_slprop_equiv_ext into a continuation closure that executes after the with_error_bound scope has exited, causing errors to inherit the wider enclosing range. Fix: capture the active error_range_bound when the atom match is made, and restore it (via with_saved_bound) when the deferred closure runs. This is the same pattern already used for prove_pure/prove_with_pure. Impact on error localization: - IfBranchMismatch: (11,2-14,8) -> (14,4-14,8) — points to the exact mismatched statement instead of the whole if-else block - MatchBranchMismatch: (11,2-13,15) -> (13,11-13,15) — points to the exact mismatched statement instead of the whole match - TerminalAction: (11,2-12,6) -> (12,2-12,6) — points to the terminal statement instead of the enclosing block Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pulse Error Localization: Comparison Report
Comparing error locations between:
~/.local/bin/fstar.exe(last binary release)stage3/out/bin/fstar.exe(_nik_err_loc)Summary
Improved Error Locations
These tests show tighter error ranges on this branch due to fixes in:
assert/fold/unfold/rewrite(withBindersOpt fix inpulseparser.mly)error_range_boundin deferredprove_pure/prove_with_pureclosureserror_range_boundin deferredprove_atom_resultclosure (atom equivalence check)FailAssertion.fst
FailAssertion.fst(10,1-11,16)FailAssertion.fst(11,2-11,16)✅Improvement: Range narrowed from 2 lines to 1 line(s).
Full baseline output
Full this-branch output
FoldError.fst
FoldError.fst(12,1-14,4)FoldError.fst(13,2-13,22)✅Improvement: Range narrowed from 3 lines to 1 line(s).
Full baseline output
Full this-branch output
IfBranchMismatch.fst
IfBranchMismatch.fst(11,2-14,8)IfBranchMismatch.fst(14,4-14,8)✅Improvement: Range narrowed from 4 lines to 1 line(s).
Full baseline output
Full this-branch output
InvariantPayload.fst
InvariantPayload.fst(12,1-14,4)InvariantPayload.fst(13,2-13,16)✅Improvement: Range narrowed from 3 lines to 1 line(s).
Full baseline output
Full this-branch output
MatchBranchMismatch.fst
MatchBranchMismatch.fst(11,2-13,15)MatchBranchMismatch.fst(13,11-13,15)✅Improvement: Range narrowed from 3 lines to 1 line(s).
Full baseline output
Full this-branch output
TerminalActionPostConditionMismatch.fst
TerminalActionPostConditionMismatch.fst(11,2-12,6)TerminalActionPostConditionMismatch.fst(12,2-12,6)✅Improvement: Range narrowed from 2 lines to 1 line(s).
Full baseline output
Full this-branch output
Same Location, New SMT Detail Format
These tests report the same error location on both versions. The only difference is the
upstream change (PR #4258) that now shows
Env =andVC =inline instead of"Use --query_stats for more details."
AssertExtraContext.fst(11,7-11,43)ClosureError.fst(17,2-17,8)ExistsIntro.fst(12,2-12,4)IllTypedInvariant.fst(13,43-13,44)IllTypedSpec.fst(8,11-8,12)IntroExistsFail.fst(11,2-11,4)LambdaArg.fst(18,2-18,11)MultiError.fst(12,18-12,19)NestedBlock.fst(14,2-14,4)NestedCall.fst(18,2-18,9)OpenScopeError.fst(11,2-11,25)PrePostMismatch.fst(12,2-12,4)ReturnImplicit.fst(12,2-12,4)RewriteFail.fst(11,11-11,33)SequenceError.fst(13,2-13,4)SubtypingFailure.fst(18,2-18,10)WhileInvPreservation.fst(15,4-15,8)WithLocalError.fst(13,21-13,22)Identical Output
These tests produce identical output on both versions (no SMT errors, so no format change).
AtomicMismatch.fst(12,2-12,8)BindTypeMismatch.fst(19,17-19,26)DuplicateResource.fst(11,2-11,4)FramingFailure.fst(11,10-11,13)GhostEffect.fst(21,10-21,18)IfCondType.fst(11,5-11,6)LeftoverResources.fst(11,2-11,4)LetBindingType.fst(11,17-11,19)MissingPost.fst(12,2-12,4)MutualExcl.fst(11,2-11,4)ReturnTypeMismatch.fst(12,2-12,3)UnfoldFailure.fst(13,4-13,6)Remaining Things to Improve
1. Multi-line ranges that could potentially be tighter
These are cases where the reported range is wide. Some may be inherently correct
(e.g., a rewrite that spans multiple lines), while others may benefit from further
narrowing in the future.
Fixes in This Branch
766f80aPulse.Checker.If: wrap branch checking withwith_error_bound branch_range3cedfb6pulseparser.mly: split withBindersOpt alternatives to fix$startposinheritanceb537cbcPulse.Checker.Prover:with_saved_boundinprove_pure/prove_with_pure4f753d2Pulse.Checker.Prover:with_saved_boundinprove_atom_resultad4d24cFStarC.Parser.Dep: sort valid_namespaces for deterministic output