Skip to content

fixes #414#590

Merged
disteph merged 17 commits into
masterfrom
fix-iss-414
May 4, 2026
Merged

fixes #414#590
disteph merged 17 commits into
masterfrom
fix-iss-414

Conversation

@ahmed-irfan
Copy link
Copy Markdown
Member

fixes #414

@ahmed-irfan ahmed-irfan requested a review from disteph July 12, 2025 08:56
@coveralls
Copy link
Copy Markdown

coveralls commented Jul 12, 2025

Coverage Status

coverage: 66.687% (+0.002%) from 66.685% — fix-iss-414 into master

@ahmed-irfan ahmed-irfan marked this pull request as draft July 17, 2025 05:11
Copy link
Copy Markdown
Collaborator

@disteph disteph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the trick of replacing uninterpreted symbols of unit types by the unique value of the type, out of the term-construction function and into the CDCLT internalization process. This is necessary over not doing the trick at all, because of function types that are unit (the CDCLT e-graph doesn't seem to know that you can't have two distinct terms in that type). However, MCSAT no longer benefits from the trick. But MCSAT has a larger problem of not being aware of cardinality constraints on finite function types. It happily says SAT on 5 pairwise distinct functions of type Bool -> Bool. So I made MCSAT refuse to assert formulas that contain equalities on finite functional types.

@disteph disteph requested a review from BrunoDutertre March 2, 2026 16:08
@disteph disteph added this to the Yices 2.8 milestone Mar 2, 2026
@disteph
Copy link
Copy Markdown
Collaborator

disteph commented May 4, 2026

MCSAT Function-Disequality Findings

This note records the experiments run with the PR590 unsupported-theory guard temporarily removed.

The tests were run on fix-iss-414 at commit 2d0d0be9 (Move unsupported MCSAT regressions out of both), with the then-current PR590 finite-function guard in src/mcsat/preprocessor.c temporarily removed and yices_smt2 relinked against that no-guard build.

The same four tests were rerun on origin/master at commit 485acafd (Fix #613 MCSAT curried function model reconstruction (#614)) after rebuilding and relinking yices_smt2. The results are the same: the bug is present on master, not introduced by PR590.

Finite Function Sort: Bool -> Bool

The explicit pairwise-disequality encoding is semantically unsatisfiable: there are only four total functions of type Bool -> Bool, so five pairwise-distinct functions cannot exist.

(set-logic ALL)

(declare-fun f0 (Bool) Bool)
(declare-fun f1 (Bool) Bool)
(declare-fun f2 (Bool) Bool)
(declare-fun f3 (Bool) Bool)
(declare-fun f4 (Bool) Bool)

(assert (not (= f0 f1)))
(assert (not (= f0 f2)))
(assert (not (= f0 f3)))
(assert (not (= f0 f4)))
(assert (not (= f1 f2)))
(assert (not (= f1 f3)))
(assert (not (= f1 f4)))
(assert (not (= f2 f3)))
(assert (not (= f2 f4)))
(assert (not (= f3 f4)))

(check-sat)
(get-model)

With the guard removed, MCSAT returns sat. The produced model violates the asserted disequalities; for example, f0 and f4 are both the constant-false function:

sat
((define-fun f0 ((x!0 Bool)) Bool false)
 (define-fun f1 ((x!0 Bool)) Bool true)
 (define-fun f2 ((x!0 Bool)) Bool (ite (= x!0 true) true false))
 (define-fun f3 ((x!0 Bool)) Bool (ite (= x!0 false) true false))
 (define-fun f4 ((x!0 Bool)) Bool false))

The corresponding distinct spelling does not expose the same search bug:

(assert (distinct f0 f1 f2 f3 f4))

This is not because MCSAT search handles the cardinality constraint soundly. It is because mk_distinct simplifies the term before MCSAT search: if type_card(tau) < n and the cardinality is exact, it returns false_term. For Bool -> Bool, the exact cardinality is 4, so (distinct f0 f1 f2 f3 f4) is simplified to false.

Finite Domain, Infinite Codomain: Bool -> Int

The finite-function guard is not broad enough. A function sort with finite domain and infinite codomain can still have constrained finite image, and then function disequalities require extensional/cardinality reasoning.

This formula is semantically unsatisfiable: every fi maps both Boolean inputs into {0,1}, so there are only four possible constrained function graphs, but five functions are required to be pairwise distinct.

(set-logic ALL)

(declare-fun f0 (Bool) Int)
(declare-fun f1 (Bool) Int)
(declare-fun f2 (Bool) Int)
(declare-fun f3 (Bool) Int)
(declare-fun f4 (Bool) Int)

(assert (not (= f0 f1)))
(assert (not (= f0 f2)))
(assert (not (= f0 f3)))
(assert (not (= f0 f4)))
(assert (not (= f1 f2)))
(assert (not (= f1 f3)))
(assert (not (= f1 f4)))
(assert (not (= f2 f3)))
(assert (not (= f2 f4)))
(assert (not (= f3 f4)))

(assert (or (= (f0 true) 0) (= (f0 true) 1)))
(assert (or (= (f0 false) 0) (= (f0 false) 1)))
(assert (or (= (f1 true) 0) (= (f1 true) 1)))
(assert (or (= (f1 false) 0) (= (f1 false) 1)))
(assert (or (= (f2 true) 0) (= (f2 true) 1)))
(assert (or (= (f2 false) 0) (= (f2 false) 1)))
(assert (or (= (f3 true) 0) (= (f3 true) 1)))
(assert (or (= (f3 false) 0) (= (f3 false) 1)))
(assert (or (= (f4 true) 0) (= (f4 true) 1)))
(assert (or (= (f4 false) 0) (= (f4 false) 1)))

(check-sat)
(get-model)

With the guard removed, MCSAT returns sat. The model uses unreachable else-branch values to make the printed functions look distinct:

sat
((define-fun f0 ((x!0 Bool)) Int (ite (= x!0 true) 0 (ite (= x!0 false) 0 4)))
 (define-fun f1 ((x!0 Bool)) Int (ite (= x!0 true) 0 (ite (= x!0 false) 0 2)))
 (define-fun f2 ((x!0 Bool)) Int (ite (= x!0 true) 0 (ite (= x!0 false) 0 3)))
 (define-fun f3 ((x!0 Bool)) Int 0)
 (define-fun f4 ((x!0 Bool)) Int (ite (= x!0 true) 0 (ite (= x!0 false) 0 1))))

Since x!0 has type Bool, the else values are unreachable. Extensionally, all five functions map both true and false to 0.

The same wrong-sat behavior occurs with the distinct spelling for Bool -> Int. Unlike Bool -> Bool, the function sort Bool -> Int is infinite, so mk_distinct cannot simplify (distinct f0 f1 f2 f3 f4) by exact type cardinality.

Consequence for PR590

The temporary unsupported-theory guard should not be limited to finite function sorts. It should reject equality/disequality atoms whose type contains a function sort that needs disequality monitoring:

  • function sort with finite domain
  • function sort with unit range
  • any enclosing function, tuple, or type-instance sort that contains such a function sort by downward closure

The guard is still a temporary solution. The complete fix is to make MCSAT's UF reasoning enforce these extensional/cardinality constraints during search rather than rejecting the input.

Copy link
Copy Markdown
Collaborator

@disteph disteph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review provided by Codex/GPT5.5 High.
I concur.

Findings
No blocking findings.

I did not find a correctness issue in the branch diff against origin/master at 485acafd. The MCSAT guard now matches the observed unsafe cases: finite-domain function sorts, unit-codomain function sorts, and downward closure through function domains/ranges, tuples, and type instances. The regression harness change for solver-specific .mcsat.gold / .dpllt.gold files is scoped to tests/regress/both and preserves shared .gold behavior by default.

Open Questions
The main residual risk is intentional, not a defect: PR590 remains an over-approximating temporary guard. It rejects some formulas that could be simplified or solved soundly, for example benign equalities over guarded function sorts. That matches the stated goal of refusing inputs that require missing MCSAT function-disequality monitoring.

Review Summary
The unit-type change in src/context/context.c compensates for making unit-typed uninterpreted terms fresh again in src/terms/term_manager.c, preserving CDCL(T) singleton semantics during internalization.

The MCSAT guard in src/mcsat/preprocessor.c correctly applies to both EQ_TERM and DISTINCT_TERM, and the new SMT2 regressions cover the exact master wrong-sat reproducers under both solver modes.

Tests Run
git diff --check origin/master...HEAD

tests/regress/check.sh tests/regress/both build/arm-apple-darwin23.6.0-debug/bin
Result: Pass: 11, Fail: 0

build/arm-apple-darwin23.6.0-debug/bin/issue_414_mcsat
Result: pass

build/arm-apple-darwin23.6.0-debug/bin/issue_414
Result: pass

@disteph disteph removed the request for review from BrunoDutertre May 4, 2026 02:37
@disteph
Copy link
Copy Markdown
Collaborator

disteph commented May 4, 2026

Note: "I made MCSAT refuse to assert formulas that contain equalities on finite functional types" -> I generalized the guard since (i) the search was also incorrect when having equalities or disequalities on function types with finite domains and infinite co-domains (see example Bool -> Int above); equalities/disequalities on D->C implicitly generate equalities/disequalities on C and D (so there a downward closure of the guard I added too).

disteph added 5 commits May 3, 2026 21:29
Address four findings from the PR 590 code review that concern the MCSAT
uninterpreted-function model-reconstruction path.

H2: uf_plugin_get_function_id_from_trail relied on q_get32 succeeding
and called `assert(ok); (void) ok;`. In release builds a q_get32 failure
would leave `id` uninitialized, silently corrupting the function_term,
function_value, and app_term_first hashmap lookups keyed on it. Convert
both uf_plugin_get_function_id_from_trail and its wrapper
uf_plugin_get_function_id to bool returns with an int32_t* out-param so
every caller can fail closed. Update the five call sites to take sound
fallbacks (sort fallback in compare_app_terms_by_function; early
return, continue, or break in the model-builder loops; null_value
propagation out of uf_model_builder_get_function_value).

M1: The `make_fresh_function == null_value` branch in
uf_model_builder_get_function_value previously fell back to
vtbl_mk_function with a default codomain value. The old comment
acknowledged this collapsed function values and that the MCSAT
preprocessor guard was supposed to prevent the case from arising.
Replace the silent fallback with assert(false) in debug builds and a
fail-closed null_value return in release builds (arguments is already
initialized at that point, so delete_ivector is safe). The top-level
uf_plugin_build_model now guards model_map_term on null_value, so an
incomplete model is emitted instead of a wrong-SAT one if the guard
ever has a hole.

H3: Upgrade the cycle-handling comment on the early `find->val =
f_value` write in uf_model_builder_get_function_value. The previous
one-liner only said "avoid recursing forever". The new block documents
why such cycles occur, notes that the cached value at cycle time is
the pre-update base (so the emitted model may not be extensional on
self-referential pairs), and flags that a full fix would need a
visited-set plus a theory-conflict path.

L4: get_function_application_type is called with NULL_TERM for the
div/mod arms but requires a real function term for APP_TERM and
UPDATE_TERM. Add a header comment documenting this invariant and
assert(f != NULL_TERM) in the APP_TERM/UPDATE_TERM arm so the coupling
is no longer silent.

No behavior change on guard-covered inputs (the only path the
preprocessor currently allows to reach these sites). The fail-closed
branches are defensive only.

Refs: #414, PR #590.
PR 590 introduced canonicalization in internalize_to_eterm so that any
term whose type is unit gets remapped to a single representative term.
At the two mapping sites the existing code only mapped `root` if it
was still free. This was sound when reached fresh, but if a recursive
internalization had already mapped `root` to a different egraph
occurrence, the inconsistency would silently propagate.

Add a defensive assert in the !is_free branch at both sites: when
`root != r` and `root` is already mapped, the existing mapping must
agree with the egraph occurrence we are about to return. Cheap insurance
against future changes that route additional kinds of terms through
this path.

No behavior change in correctly canonicalized inputs.

Refs: PR #590 review finding M2.
PR 590 added an identical static `copy_update_maps` helper to both
src/io/concrete_value_printer.c and src/frontend/smt2/smt2_printer.c.
The helper expands an update via vtbl_expand_update and copies the
resulting map ids out of the shared scratch buffer table->hset1 into a
private array, so that recursive printing of value subtrees cannot
clobber the iteration mid-loop.

Promote the helper to vtbl_copy_update_maps in the value-table API
(src/model/concrete_values.h, src/model/concrete_values.c) and replace
both static duplicates with calls into it. The new function lives next
to vtbl_expand_update and carries a header comment that documents the
hset1 reentrancy hazard and the caller's safe_free obligation.

Net: -27 lines in smt2_printer.c, -33 lines in concrete_value_printer.c,
+44 lines in the model/concrete_values pair. No behavior change. All
five call sites still safe_free the returned array.

Refs: PR #590 review finding M3.
PR 590 disabled four MCSAT regressions whose inputs are now rejected by
the new finite-function-equality preprocessor guard. Add a README at
tests/regress/mcsat/README.md that:

  - explains the .smt2.disabled file-suffix convention used in this
    subtree (inputs are preserved rather than deleted so they can be
    re-enabled when the underlying capability is restored),
  - lists every test disabled by commit 48f2e4a with its rejected
    type/shape, the guard branch that rejects it, and the work item
    that should re-enable it (the finite-function extensionality plans
    at the repository root), and
  - notes that older .smt2.disabled files in this subtree predate
    PR 590 and were disabled for unrelated reasons.

Going forward, anyone disabling a test in this directory is expected
to add a row with a one-line reason and a re-enable condition; anyone
re-enabling should remove the corresponding row.

Refs: PR #590 review finding M4, issue #414.
The four review-driven commits earlier in this PR added defensive
fail-closed branches that are unreachable on supported inputs:

  - src/mcsat/uf/uf_plugin.c
      * uf_plugin_get_function_id_from_trail's q_get32 failure path
      * uf_plugin_build_app_model_compare's both-ids-failed fallback
      * uf_model_builder_remember_function_term's early return
      * uf_model_builder_get_function_value's early return on id lookup
      * uf_model_builder_get_function_value's make_fresh_function ==
        null_value branch (rejected by the preprocessor guard)
      * uf_model_builder_prepare's continue on id lookup

  - src/context/context.c
      * two internalize_to_eterm consistency assert()s that fire only
        when root was already mapped during recursive internalization

All of these are never taken in the gcov regression run (either
guarded out by term_needs_function_diseq_guard in preprocessor.c, or
structurally impossible because function ids are allocated as small
non-negative integers by uf_plugin_decide), so they show up as
uncovered lines and regress PR 590's coveralls score.

Wrap each with LCOV_EXCL_LINE (single-line branches) or
LCOV_EXCL_START / LCOV_EXCL_STOP (multi-line blocks). The hardening
itself is preserved; the markers only tell lcov not to count these
deliberately unreachable lines against the coverage metric.

No behavior change.

Refs: PR #590, coveralls build 79245430.
@disteph
Copy link
Copy Markdown
Collaborator

disteph commented May 4, 2026

Code review by Windsurf/Opus4.7 high

Code review: PR 590 (master..fix-iss-414, 13 commits, 38 files, +717/-103)

Scope

Fixes issues #414 and #613 along five axes:

  • Unit-type term creation — drop the mk_uterm collapse at @/Users/sgl/git/yices/yices2/src/terms/term_manager.c:3998-4001.
  • CDCL(T) internalization — re-add the collapse at the egraph layer in internalize_to_eterm (@/Users/sgl/git/yices/yices2/src/context/context.c:2594-2614), with consistency asserts at both mapping sites.
  • MCSAT theory guard — new term_needs_function_diseq_guard rejects EQ_TERM and DISTINCT_TERM whose type transitively contains a finite-domain or unit-codomain function sort (@/Users/sgl/git/yices/yices2/src/mcsat/preprocessor.c:170-219,553,1153-1160).
  • MCSAT UF plugin — curried function model reconstruction plus fail-closed hardening (@/Users/sgl/git/yices/yices2/src/mcsat/uf/uf_plugin.c).
  • Test infrastructuretests/regress/both/ with .mcsat.gold / .dpllt.gold overrides, SMT2 and API regressions, documented .smt2.disabled convention (@/Users/sgl/git/yices/yices2/tests/regress/run_test.sh:194-225, @/Users/sgl/git/yices/yices2/tests/regress/mcsat/README.md).

The vtbl_copy_update_maps extraction into @/Users/sgl/git/yices/yices2/src/model/concrete_values.c preserves the issue #613 hset1 reentrancy fix across all five call sites.

Findings

Medium

M1 — Self-referential cycles may emit non-extensional models. @/Users/sgl/git/yices/yices2/src/mcsat/uf/uf_plugin.c:875-891 caches find->val = f_value before recursing, to break cycles where an application's argument or result shares the head's function_id. The cached value at cycle time is the pre-update base, so the emitted model can fail extensionality on the self-referential pair. The KNOWN LIMITATION comment documents this, but no regression test exercises the path. Follow-up: file an issue linked from the comment. A real fix needs a visited-set plus a theory-conflict path.

M2 — Unbounded recursion in type_needs_function_diseq_guard. @/Users/sgl/git/yices/yices2/src/mcsat/preprocessor.c:170-216 walks FUNCTION_TYPE, TUPLE_TYPE, and INSTANCE_TYPE with no memoization. tests/regress/both/wd_issue301.smt2 runs a 10+-deep nest fine, but on DAG-shaped types with heavy fan-in the repeat work can blow up. Follow-up: either confirm in a comment that type_table_t's hash-consing amortizes repeat visits, or cache results in an int_hmap_t keyed on type_t per preprocessor_apply call.

Low

L1 — DISTINCT guard re-checks every argument. @/Users/sgl/git/yices/yices2/src/mcsat/preprocessor.c:1153-1160 loops over all n args, but SMT-LIB requires distinct to be homogeneous. Checking desc->arg[0] alone suffices and matters on large distincts (LDV traces have hundreds). One-line cleanup.

L2 — Unit-codomain function equality is trivially decidable but rejected. @/Users/sgl/git/yices/yices2/src/mcsat/preprocessor.c:175 rejects f = g : T -> unit even though all such functions are extensionally equal. Sound but unnecessarily conservative. Flag in the guard comment as deliberately coarse.

L3 — Consistency asserts in internalize_to_eterm are debug-only. @/Users/sgl/git/yices/yices2/src/context/context.c:2603-2611,2794-2802 asserts existing root mappings agree with the egraph occurrence being returned. Release builds with NDEBUG compile these out. Consistent with the rest of context.c; upgrade to a runtime check or longjmp only if the project's style dictates.

L4 — H3 cycle path has no regression. The known-limitation branch in @/Users/sgl/git/yices/yices2/src/mcsat/uf/uf_plugin.c:875 is hard but not impossible to provoke. A smoke test with a heavily updated curried function sharing sub-results would lock in the early-cache-write behavior.

Info

I1 — Track degraded MCSAT answers alongside disabled tests. @/Users/sgl/git/yices/yices2/tests/regress/both/wd_issue301.smt2.mcsat.gold is (error "mcsat: unsupported theory") while DPLLT returns sat. Consider extending @/Users/sgl/git/yices/yices2/tests/regress/mcsat/README.md with a sub-table for tests that still pass but with a degraded MCSAT gold.

I2 — API tests are thorough. @/Users/sgl/git/yices/yices2/tests/api/issue_414_mcsat.c systematically probes the guard boundary across nine cases, including the still-accepted Int -> Bool baseline (case 9). @/Users/sgl/git/yices/yices2/tests/api/mcsat_curried_function_model.c locks in the issue #613 model path with YVAL_FUNCTION assertions on the partial application.

Verification

git diff --check                                      # clean
make MODE=debug -j8                                   # clean, no -Wall warnings
tests/regress/check.sh tests/regress/both ...         # 11/11 PASS (mcsat + dpllt)
tests/regress/check.sh tests/regress/mcsat/arrays ... # 23/23 PASS (incl. issue613_*)
make MODE=debug check-api                             # 30/30 PASS (incl. issue_414*, mcsat_curried_function_model)

Pre-existing set-timeout.smt2 / wd/issue401b.smt2 failures are macOS-timeout infrastructure, unrelated.

Note on the disabled tests

I initially flagged the guard's rejection of (Array (BV N) T) equalities as a coverage regression worth tightening. That framing was wrong: the two disabled LDV benchmarks (linux-...locktorture..., 32_1_cilled_...gpio_keys_polled...) each assert direct (= a (store b k v)) atoms over (Array (BV 64) T). Those equalities are exactly the extensionality cases MCSAT cannot soundly decide without the Plan A/B extensionality work. Pre-PR MCSAT did return the expected gold answers on these inputs, but only because the rest of the Boolean skeleton was consistent with those answers regardless of how MCSAT interpreted the array equalities. Any cardinality-threshold relaxation of the guard would be a soundness regression. The disabled state is the correct one; the re-enable condition is extensional-function support, exactly as the disabled-tests README records.

Recommendation

Approve. The four review-driven hardening commits (bb24f567, 8d4154b3, a8994060, 6ac4414a) close all high-severity findings from the prior round. The remaining medium-severity items are follow-up work, not blockers: M1 is a documented limitation with no current regression; M2 is a performance concern only. The low-severity items are style and minor cleanup.

@disteph
Copy link
Copy Markdown
Collaborator

disteph commented May 4, 2026

Code review by Codex/GPT5.5

Findings

No findings. I reviewed the current fix-iss-414 branch against origin/master at merge-base 485acafd9670.

Summary

The PR now looks mergeable from my review pass. The unit-type uninterpreted-term change is covered by CDCL(T) internalization canonicalization and API regressions. The MCSAT guard now covers both equality and distinct, including the finite-domain/infinite-codomain cases and downward closure. The MCSAT UF model hardening avoids the previous release-build UB/collapsing-id paths, and the shared update-map printer helper is a clean extraction.

Residual risk is intentional scope: four MCSAT regressions are disabled behind the temporary finite-function guard, but they are documented in tests/regress/mcsat/README.md.

Tests Run

  • git diff --check origin/master...HEAD
  • make MODE=debug -j4
  • tests/regress/check.sh tests/regress/both build/arm-apple-darwin23.6.0-debug/bin
  • tests/regress/check.sh tests/regress/mcsat/arrays build/arm-apple-darwin23.6.0-debug/bin
  • build/arm-apple-darwin23.6.0-debug/bin/issue_414
  • build/arm-apple-darwin23.6.0-debug/bin/issue_414_mcsat
  • build/arm-apple-darwin23.6.0-debug/bin/mcsat_curried_function_model

@disteph disteph marked this pull request as ready for review May 4, 2026 05:15
@disteph disteph merged commit 87861e4 into master May 4, 2026
34 checks passed
@ahmed-irfan ahmed-irfan deleted the fix-iss-414 branch May 4, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

yices_new_uninterpreted_term does not produce an uninterpreted term

3 participants