Skip to content

Commit b6da0e1

Browse files
committed
feat(quantifiers): Add pure expression inliner infrastructure
Add Expr::substitute_symbol() and inline_as_pure_expr() for inlining function calls as side-effect-free expression trees. This infrastructure enables the quantifier-pure-expressions branch to generate CBMC quantifier bodies without StatementExpression nodes. Changes: - cprover_bindings: Expr::substitute_symbol() — recursive symbol replacement across all ExprValue variants, with 6 unit tests - goto_ctx.rs: inline_as_pure_expr() — inlines function calls by extracting return expressions, resolving intermediate variables, and substituting parameters. Handles StatementExpression flattening for checked arithmetic (drops Assert/Assume runtime checks). - docs/dev/pure-expression-inliner.md — developer documentation including soundness implications Soundness note: The pure inliner drops overflow and division-by-zero checks when flattening StatementExpression nodes from checked arithmetic. This is documented as a known trade-off — CBMC requires pure expressions in quantifier bodies, and runtime checks are side effects. A future improvement could hoist these checks outside the quantifier. The existing handle_quantifiers post-pass is NOT modified — this is purely additive infrastructure. All existing tests pass unchanged. Full regression: 1289 tests passed, 0 failed.
1 parent 105be6b commit b6da0e1

3 files changed

Lines changed: 424 additions & 2 deletions

File tree

cprover_bindings/src/goto_program/expr.rs

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,74 @@ impl Expr {
353353

354354
/// Predicates
355355
impl Expr {
356+
/// Replace all occurrences of `Symbol { identifier: old_id }` with `replacement`.
357+
/// Produces a new expression tree with all substitutions applied.
358+
/// Used for quantifier codegen to inline function bodies as pure expressions.
359+
pub fn substitute_symbol(self, old_id: &InternedString, replacement: &Expr) -> Expr {
360+
let loc = self.location;
361+
let typ = self.typ.clone();
362+
let ann = self.size_of_annotation.clone();
363+
let mk = |value: ExprValue| Expr {
364+
value: Box::new(value),
365+
typ: typ.clone(),
366+
location: loc,
367+
size_of_annotation: ann.clone(),
368+
};
369+
let sub = |e: Expr| e.substitute_symbol(old_id, replacement);
370+
let sub_vec = |v: Vec<Expr>| v.into_iter().map(|e| sub(e)).collect();
371+
372+
match *self.value {
373+
ExprValue::Symbol { identifier } if identifier == *old_id => {
374+
replacement.clone().with_location(loc)
375+
}
376+
ExprValue::AddressOf(e) => mk(AddressOf(sub(e))),
377+
ExprValue::Dereference(e) => mk(Dereference(sub(e))),
378+
ExprValue::Typecast(e) => mk(Typecast(sub(e))),
379+
ExprValue::UnOp { op, e } => mk(UnOp { op, e: sub(e) }),
380+
ExprValue::BinOp { op, lhs, rhs } => mk(BinOp { op, lhs: sub(lhs), rhs: sub(rhs) }),
381+
ExprValue::If { c, t, e } => mk(If { c: sub(c), t: sub(t), e: sub(e) }),
382+
ExprValue::Index { array, index } => mk(Index { array: sub(array), index: sub(index) }),
383+
ExprValue::Member { lhs, field } => mk(Member { lhs: sub(lhs), field }),
384+
ExprValue::FunctionCall { function, arguments } => {
385+
mk(FunctionCall { function: sub(function), arguments: sub_vec(arguments) })
386+
}
387+
ExprValue::Array { elems } => mk(Array { elems: sub_vec(elems) }),
388+
ExprValue::Struct { values } => mk(Struct { values: sub_vec(values) }),
389+
ExprValue::Assign { left, right } => mk(Assign { left: sub(left), right: sub(right) }),
390+
ExprValue::ReadOk { ptr, size } => mk(ReadOk { ptr: sub(ptr), size: sub(size) }),
391+
ExprValue::ArrayOf { elem } => mk(ArrayOf { elem: sub(elem) }),
392+
ExprValue::ByteExtract { e, offset } => mk(ByteExtract { e: sub(e), offset }),
393+
ExprValue::SelfOp { op, e } => mk(SelfOp { op, e: sub(e) }),
394+
ExprValue::Union { value, field } => mk(Union { value: sub(value), field }),
395+
ExprValue::Forall { variable, domain } => {
396+
mk(Forall { variable: sub(variable), domain: sub(domain) })
397+
}
398+
ExprValue::Exists { variable, domain } => {
399+
mk(Exists { variable: sub(variable), domain: sub(domain) })
400+
}
401+
ExprValue::Vector { elems } => mk(Vector { elems: sub_vec(elems) }),
402+
ExprValue::ShuffleVector { vector1, vector2, indexes } => mk(ShuffleVector {
403+
vector1: sub(vector1),
404+
vector2: sub(vector2),
405+
indexes: sub_vec(indexes),
406+
}),
407+
// Leaf nodes and statement expressions — no substitution
408+
ExprValue::Symbol { .. }
409+
| ExprValue::IntConstant(_)
410+
| ExprValue::BoolConstant(_)
411+
| ExprValue::CBoolConstant(_)
412+
| ExprValue::DoubleConstant(_)
413+
| ExprValue::FloatConstant(_)
414+
| ExprValue::Float16Constant(_)
415+
| ExprValue::Float128Constant(_)
416+
| ExprValue::PointerConstant(_)
417+
| ExprValue::StringConstant { .. }
418+
| ExprValue::Nondet
419+
| ExprValue::EmptyUnion
420+
| ExprValue::StatementExpression { .. } => self,
421+
}
422+
}
423+
356424
pub fn is_int_constant(&self) -> bool {
357425
match *self.value {
358426
IntConstant(_) => true,
@@ -1762,3 +1830,98 @@ impl Expr {
17621830
exprs
17631831
}
17641832
}
1833+
1834+
#[cfg(test)]
1835+
mod tests {
1836+
use super::*;
1837+
1838+
fn sym(name: &str) -> Expr {
1839+
Expr::symbol_expression(name, Type::signed_int(32))
1840+
}
1841+
1842+
fn int(val: i64) -> Expr {
1843+
Expr::int_constant(val, Type::signed_int(32))
1844+
}
1845+
1846+
#[test]
1847+
fn substitute_symbol_leaf_match() {
1848+
let old: InternedString = "x".into();
1849+
let replacement = int(42);
1850+
let result = sym("x").substitute_symbol(&old, &replacement);
1851+
assert!(matches!(result.value(), ExprValue::IntConstant(v) if *v == 42.into()));
1852+
}
1853+
1854+
#[test]
1855+
fn substitute_symbol_leaf_no_match() {
1856+
let old: InternedString = "x".into();
1857+
let replacement = int(42);
1858+
let result = sym("y").substitute_symbol(&old, &replacement);
1859+
assert!(
1860+
matches!(result.value(), ExprValue::Symbol { identifier } if identifier.to_string() == "y")
1861+
);
1862+
}
1863+
1864+
#[test]
1865+
fn substitute_symbol_in_binop() {
1866+
let old: InternedString = "x".into();
1867+
let replacement = int(10);
1868+
// x + 1 → 10 + 1
1869+
let expr = sym("x").plus(int(1));
1870+
let result = expr.substitute_symbol(&old, &replacement);
1871+
if let ExprValue::BinOp { lhs, rhs, .. } = result.value() {
1872+
assert!(matches!(lhs.value(), ExprValue::IntConstant(v) if *v == 10.into()));
1873+
assert!(matches!(rhs.value(), ExprValue::IntConstant(v) if *v == 1.into()));
1874+
} else {
1875+
panic!("Expected BinOp");
1876+
}
1877+
}
1878+
1879+
#[test]
1880+
fn substitute_symbol_nested() {
1881+
let old: InternedString = "x".into();
1882+
let replacement = int(5);
1883+
// (x + x) * 2 → (5 + 5) * 2
1884+
let expr = sym("x").plus(sym("x")).mul(int(2));
1885+
let result = expr.substitute_symbol(&old, &replacement);
1886+
if let ExprValue::BinOp { lhs, .. } = result.value() {
1887+
if let ExprValue::BinOp { lhs: ll, rhs: lr, .. } = lhs.value() {
1888+
assert!(matches!(ll.value(), ExprValue::IntConstant(v) if *v == 5.into()));
1889+
assert!(matches!(lr.value(), ExprValue::IntConstant(v) if *v == 5.into()));
1890+
} else {
1891+
panic!("Expected inner BinOp");
1892+
}
1893+
} else {
1894+
panic!("Expected outer BinOp");
1895+
}
1896+
}
1897+
1898+
#[test]
1899+
fn substitute_symbol_in_typecast() {
1900+
let old: InternedString = "x".into();
1901+
let replacement = int(7);
1902+
let expr = sym("x").cast_to(Type::signed_int(64));
1903+
let result = expr.substitute_symbol(&old, &replacement);
1904+
if let ExprValue::Typecast(inner) = result.value() {
1905+
assert!(matches!(inner.value(), ExprValue::IntConstant(v) if *v == 7.into()));
1906+
} else {
1907+
panic!("Expected Typecast");
1908+
}
1909+
}
1910+
1911+
#[test]
1912+
fn substitute_preserves_unrelated_symbols() {
1913+
let old: InternedString = "x".into();
1914+
let replacement = int(1);
1915+
// y + x → y + 1
1916+
let expr = sym("y").plus(sym("x"));
1917+
let result = expr.substitute_symbol(&old, &replacement);
1918+
if let ExprValue::BinOp { lhs, rhs, .. } = result.value() {
1919+
assert!(
1920+
matches!(lhs.value(), ExprValue::Symbol { identifier } if identifier.to_string() == "y")
1921+
);
1922+
assert!(matches!(rhs.value(), ExprValue::IntConstant(v) if *v == 1.into()));
1923+
} else {
1924+
panic!("Expected BinOp");
1925+
}
1926+
}
1927+
}
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Pure Expression Inliner
2+
3+
## Overview
4+
5+
The pure expression inliner (`inline_as_pure_expr`) is a function call inlining
6+
mechanism that produces side-effect-free expression trees. Unlike the original
7+
`inline_function_calls_in_expr` which wraps inlined bodies in CBMC
8+
`StatementExpression` nodes, this produces expressions using only pure
9+
constructs: `BinOp`, `UnOp`, `If` (ternary), `Typecast`, etc.
10+
11+
## Motivation
12+
13+
CBMC's quantifier expressions (`forall`, `exists`) reject side effects in their
14+
bodies. The original inliner produced `StatementExpression` nodes which CBMC
15+
treats as side effects, causing invariant violations. The pure inliner eliminates
16+
this by producing expression trees that CBMC can process directly.
17+
18+
## How It Works
19+
20+
For a function call `f(arg1, arg2)` where `f` is defined as:
21+
```c
22+
ret_type f(param1, param2) {
23+
local1 = expr1(param1);
24+
local2 = expr2(local1, param2);
25+
return local2;
26+
}
27+
```
28+
29+
The pure inliner:
30+
1. Collects all assignments: `{local1 → expr1(param1), local2 → expr2(local1, param2)}`
31+
2. Finds the return symbol: `local2`
32+
3. Resolves intermediates: `local2` → `expr2(local1, param2)` → `expr2(expr1(param1), param2)`
33+
4. Substitutes parameters: `expr2(expr1(arg1), arg2)`
34+
5. Flattens `StatementExpression` nodes (e.g., checked arithmetic → just the operation)
35+
6. Recursively inlines any remaining function calls
36+
37+
## Soundness Implications
38+
39+
**Checked arithmetic in quantifier bodies**: When flattening `StatementExpression`
40+
nodes (e.g., from checked division or remainder), the pure inliner drops the
41+
`Assert` and `Assume` statements that check for overflow and division by zero.
42+
This means:
43+
44+
- **Division by zero** inside a quantifier body will NOT be detected. For example,
45+
`forall!(|i in (0, 10)| arr[i] / x == 0)` where `x` could be zero will not
46+
produce a division-by-zero check.
47+
- **Arithmetic overflow** inside a quantifier body will NOT be detected.
48+
49+
This is a known trade-off: CBMC requires pure expressions in quantifier bodies,
50+
and runtime checks are inherently side effects. Users should ensure that
51+
arithmetic operations in quantifier predicates cannot overflow or divide by zero.
52+
53+
**Future improvement**: The dropped assertions could be hoisted outside the
54+
quantifier as preconditions, preserving soundness while keeping the quantifier
55+
body pure.
56+
57+
## Limitations
58+
59+
- **No control flow**: Functions with `if`/`else` or `match` that produce
60+
multiple assignments to the return variable are not fully supported. The
61+
inliner takes the last assignment, which may not be correct for all paths.
62+
- **No loops**: Functions containing loops cannot be inlined as pure expressions.
63+
- **No recursion**: Recursive functions are detected and cause a panic.
64+
- **Checked arithmetic**: Overflow/division-by-zero checks (`Assert` + `Assume`
65+
statements) are dropped when flattening `StatementExpression` nodes. This
66+
means the pure expression doesn't include these runtime checks.
67+
68+
## Files
69+
70+
- `cprover_bindings/src/goto_program/expr.rs` — `Expr::substitute_symbol()`
71+
- `kani-compiler/src/codegen_cprover_gotoc/context/goto_ctx.rs` — `inline_as_pure_expr()`,
72+
`inline_call_as_pure_expr()`, `collect_assignments_from_stmt()`,
73+
`find_return_symbol_in_stmt()`, `resolve_intermediates_iterative()`

0 commit comments

Comments
 (0)