[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference by wrwg · Pull Request #19319 · aptos-labs/aptos-core

wrwg · 2026-04-03T05:25:33Z

Description

Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and prover error messages.

Plugin architecture:

Split workflow templates into tasks + reference material for structured execution
Add /move-init skill, move-test agent; restructure agent preambles
Add guidance for data invariants, global update invariants, loop invariants
Skip #[test] and #[test_only] functions in WP inference
Fix test template doc comment ordering

MCP tool & caching fixes:

Separate compilation and verification diagnostics (per-source BTreeMap) so verify/infer results never mask compilation status
Always build with bytecode — removes conditional rebuild complexity and eliminates a class of mutex deadlocks
Invalidate package cache on tool timeout to prevent mutex deadlock from orphaned spawn_blocking tasks
Fix verification cache: exclude-scoped results no longer pollute cache
Add --tool-timeout (default 120s) for heavy tools
Rename move_package_spec_infer → move_package_wp
Fix output_to_files to merge into existing .spec.move

Spec workflow improvements:

Split spec_lang.md into core + proofs for lighter inference prompts
Restructure timeout resolution: prioritize data/update invariants, then spec helpers, lemmas, proof blocks, expression rewrites
Add rules: preserve pragma opaque (use verify=false on failure), preserve [inferred] markers, never drop aborts_if
Add pragma verify_duration_estimate guidance

Prover:

Better error message for Move functions called in spec context (was "bug: operation not supported")
Replace baby_dex example with baby_swap (constant-product AMM with data and update invariants)

Testing

All 66 flow tests + 212 prover tests pass
New tests: file_output and file_output_merge for .spec.move generation/merging
Full move_pr.sh -t suite: 6093 passed

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Note

Medium Risk
Touches core MoveFlow MCP tool execution and caching/timeout behavior, so regressions could impact spec inference/verification reliability and deadlock avoidance. Changes are mostly additive/defensive (timeouts, diagnostics separation, file-output mode) but affect critical developer workflows.

Overview
MoveFlow’s agent/skill prompts are reorganized into explicit task lists plus reference-only templates, with new /move-init routing setup and a new move-test agent/skill to drive coverage-based unit-test generation.

MCP tooling is hardened and renamed: move_package_spec_infer becomes move_package_wp with a new spec_output mode to write/merge .spec.move files, compilation/verification/inference diagnostics are separated so later stages don’t mask compile errors, and all heavy tools (wp, verify, test) gain a global --tool-timeout and cache invalidation on timeout to avoid mutex deadlocks.

Move prover behavior is refined: WP inference skips #[test]/#[test_only] functions, .spec.move file output now merges into existing spec files instead of overwriting, and spec translation emits a clearer error when Move functions are referenced in spec context. Examples/docs are updated (including a new baby_swap prover example) and the crate version bumps to 1.0.4.

^{Reviewed by Cursor Bugbot for commit 44b01ca. Bugbot is set up for automated code reviews on this repo. Configure here.}

wrwg · 2026-04-03T05:25:47Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 66b281e8c9ddf9afa74988482d32b88729ede664. Configure here.}

cursor

Differential Security Review

Scope: 0b5fc71..66b281e · 52 files (+1612 / -984)
Risk: LOW — confined to aptos-move/flow developer tooling and Move Prover inference. No production blockchain paths (consensus, VM, mempool, storage) are touched.
Recommendation: APPROVE WITH NOTES

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	1
LOW	2

[MEDIUM] Silent spec-block drop when existing `.spec.move` lacks a closing `}`

In output_to_files (third_party/move/move-prover/src/inference.rs), the insertion point for new spec blocks is:

let module_close_insert_pos = source.rfind('}').and_then(|brace_pos| {
    source[..brace_pos].rfind('
').map(|p| p + 1).or(Some(brace_pos))
});

If the .spec.move file exists in the compiler's source set but its final } is missing (e.g., truncated by a prior tool failure or interrupted write), rfind('}') returns None, module_close_insert_pos is None, and the later branch:

} else if let Some(insert_pos) = module_close_insert_pos {
    // insert full spec block — this is silently skipped
}

drops every new spec-block insertion without emitting any diagnostic. The tool still reports "inference succeeded, wrote spec files for 1 file(s)" because has_any_inferred passed. The user sees a success response but the new specs were never written.

A completely malformed file is unlikely in practice (the compiler must have compiled it), but a partially valid file missing only its final } can happen after a previous interrupted tool call. Adding a warning/diagnostic when this branch is taken silently would make failure visible.

[LOW] Mutex poison after `spawn_blocking` panic is not handled

Files: package_spec_infer.rs:71 and package_verify.rs:70

Both closures begin with pkg.lock().unwrap(). If the task panics while holding the MutexGuard, the mutex is poisoned. On the next invocation, resolve_package returns the same cached Arc<Mutex<PackageData>> (because invalidate_package is called only on timeout, not on JoinError). The next pkg.lock().unwrap() on the poisoned mutex panics again, making the package permanently unusable until the server restarts.

The outer map_err(|e| rmcp::ErrorData::internal_error(format!("... panicked: {}", e), None)) correctly reports the first panic as an error, but the stale cache entry is not cleared. Calling invalidate_package on JoinError (in addition to Elapsed) would close this gap.

[LOW] `vc_timeout` × `tool_timeout` interaction is undocumented

vc_timeout (per-VC solver budget, default 10s, max 60s) and tool_timeout (outer wall-clock budget, default 120s) are independent. A package with more VCs than tool_timeout / vc_timeout will exhaust the outer budget before any VC finishes. The user gets only "tool timeout (120s exceeded)" — no partial results, no count of completed VCs, and no indication of how to resolve the conflict. The skill guidance encourages users to increase vc_timeout for stubborn VCs, which makes this interaction easy to stumble into. Documenting the relationship (or capping vc_timeout relative to tool_timeout) would help.

Positive changes worth noting

needs_bytecode set removal: Eliminating the two-phase compilation path (check-only → rebuild with bytecode) removes a class of state bugs where PackageData could be used for prover operations with stale or missing bytecode. Always building with bytecode is simpler and correct.
invalidate_package after timeout: The doc comment accurately describes the problem — a timed-out spawn_blocking task still owns the Arc<Mutex<PackageData>> lock. Dropping the cache entry ensures the next caller gets a fresh Arc with an uncontested mutex. This is the right fix for the deadlock.
Verification cache correctness with exclusions: Pre-PR, set_verified was called unconditionally, so verify(fun, exclude=X) would cache a result later reused for verify(fun) (without the exclusion), a correctness bug. The if !has_excludes guard is the correct fix.
clear_diag() before prover/inference runs: Reusing a cached GlobalEnv without clearing diagnostics would accumulate stale errors across tool calls. The explicit clear before each prover and inference run is correct.
is_test_only() filter in inference: Consistently applied across run_spec_inference_inner, output_to_stdout, output_to_files, and output_unified — test-only functions should never have specs inferred and emitted.
Operation::MoveFunction in spec_translator.rs: The dedicated match arm now emits a user-readable "Move function cannot be called in specifications" message instead of the previous "bug: operation not supported in the current context", which was confusing for end users.

_{Sent by Cursor Automation: Security Review Bot}

…nference Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and prover error messages. Plugin architecture: - Split workflow templates into tasks + reference material for structured execution - Add /move-init skill, move-test agent; restructure agent preambles - Add guidance for data invariants, global update invariants, loop invariants - Skip #[test] and #[test_only] functions in WP inference - Fix test template doc comment ordering MCP tool & caching fixes: - Separate compilation and verification diagnostics (per-source BTreeMap) so verify/infer results never mask compilation status - Always build with bytecode (remove conditional rebuild complexity) - Invalidate package cache on tool timeout to prevent mutex deadlock - Fix verification cache: exclude-scoped results no longer pollute cache - Add --tool-timeout (default 120s) for heavy tools - Rename move_package_spec_infer → move_package_wp - Fix output_to_files to merge into existing .spec.move Spec workflow improvements: - Split spec_lang.md into core + proofs for lighter inference prompts - Restructure timeout resolution: prioritize data/update invariants, then spec helpers, lemmas, proof blocks, expression rewrites - Add rules: preserve pragma opaque (use verify=false on failure), preserve [inferred] markers, never drop aborts_if - Add pragma verify_duration_estimate guidance Prover: - Better error for Move functions called in spec context (was "bug: operation not supported") - Replace baby_dex example with baby_swap (constant-product AMM with data and update invariants) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rahxephon89

Currently running into some issues when using the spec infer in Claude, stamp to quickly unblock

wrwg · 2026-04-07T22:08:15Z

Currently running into some issues when using the spec infer in Claude, stamp to quickly unblock
Still need stamp ;-)

github-actions · 2026-04-08T03:05:28Z

✅ Forge suite `compat` success on `df3874258ca710d1923584a9e41c1863f4b9ca5a` ==> `44b01caeff073be666f80218b985897ee6dddca8`

Compatibility test results for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 (PR)
1. Check liveness of validators at old version: df3874258ca710d1923584a9e41c1863f4b9ca5a
compatibility::simple-validator-upgrade::liveness-check : committed: 13203.88 txn/s, latency: 2644.51 ms, (p50: 2700 ms, p70: 3000, p90: 3300 ms, p99: 3900 ms), latency samples: 431360
2. Upgrading first Validator to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6266.44 txn/s, latency: 5351.18 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6400 ms), latency samples: 215460
3. Upgrading rest of first batch to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6332.97 txn/s, latency: 5350.84 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6100 ms), latency samples: 220000
4. upgrading second batch to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10670.76 txn/s, latency: 3033.50 ms, (p50: 3000 ms, p70: 3400, p90: 3700 ms, p99: 4300 ms), latency samples: 360160
5. check swarm health
Compatibility test for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 passed
Test Ok

github-actions · 2026-04-08T03:08:51Z

✅ Forge suite `realistic_env_max_load` success on `44b01caeff073be666f80218b985897ee6dddca8`

two traffics test: inner traffic : committed: 14097.02 txn/s, latency: 1286.60 ms, (p50: 1200 ms, p70: 1300, p90: 1500 ms, p99: 1900 ms), latency samples: 5265260
two traffics test : committed: 100.01 txn/s, latency: 677.53 ms, (p50: 600 ms, p70: 700, p90: 800 ms, p99: 1000 ms), latency samples: 1680
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.490, avg: 0.475", "ConsensusProposalToOrdered: max: 0.128, avg: 0.121", "ConsensusOrderedToCommit: max: 0.211, avg: 0.161", "ConsensusProposalToCommit: max: 0.322, avg: 0.282"]
Max non-epoch-change gap was: 1 rounds at version 23355 (avg 0.00) [limit 4], 1.62s no progress at version 23355 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.30s no progress at version 2712785 (avg 0.30s) [limit 16].
Test Ok

github-actions · 2026-04-08T03:11:04Z

✅ Forge suite `framework_upgrade` success on `df3874258ca710d1923584a9e41c1863f4b9ca5a` ==> `44b01caeff073be666f80218b985897ee6dddca8`

Compatibility test results for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 (PR)
Upgrade the nodes to version: 44b01caeff073be666f80218b985897ee6dddca8
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2225.19 txn/s, submitted: 2232.76 txn/s, failed submission: 7.58 txn/s, expired: 7.58 txn/s, latency: 1314.73 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 2700 ms), latency samples: 199681
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2320.13 txn/s, submitted: 2327.00 txn/s, failed submission: 6.86 txn/s, expired: 6.86 txn/s, latency: 1257.55 ms, (p50: 1200 ms, p70: 1400, p90: 1700 ms, p99: 3000 ms), latency samples: 209581
5. check swarm health
Compatibility test for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 passed
Upgrade the remaining nodes to version: 44b01caeff073be666f80218b985897ee6dddca8
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1965.28 txn/s, submitted: 1972.38 txn/s, failed submission: 7.10 txn/s, expired: 7.10 txn/s, latency: 1585.66 ms, (p50: 1200 ms, p70: 1500, p90: 2400 ms, p99: 11200 ms), latency samples: 177081
Test Ok

wrwg mentioned this pull request Apr 3, 2026

[prover] Reimplement state labels with mutation builtins and havoc+assume #19276

Merged

wrwg changed the title ~~[flow] Add spec_output mode and streamline tool descriptions~~ [flow] Overhaul plugin architecture, fix caching bugs, improve spec inference Apr 5, 2026

wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 45138f1 to 367594e Compare April 5, 2026 18:08

wrwg force-pushed the wrwg/inf-calc branch from a624db2 to 2b5789b Compare April 5, 2026 18:14

wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from eef1a59 to 4f63d0c Compare April 5, 2026 20:56

wrwg force-pushed the wrwg/inf-calc branch 2 times, most recently from deabd57 to cc75964 Compare April 6, 2026 18:50

wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 2d2dbbb to 5f2c44f Compare April 6, 2026 20:11

wrwg force-pushed the wrwg/inf-calc branch from cc75964 to bc6bc30 Compare April 6, 2026 20:11

wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 5f2c44f to 7a316cd Compare April 7, 2026 02:34

wrwg force-pushed the wrwg/inf-calc branch 2 times, most recently from 8a4a781 to 12b661c Compare April 7, 2026 05:15

wrwg force-pushed the wrwg/flow-spec branch from 7a316cd to a24c306 Compare April 7, 2026 05:15

Base automatically changed from wrwg/inf-calc to main April 7, 2026 05:59

wrwg force-pushed the wrwg/flow-spec branch from a24c306 to 66b281e Compare April 7, 2026 18:07

wrwg requested review from rahxephon89 and vineethk April 7, 2026 18:08

wrwg marked this pull request as ready for review April 7, 2026 18:09

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread third_party/move/move-prover/src/inference.rs

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread third_party/move/move-prover/src/inference.rs

wrwg force-pushed the wrwg/flow-spec branch from 66b281e to 44b01ca Compare April 7, 2026 19:32

rahxephon89 reviewed Apr 7, 2026

View reviewed changes

rahxephon89 approved these changes Apr 7, 2026

View reviewed changes

vineethk approved these changes Apr 8, 2026

View reviewed changes

wrwg enabled auto-merge (squash) April 8, 2026 02:28

This comment has been minimized.

Sign in to view

wrwg merged commit cd3fecd into main Apr 8, 2026
102 of 103 checks passed

wrwg deleted the wrwg/flow-spec branch April 8, 2026 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference#19319

[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference#19319
wrwg merged 1 commit into
mainfrom
wrwg/flow-spec

wrwg commented Apr 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

wrwg commented Apr 3, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

rahxephon89 left a comment

Uh oh!

wrwg commented Apr 7, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wrwg commented Apr 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

wrwg commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Differential Security Review

[MEDIUM] Silent spec-block drop when existing .spec.move lacks a closing }

[LOW] Mutex poison after spawn_blocking panic is not handled

[LOW] vc_timeout × tool_timeout interaction is undocumented

Positive changes worth noting

Uh oh!

Uh oh!

rahxephon89 left a comment

Choose a reason for hiding this comment

Uh oh!

wrwg commented Apr 7, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 8, 2026

✅ Forge suite compat success on df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8

Uh oh!

github-actions Bot commented Apr 8, 2026

✅ Forge suite realistic_env_max_load success on 44b01caeff073be666f80218b985897ee6dddca8

Uh oh!

github-actions Bot commented Apr 8, 2026

✅ Forge suite framework_upgrade success on df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wrwg commented Apr 3, 2026 •

edited by cursor Bot

Loading

wrwg commented Apr 3, 2026 •

edited

Loading

[MEDIUM] Silent spec-block drop when existing `.spec.move` lacks a closing `}`

[LOW] Mutex poison after `spawn_blocking` panic is not handled

[LOW] `vc_timeout` × `tool_timeout` interaction is undocumented

✅ Forge suite `compat` success on `df3874258ca710d1923584a9e41c1863f4b9ca5a` ==> `44b01caeff073be666f80218b985897ee6dddca8`

✅ Forge suite `realistic_env_max_load` success on `44b01caeff073be666f80218b985897ee6dddca8`

✅ Forge suite `framework_upgrade` success on `df3874258ca710d1923584a9e41c1863f4b9ca5a` ==> `44b01caeff073be666f80218b985897ee6dddca8`