Skip to content

[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference#19319

Merged
wrwg merged 1 commit into
mainfrom
wrwg/flow-spec
Apr 8, 2026
Merged

[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference#19319
wrwg merged 1 commit into
mainfrom
wrwg/flow-spec

Conversation

@wrwg
Copy link
Copy Markdown
Contributor

@wrwg wrwg commented Apr 3, 2026

Description

Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and prover error messages.

Plugin architecture:

  • Split workflow templates into tasks + reference material for structured execution
  • Add /move-init skill, move-test agent; restructure agent preambles
  • Add guidance for data invariants, global update invariants, loop invariants
  • Skip #[test] and #[test_only] functions in WP inference
  • Fix test template doc comment ordering

MCP tool & caching fixes:

  • Separate compilation and verification diagnostics (per-source BTreeMap) so verify/infer results never mask compilation status
  • Always build with bytecode — removes conditional rebuild complexity and eliminates a class of mutex deadlocks
  • Invalidate package cache on tool timeout to prevent mutex deadlock from orphaned spawn_blocking tasks
  • Fix verification cache: exclude-scoped results no longer pollute cache
  • Add --tool-timeout (default 120s) for heavy tools
  • Rename move_package_spec_infermove_package_wp
  • Fix output_to_files to merge into existing .spec.move

Spec workflow improvements:

  • Split spec_lang.md into core + proofs for lighter inference prompts
  • Restructure timeout resolution: prioritize data/update invariants, then spec helpers, lemmas, proof blocks, expression rewrites
  • Add rules: preserve pragma opaque (use verify=false on failure), preserve [inferred] markers, never drop aborts_if
  • Add pragma verify_duration_estimate guidance

Prover:

  • Better error message for Move functions called in spec context (was "bug: operation not supported")
  • Replace baby_dex example with baby_swap (constant-product AMM with data and update invariants)

Testing

  • All 66 flow tests + 212 prover tests pass
  • New tests: file_output and file_output_merge for .spec.move generation/merging
  • Full move_pr.sh -t suite: 6093 passed

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com


Note

Medium Risk
Touches core MoveFlow MCP tool execution and caching/timeout behavior, so regressions could impact spec inference/verification reliability and deadlock avoidance. Changes are mostly additive/defensive (timeouts, diagnostics separation, file-output mode) but affect critical developer workflows.

Overview
MoveFlow’s agent/skill prompts are reorganized into explicit task lists plus reference-only templates, with new /move-init routing setup and a new move-test agent/skill to drive coverage-based unit-test generation.

MCP tooling is hardened and renamed: move_package_spec_infer becomes move_package_wp with a new spec_output mode to write/merge .spec.move files, compilation/verification/inference diagnostics are separated so later stages don’t mask compile errors, and all heavy tools (wp, verify, test) gain a global --tool-timeout and cache invalidation on timeout to avoid mutex deadlocks.

Move prover behavior is refined: WP inference skips #[test]/#[test_only] functions, .spec.move file output now merges into existing spec files instead of overwriting, and spec translation emits a clearer error when Move functions are referenced in spec context. Examples/docs are updated (including a new baby_swap prover example) and the crate version bumps to 1.0.4.

Reviewed by Cursor Bugbot for commit 44b01ca. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor Author

wrwg commented Apr 3, 2026

@wrwg wrwg changed the title [flow] Add spec_output mode and streamline tool descriptions [flow] Overhaul plugin architecture, fix caching bugs, improve spec inference Apr 5, 2026
@wrwg wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 45138f1 to 367594e Compare April 5, 2026 18:08
@wrwg wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from eef1a59 to 4f63d0c Compare April 5, 2026 20:56
@wrwg wrwg force-pushed the wrwg/inf-calc branch 2 times, most recently from deabd57 to cc75964 Compare April 6, 2026 18:50
@wrwg wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 2d2dbbb to 5f2c44f Compare April 6, 2026 20:11
@wrwg wrwg force-pushed the wrwg/flow-spec branch 2 times, most recently from 5f2c44f to 7a316cd Compare April 7, 2026 02:34
@wrwg wrwg force-pushed the wrwg/inf-calc branch 2 times, most recently from 8a4a781 to 12b661c Compare April 7, 2026 05:15
@wrwg wrwg force-pushed the wrwg/flow-spec branch from 7a316cd to a24c306 Compare April 7, 2026 05:15
Base automatically changed from wrwg/inf-calc to main April 7, 2026 05:59
@wrwg wrwg force-pushed the wrwg/flow-spec branch from a24c306 to 66b281e Compare April 7, 2026 18:07
@wrwg wrwg requested review from rahxephon89 and vineethk April 7, 2026 18:08
@wrwg wrwg marked this pull request as ready for review April 7, 2026 18:09
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 66b281e8c9ddf9afa74988482d32b88729ede664. Configure here.

Comment thread third_party/move/move-prover/src/inference.rs
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differential Security Review

Scope: 0b5fc71..66b281e · 52 files (+1612 / -984)
Risk: LOW — confined to aptos-move/flow developer tooling and Move Prover inference. No production blockchain paths (consensus, VM, mempool, storage) are touched.
Recommendation: APPROVE WITH NOTES

Severity Count
CRITICAL 0
HIGH 0
MEDIUM 1
LOW 2

[MEDIUM] Silent spec-block drop when existing .spec.move lacks a closing }

In output_to_files (third_party/move/move-prover/src/inference.rs), the insertion point for new spec blocks is:

let module_close_insert_pos = source.rfind('}').and_then(|brace_pos| {
    source[..brace_pos].rfind('
').map(|p| p + 1).or(Some(brace_pos))
});

If the .spec.move file exists in the compiler's source set but its final } is missing (e.g., truncated by a prior tool failure or interrupted write), rfind('}') returns None, module_close_insert_pos is None, and the later branch:

} else if let Some(insert_pos) = module_close_insert_pos {
    // insert full spec block — this is silently skipped
}

drops every new spec-block insertion without emitting any diagnostic. The tool still reports "inference succeeded, wrote spec files for 1 file(s)" because has_any_inferred passed. The user sees a success response but the new specs were never written.

A completely malformed file is unlikely in practice (the compiler must have compiled it), but a partially valid file missing only its final } can happen after a previous interrupted tool call. Adding a warning/diagnostic when this branch is taken silently would make failure visible.


[LOW] Mutex poison after spawn_blocking panic is not handled

Files: package_spec_infer.rs:71 and package_verify.rs:70

Both closures begin with pkg.lock().unwrap(). If the task panics while holding the MutexGuard, the mutex is poisoned. On the next invocation, resolve_package returns the same cached Arc<Mutex<PackageData>> (because invalidate_package is called only on timeout, not on JoinError). The next pkg.lock().unwrap() on the poisoned mutex panics again, making the package permanently unusable until the server restarts.

The outer map_err(|e| rmcp::ErrorData::internal_error(format!("... panicked: {}", e), None)) correctly reports the first panic as an error, but the stale cache entry is not cleared. Calling invalidate_package on JoinError (in addition to Elapsed) would close this gap.


[LOW] vc_timeout × tool_timeout interaction is undocumented

vc_timeout (per-VC solver budget, default 10s, max 60s) and tool_timeout (outer wall-clock budget, default 120s) are independent. A package with more VCs than tool_timeout / vc_timeout will exhaust the outer budget before any VC finishes. The user gets only "tool timeout (120s exceeded)" — no partial results, no count of completed VCs, and no indication of how to resolve the conflict. The skill guidance encourages users to increase vc_timeout for stubborn VCs, which makes this interaction easy to stumble into. Documenting the relationship (or capping vc_timeout relative to tool_timeout) would help.


Positive changes worth noting

  • needs_bytecode set removal: Eliminating the two-phase compilation path (check-only → rebuild with bytecode) removes a class of state bugs where PackageData could be used for prover operations with stale or missing bytecode. Always building with bytecode is simpler and correct.
  • invalidate_package after timeout: The doc comment accurately describes the problem — a timed-out spawn_blocking task still owns the Arc<Mutex<PackageData>> lock. Dropping the cache entry ensures the next caller gets a fresh Arc with an uncontested mutex. This is the right fix for the deadlock.
  • Verification cache correctness with exclusions: Pre-PR, set_verified was called unconditionally, so verify(fun, exclude=X) would cache a result later reused for verify(fun) (without the exclusion), a correctness bug. The if !has_excludes guard is the correct fix.
  • clear_diag() before prover/inference runs: Reusing a cached GlobalEnv without clearing diagnostics would accumulate stale errors across tool calls. The explicit clear before each prover and inference run is correct.
  • is_test_only() filter in inference: Consistently applied across run_spec_inference_inner, output_to_stdout, output_to_files, and output_unified — test-only functions should never have specs inferred and emitted.
  • Operation::MoveFunction in spec_translator.rs: The dedicated match arm now emits a user-readable "Move function cannot be called in specifications" message instead of the previous "bug: operation not supported in the current context", which was confusing for end users.
Open in Web View Automation 

Sent by Cursor Automation: Security Review Bot

Comment thread third_party/move/move-prover/src/inference.rs
…nference

Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and
prover error messages.

Plugin architecture:
- Split workflow templates into tasks + reference material for structured execution
- Add /move-init skill, move-test agent; restructure agent preambles
- Add guidance for data invariants, global update invariants, loop invariants
- Skip #[test] and #[test_only] functions in WP inference
- Fix test template doc comment ordering

MCP tool & caching fixes:
- Separate compilation and verification diagnostics (per-source BTreeMap)
  so verify/infer results never mask compilation status
- Always build with bytecode (remove conditional rebuild complexity)
- Invalidate package cache on tool timeout to prevent mutex deadlock
- Fix verification cache: exclude-scoped results no longer pollute cache
- Add --tool-timeout (default 120s) for heavy tools
- Rename move_package_spec_infer → move_package_wp
- Fix output_to_files to merge into existing .spec.move

Spec workflow improvements:
- Split spec_lang.md into core + proofs for lighter inference prompts
- Restructure timeout resolution: prioritize data/update invariants, then
  spec helpers, lemmas, proof blocks, expression rewrites
- Add rules: preserve pragma opaque (use verify=false on failure),
  preserve [inferred] markers, never drop aborts_if
- Add pragma verify_duration_estimate guidance

Prover:
- Better error for Move functions called in spec context (was "bug: operation
  not supported")
- Replace baby_dex example with baby_swap (constant-product AMM with data
  and update invariants)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wrwg wrwg force-pushed the wrwg/flow-spec branch from 66b281e to 44b01ca Compare April 7, 2026 19:32
Copy link
Copy Markdown
Contributor

@rahxephon89 rahxephon89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently running into some issues when using the spec infer in Claude, stamp to quickly unblock

@wrwg
Copy link
Copy Markdown
Contributor Author

wrwg commented Apr 7, 2026

Currently running into some issues when using the spec infer in Claude, stamp to quickly unblock
Still need stamp ;-)

@wrwg wrwg enabled auto-merge (squash) April 8, 2026 02:28
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

✅ Forge suite compat success on df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8

Compatibility test results for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 (PR)
1. Check liveness of validators at old version: df3874258ca710d1923584a9e41c1863f4b9ca5a
compatibility::simple-validator-upgrade::liveness-check : committed: 13203.88 txn/s, latency: 2644.51 ms, (p50: 2700 ms, p70: 3000, p90: 3300 ms, p99: 3900 ms), latency samples: 431360
2. Upgrading first Validator to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6266.44 txn/s, latency: 5351.18 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6400 ms), latency samples: 215460
3. Upgrading rest of first batch to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6332.97 txn/s, latency: 5350.84 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6100 ms), latency samples: 220000
4. upgrading second batch to new version: 44b01caeff073be666f80218b985897ee6dddca8
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10670.76 txn/s, latency: 3033.50 ms, (p50: 3000 ms, p70: 3400, p90: 3700 ms, p99: 4300 ms), latency samples: 360160
5. check swarm health
Compatibility test for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 passed
Test Ok

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

✅ Forge suite realistic_env_max_load success on 44b01caeff073be666f80218b985897ee6dddca8

two traffics test: inner traffic : committed: 14097.02 txn/s, latency: 1286.60 ms, (p50: 1200 ms, p70: 1300, p90: 1500 ms, p99: 1900 ms), latency samples: 5265260
two traffics test : committed: 100.01 txn/s, latency: 677.53 ms, (p50: 600 ms, p70: 700, p90: 800 ms, p99: 1000 ms), latency samples: 1680
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.490, avg: 0.475", "ConsensusProposalToOrdered: max: 0.128, avg: 0.121", "ConsensusOrderedToCommit: max: 0.211, avg: 0.161", "ConsensusProposalToCommit: max: 0.322, avg: 0.282"]
Max non-epoch-change gap was: 1 rounds at version 23355 (avg 0.00) [limit 4], 1.62s no progress at version 23355 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.30s no progress at version 2712785 (avg 0.30s) [limit 16].
Test Ok

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

✅ Forge suite framework_upgrade success on df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8

Compatibility test results for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 (PR)
Upgrade the nodes to version: 44b01caeff073be666f80218b985897ee6dddca8
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2225.19 txn/s, submitted: 2232.76 txn/s, failed submission: 7.58 txn/s, expired: 7.58 txn/s, latency: 1314.73 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 2700 ms), latency samples: 199681
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2320.13 txn/s, submitted: 2327.00 txn/s, failed submission: 6.86 txn/s, expired: 6.86 txn/s, latency: 1257.55 ms, (p50: 1200 ms, p70: 1400, p90: 1700 ms, p99: 3000 ms), latency samples: 209581
5. check swarm health
Compatibility test for df3874258ca710d1923584a9e41c1863f4b9ca5a ==> 44b01caeff073be666f80218b985897ee6dddca8 passed
Upgrade the remaining nodes to version: 44b01caeff073be666f80218b985897ee6dddca8
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1965.28 txn/s, submitted: 1972.38 txn/s, failed submission: 7.10 txn/s, expired: 7.10 txn/s, latency: 1585.66 ms, (p50: 1200 ms, p70: 1500, p90: 2400 ms, p99: 11200 ms), latency samples: 177081
Test Ok

@wrwg wrwg merged commit cd3fecd into main Apr 8, 2026
102 of 103 checks passed
@wrwg wrwg deleted the wrwg/flow-spec branch April 8, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants