[flow] Overhaul plugin architecture, fix caching bugs, improve spec inference#19319
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
45138f1 to
367594e
Compare
eef1a59 to
4f63d0c
Compare
deabd57 to
cc75964
Compare
2d2dbbb to
5f2c44f
Compare
5f2c44f to
7a316cd
Compare
8a4a781 to
12b661c
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 66b281e8c9ddf9afa74988482d32b88729ede664. Configure here.
There was a problem hiding this comment.
Differential Security Review
Scope: 0b5fc71..66b281e · 52 files (+1612 / -984)
Risk: LOW — confined to aptos-move/flow developer tooling and Move Prover inference. No production blockchain paths (consensus, VM, mempool, storage) are touched.
Recommendation: APPROVE WITH NOTES
| Severity | Count |
|---|---|
| CRITICAL | 0 |
| HIGH | 0 |
| MEDIUM | 1 |
| LOW | 2 |
[MEDIUM] Silent spec-block drop when existing .spec.move lacks a closing }
In output_to_files (third_party/move/move-prover/src/inference.rs), the insertion point for new spec blocks is:
let module_close_insert_pos = source.rfind('}').and_then(|brace_pos| {
source[..brace_pos].rfind('
').map(|p| p + 1).or(Some(brace_pos))
});If the .spec.move file exists in the compiler's source set but its final } is missing (e.g., truncated by a prior tool failure or interrupted write), rfind('}') returns None, module_close_insert_pos is None, and the later branch:
} else if let Some(insert_pos) = module_close_insert_pos {
// insert full spec block — this is silently skipped
}drops every new spec-block insertion without emitting any diagnostic. The tool still reports "inference succeeded, wrote spec files for 1 file(s)" because has_any_inferred passed. The user sees a success response but the new specs were never written.
A completely malformed file is unlikely in practice (the compiler must have compiled it), but a partially valid file missing only its final } can happen after a previous interrupted tool call. Adding a warning/diagnostic when this branch is taken silently would make failure visible.
[LOW] Mutex poison after spawn_blocking panic is not handled
Files: package_spec_infer.rs:71 and package_verify.rs:70
Both closures begin with pkg.lock().unwrap(). If the task panics while holding the MutexGuard, the mutex is poisoned. On the next invocation, resolve_package returns the same cached Arc<Mutex<PackageData>> (because invalidate_package is called only on timeout, not on JoinError). The next pkg.lock().unwrap() on the poisoned mutex panics again, making the package permanently unusable until the server restarts.
The outer map_err(|e| rmcp::ErrorData::internal_error(format!("... panicked: {}", e), None)) correctly reports the first panic as an error, but the stale cache entry is not cleared. Calling invalidate_package on JoinError (in addition to Elapsed) would close this gap.
[LOW] vc_timeout × tool_timeout interaction is undocumented
vc_timeout (per-VC solver budget, default 10s, max 60s) and tool_timeout (outer wall-clock budget, default 120s) are independent. A package with more VCs than tool_timeout / vc_timeout will exhaust the outer budget before any VC finishes. The user gets only "tool timeout (120s exceeded)" — no partial results, no count of completed VCs, and no indication of how to resolve the conflict. The skill guidance encourages users to increase vc_timeout for stubborn VCs, which makes this interaction easy to stumble into. Documenting the relationship (or capping vc_timeout relative to tool_timeout) would help.
Positive changes worth noting
needs_bytecodeset removal: Eliminating the two-phase compilation path (check-only → rebuild with bytecode) removes a class of state bugs wherePackageDatacould be used for prover operations with stale or missing bytecode. Always building with bytecode is simpler and correct.invalidate_packageafter timeout: The doc comment accurately describes the problem — a timed-outspawn_blockingtask still owns theArc<Mutex<PackageData>>lock. Dropping the cache entry ensures the next caller gets a freshArcwith an uncontested mutex. This is the right fix for the deadlock.- Verification cache correctness with exclusions: Pre-PR,
set_verifiedwas called unconditionally, soverify(fun, exclude=X)would cache a result later reused forverify(fun)(without the exclusion), a correctness bug. Theif !has_excludesguard is the correct fix. clear_diag()before prover/inference runs: Reusing a cachedGlobalEnvwithout clearing diagnostics would accumulate stale errors across tool calls. The explicit clear before each prover and inference run is correct.is_test_only()filter in inference: Consistently applied acrossrun_spec_inference_inner,output_to_stdout,output_to_files, andoutput_unified— test-only functions should never have specs inferred and emitted.Operation::MoveFunctioninspec_translator.rs: The dedicated match arm now emits a user-readable "Move function cannot be called in specifications" message instead of the previous "bug: operation not supported in the current context", which was confusing for end users.
Sent by Cursor Automation: Security Review Bot
…nference Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and prover error messages. Plugin architecture: - Split workflow templates into tasks + reference material for structured execution - Add /move-init skill, move-test agent; restructure agent preambles - Add guidance for data invariants, global update invariants, loop invariants - Skip #[test] and #[test_only] functions in WP inference - Fix test template doc comment ordering MCP tool & caching fixes: - Separate compilation and verification diagnostics (per-source BTreeMap) so verify/infer results never mask compilation status - Always build with bytecode (remove conditional rebuild complexity) - Invalidate package cache on tool timeout to prevent mutex deadlock - Fix verification cache: exclude-scoped results no longer pollute cache - Add --tool-timeout (default 120s) for heavy tools - Rename move_package_spec_infer → move_package_wp - Fix output_to_files to merge into existing .spec.move Spec workflow improvements: - Split spec_lang.md into core + proofs for lighter inference prompts - Restructure timeout resolution: prioritize data/update invariants, then spec helpers, lemmas, proof blocks, expression rewrites - Add rules: preserve pragma opaque (use verify=false on failure), preserve [inferred] markers, never drop aborts_if - Add pragma verify_duration_estimate guidance Prover: - Better error for Move functions called in spec context (was "bug: operation not supported") - Replace baby_dex example with baby_swap (constant-product AMM with data and update invariants) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rahxephon89
left a comment
There was a problem hiding this comment.
Currently running into some issues when using the spec infer in Claude, stamp to quickly unblock
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|




Description
Major rework of the MoveFlow plugin's agent/skill system, MCP tools, and prover error messages.
Plugin architecture:
/move-initskill,move-testagent; restructure agent preambles#[test]and#[test_only]functions in WP inferenceMCP tool & caching fixes:
BTreeMap) so verify/infer results never mask compilation statusspawn_blockingtasks--tool-timeout(default 120s) for heavy toolsmove_package_spec_infer→move_package_wpoutput_to_filesto merge into existing.spec.moveSpec workflow improvements:
spec_lang.mdinto core + proofs for lighter inference promptspragma opaque(useverify=falseon failure), preserve[inferred]markers, never dropaborts_ifpragma verify_duration_estimateguidanceProver:
"bug: operation not supported")baby_dexexample withbaby_swap(constant-product AMM with data and update invariants)Testing
file_outputandfile_output_mergefor.spec.movegeneration/mergingmove_pr.sh -tsuite: 6093 passed🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
Note
Medium Risk
Touches core MoveFlow MCP tool execution and caching/timeout behavior, so regressions could impact spec inference/verification reliability and deadlock avoidance. Changes are mostly additive/defensive (timeouts, diagnostics separation, file-output mode) but affect critical developer workflows.
Overview
MoveFlow’s agent/skill prompts are reorganized into explicit task lists plus reference-only templates, with new
/move-initrouting setup and a newmove-testagent/skill to drive coverage-based unit-test generation.MCP tooling is hardened and renamed:
move_package_spec_inferbecomesmove_package_wpwith a newspec_outputmode to write/merge.spec.movefiles, compilation/verification/inference diagnostics are separated so later stages don’t mask compile errors, and all heavy tools (wp,verify,test) gain a global--tool-timeoutand cache invalidation on timeout to avoid mutex deadlocks.Move prover behavior is refined: WP inference skips
#[test]/#[test_only]functions,.spec.movefile output now merges into existing spec files instead of overwriting, and spec translation emits a clearer error when Move functions are referenced in spec context. Examples/docs are updated (including a newbaby_swapprover example) and the crate version bumps to1.0.4.Reviewed by Cursor Bugbot for commit 44b01ca. Bugbot is set up for automated code reviews on this repo. Configure here.