fix: Ivy plugin evaluation fixes (18 issues)#109
Conversation
Updates panther_ivy to feat/mcp-tool-fixes-and-requirements which includes: - Fixed 6 issues in ivy-tools MCP server (ISS-001/002/003/005/006/008) - All 19 MCP tools verified working via live calls - Added RFC 9000 requirements manifest (101 requirements) - Auto-detect local ivy-lsp source for development
Update panther_ivy submodule with the new Pattern Library featuring: - 13 reusable .ivy template files for 7 formal patterns - Pattern detection engine with cross-reference validation - 2 new MCP tools (ivy_pattern_analysis, ivy_pattern_scaffold) - /nct-add-pattern command and pattern-library skill - Enhanced coverage gaps, smart suggestions, and LSP diagnostics
Picks up ivy-lsp changes: semantic edge wiring (COVERS, HAS_PARAM, RETURNS_TYPE, INCLUDES), smart suggestions fix, per-action state var filtering, and three new MCP tools (ivy_generate_manifest, ivy_scaffold_check, ivy_quality_gate).
Requirements YAML cleanup, ivy-lsp bug fixes, plugin hook hardening.
Batch 1 (P0): Fix broken wiring - Fix api/ broken relative imports (api/_shared.py) - Wire all 25 MCP tools into Claude Code plugin - Update ivy-tooling-guide skill with complete tool mapping Batch 2 (P1): Quality debt - Modularize mcp_server.py (2225 → 728 lines + tools/ package) - Deduplicate shell script workspace detection (workspace-common.sh) - Fix PantherIvyVersion docstring (class is used, not dead code)
…lignment) Python tester: remove shadowed adapt_environment_paths, consolidate 3 protocol name methods to 1, merge duplicate output patterns, remove dead build_tests(). Plugin: fix README inventory counts, CSO-optimize all 15 skill descriptions, add Red Flags/Integration/Common Mistakes sections to process skills, normalize skill name casing to kebab-case.
Update panther_ivy submodule with MCP tool consolidation (25→15), plugin surface reduction (agents 9→4, skills 14→6, commands 6→5), counterexample parser, per-isolate caching, and coverage diff mode. Add strategic evaluation document comparing Ivy tooling to state of the art across 5 dimensions (code intelligence, verification, traceability, AI-assisted specification, protocol analysis).
ivy-lsp: fix verify cache race, diagnostic layer errors, model init retry, compile path bug, pattern mode validation, logging, quality gate OSError, coverage diff truncation, off-by-one in pattern library, stale tool names in tests. Add counterexample parser, verification cache, and traceability tool tests. panther-ivy-plugin: fix shell injection in hooks, JSON parsing in hooks, brace counting with comment stripping, stale MCP tool names in agents/ skills/commands/READMEs, component counts and version.
Updates submodule pointers for panther-ivy-plugin and ivy-lsp with fixes for all 18 issues from the plugin evaluation report: Plugin docs: tool name alignment, skill/agent listing, plugin.json fields, version bump to 0.5.0, Stop hook for session summary. Server: cross-reference fuzzy matching, symbol disambiguation with protocol scoping, coverage tag normalization, hover SemanticModel fallback, test_file filtering, output size limits, workDoneProgress capability guard, individual tool aliases for backward compatibility.
Reviewer's GuideUpdates Ivy plugin documentation to use actual MCP tool names and add evaluation context, introduces backward-compatible MCP tool aliases and plugin metadata changes, and adjusts the LSP server behavior around symbol resolution, diagnostics, and progress/output limits to address the 18 issues from the Ivy tooling evaluation report. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Pull request overview
Updates the Ivy plugin submodule and adds a strategic evaluation document describing the Ivy tooling ecosystem, consolidation opportunities, and a roadmap to address known evaluation gaps.
Changes:
- Bumps
panther_ivysubmodule to a newer commit. - Adds a new “Ivy Tooling Ecosystem” strategic evaluation/roadmap document.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| panther/plugins/services/testers/panther_ivy | Updates the git submodule pointer for the Ivy tester integration. |
| docs/superpowers/specs/2026-03-13-ivy-tooling-evaluation.md | Adds a detailed evaluation + consolidation plan and checklist for Ivy MCP/LSP tooling. |
You can also share your feedback on Copilot code review. Take the survey.
| | Structured semantic access | None (text-only context) | Codebase indexing (AST-level) | Lean server API (JSON) | SerAPI (S-expressions) | **21 MCP tools** with typed JSON schemas providing symbol info, include graphs, dependency analysis, traceability matrices, pattern detection | | ||
| | Specification generation | General code generation | General with context | Proof step generation | Proof search | `ivy_pattern_scaffold` generates complete Ivy specification templates for 5 pattern types (serdes, shim, entity, variants, monitors) with documentation | | ||
| | Workflow guidance | None | Tab-based suggestions | None | None | 14 Claude Code skills + 9 agents with methodology knowledge (specification creation workflow, quality gates, RFC analysis) | | ||
| | Quality enforcement | Linting only | Linting + review | Type checking | Type checking | `ivy_quality_gate` (3-tier: minimal/standard/comprehensive) + `ivy_scaffold_check` (14-layer architecture validation) + `ivy_diagnostics` (5-layer graduated analysis) | | ||
|
|
||
| **Ivy is ahead in one critical area**: Structured semantic access for AI agents. The 21 MCP tools provide typed, queryable access to the specification model that goes far beyond what any LLM+prover integration offers. LeanDojo and Proverbot9001 give LLMs access to tactic state; Ivy gives LLMs access to the entire specification architecture (symbols, dependencies, requirements, patterns, coverage, quality metrics). |
| Plus 5 standalone tools kept as-is: | ||
| - `ivy_extract_requirements` (4.6/5) | ||
| - `ivy_generate_manifest` (3.8/5) | ||
| - `ivy_pattern_analysis` (4.4/5) | ||
| - `ivy_pattern_scaffold` (4.6/5) | ||
|
|
||
| **Total**: 12 merged/kept + 5 standalone = 17. Further consolidation of the 5 standalone tools is not recommended as they each serve distinct, high-scoring functions. | ||
|
|
||
| *Correction*: The 5 standalone tools are counted within the 12 above. The final tool count is **12** unique tools total (7 merged entries + 5 kept as-is = 12, after cutting `ivy_smart_suggestions` and absorbing `ivy_lint`). | ||
|
|
||
| --- | ||
|
|
||
| ## Appendix C: Comparison with Prior Art in Combined Verification+Traceability | ||
|
|
…iew fixes) Evaluation doc: - TLA+: "Shipping (SANY-based)" not "In development" - ProVerif: "Partial" not "Full" LSP - Tamarin: note web prover provides exploration Submodule (panther_ivy): - Fix classify_endpoint_type vs _extract_test_directory_from_name divergence - Fix docstrings, add edge case tests, clean unused imports - Fix pattern catalog detection markers
LSP: 17→19 features (add implementation, call_hierarchy to Appendix B). Patterns: 6→7 types (add include-chain). Code metrics: analysis_pipeline 874→896 lines, 37→~17 methods; model.py 252→295 lines; requirement_graph 400+→718 lines. Note backward-compatibility aliases in Appendix A.
- Fix _extract_test_directory divergence from classify_endpoint_type - Re-export classify_endpoint_type in api/_shared.py - Remove 4 duplicate entries from rfc9000_requirements.yaml
…tale comments) - Revert autoescape=True to False in Jinja env (Markdown, not HTML) - Switch hardcoded 1/2/3 numbering to bullet points in failure analysis - Add missing Phases column to fallback markdown reporter - Restore delegation-pattern "why" context on emit comments - Fix stale StateManager reference and wrong progress bar comment - Remove redundant inline comment in metrics_observer - Document test_name retention and timing-includes-init behavior - Add file references for backward-compat alias count in eval doc - Fix pre-existing D212 docstring style warnings in experiment_reporter
|
✅ PR Validation Passed Your changes look good! The quick validation checks have passed:
The full CI pipeline will run when this PR is merged or when targeting the main branches. |
Summary
Fixes all 18 issues from the Ivy plugin evaluation report to raise the pass rate from 72% toward ≥90%.
Test plan
Summary by Sourcery
Add strategic evaluation and consolidation plan documentation for the Ivy tooling ecosystem, covering current capabilities, over‑engineering analysis, and an implementation roadmap.
Documentation: