fix: Ivy plugin evaluation fixes (18 issues) by ElNiak · Pull Request #109 · ElNiak/PANTHER

ElNiak · 2026-03-16T11:48:12Z

Summary

Fixes all 18 issues from the Ivy plugin evaluation report to raise the pass rate from 72% toward ≥90%.

Plugin docs (14 files): Replace consolidated tool names with actual MCP names across CLAUDE.md, 11 skills, agents, commands. Add tool mapping table, skills/agents listing, caveats.
Server fixes (9 files): Cross-ref fuzzy resolution, symbol disambiguation, coverage normalization, hover fallback, test_file filtering, output limits, workDoneProgress guard.
Version + hooks: Bump to 0.5.0, Stop session-summary hook, plugin.json fields.
Backward compat: 15 individual MCP tool aliases alongside consolidated dispatch.

Test plan

ivy-lsp tests pass (1977 passed, 0 failed)
Re-run evaluation static tests (B.S4, B.S6)
Re-run evaluation live tests (A.L4, B.L1, D.L8)

Summary by Sourcery

Add strategic evaluation and consolidation plan documentation for the Ivy tooling ecosystem, covering current capabilities, over‑engineering analysis, and an implementation roadmap.

Documentation:

Document the current Ivy MCP and LSP capabilities, strengths, and gaps versus other formal verification and specification tools.
Describe over‑engineering issues in Ivy tools, agents, and skills and propose a consolidated, streamlined tooling surface.
Outline missing capabilities and a phased implementation roadmap with verification checklists for improving Ivy’s verification, traceability, and AI-integration features.

Updates panther_ivy to feat/mcp-tool-fixes-and-requirements which includes: - Fixed 6 issues in ivy-tools MCP server (ISS-001/002/003/005/006/008) - All 19 MCP tools verified working via live calls - Added RFC 9000 requirements manifest (101 requirements) - Auto-detect local ivy-lsp source for development

Update panther_ivy submodule with the new Pattern Library featuring: - 13 reusable .ivy template files for 7 formal patterns - Pattern detection engine with cross-reference validation - 2 new MCP tools (ivy_pattern_analysis, ivy_pattern_scaffold) - /nct-add-pattern command and pattern-library skill - Enhanced coverage gaps, smart suggestions, and LSP diagnostics

Picks up ivy-lsp changes: semantic edge wiring (COVERS, HAS_PARAM, RETURNS_TYPE, INCLUDES), smart suggestions fix, per-action state var filtering, and three new MCP tools (ivy_generate_manifest, ivy_scaffold_check, ivy_quality_gate).

Requirements YAML cleanup, ivy-lsp bug fixes, plugin hook hardening.

Batch 1 (P0): Fix broken wiring - Fix api/ broken relative imports (api/_shared.py) - Wire all 25 MCP tools into Claude Code plugin - Update ivy-tooling-guide skill with complete tool mapping Batch 2 (P1): Quality debt - Modularize mcp_server.py (2225 → 728 lines + tools/ package) - Deduplicate shell script workspace detection (workspace-common.sh) - Fix PantherIvyVersion docstring (class is used, not dead code)

…lignment) Python tester: remove shadowed adapt_environment_paths, consolidate 3 protocol name methods to 1, merge duplicate output patterns, remove dead build_tests(). Plugin: fix README inventory counts, CSO-optimize all 15 skill descriptions, add Red Flags/Integration/Common Mistakes sections to process skills, normalize skill name casing to kebab-case.

Update panther_ivy submodule with MCP tool consolidation (25→15), plugin surface reduction (agents 9→4, skills 14→6, commands 6→5), counterexample parser, per-isolate caching, and coverage diff mode. Add strategic evaluation document comparing Ivy tooling to state of the art across 5 dimensions (code intelligence, verification, traceability, AI-assisted specification, protocol analysis).

ivy-lsp: fix verify cache race, diagnostic layer errors, model init retry, compile path bug, pattern mode validation, logging, quality gate OSError, coverage diff truncation, off-by-one in pattern library, stale tool names in tests. Add counterexample parser, verification cache, and traceability tool tests. panther-ivy-plugin: fix shell injection in hooks, JSON parsing in hooks, brace counting with comment stripping, stale MCP tool names in agents/ skills/commands/READMEs, component counts and version.

Updates submodule pointers for panther-ivy-plugin and ivy-lsp with fixes for all 18 issues from the plugin evaluation report: Plugin docs: tool name alignment, skill/agent listing, plugin.json fields, version bump to 0.5.0, Stop hook for session summary. Server: cross-reference fuzzy matching, symbol disambiguation with protocol scoping, coverage tag normalization, hover SemanticModel fallback, test_file filtering, output size limits, workDoneProgress capability guard, individual tool aliases for backward compatibility.

sourcery-ai · 2026-03-16T11:48:19Z

Reviewer's Guide

Updates Ivy plugin documentation to use actual MCP tool names and add evaluation context, introduces backward-compatible MCP tool aliases and plugin metadata changes, and adjusts the LSP server behavior around symbol resolution, diagnostics, and progress/output limits to address the 18 issues from the Ivy tooling evaluation report.

File-Level Changes

Change	Details	Files
Add a comprehensive Ivy tooling evaluation and roadmap document to the docs tree.	Introduce a long-form markdown spec describing current Ivy MCP/LSP capabilities, gaps, and consolidation plan. Document proposed consolidation of MCP tools, agents, and skills, plus quality gates and hooks behavior changes. Capture comparison against other verification ecosystems and define phased implementation and verification checklist.	`docs/superpowers/specs/2026-03-13-ivy-tooling-evaluation.md`
Update Ivy plugin implementation, configuration, and docs to align with the evaluation findings and improve pass rate.	Adjust plugin/server logic in the panther_ivy service to fix fuzzy cross-reference resolution, hover fallback, symbol disambiguation, coverage normalization, and test_file filtering. Add guards and limits for workDoneProgress and tool output sizes to prevent pathological responses during evaluation scenarios. Introduce version bump and plugin.json field updates, and wire up or disable specific hooks (e.g., session-summary, pre/post tool hooks) to match the new quality-gate strategy. Replace consolidated tool names with actual MCP tool identifiers across CLAUDE.md, skills, agents, and command docs, and add tables mapping tools, skills, and agents for the Ivy plugin. Add backward-compatible MCP tool aliases so legacy consolidated tool names still resolve while new per-tool dispatch is available.	`panther/plugins/services/testers/panther_ivy` `.claude/CLAUDE.md` `docs/*` `plugin.json` `skills/` `agents/` `commands/`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Copilot

Pull request overview

Updates the Ivy plugin submodule and adds a strategic evaluation document describing the Ivy tooling ecosystem, consolidation opportunities, and a roadmap to address known evaluation gaps.

Changes:

Bumps panther_ivy submodule to a newer commit.
Adds a new “Ivy Tooling Ecosystem” strategic evaluation/roadmap document.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
panther/plugins/services/testers/panther_ivy	Updates the git submodule pointer for the Ivy tester integration.
docs/superpowers/specs/2026-03-13-ivy-tooling-evaluation.md	Adds a detailed evaluation + consolidation plan and checklist for Ivy MCP/LSP tooling.

You can also share your feedback on Copilot code review. Take the survey.

+| Structured semantic access | None (text-only context) | Codebase indexing (AST-level) | Lean server API (JSON) | SerAPI (S-expressions) | **21 MCP tools** with typed JSON schemas providing symbol info, include graphs, dependency analysis, traceability matrices, pattern detection |
+| Specification generation | General code generation | General with context | Proof step generation | Proof search | `ivy_pattern_scaffold` generates complete Ivy specification templates for 5 pattern types (serdes, shim, entity, variants, monitors) with documentation |
+| Workflow guidance | None | Tab-based suggestions | None | None | 14 Claude Code skills + 9 agents with methodology knowledge (specification creation workflow, quality gates, RFC analysis) |
+| Quality enforcement | Linting only | Linting + review | Type checking | Type checking | `ivy_quality_gate` (3-tier: minimal/standard/comprehensive) + `ivy_scaffold_check` (14-layer architecture validation) + `ivy_diagnostics` (5-layer graduated analysis) |
+
+**Ivy is ahead in one critical area**: Structured semantic access for AI agents. The 21 MCP tools provide typed, queryable access to the specification model that goes far beyond what any LLM+prover integration offers. LeanDojo and Proverbot9001 give LLMs access to tactic state; Ivy gives LLMs access to the entire specification architecture (symbols, dependencies, requirements, patterns, coverage, quality metrics).


+Plus 5 standalone tools kept as-is:
+- `ivy_extract_requirements` (4.6/5)
+- `ivy_generate_manifest` (3.8/5)
+- `ivy_pattern_analysis` (4.4/5)
+- `ivy_pattern_scaffold` (4.6/5)
+
+**Total**: 12 merged/kept + 5 standalone = 17. Further consolidation of the 5 standalone tools is not recommended as they each serve distinct, high-scoring functions.
+
+*Correction*: The 5 standalone tools are counted within the 12 above. The final tool count is **12** unique tools total (7 merged entries + 5 kept as-is = 12, after cutting `ivy_smart_suggestions` and absorbing `ivy_lint`).
+
+---
+
+## Appendix C: Comparison with Prior Art in Combined Verification+Traceability
+


…iew fixes) Evaluation doc: - TLA+: "Shipping (SANY-based)" not "In development" - ProVerif: "Partial" not "Full" LSP - Tamarin: note web prover provides exploration Submodule (panther_ivy): - Fix classify_endpoint_type vs _extract_test_directory_from_name divergence - Fix docstrings, add edge case tests, clean unused imports - Fix pattern catalog detection markers

LSP: 17→19 features (add implementation, call_hierarchy to Appendix B). Patterns: 6→7 types (add include-chain). Code metrics: analysis_pipeline 874→896 lines, 37→~17 methods; model.py 252→295 lines; requirement_graph 400+→718 lines. Note backward-compatibility aliases in Appendix A.

- Fix _extract_test_directory divergence from classify_endpoint_type - Re-export classify_endpoint_type in api/_shared.py - Remove 4 duplicate entries from rfc9000_requirements.yaml

…tale comments) - Revert autoescape=True to False in Jinja env (Markdown, not HTML) - Switch hardcoded 1/2/3 numbering to bullet points in failure analysis - Add missing Phases column to fallback markdown reporter - Restore delegation-pattern "why" context on emit comments - Fix stale StateManager reference and wrong progress bar comment - Remove redundant inline comment in metrics_observer - Document test_name retention and timing-includes-init behavior - Add file references for backward-compat alias count in eval doc - Fix pre-existing D212 docstring style warnings in experiment_reporter

github-actions · 2026-03-17T13:53:23Z

✅ PR Validation Passed

Your changes look good! The quick validation checks have passed:

✅ Code formatting (Black)
✅ Import sorting (isort)
✅ Linting (flake8)
✅ Basic tests

The full CI pipeline will run when this PR is merged or when targeting the main branches.

ElNiak added 11 commits March 12, 2026 22:45

chore: update panther_ivy submodule (diagnostics tool, LSP wrapper)

1e97a3a

chore: update panther_ivy submodule (PR review fixes)

7f4ceea

Requirements YAML cleanup, ivy-lsp bug fixes, plugin hook hardening.

chore: update panther_ivy submodule (evaluation fixes + test fix)

5ac1b09

Copilot AI review requested due to automatic review settings March 16, 2026 11:48

sourcery-ai Bot reviewed Mar 16, 2026

View reviewed changes

Copilot AI reviewed Mar 16, 2026

View reviewed changes

ElNiak added 6 commits March 16, 2026 14:28

chore: update panther_ivy submodule (tiered parsing cascade migration)

2cbf5d0

chore: update panther_ivy submodule (PR review fixes)

2d94bd6

- Fix _extract_test_directory divergence from classify_endpoint_type - Re-export classify_endpoint_type in api/_shared.py - Remove 4 duplicate entries from rfc9000_requirements.yaml

chore: update panther_ivy submodule (nct-validate simplification)

7f27a2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ivy plugin evaluation fixes (18 issues)#109

fix: Ivy plugin evaluation fixes (18 issues)#109
ElNiak wants to merge 17 commits into
productionfrom
fix/ivy-plugin-evaluation-18-issues

ElNiak commented Mar 16, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Mar 16, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ElNiak commented Mar 16, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ElNiak commented Mar 16, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Mar 16, 2026 •

edited

Loading