Skip to content

fix: Ivy plugin evaluation fixes (18 issues)#109

Open
ElNiak wants to merge 17 commits into
productionfrom
fix/ivy-plugin-evaluation-18-issues
Open

fix: Ivy plugin evaluation fixes (18 issues)#109
ElNiak wants to merge 17 commits into
productionfrom
fix/ivy-plugin-evaluation-18-issues

Conversation

@ElNiak
Copy link
Copy Markdown
Owner

@ElNiak ElNiak commented Mar 16, 2026

Summary

Fixes all 18 issues from the Ivy plugin evaluation report to raise the pass rate from 72% toward ≥90%.

  • Plugin docs (14 files): Replace consolidated tool names with actual MCP names across CLAUDE.md, 11 skills, agents, commands. Add tool mapping table, skills/agents listing, caveats.
  • Server fixes (9 files): Cross-ref fuzzy resolution, symbol disambiguation, coverage normalization, hover fallback, test_file filtering, output limits, workDoneProgress guard.
  • Version + hooks: Bump to 0.5.0, Stop session-summary hook, plugin.json fields.
  • Backward compat: 15 individual MCP tool aliases alongside consolidated dispatch.

Test plan

  • ivy-lsp tests pass (1977 passed, 0 failed)
  • Re-run evaluation static tests (B.S4, B.S6)
  • Re-run evaluation live tests (A.L4, B.L1, D.L8)

Summary by Sourcery

Add strategic evaluation and consolidation plan documentation for the Ivy tooling ecosystem, covering current capabilities, over‑engineering analysis, and an implementation roadmap.

Documentation:

  • Document the current Ivy MCP and LSP capabilities, strengths, and gaps versus other formal verification and specification tools.
  • Describe over‑engineering issues in Ivy tools, agents, and skills and propose a consolidated, streamlined tooling surface.
  • Outline missing capabilities and a phased implementation roadmap with verification checklists for improving Ivy’s verification, traceability, and AI-integration features.

ElNiak added 11 commits March 12, 2026 22:45
Updates panther_ivy to feat/mcp-tool-fixes-and-requirements which includes:
- Fixed 6 issues in ivy-tools MCP server (ISS-001/002/003/005/006/008)
- All 19 MCP tools verified working via live calls
- Added RFC 9000 requirements manifest (101 requirements)
- Auto-detect local ivy-lsp source for development
Update panther_ivy submodule with the new Pattern Library featuring:
- 13 reusable .ivy template files for 7 formal patterns
- Pattern detection engine with cross-reference validation
- 2 new MCP tools (ivy_pattern_analysis, ivy_pattern_scaffold)
- /nct-add-pattern command and pattern-library skill
- Enhanced coverage gaps, smart suggestions, and LSP diagnostics
Picks up ivy-lsp changes: semantic edge wiring (COVERS, HAS_PARAM,
RETURNS_TYPE, INCLUDES), smart suggestions fix, per-action state var
filtering, and three new MCP tools (ivy_generate_manifest,
ivy_scaffold_check, ivy_quality_gate).
Requirements YAML cleanup, ivy-lsp bug fixes, plugin hook hardening.
Batch 1 (P0): Fix broken wiring
- Fix api/ broken relative imports (api/_shared.py)
- Wire all 25 MCP tools into Claude Code plugin
- Update ivy-tooling-guide skill with complete tool mapping

Batch 2 (P1): Quality debt
- Modularize mcp_server.py (2225 → 728 lines + tools/ package)
- Deduplicate shell script workspace detection (workspace-common.sh)
- Fix PantherIvyVersion docstring (class is used, not dead code)
…lignment)

Python tester: remove shadowed adapt_environment_paths, consolidate 3 protocol
name methods to 1, merge duplicate output patterns, remove dead build_tests().
Plugin: fix README inventory counts, CSO-optimize all 15 skill descriptions,
add Red Flags/Integration/Common Mistakes sections to process skills,
normalize skill name casing to kebab-case.
Update panther_ivy submodule with MCP tool consolidation (25→15),
plugin surface reduction (agents 9→4, skills 14→6, commands 6→5),
counterexample parser, per-isolate caching, and coverage diff mode.

Add strategic evaluation document comparing Ivy tooling to state of
the art across 5 dimensions (code intelligence, verification,
traceability, AI-assisted specification, protocol analysis).
ivy-lsp: fix verify cache race, diagnostic layer errors, model init retry,
compile path bug, pattern mode validation, logging, quality gate OSError,
coverage diff truncation, off-by-one in pattern library, stale tool names
in tests. Add counterexample parser, verification cache, and traceability
tool tests.

panther-ivy-plugin: fix shell injection in hooks, JSON parsing in hooks,
brace counting with comment stripping, stale MCP tool names in agents/
skills/commands/READMEs, component counts and version.
Updates submodule pointers for panther-ivy-plugin and ivy-lsp with fixes
for all 18 issues from the plugin evaluation report:

Plugin docs: tool name alignment, skill/agent listing, plugin.json fields,
version bump to 0.5.0, Stop hook for session summary.

Server: cross-reference fuzzy matching, symbol disambiguation with protocol
scoping, coverage tag normalization, hover SemanticModel fallback,
test_file filtering, output size limits, workDoneProgress capability guard,
individual tool aliases for backward compatibility.
Copilot AI review requested due to automatic review settings March 16, 2026 11:48
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 16, 2026

Reviewer's Guide

Updates Ivy plugin documentation to use actual MCP tool names and add evaluation context, introduces backward-compatible MCP tool aliases and plugin metadata changes, and adjusts the LSP server behavior around symbol resolution, diagnostics, and progress/output limits to address the 18 issues from the Ivy tooling evaluation report.

File-Level Changes

Change Details Files
Add a comprehensive Ivy tooling evaluation and roadmap document to the docs tree.
  • Introduce a long-form markdown spec describing current Ivy MCP/LSP capabilities, gaps, and consolidation plan.
  • Document proposed consolidation of MCP tools, agents, and skills, plus quality gates and hooks behavior changes.
  • Capture comparison against other verification ecosystems and define phased implementation and verification checklist.
docs/superpowers/specs/2026-03-13-ivy-tooling-evaluation.md
Update Ivy plugin implementation, configuration, and docs to align with the evaluation findings and improve pass rate.
  • Adjust plugin/server logic in the panther_ivy service to fix fuzzy cross-reference resolution, hover fallback, symbol disambiguation, coverage normalization, and test_file filtering.
  • Add guards and limits for workDoneProgress and tool output sizes to prevent pathological responses during evaluation scenarios.
  • Introduce version bump and plugin.json field updates, and wire up or disable specific hooks (e.g., session-summary, pre/post tool hooks) to match the new quality-gate strategy.
  • Replace consolidated tool names with actual MCP tool identifiers across CLAUDE.md, skills, agents, and command docs, and add tables mapping tools, skills, and agents for the Ivy plugin.
  • Add backward-compatible MCP tool aliases so legacy consolidated tool names still resolve while new per-tool dispatch is available.
panther/plugins/services/testers/panther_ivy
.claude/CLAUDE.md
docs/**
plugin.json
skills/*
agents/*
commands/*

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Ivy plugin submodule and adds a strategic evaluation document describing the Ivy tooling ecosystem, consolidation opportunities, and a roadmap to address known evaluation gaps.

Changes:

  • Bumps panther_ivy submodule to a newer commit.
  • Adds a new “Ivy Tooling Ecosystem” strategic evaluation/roadmap document.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
panther/plugins/services/testers/panther_ivy Updates the git submodule pointer for the Ivy tester integration.
docs/superpowers/specs/2026-03-13-ivy-tooling-evaluation.md Adds a detailed evaluation + consolidation plan and checklist for Ivy MCP/LSP tooling.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +79 to +84
| Structured semantic access | None (text-only context) | Codebase indexing (AST-level) | Lean server API (JSON) | SerAPI (S-expressions) | **21 MCP tools** with typed JSON schemas providing symbol info, include graphs, dependency analysis, traceability matrices, pattern detection |
| Specification generation | General code generation | General with context | Proof step generation | Proof search | `ivy_pattern_scaffold` generates complete Ivy specification templates for 5 pattern types (serdes, shim, entity, variants, monitors) with documentation |
| Workflow guidance | None | Tab-based suggestions | None | None | 14 Claude Code skills + 9 agents with methodology knowledge (specification creation workflow, quality gates, RFC analysis) |
| Quality enforcement | Linting only | Linting + review | Type checking | Type checking | `ivy_quality_gate` (3-tier: minimal/standard/comprehensive) + `ivy_scaffold_check` (14-layer architecture validation) + `ivy_diagnostics` (5-layer graduated analysis) |

**Ivy is ahead in one critical area**: Structured semantic access for AI agents. The 21 MCP tools provide typed, queryable access to the specification model that goes far beyond what any LLM+prover integration offers. LeanDojo and Proverbot9001 give LLMs access to tactic state; Ivy gives LLMs access to the entire specification architecture (symbols, dependencies, requirements, patterns, coverage, quality metrics).
Comment on lines +405 to +418
Plus 5 standalone tools kept as-is:
- `ivy_extract_requirements` (4.6/5)
- `ivy_generate_manifest` (3.8/5)
- `ivy_pattern_analysis` (4.4/5)
- `ivy_pattern_scaffold` (4.6/5)

**Total**: 12 merged/kept + 5 standalone = 17. Further consolidation of the 5 standalone tools is not recommended as they each serve distinct, high-scoring functions.

*Correction*: The 5 standalone tools are counted within the 12 above. The final tool count is **12** unique tools total (7 merged entries + 5 kept as-is = 12, after cutting `ivy_smart_suggestions` and absorbing `ivy_lint`).

---

## Appendix C: Comparison with Prior Art in Combined Verification+Traceability

ElNiak added 6 commits March 16, 2026 14:28
…iew fixes)

Evaluation doc:
- TLA+: "Shipping (SANY-based)" not "In development"
- ProVerif: "Partial" not "Full" LSP
- Tamarin: note web prover provides exploration

Submodule (panther_ivy):
- Fix classify_endpoint_type vs _extract_test_directory_from_name divergence
- Fix docstrings, add edge case tests, clean unused imports
- Fix pattern catalog detection markers
LSP: 17→19 features (add implementation, call_hierarchy to Appendix B).
Patterns: 6→7 types (add include-chain).
Code metrics: analysis_pipeline 874→896 lines, 37→~17 methods;
model.py 252→295 lines; requirement_graph 400+→718 lines.
Note backward-compatibility aliases in Appendix A.
- Fix _extract_test_directory divergence from classify_endpoint_type
- Re-export classify_endpoint_type in api/_shared.py
- Remove 4 duplicate entries from rfc9000_requirements.yaml
…tale comments)

- Revert autoescape=True to False in Jinja env (Markdown, not HTML)
- Switch hardcoded 1/2/3 numbering to bullet points in failure analysis
- Add missing Phases column to fallback markdown reporter
- Restore delegation-pattern "why" context on emit comments
- Fix stale StateManager reference and wrong progress bar comment
- Remove redundant inline comment in metrics_observer
- Document test_name retention and timing-includes-init behavior
- Add file references for backward-compat alias count in eval doc
- Fix pre-existing D212 docstring style warnings in experiment_reporter
@github-actions
Copy link
Copy Markdown

PR Validation Passed

Your changes look good! The quick validation checks have passed:

  • ✅ Code formatting (Black)
  • ✅ Import sorting (isort)
  • ✅ Linting (flake8)
  • ✅ Basic tests

The full CI pipeline will run when this PR is merged or when targeting the main branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants