Skip to content

Add agents evaluation and rewrite agents based on the evaluation#1377

Draft
saratpoluri wants to merge 3 commits into
mainfrom
fix/agents-evaluation
Draft

Add agents evaluation and rewrite agents based on the evaluation#1377
saratpoluri wants to merge 3 commits into
mainfrom
fix/agents-evaluation

Conversation

@saratpoluri
Copy link
Copy Markdown
Contributor

📝 Description

Provide a clear summary of the changes and the context behind them. Describe what was changed, why it was needed, and how the changes address the issue or add value.

✨ Type of Change

Select the type of change your PR introduces:

  • 🐞 Bug fix – Non-breaking change which fixes an issue
  • 🚀 New feature – Non-breaking change which adds functionality
  • 🔨 Refactor – Non-breaking change which refactors the code base
  • 💥 Breaking change – Changes that break existing functionality
  • 📚 Documentation update
  • 🔒 Security update
  • 🧪 Tests
  • 🚂 CI

🧪 Testing Scenarios

Describe how the changes were tested and how reviewers can test them too:

  • ✅ Tested manually
  • 🤖 Ran automated end-to-end tests

✅ Checklist

Before submitting the PR, ensure the following:

  • 🔍 PR title is clear and descriptive
  • 📝 For internal contributors: If applicable, include the JIRA ticket number (e.g., ITEP-123456) in the PR title. Do not include full URLs
  • 💬 I have commented my code, especially in hard-to-understand areas
  • 📄 I have made corresponding changes to the documentation
  • ✅ I have added tests that prove my fix is effective or my feature works

@saratpoluri saratpoluri requested review from Copilot and daddo-intel May 1, 2026 20:43
@saratpoluri saratpoluri marked this pull request as draft May 1, 2026 20:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes and streamlines service-level Agents.md guides across multiple SceneScape microservices, adds a new rubric + efficacy-testing skill for evaluating Agents.md quality, and updates Copilot instructions to reference the new evaluation skill.

Changes:

  • Added “Verification Gate (Standardized)” sections with command paths and pass criteria to multiple Agents.md guides.
  • Rewrote several service Agents.md files to be more concise and KPI/constraint-driven.
  • Introduced a new .github/skills/agent_evaluation/ skill (rubric + efficacy test procedure) and linked it from .github/copilot-instructions.md.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tracker/Agents.md Adds a standardized verification gate table for tracker changes.
tools/tracker/evaluation/Agents.md Adds a standardized verification gate table for evaluation tooling.
mapping/Agents.md Rewrites mapping agent guide into a concise format and adds verification gate commands.
manager/Agents.md Rewrites manager agent guide into a concise format and adds verification gate commands.
controller/Agents.md Rewrites controller agent guide into a concise format and adds verification gate commands.
cluster_analytics/Agents.md Rewrites cluster analytics agent guide into a concise format and adds verification gate commands.
autocalibration/Agents.md Rewrites autocalibration agent guide into a concise format and adds verification gate commands.
.github/skills/agent_evaluation/agents-md-evaluation.md Adds a scoring rubric and required JSON output format for evaluating Agents.md.
.github/skills/agent_evaluation/SKILL.md Adds the evaluation skill entry and an efficacy-testing procedure.
.github/copilot-instructions.md References the new agent evaluation skill and adds an on-demand loading trigger.

Comment thread autocalibration/Agents.md
Comment on lines +51 to +56
| Change class | Command path | Pass criteria |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- |
| API/schema contracts | `make autocalibration && make -C autocalibration test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests auto-calibration` | Exit code 0; auto-calibration workflow test passes without new contract/schema failures. |
| Algorithm/concurrency flow | `make autocalibration && make -C autocalibration test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests markerless-unit` | Exit code 0; markerless calibration path passes with no new race/locking regressions. |
| Performance/quality-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests auto-calibration` | Exit code 0; report before/after p95 completion and reprojection-quality deltas for changed logic. |
| Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when only runtime calibration behavior is changed. |
Comment thread mapping/Agents.md
Comment on lines +51 to +56
| Change class | Command path | Pass criteria |
| -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| API/schema contracts | `make mapping && make -C mapping test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; no new API contract/schema regressions in mapping test output. |
| Reconstruction/localization logic | `make mapping && make -C mapping test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; logic tests pass for both success and failure paths. |
| Performance/resource-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; include before/after runtime and memory-envelope impact for representative workloads. |
| Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when only runtime mapping behavior is changed. |
Comment thread manager/Agents.md
Comment on lines +51 to +56
| Change class | Command path | Pass criteria |
| ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| API/schema contracts | `make manager && make -C manager test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests django-integration-unit` | Exit code 0; no new API contract/schema/permission regressions. |
| Workflow/business logic | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests logic-unit-tests` | Exit code 0; changed workflow tests pass for positive and negative paths. |
| Performance/reliability-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests openapi-validation` | Exit code 0; no new endpoint failures and measured p95 API latency regression is within agreed budget. |
| Migrations/persistence | `make manager && make -C manager test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests django-integration-unit` | Exit code 0; migration-related tests pass and DB-impacting changes include apply/check evidence in the PR notes. |
Comment thread controller/Agents.md
Comment on lines +51 to +56
| Change class | Command path | Pass criteria |
| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| API/schema contracts | `make controller && make -C controller test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests logic-unit-tests` | Exit code 0; no new schema validation or payload contract failures. |
| Tracking/association/time logic | `make controller && make -C controller test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests scene-unit` | Exit code 0; continuity/time-handling checks pass with no new logic regressions. |
| Performance-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests metrics` | Exit code 0; include before/after p95 handler latency and ingest-to-publish latency in report. |
| Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR because controller is runtime-state focused. |
Comment on lines +51 to +56
| Change class | Command path | Pass criteria |
| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| API/schema contracts | `make cluster_analytics && make -C cluster_analytics test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests unit-tests` | Exit code 0; no new publish-contract/schema regressions in test output. |
| Algorithm/lifecycle logic | `make cluster_analytics && make -C cluster_analytics test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests unit-tests` | Exit code 0; no regressions in stability-oriented checks and scene processing behavior. |
| Performance-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests metrics` | Exit code 0; include before/after latency and cluster ID churn measurements for tuned logic. |
| Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when no persistent-schema surface is changed. |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants