Add agents evaluation and rewrite agents based on the evaluation#1377
Draft
saratpoluri wants to merge 3 commits into
Draft
Add agents evaluation and rewrite agents based on the evaluation#1377saratpoluri wants to merge 3 commits into
saratpoluri wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR standardizes and streamlines service-level Agents.md guides across multiple SceneScape microservices, adds a new rubric + efficacy-testing skill for evaluating Agents.md quality, and updates Copilot instructions to reference the new evaluation skill.
Changes:
- Added “Verification Gate (Standardized)” sections with command paths and pass criteria to multiple
Agents.mdguides. - Rewrote several service
Agents.mdfiles to be more concise and KPI/constraint-driven. - Introduced a new
.github/skills/agent_evaluation/skill (rubric + efficacy test procedure) and linked it from.github/copilot-instructions.md.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tracker/Agents.md | Adds a standardized verification gate table for tracker changes. |
| tools/tracker/evaluation/Agents.md | Adds a standardized verification gate table for evaluation tooling. |
| mapping/Agents.md | Rewrites mapping agent guide into a concise format and adds verification gate commands. |
| manager/Agents.md | Rewrites manager agent guide into a concise format and adds verification gate commands. |
| controller/Agents.md | Rewrites controller agent guide into a concise format and adds verification gate commands. |
| cluster_analytics/Agents.md | Rewrites cluster analytics agent guide into a concise format and adds verification gate commands. |
| autocalibration/Agents.md | Rewrites autocalibration agent guide into a concise format and adds verification gate commands. |
| .github/skills/agent_evaluation/agents-md-evaluation.md | Adds a scoring rubric and required JSON output format for evaluating Agents.md. |
| .github/skills/agent_evaluation/SKILL.md | Adds the evaluation skill entry and an efficacy-testing procedure. |
| .github/copilot-instructions.md | References the new agent evaluation skill and adds an on-demand loading trigger. |
Comment on lines
+51
to
+56
| | Change class | Command path | Pass criteria | | ||
| | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- | | ||
| | API/schema contracts | `make autocalibration && make -C autocalibration test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests auto-calibration` | Exit code 0; auto-calibration workflow test passes without new contract/schema failures. | | ||
| | Algorithm/concurrency flow | `make autocalibration && make -C autocalibration test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests markerless-unit` | Exit code 0; markerless calibration path passes with no new race/locking regressions. | | ||
| | Performance/quality-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests auto-calibration` | Exit code 0; report before/after p95 completion and reprojection-quality deltas for changed logic. | | ||
| | Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when only runtime calibration behavior is changed. | |
Comment on lines
+51
to
+56
| | Change class | Command path | Pass criteria | | ||
| | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | | ||
| | API/schema contracts | `make mapping && make -C mapping test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; no new API contract/schema regressions in mapping test output. | | ||
| | Reconstruction/localization logic | `make mapping && make -C mapping test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; logic tests pass for both success and failure paths. | | ||
| | Performance/resource-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests mapping-unit` | Exit code 0; include before/after runtime and memory-envelope impact for representative workloads. | | ||
| | Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when only runtime mapping behavior is changed. | |
Comment on lines
+51
to
+56
| | Change class | Command path | Pass criteria | | ||
| | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | ||
| | API/schema contracts | `make manager && make -C manager test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests django-integration-unit` | Exit code 0; no new API contract/schema/permission regressions. | | ||
| | Workflow/business logic | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests logic-unit-tests` | Exit code 0; changed workflow tests pass for positive and negative paths. | | ||
| | Performance/reliability-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests openapi-validation` | Exit code 0; no new endpoint failures and measured p95 API latency regression is within agreed budget. | | ||
| | Migrations/persistence | `make manager && make -C manager test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests django-integration-unit` | Exit code 0; migration-related tests pass and DB-impacting changes include apply/check evidence in the PR notes. | |
Comment on lines
+51
to
+56
| | Change class | Command path | Pass criteria | | ||
| | ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | | ||
| | API/schema contracts | `make controller && make -C controller test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests logic-unit-tests` | Exit code 0; no new schema validation or payload contract failures. | | ||
| | Tracking/association/time logic | `make controller && make -C controller test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests scene-unit` | Exit code 0; continuity/time-handling checks pass with no new logic regressions. | | ||
| | Performance-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests metrics` | Exit code 0; include before/after p95 handler latency and ingest-to-publish latency in report. | | ||
| | Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR because controller is runtime-state focused. | |
Comment on lines
+51
to
+56
| | Change class | Command path | Pass criteria | | ||
| | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | ||
| | API/schema contracts | `make cluster_analytics && make -C cluster_analytics test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests unit-tests` | Exit code 0; no new publish-contract/schema regressions in test output. | | ||
| | Algorithm/lifecycle logic | `make cluster_analytics && make -C cluster_analytics test-build && http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests unit-tests` | Exit code 0; no regressions in stability-oriented checks and scene processing behavior. | | ||
| | Performance-sensitive changes | `http_proxy=http://proxy-dmz.intel.com:911 HTTP_PROXY=http://proxy-dmz.intel.com:911 https_proxy=http://proxy-dmz.intel.com:912 HTTPS_PROXY=http://proxy-dmz.intel.com:912 make -C tests metrics` | Exit code 0; include before/after latency and cluster ID churn measurements for tuned logic. | | ||
| | Migrations/persistence | N/A for this service | Must be explicitly marked N/A in the PR when no persistent-schema surface is changed. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Description
Provide a clear summary of the changes and the context behind them. Describe what was changed, why it was needed, and how the changes address the issue or add value.
✨ Type of Change
Select the type of change your PR introduces:
🧪 Testing Scenarios
Describe how the changes were tested and how reviewers can test them too:
✅ Checklist
Before submitting the PR, ensure the following: