Turn the GitHub repo into a better public collaboration surface

weiyi · weiyi · commit 07a5800f98a5 · 2026-05-03T22:31:47.000+08:00
The repository already had stronger docs and community-health files,
but the public contribution flow still stopped at issues and pull
requests. This adds a dedicated roadmap, issue-template routing, labels,
and Discussions so the GitHub repo can support open-ended product
conversation without diluting the artifact-focused issue tracker.

Constraint: Keep GitHub collaboration centered on the eval toolchain and avoid generic community process overhead
Rejected: Add many issue forms before a roadmap/discussion surface existed | creates intake without direction
Rejected: Leave broad proposals in regular issues only | mixes product discussion with reproducible bug tracking
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep issues optimized for reproducible defects and artifact contracts; route open-ended direction questions toward roadmap and Discussions
Tested: Enabled GitHub Discussions; created and verified custom labels; verified issue template config and roadmap files; reran root demo script
Not-tested: Manual creation of discussion threads in the GitHub UI
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,11 @@
+blank_issues_enabled: false
+contact_links:
+  - name: Roadmap and project direction
+    url: https://github.com/Jasvina/AgentEvalKit/blob/main/ROADMAP.md
+    about: Read the public roadmap before proposing broad new directions.
+  - name: Discussions
+    url: https://github.com/Jasvina/AgentEvalKit/discussions
+    about: Use Discussions for open-ended questions, ideas, and design conversations.
+  - name: Security reporting
+    url: https://github.com/Jasvina/AgentEvalKit/blob/main/SECURITY.md
+    about: Please report undisclosed security issues privately.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -33,6 +33,8 @@ When in doubt, optimize for sharper scope. A small contribution that makes one e
 
 ## Before you start
 
+If you are proposing broader direction changes, read `ROADMAP.md` first so suggestions stay aligned with the public scope.
+
 Before writing code, it helps to align on the kind of change you are making:
 
 - for bug fixes, include a failing test or a concrete reproduction if possible
diff --git a/README.md b/README.md
@@ -204,9 +204,13 @@ projects/
 - issue and PR templates: `.github/`
 - security policy: `SECURITY.md`
 - support guidance: `SUPPORT.md`
+- public roadmap: `ROADMAP.md`
+- discussions: GitHub Discussions
 
 ## Roadmap
 
+For the longer view, see `ROADMAP.md`.
+
 - add more `AgentCI` integrations and richer HTML diff reports
 - strengthen `TracePack` redaction policies, labeling workflows, and export formats
 - add richer `FailMap` issue templates, trend views, and release-to-release drilldowns
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -0,0 +1,61 @@
+# Roadmap
+
+`AgentEvalKit` is a toolkit for the agent reliability loop:
+
+1. capture real runs
+2. replay and diff them
+3. package them into reusable eval artifacts
+4. cluster recurring failures
+5. slice the same evidence into train/eval/test datasets
+
+This roadmap keeps the repo focused on strengthening that loop.
+
+## Near term
+
+### AgentCI
+
+- add more adapters for real agent runtimes
+- improve regression diffs for step-level and metric-level debugging
+- expand flaky-run detection and replay mismatch reporting
+
+### TracePack
+
+- improve redaction coverage and redaction test cases
+- add richer labeling workflows and metadata normalization
+- strengthen export formats for eval and fine-tuning pipelines
+
+### FailMap
+
+- improve release-over-release comparisons
+- add stronger issue draft templates and routing rules
+- expand trend and drilldown reporting for recurring signatures
+
+### PackSlice
+
+- expand label-aware and temporal splitting strategies
+- improve reproducibility guarantees for repeated slicing runs
+- add stronger diagnostics for split balance and leakage risks
+
+## Cross-tool priorities
+
+- keep JSON artifact contracts stable and explicit
+- make root automation outputs easier to consume in CI and dashboards
+- improve monorepo demos so visitors can understand the full toolchain in minutes
+- add more tests that lock the handoff between tools
+
+## Not the current focus
+
+To keep the repo sharp, it is intentionally not prioritizing:
+
+- generic chat demos
+- broad orchestration frameworks
+- memory layers unrelated to eval artifacts
+- open-ended agent abstractions with no reproducible output contract
+
+## Contribution lens
+
+The highest-value additions usually do at least one of these:
+
+- make one existing tool more useful in a real eval workflow
+- improve the handoff between `AgentCI`, `TracePack`, `FailMap`, and `PackSlice`
+- make artifact outputs easier to validate, compare, or automate against