Skip to content

Commit 07a5800

Browse files
weiyiweiyi
authored andcommitted
Turn the GitHub repo into a better public collaboration surface
The repository already had stronger docs and community-health files, but the public contribution flow still stopped at issues and pull requests. This adds a dedicated roadmap, issue-template routing, labels, and Discussions so the GitHub repo can support open-ended product conversation without diluting the artifact-focused issue tracker. Constraint: Keep GitHub collaboration centered on the eval toolchain and avoid generic community process overhead Rejected: Add many issue forms before a roadmap/discussion surface existed | creates intake without direction Rejected: Leave broad proposals in regular issues only | mixes product discussion with reproducible bug tracking Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep issues optimized for reproducible defects and artifact contracts; route open-ended direction questions toward roadmap and Discussions Tested: Enabled GitHub Discussions; created and verified custom labels; verified issue template config and roadmap files; reran root demo script Not-tested: Manual creation of discussion threads in the GitHub UI
1 parent 2cd6640 commit 07a5800

4 files changed

Lines changed: 78 additions & 0 deletions

File tree

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
blank_issues_enabled: false
2+
contact_links:
3+
- name: Roadmap and project direction
4+
url: https://github.com/Jasvina/AgentEvalKit/blob/main/ROADMAP.md
5+
about: Read the public roadmap before proposing broad new directions.
6+
- name: Discussions
7+
url: https://github.com/Jasvina/AgentEvalKit/discussions
8+
about: Use Discussions for open-ended questions, ideas, and design conversations.
9+
- name: Security reporting
10+
url: https://github.com/Jasvina/AgentEvalKit/blob/main/SECURITY.md
11+
about: Please report undisclosed security issues privately.

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ When in doubt, optimize for sharper scope. A small contribution that makes one e
3333

3434
## Before you start
3535

36+
If you are proposing broader direction changes, read `ROADMAP.md` first so suggestions stay aligned with the public scope.
37+
3638
Before writing code, it helps to align on the kind of change you are making:
3739

3840
- for bug fixes, include a failing test or a concrete reproduction if possible

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,9 +204,13 @@ projects/
204204
- issue and PR templates: `.github/`
205205
- security policy: `SECURITY.md`
206206
- support guidance: `SUPPORT.md`
207+
- public roadmap: `ROADMAP.md`
208+
- discussions: GitHub Discussions
207209

208210
## Roadmap
209211

212+
For the longer view, see `ROADMAP.md`.
213+
210214
- add more `AgentCI` integrations and richer HTML diff reports
211215
- strengthen `TracePack` redaction policies, labeling workflows, and export formats
212216
- add richer `FailMap` issue templates, trend views, and release-to-release drilldowns

ROADMAP.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Roadmap
2+
3+
`AgentEvalKit` is a toolkit for the agent reliability loop:
4+
5+
1. capture real runs
6+
2. replay and diff them
7+
3. package them into reusable eval artifacts
8+
4. cluster recurring failures
9+
5. slice the same evidence into train/eval/test datasets
10+
11+
This roadmap keeps the repo focused on strengthening that loop.
12+
13+
## Near term
14+
15+
### AgentCI
16+
17+
- add more adapters for real agent runtimes
18+
- improve regression diffs for step-level and metric-level debugging
19+
- expand flaky-run detection and replay mismatch reporting
20+
21+
### TracePack
22+
23+
- improve redaction coverage and redaction test cases
24+
- add richer labeling workflows and metadata normalization
25+
- strengthen export formats for eval and fine-tuning pipelines
26+
27+
### FailMap
28+
29+
- improve release-over-release comparisons
30+
- add stronger issue draft templates and routing rules
31+
- expand trend and drilldown reporting for recurring signatures
32+
33+
### PackSlice
34+
35+
- expand label-aware and temporal splitting strategies
36+
- improve reproducibility guarantees for repeated slicing runs
37+
- add stronger diagnostics for split balance and leakage risks
38+
39+
## Cross-tool priorities
40+
41+
- keep JSON artifact contracts stable and explicit
42+
- make root automation outputs easier to consume in CI and dashboards
43+
- improve monorepo demos so visitors can understand the full toolchain in minutes
44+
- add more tests that lock the handoff between tools
45+
46+
## Not the current focus
47+
48+
To keep the repo sharp, it is intentionally not prioritizing:
49+
50+
- generic chat demos
51+
- broad orchestration frameworks
52+
- memory layers unrelated to eval artifacts
53+
- open-ended agent abstractions with no reproducible output contract
54+
55+
## Contribution lens
56+
57+
The highest-value additions usually do at least one of these:
58+
59+
- make one existing tool more useful in a real eval workflow
60+
- improve the handoff between `AgentCI`, `TracePack`, `FailMap`, and `PackSlice`
61+
- make artifact outputs easier to validate, compare, or automate against

0 commit comments

Comments
 (0)