Skip to content

Commit 02c5200

Browse files
weiyiweiyi
authored andcommitted
Rename the repo to better match the actual product scope
The public repository has grown beyond an eval-only story: it now covers replay, regression testing, trace packaging, failure analysis, and dataset slicing for LLM agents. The old name still undersold that broader reliability workflow, so this change renames the repo surface to AgentReliabilityKit and aligns the visible docs, assets, and automation paths with the new positioning. Constraint: Keep the existing tool names and artifact contracts stable while clarifying the monorepo brand Rejected: Keep the AgentEvalKit name and only tweak the description | still too narrow for the shipped functionality Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep future top-level naming and copy aligned with the full reliability loop, not just evaluation in isolation Tested: AgentCI / TracePack / PackSlice test suites; root automation demo script; GitHub repo rename and issue closure verification Not-tested: FailMap suite on this final rename-only pass
1 parent 7324eaa commit 02c5200

15 files changed

Lines changed: 57 additions & 57 deletions

.github/FUNDING.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ liberapay: ""
88
issuehunt: ""
99
otechie: ""
1010
custom:
11-
- https://github.com/Jasvina/AgentEvalKit
11+
- https://github.com/Jasvina/AgentReliabilityKit

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
blank_issues_enabled: false
22
contact_links:
33
- name: Roadmap and project direction
4-
url: https://github.com/Jasvina/AgentEvalKit/blob/main/ROADMAP.md
4+
url: https://github.com/Jasvina/AgentReliabilityKit/blob/main/ROADMAP.md
55
about: Read the public roadmap before proposing broad new directions.
66
- name: Discussions
7-
url: https://github.com/Jasvina/AgentEvalKit/discussions
7+
url: https://github.com/Jasvina/AgentReliabilityKit/discussions
88
about: Use Discussions for open-ended questions, ideas, and design conversations.
99
- name: Security reporting
10-
url: https://github.com/Jasvina/AgentEvalKit/blob/main/SECURITY.md
10+
url: https://github.com/Jasvina/AgentReliabilityKit/blob/main/SECURITY.md
1111
about: Please report undisclosed security issues privately.

.github/ISSUE_TEMPLATE/feature_request.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,6 @@ Describe the smallest useful feature that would address it.
2727

2828
What file, output, or CLI shape would this add or change?
2929

30-
## Why this belongs in AgentEvalKit
30+
## Why this belongs in AgentReliabilityKit
3131

3232
Explain why this strengthens the eval / regression / failure-analysis story rather than adding a generic demo.

.github/workflows/ci.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -128,14 +128,14 @@ jobs:
128128
with:
129129
python-version: "3.11"
130130
- name: Run monorepo automation demo
131-
run: ./scripts/run_automation_demo.sh /tmp/agentevalkit-automation-demo
131+
run: ./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-automation-demo
132132
- name: Validate automation outputs
133133
run: |
134134
python - <<'PY'
135135
import json
136136
from pathlib import Path
137137
138-
out = Path("/tmp/agentevalkit-automation-demo")
138+
out = Path("/tmp/agentreliabilitykit-automation-demo")
139139
assert (out / "manifest.json").exists()
140140
assert (out / "agentci-summary.json").exists()
141141
assert (out / "agentci-regression.json").exists()
@@ -144,7 +144,7 @@ jobs:
144144
assert (out / "packslice" / "summary.json").exists()
145145
146146
demo_manifest = json.loads((out / "manifest.json").read_text())
147-
assert demo_manifest["format"] == "agentevalkit-demo-v1"
147+
assert demo_manifest["format"] == "agentreliabilitykit-demo-v1"
148148
assert demo_manifest["summary"]["agentci"]["regression_passed"] is True
149149
150150
agentci_summary = json.loads((out / "agentci-summary.json").read_text())
@@ -167,5 +167,5 @@ jobs:
167167
- name: Upload automation demo artifacts
168168
uses: actions/upload-artifact@v4
169169
with:
170-
name: agentevalkit-automation-demo
171-
path: /tmp/agentevalkit-automation-demo
170+
name: agentreliabilitykit-automation-demo
171+
path: /tmp/agentreliabilitykit-automation-demo

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Changelog
22

3-
All notable changes to `AgentEvalKit` will be documented in this file.
3+
All notable changes to `AgentReliabilityKit` will be documented in this file.
44

55
The project currently tracks a single public line of development on `main`, with GitHub releases used to mark meaningful public milestones in the repo's evolution.
66

@@ -10,7 +10,7 @@ Initial public toolkit release for the repo in its current form.
1010

1111
### Added
1212

13-
- clarified monorepo positioning as `AgentEvalKit`
13+
- clarified monorepo positioning as `AgentReliabilityKit`
1414
- root automation demo with machine-readable `manifest.json`
1515
- GitHub-facing repository docs and community health files
1616
- issue templates, PR template, roadmap, labels, Discussions, funding metadata, and code ownership metadata

CODE_OF_CONDUCT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Our Commitment
44

5-
We want `AgentEvalKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
5+
We want `AgentReliabilityKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
66

77
Contributors, maintainers, and community members are expected to keep interactions respectful, constructive, and focused on improving the work.
88

CONTRIBUTING.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Contributing to AgentEvalKit
1+
# Contributing to AgentReliabilityKit
22

3-
Thanks for checking out `AgentEvalKit`.
3+
Thanks for checking out `AgentReliabilityKit`.
44

55
This monorepo is intentionally narrow: each project should solve a concrete gap in agent reproducibility, regression testing, failure analysis, or benchmark preparation. Contributions are most useful when they strengthen that end-to-end story instead of adding unrelated demos.
66

@@ -65,7 +65,7 @@ For monorepo automation checks, the root demo script is often the fastest way to
6565

6666
```bash
6767
chmod +x scripts/run_automation_demo.sh
68-
./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
68+
./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
6969
```
7070

7171
That demo now writes a root `manifest.json` alongside the per-tool artifacts, which is the best single file to inspect when you want to confirm the end-to-end handoff shape.
@@ -117,7 +117,7 @@ cd projects/packslice && python -m unittest discover -s tests -v
117117
End-to-end validation:
118118

119119
```bash
120-
./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
120+
./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
121121
```
122122

123123
If you change CLI output that is documented in the README, examples, or CI workflow, update those references in the same pull request.
@@ -164,6 +164,6 @@ If you want to propose a new project for the monorepo, start by describing:
164164
- the missing workflow in today's agent tooling
165165
- why the problem is not already well served by existing OSS
166166
- the minimal artifact contract and CLI that would make it useful
167-
- how it would connect to the rest of `AgentEvalKit`
167+
- how it would connect to the rest of `AgentReliabilityKit`
168168

169169
The best proposals usually start small: one tight workflow, one useful artifact, one clear CLI, and one obvious connection to the rest of the toolchain.

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# AgentEvalKit
1+
# AgentReliabilityKit
22

3-
[![CI](https://github.com/Jasvina/AgentEvalKit/actions/workflows/ci.yml/badge.svg)](https://github.com/Jasvina/AgentEvalKit/actions/workflows/ci.yml)
4-
[![License](https://img.shields.io/github/license/Jasvina/AgentEvalKit)](LICENSE)
5-
[![Monorepo](https://img.shields.io/badge/layout-agent%20tooling%20monorepo-0a7bbb)](https://github.com/Jasvina/AgentEvalKit)
3+
[![CI](https://github.com/Jasvina/AgentReliabilityKit/actions/workflows/ci.yml/badge.svg)](https://github.com/Jasvina/AgentReliabilityKit/actions/workflows/ci.yml)
4+
[![License](https://img.shields.io/github/license/Jasvina/AgentReliabilityKit)](LICENSE)
5+
[![Monorepo](https://img.shields.io/badge/layout-agent%20tooling%20monorepo-0a7bbb)](https://github.com/Jasvina/AgentReliabilityKit)
66

77
Open-source tooling for agent evals, regression testing, trace packaging, failure clustering, and dataset slicing.
88

9-
`AgentEvalKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
9+
`AgentReliabilityKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
1010

1111
## Why this exists
1212

@@ -20,7 +20,7 @@ This repo is built around that loop:
2020
4. cluster repeated failures across runs or releases
2121
5. slice the same artifact into reproducible datasets
2222

23-
That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a general agent framework.
23+
That makes `AgentReliabilityKit` closer to an eval-and-reliability toolkit than a general agent framework.
2424

2525
## What you get
2626

@@ -33,7 +33,7 @@ That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a gener
3333
## Toolchain at a glance
3434

3535
<p align="center">
36-
<img src="docs/assets/agentevalkit-overview.svg" alt="AgentEvalKit architecture overview" width="100%" />
36+
<img src="docs/assets/agentreliabilitykit-overview.svg" alt="AgentReliabilityKit architecture overview" width="100%" />
3737
</p>
3838

3939
```text
@@ -46,13 +46,13 @@ PackSlice -> split packs into balanced train/eval/test datasets
4646
## What the demo produces
4747

4848
<p align="center">
49-
<img src="docs/assets/agentevalkit-demo-terminal.svg" alt="AgentEvalKit terminal-style demo output" width="100%" />
49+
<img src="docs/assets/agentreliabilitykit-demo-terminal.svg" alt="AgentReliabilityKit terminal-style demo output" width="100%" />
5050
</p>
5151

5252
Run the end-to-end repo demo with:
5353

5454
```bash
55-
./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
55+
./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
5656
```
5757

5858
The output is intentionally machine-readable. A successful run gives you a root `manifest.json` plus per-tool artifacts:
@@ -183,7 +183,7 @@ The most useful agent infra repos are usually:
183183
3. demoable in a few minutes
184184
4. useful to both researchers and production teams
185185

186-
`AgentEvalKit` is built around that rule.
186+
`AgentReliabilityKit` is built around that rule.
187187

188188
## Monorepo structure
189189

@@ -207,7 +207,7 @@ projects/
207207
- public roadmap: `ROADMAP.md`
208208
- changelog: `CHANGELOG.md`
209209
- discussions: GitHub Discussions
210-
- social preview source: `docs/assets/agentevalkit-social-preview.svg`
210+
- social preview source: `docs/assets/agentreliabilitykit-social-preview.svg`
211211

212212
## Roadmap
213213

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Roadmap
22

3-
`AgentEvalKit` is a toolkit for the agent reliability loop:
3+
`AgentReliabilityKit` is a toolkit for the agent reliability loop:
44

55
1. capture real runs
66
2. replay and diff them

SECURITY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Scope
44

5-
`AgentEvalKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
5+
`AgentReliabilityKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
66

77
- secret leakage or incomplete redaction in `TracePack`
88
- unsafe artifact handling in `AgentCI`, `FailMap`, or `PackSlice`

0 commit comments

Comments
 (0)