You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rename the repo to better match the actual product scope
The public repository has grown beyond an eval-only story: it now covers
replay, regression testing, trace packaging, failure analysis, and
dataset slicing for LLM agents. The old name still undersold that broader
reliability workflow, so this change renames the repo surface to
AgentReliabilityKit and aligns the visible docs, assets, and automation
paths with the new positioning.
Constraint: Keep the existing tool names and artifact contracts stable while clarifying the monorepo brand
Rejected: Keep the AgentEvalKit name and only tweak the description | still too narrow for the shipped functionality
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future top-level naming and copy aligned with the full reliability loop, not just evaluation in isolation
Tested: AgentCI / TracePack / PackSlice test suites; root automation demo script; GitHub repo rename and issue closure verification
Not-tested: FailMap suite on this final rename-only pass
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Changelog
2
2
3
-
All notable changes to `AgentEvalKit` will be documented in this file.
3
+
All notable changes to `AgentReliabilityKit` will be documented in this file.
4
4
5
5
The project currently tracks a single public line of development on `main`, with GitHub releases used to mark meaningful public milestones in the repo's evolution.
6
6
@@ -10,7 +10,7 @@ Initial public toolkit release for the repo in its current form.
10
10
11
11
### Added
12
12
13
-
- clarified monorepo positioning as `AgentEvalKit`
13
+
- clarified monorepo positioning as `AgentReliabilityKit`
14
14
- root automation demo with machine-readable `manifest.json`
15
15
- GitHub-facing repository docs and community health files
Copy file name to clipboardExpand all lines: CODE_OF_CONDUCT.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Our Commitment
4
4
5
-
We want `AgentEvalKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
5
+
We want `AgentReliabilityKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
6
6
7
7
Contributors, maintainers, and community members are expected to keep interactions respectful, constructive, and focused on improving the work.
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
-
# Contributing to AgentEvalKit
1
+
# Contributing to AgentReliabilityKit
2
2
3
-
Thanks for checking out `AgentEvalKit`.
3
+
Thanks for checking out `AgentReliabilityKit`.
4
4
5
5
This monorepo is intentionally narrow: each project should solve a concrete gap in agent reproducibility, regression testing, failure analysis, or benchmark preparation. Contributions are most useful when they strengthen that end-to-end story instead of adding unrelated demos.
6
6
@@ -65,7 +65,7 @@ For monorepo automation checks, the root demo script is often the fastest way to
That demo now writes a root `manifest.json` alongside the per-tool artifacts, which is the best single file to inspect when you want to confirm the end-to-end handoff shape.
If you change CLI output that is documented in the README, examples, or CI workflow, update those references in the same pull request.
@@ -164,6 +164,6 @@ If you want to propose a new project for the monorepo, start by describing:
164
164
- the missing workflow in today's agent tooling
165
165
- why the problem is not already well served by existing OSS
166
166
- the minimal artifact contract and CLI that would make it useful
167
-
- how it would connect to the rest of `AgentEvalKit`
167
+
- how it would connect to the rest of `AgentReliabilityKit`
168
168
169
169
The best proposals usually start small: one tight workflow, one useful artifact, one clear CLI, and one obvious connection to the rest of the toolchain.
Open-source tooling for agent evals, regression testing, trace packaging, failure clustering, and dataset slicing.
8
8
9
-
`AgentEvalKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
9
+
`AgentReliabilityKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
10
10
11
11
## Why this exists
12
12
@@ -20,7 +20,7 @@ This repo is built around that loop:
20
20
4. cluster repeated failures across runs or releases
21
21
5. slice the same artifact into reproducible datasets
22
22
23
-
That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a general agent framework.
23
+
That makes `AgentReliabilityKit` closer to an eval-and-reliability toolkit than a general agent framework.
24
24
25
25
## What you get
26
26
@@ -33,7 +33,7 @@ That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a gener
Copy file name to clipboardExpand all lines: SECURITY.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Scope
4
4
5
-
`AgentEvalKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
5
+
`AgentReliabilityKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
6
6
7
7
- secret leakage or incomplete redaction in `TracePack`
8
8
- unsafe artifact handling in `AgentCI`, `FailMap`, or `PackSlice`
0 commit comments