Rename the repo to better match the actual product scope

weiyi · weiyi · commit 02c5200550d1 · 2026-05-03T23:19:41.000+08:00
The public repository has grown beyond an eval-only story: it now covers
replay, regression testing, trace packaging, failure analysis, and
dataset slicing for LLM agents. The old name still undersold that broader
reliability workflow, so this change renames the repo surface to
AgentReliabilityKit and aligns the visible docs, assets, and automation
paths with the new positioning.

Constraint: Keep the existing tool names and artifact contracts stable while clarifying the monorepo brand
Rejected: Keep the AgentEvalKit name and only tweak the description | still too narrow for the shipped functionality
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future top-level naming and copy aligned with the full reliability loop, not just evaluation in isolation
Tested: AgentCI / TracePack / PackSlice test suites; root automation demo script; GitHub repo rename and issue closure verification
Not-tested: FailMap suite on this final rename-only pass
diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml
@@ -8,4 +8,4 @@ liberapay: ""
 issuehunt: ""
 otechie: ""
 custom:
-  - https://github.com/Jasvina/AgentEvalKit
+  - https://github.com/Jasvina/AgentReliabilityKit
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -1,11 +1,11 @@
 blank_issues_enabled: false
 contact_links:
   - name: Roadmap and project direction
-    url: https://github.com/Jasvina/AgentEvalKit/blob/main/ROADMAP.md
+    url: https://github.com/Jasvina/AgentReliabilityKit/blob/main/ROADMAP.md
     about: Read the public roadmap before proposing broad new directions.
   - name: Discussions
-    url: https://github.com/Jasvina/AgentEvalKit/discussions
+    url: https://github.com/Jasvina/AgentReliabilityKit/discussions
     about: Use Discussions for open-ended questions, ideas, and design conversations.
   - name: Security reporting
-    url: https://github.com/Jasvina/AgentEvalKit/blob/main/SECURITY.md
+    url: https://github.com/Jasvina/AgentReliabilityKit/blob/main/SECURITY.md
     about: Please report undisclosed security issues privately.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -27,6 +27,6 @@ Describe the smallest useful feature that would address it.
 
 What file, output, or CLI shape would this add or change?
 
-## Why this belongs in AgentEvalKit
+## Why this belongs in AgentReliabilityKit
 
 Explain why this strengthens the eval / regression / failure-analysis story rather than adding a generic demo.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -128,14 +128,14 @@ jobs:
         with:
           python-version: "3.11"
       - name: Run monorepo automation demo
-        run: ./scripts/run_automation_demo.sh /tmp/agentevalkit-automation-demo
+        run: ./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-automation-demo
       - name: Validate automation outputs
         run: |
           python - <<'PY'
           import json
           from pathlib import Path
 
-          out = Path("/tmp/agentevalkit-automation-demo")
+          out = Path("/tmp/agentreliabilitykit-automation-demo")
           assert (out / "manifest.json").exists()
           assert (out / "agentci-summary.json").exists()
           assert (out / "agentci-regression.json").exists()
@@ -144,7 +144,7 @@ jobs:
           assert (out / "packslice" / "summary.json").exists()
 
           demo_manifest = json.loads((out / "manifest.json").read_text())
-          assert demo_manifest["format"] == "agentevalkit-demo-v1"
+          assert demo_manifest["format"] == "agentreliabilitykit-demo-v1"
           assert demo_manifest["summary"]["agentci"]["regression_passed"] is True
 
           agentci_summary = json.loads((out / "agentci-summary.json").read_text())
@@ -167,5 +167,5 @@ jobs:
       - name: Upload automation demo artifacts
         uses: actions/upload-artifact@v4
         with:
-          name: agentevalkit-automation-demo
-          path: /tmp/agentevalkit-automation-demo
+          name: agentreliabilitykit-automation-demo
+          path: /tmp/agentreliabilitykit-automation-demo
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,6 @@
 # Changelog
 
-All notable changes to `AgentEvalKit` will be documented in this file.
+All notable changes to `AgentReliabilityKit` will be documented in this file.
 
 The project currently tracks a single public line of development on `main`, with GitHub releases used to mark meaningful public milestones in the repo's evolution.
 
@@ -10,7 +10,7 @@ Initial public toolkit release for the repo in its current form.
 
 ### Added
 
-- clarified monorepo positioning as `AgentEvalKit`
+- clarified monorepo positioning as `AgentReliabilityKit`
 - root automation demo with machine-readable `manifest.json`
 - GitHub-facing repository docs and community health files
 - issue templates, PR template, roadmap, labels, Discussions, funding metadata, and code ownership metadata
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -2,7 +2,7 @@
 
 ## Our Commitment
 
-We want `AgentEvalKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
+We want `AgentReliabilityKit` to be a useful, welcoming open-source project for people working on agent evals, infrastructure, reliability, and research tooling.
 
 Contributors, maintainers, and community members are expected to keep interactions respectful, constructive, and focused on improving the work.
 
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
-# Contributing to AgentEvalKit
+# Contributing to AgentReliabilityKit
 
-Thanks for checking out `AgentEvalKit`.
+Thanks for checking out `AgentReliabilityKit`.
 
 This monorepo is intentionally narrow: each project should solve a concrete gap in agent reproducibility, regression testing, failure analysis, or benchmark preparation. Contributions are most useful when they strengthen that end-to-end story instead of adding unrelated demos.
 
@@ -65,7 +65,7 @@ For monorepo automation checks, the root demo script is often the fastest way to
 
 ```bash
 chmod +x scripts/run_automation_demo.sh
-./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
+./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
 ```
 
 That demo now writes a root `manifest.json` alongside the per-tool artifacts, which is the best single file to inspect when you want to confirm the end-to-end handoff shape.
@@ -117,7 +117,7 @@ cd projects/packslice && python -m unittest discover -s tests -v
 End-to-end validation:
 
 ```bash
-./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
+./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
 ```
 
 If you change CLI output that is documented in the README, examples, or CI workflow, update those references in the same pull request.
@@ -164,6 +164,6 @@ If you want to propose a new project for the monorepo, start by describing:
 - the missing workflow in today's agent tooling
 - why the problem is not already well served by existing OSS
 - the minimal artifact contract and CLI that would make it useful
-- how it would connect to the rest of `AgentEvalKit`
+- how it would connect to the rest of `AgentReliabilityKit`
 
 The best proposals usually start small: one tight workflow, one useful artifact, one clear CLI, and one obvious connection to the rest of the toolchain.
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
-# AgentEvalKit
+# AgentReliabilityKit
 
-[![CI](https://github.com/Jasvina/AgentEvalKit/actions/workflows/ci.yml/badge.svg)](https://github.com/Jasvina/AgentEvalKit/actions/workflows/ci.yml)
-[![License](https://img.shields.io/github/license/Jasvina/AgentEvalKit)](LICENSE)
-[![Monorepo](https://img.shields.io/badge/layout-agent%20tooling%20monorepo-0a7bbb)](https://github.com/Jasvina/AgentEvalKit)
+[![CI](https://github.com/Jasvina/AgentReliabilityKit/actions/workflows/ci.yml/badge.svg)](https://github.com/Jasvina/AgentReliabilityKit/actions/workflows/ci.yml)
+[![License](https://img.shields.io/github/license/Jasvina/AgentReliabilityKit)](LICENSE)
+[![Monorepo](https://img.shields.io/badge/layout-agent%20tooling%20monorepo-0a7bbb)](https://github.com/Jasvina/AgentReliabilityKit)
 
 Open-source tooling for agent evals, regression testing, trace packaging, failure clustering, and dataset slicing.
 
-`AgentEvalKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
+`AgentReliabilityKit` is a focused monorepo for a specific gap in the LLM agent stack: teams can build agents, but still struggle to replay failures, turn real traces into reusable eval assets, cluster recurring failure modes, and produce stable train/eval/test slices from the same evidence.
 
 ## Why this exists
 
@@ -20,7 +20,7 @@ This repo is built around that loop:
 4. cluster repeated failures across runs or releases
 5. slice the same artifact into reproducible datasets
 
-That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a general agent framework.
+That makes `AgentReliabilityKit` closer to an eval-and-reliability toolkit than a general agent framework.
 
 ## What you get
 
@@ -33,7 +33,7 @@ That makes `AgentEvalKit` closer to an eval-and-reliability toolkit than a gener
 ## Toolchain at a glance
 
 <p align="center">
-  <img src="docs/assets/agentevalkit-overview.svg" alt="AgentEvalKit architecture overview" width="100%" />
+  <img src="docs/assets/agentreliabilitykit-overview.svg" alt="AgentReliabilityKit architecture overview" width="100%" />
 </p>
 
 ```text
@@ -46,13 +46,13 @@ PackSlice -> split packs into balanced train/eval/test datasets
 ## What the demo produces
 
 <p align="center">
-  <img src="docs/assets/agentevalkit-demo-terminal.svg" alt="AgentEvalKit terminal-style demo output" width="100%" />
+  <img src="docs/assets/agentreliabilitykit-demo-terminal.svg" alt="AgentReliabilityKit terminal-style demo output" width="100%" />
 </p>
 
 Run the end-to-end repo demo with:
 
 ```bash
-./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
+./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
 ```
 
 The output is intentionally machine-readable. A successful run gives you a root `manifest.json` plus per-tool artifacts:
@@ -183,7 +183,7 @@ The most useful agent infra repos are usually:
 3. demoable in a few minutes
 4. useful to both researchers and production teams
 
-`AgentEvalKit` is built around that rule.
+`AgentReliabilityKit` is built around that rule.
 
 ## Monorepo structure
 
@@ -207,7 +207,7 @@ projects/
 - public roadmap: `ROADMAP.md`
 - changelog: `CHANGELOG.md`
 - discussions: GitHub Discussions
-- social preview source: `docs/assets/agentevalkit-social-preview.svg`
+- social preview source: `docs/assets/agentreliabilitykit-social-preview.svg`
 
 ## Roadmap
 
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -1,6 +1,6 @@
 # Roadmap
 
-`AgentEvalKit` is a toolkit for the agent reliability loop:
+`AgentReliabilityKit` is a toolkit for the agent reliability loop:
 
 1. capture real runs
 2. replay and diff them
diff --git a/SECURITY.md b/SECURITY.md
@@ -2,7 +2,7 @@
 
 ## Scope
 
-`AgentEvalKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
+`AgentReliabilityKit` is a public toolkit for agent eval, regression testing, trace packaging, failure clustering, and dataset slicing. Security reports are especially helpful when they involve:
 
 - secret leakage or incomplete redaction in `TracePack`
 - unsafe artifact handling in `AgentCI`, `FailMap`, or `PackSlice`
diff --git a/docs/assets/agentreliabilitykit-demo-terminal.svg b/docs/assets/agentreliabilitykit-demo-terminal.svg
@@ -1,6 +1,6 @@
 <svg width="1280" height="900" viewBox="0 0 1280 900" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
-  <title id="title">AgentEvalKit quick demo output</title>
-  <desc id="desc">A terminal-style screenshot showing the AgentEvalKit automation demo command, generated artifacts, and summary metrics from AgentCI, TracePack, FailMap, and PackSlice.</desc>
+  <title id="title">AgentReliabilityKit quick demo output</title>
+  <desc id="desc">A terminal-style screenshot showing the AgentReliabilityKit automation demo command, generated artifacts, and summary metrics from AgentCI, TracePack, FailMap, and PackSlice.</desc>
   <defs>
     <linearGradient id="bg" x1="0" y1="0" x2="1280" y2="900" gradientUnits="userSpaceOnUse">
       <stop stop-color="#0B1220"/>
@@ -26,18 +26,18 @@
   <circle cx="95" cy="83" r="10" fill="#FF5F57"/>
   <circle cx="127" cy="83" r="10" fill="#FEBC2E"/>
   <circle cx="159" cy="83" r="10" fill="#28C840"/>
-  <text x="550" y="90" fill="#DDE7F5" font-family="Inter, Arial, sans-serif" font-size="22" font-weight="600">AgentEvalKit quick demo</text>
+  <text x="550" y="90" fill="#DDE7F5" font-family="Inter, Arial, sans-serif" font-size="22" font-weight="600">AgentReliabilityKit quick demo</text>
 
   <rect x="92" y="146" width="1096" height="82" rx="18" fill="#0A1322" stroke="#23324B"/>
   <text x="118" y="182" fill="#8BB4FF" font-family="Menlo, Consolas, monospace" font-size="21">$</text>
-  <text x="144" y="182" fill="#E8EEF8" font-family="Menlo, Consolas, monospace" font-size="21">./scripts/run_automation_demo.sh /tmp/agentevalkit-demo</text>
+  <text x="144" y="182" fill="#E8EEF8" font-family="Menlo, Consolas, monospace" font-size="21">./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo</text>
   <text x="118" y="212" fill="#8FA6BF" font-family="Inter, Arial, sans-serif" font-size="16">Runs the whole monorepo chain and emits CI-friendly artifacts without scraping human logs.</text>
 
   <text x="92" y="278" fill="#EEF5FF" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Generated outputs</text>
   <text x="92" y="306" fill="#8FA6BF" font-family="Inter, Arial, sans-serif" font-size="16">These are the same outputs the root GitHub Actions workflow validates and uploads as artifacts.</text>
 
   <rect x="92" y="334" width="512" height="234" rx="22" fill="#0A1322" stroke="#22324B"/>
-  <text x="122" y="376" fill="#9CC7FF" font-family="Menlo, Consolas, monospace" font-size="18">/tmp/agentevalkit-demo</text>
+  <text x="122" y="376" fill="#9CC7FF" font-family="Menlo, Consolas, monospace" font-size="18">/tmp/agentreliabilitykit-demo</text>
   <text x="122" y="412" fill="#D6E2F2" font-family="Menlo, Consolas, monospace" font-size="17">├── agentci-summary.json</text>
   <text x="122" y="442" fill="#D6E2F2" font-family="Menlo, Consolas, monospace" font-size="17">├── agentci-regression.json</text>
   <text x="122" y="472" fill="#D6E2F2" font-family="Menlo, Consolas, monospace" font-size="17">├── tracepack-pack/manifest.json</text>
diff --git a/docs/assets/agentreliabilitykit-overview.svg b/docs/assets/agentreliabilitykit-overview.svg
@@ -1,5 +1,5 @@
 <svg width="1280" height="760" viewBox="0 0 1280 760" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
-  <title id="title">AgentEvalKit architecture overview</title>
+  <title id="title">AgentReliabilityKit architecture overview</title>
   <desc id="desc">A pipeline diagram showing AgentCI, TracePack, FailMap, and PackSlice connected by portable JSON artifacts and CI automation.</desc>
   <defs>
     <linearGradient id="bg" x1="0" y1="0" x2="1280" y2="760" gradientUnits="userSpaceOnUse">
@@ -37,7 +37,7 @@
   <rect width="1280" height="760" rx="32" fill="url(#bg)"/>
 
   <rect x="48" y="40" width="1184" height="108" rx="24" fill="url(#hero)" filter="url(#shadow)"/>
-  <text x="88" y="88" fill="white" font-family="Inter, Arial, sans-serif" font-size="34" font-weight="700">AgentEvalKit</text>
+  <text x="88" y="88" fill="white" font-family="Inter, Arial, sans-serif" font-size="34" font-weight="700">AgentReliabilityKit</text>
   <text x="88" y="124" fill="#DCEEFF" font-family="Inter, Arial, sans-serif" font-size="18">
     Reproducibility, regression testing, failure clustering, and eval-pack preparation for tool-using LLM agents
   </text>
diff --git a/docs/assets/agentreliabilitykit-social-preview.svg b/docs/assets/agentreliabilitykit-social-preview.svg
@@ -1,6 +1,6 @@
 <svg width="1280" height="640" viewBox="0 0 1280 640" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
-  <title id="title">AgentEvalKit social preview</title>
-  <desc id="desc">A GitHub social preview card for AgentEvalKit showing an eval loop, starter status chips, and the four-tool artifact workflow.</desc>
+  <title id="title">AgentReliabilityKit social preview</title>
+  <desc id="desc">A GitHub social preview card for AgentReliabilityKit showing an eval loop, starter status chips, and the four-tool artifact workflow.</desc>
   <defs>
     <linearGradient id="bg" x1="0" y1="0" x2="1280" y2="640" gradientUnits="userSpaceOnUse">
       <stop stop-color="#0B1220"/>
diff --git a/docs/automation.md b/docs/automation.md
@@ -1,6 +1,6 @@
 # Automation Guide
 
-This guide shows how to automate the full `AgentEvalKit` toolchain in a way that works for local scripts, CI jobs, and later dashboards:
+This guide shows how to automate the full `AgentReliabilityKit` toolchain in a way that works for local scripts, CI jobs, and later dashboards:
 
 ```text
 AgentCI   -> record or validate episodes
@@ -35,7 +35,7 @@ For automation, `PYTHONPATH=src python -m ...` is often the simplest because it
 
 Examples in this doc assume:
 
-- repo root: `AgentEvalKit/`
+- repo root: `AgentReliabilityKit/`
 - Python 3.10+
 - commands run from the relevant project directory, or with explicit `cd`
 
@@ -58,14 +58,14 @@ It writes a timestamped output directory under `/tmp` by default and produces:
 To write into a fixed directory instead:
 
 ```bash
-./scripts/run_automation_demo.sh /tmp/agentevalkit-demo
+./scripts/run_automation_demo.sh /tmp/agentreliabilitykit-demo
 ```
 
 ### Root demo manifest contract
 
 The root `manifest.json` produced by `./scripts/run_automation_demo.sh` is intended to be the stable entrypoint for downstream automation. Its current contract is:
 
-- `format`: the top-level manifest format identifier, currently `agentevalkit-demo-v1`
+- `format`: the top-level manifest format identifier, currently `agentreliabilitykit-demo-v1`
 - `generated_at`: UTC timestamp for the demo run
 - `artifact_root`: absolute path to the generated demo directory
 - `toolchain`: ordered list of the tool names included in the run
@@ -134,18 +134,18 @@ PYTHONPATH=src python3 -m tracepack.cli scan \
 
 PYTHONPATH=src python3 -m tracepack.cli build \
   examples/source_episodes \
-  /tmp/agentevalkit-demo/tracepack-pack \
+  /tmp/agentreliabilitykit-demo/tracepack-pack \
   --only-failures \
   --redact \
   --max-per-signature 2 \
   --json
 
 PYTHONPATH=src python3 -m tracepack.cli inspect \
-  /tmp/agentevalkit-demo/tracepack-pack \
+  /tmp/agentreliabilitykit-demo/tracepack-pack \
   --json
 
 PYTHONPATH=src python3 -m tracepack.cli validate \
-  /tmp/agentevalkit-demo/tracepack-pack \
+  /tmp/agentreliabilitykit-demo/tracepack-pack \
   --json
 ```
 
@@ -170,12 +170,12 @@ FailMap reads the TracePack output and turns it into a triage-oriented snapshot.
 cd projects/failmap
 
 PYTHONPATH=src python3 -m failmap.cli cluster \
-  /tmp/agentevalkit-demo/tracepack-pack \
-  /tmp/agentevalkit-demo/failmap-clusters.json \
+  /tmp/agentreliabilitykit-demo/tracepack-pack \
+  /tmp/agentreliabilitykit-demo/failmap-clusters.json \
   --json
 
 PYTHONPATH=src python3 -m failmap.cli summarize \
-  /tmp/agentevalkit-demo/failmap-clusters.json \
+  /tmp/agentreliabilitykit-demo/failmap-clusters.json \
   --json
 ```
 
@@ -213,16 +213,16 @@ PackSlice works directly on the TracePack artifact, so you can prepare datasets
 cd projects/packslice
 
 PYTHONPATH=src python3 -m packslice.cli split \
-  /tmp/agentevalkit-demo/tracepack-pack \
-  /tmp/agentevalkit-demo/packslice \
+  /tmp/agentreliabilitykit-demo/tracepack-pack \
+  /tmp/agentreliabilitykit-demo/packslice \
   --group-by signature \
   --train-ratio 70 \
   --eval-ratio 15 \
   --test-ratio 15 \
   --json
 
 PYTHONPATH=src python3 -m packslice.cli summarize \
-  /tmp/agentevalkit-demo/packslice \
+  /tmp/agentreliabilitykit-demo/packslice \
   --json
 ```
 
@@ -264,13 +264,13 @@ You do not need `jq`; plain Python works everywhere GitHub Actions already has P
 agentci summarize examples/openai_agents_episode.json --json > summary.json
 python -c "import json; data=json.load(open('summary.json')); assert data['tool_calls'] >= 1"
 
-tracepack inspect /tmp/agentevalkit-demo/tracepack-pack --json > inspect.json
+tracepack inspect /tmp/agentreliabilitykit-demo/tracepack-pack --json > inspect.json
 python -c "import json; data=json.load(open('inspect.json')); assert data['case_count'] >= 1"
 
 failmap compare-summary compare.json --json > compare-summary.json
 python -c "import json; data=json.load(open('compare-summary.json')); assert 'summary' in data"
 
-packslice summarize /tmp/agentevalkit-demo/packslice --json > split-summary.json
+packslice summarize /tmp/agentreliabilitykit-demo/packslice --json > split-summary.json
 python -c "import json; data=json.load(open('split-summary.json')); assert len(data['splits']) == 3"
 ```
 
diff --git a/scripts/run_automation_demo.sh b/scripts/run_automation_demo.sh
@@ -3,7 +3,7 @@ set -euo pipefail
 
 ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 TIMESTAMP="$(date +%Y%m%d-%H%M%S)"
-OUTPUT_DIR="${1:-${TMPDIR:-/tmp}/agentevalkit-automation-demo-${TIMESTAMP}}"
+OUTPUT_DIR="${1:-${TMPDIR:-/tmp}/agentreliabilitykit-automation-demo-${TIMESTAMP}}"
 
 mkdir -p "$OUTPUT_DIR"
 
@@ -90,7 +90,7 @@ failmap_cluster = load_json("failmap-cluster.json")
 packslice_summary = load_json("packslice-summary.json")
 
 manifest = {
-    "format": "agentevalkit-demo-v1",
+    "format": "agentreliabilitykit-demo-v1",
     "generated_at": datetime.now(timezone.utc).isoformat(),
     "artifact_root": str(out),
     "toolchain": ["AgentCI", "TracePack", "FailMap", "PackSlice"],