Skip to content

Commit c36c8aa

Browse files
ci(fidelity): wire strict fidelity check into lint.yml (#72)
Enforces strict upstream parity for mapped core files in CI. Closes #53. - scripts/verify_test_fidelity.py: --strict mode (default) fails on any missing test, --update-baseline writes the file with dynamic ts_parity from UPSTREAM_PARITY - Fails cleanly when upstream checkout is missing (no silent skip-and-exit-0) - Validates baseline ts_parity against UPSTREAM_PARITY to catch drift after upstream bumps - lint.yml: clones vercel/chat@4.26.0 to /tmp/vercel-chat then runs --strict; clone step is required (no continue-on-error) - fidelity_baseline.json: empty, ships at zero-missing for mapped core files (8 of 17 packages/chat/src/*.test.ts) - Follow-ups: #78 (MAPPING expansion), #79 (SHA pin clone), #80 (fuzzy matcher hyphen)
1 parent 3698ddb commit c36c8aa

9 files changed

Lines changed: 376 additions & 18 deletions

File tree

.github/workflows/lint.yml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,19 @@ jobs:
5757
continue-on-error: true
5858
run: uv run python scripts/audit_test_quality.py
5959

60+
- name: Clone upstream vercel/chat at pinned parity tag
61+
id: clone_upstream
62+
run: |
63+
git clone --depth 1 --branch chat@4.26.0 \
64+
https://github.com/vercel/chat.git /tmp/vercel-chat
65+
66+
- name: Test fidelity check (strict — zero missing in mapped core files)
67+
id: fidelity
68+
continue-on-error: true
69+
env:
70+
TS_ROOT: /tmp/vercel-chat
71+
run: uv run python scripts/verify_test_fidelity.py --strict
72+
6073
- name: Pyrefly type check
6174
id: pyrefly
6275
continue-on-error: true
@@ -75,6 +88,7 @@ jobs:
7588
echo "| Ruff check | ${{ steps.ruff_check.outcome }} |" >> $GITHUB_STEP_SUMMARY
7689
echo "| Ruff format | ${{ steps.ruff_format.outcome }} |" >> $GITHUB_STEP_SUMMARY
7790
echo "| Test audit | ${{ steps.audit.outcome }} |" >> $GITHUB_STEP_SUMMARY
91+
echo "| Test fidelity | ${{ steps.fidelity.outcome }} |" >> $GITHUB_STEP_SUMMARY
7892
echo "| Pyrefly | ${{ steps.pyrefly.outcome }} |" >> $GITHUB_STEP_SUMMARY
7993
echo "" >> $GITHUB_STEP_SUMMARY
8094
if [ "${{ steps.pyrefly.outcome }}" = "success" ]; then
@@ -89,10 +103,11 @@ jobs:
89103
RUFF_CHECK: ${{ steps.ruff_check.outcome }}
90104
RUFF_FORMAT: ${{ steps.ruff_format.outcome }}
91105
AUDIT: ${{ steps.audit.outcome }}
106+
FIDELITY: ${{ steps.fidelity.outcome }}
92107
PYREFLY: ${{ steps.pyrefly.outcome }}
93108
run: |
94109
failures=0
95-
for var in RUFF_CHECK RUFF_FORMAT AUDIT PYREFLY; do
110+
for var in RUFF_CHECK RUFF_FORMAT AUDIT FIDELITY PYREFLY; do
96111
outcome="${!var}"
97112
if [ "$outcome" != "success" ]; then
98113
echo "$var failed (outcome: $outcome)"

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,18 @@ Parity catch-up with upstream `4.26.0`. No upstream version change.
9595
(`test_memory_state.py`, `test_state_postgres.py`). Closes the same
9696
flaky-test hazard fixed for the Redis backend in PR #73.
9797

98+
### CI / Internals
99+
100+
- `verify_test_fidelity.py` now enforces against upstream on every PR
101+
(`.github/workflows/lint.yml`); fails when the upstream clone is missing
102+
or when any mapped TS file can't be found. Workflow runs `--strict` and
103+
the clone step no longer carries `continue-on-error: true`, so infra
104+
failures surface immediately at the job level. Baseline shipped empty
105+
(all previously-missing tests ported in this release) — strict fidelity
106+
for *mapped core files* (8 of 17 `packages/chat/src/*.test.ts` files;
107+
see the `MAPPING` dict in `scripts/verify_test_fidelity.py` for the
108+
authoritative scope list). Closes #53.
109+
98110
## 0.4.26.1 (2026-04-23)
99111

100112
Python-only follow-up on `0.4.26`. Still alpha — APIs may change.

CLAUDE.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,5 +105,22 @@ async mock bugs, and cross-file duplicates. PRs that introduce hard failures
105105
will not pass CI.
106106

107107
**Fidelity check** (`scripts/verify_test_fidelity.py`) verifies every TS
108-
`it("...")` has a matching Python `def test_*()`. Must show 0 missing before
109-
committing test changes.
108+
`it("...")` in the mapped core files has a matching Python `def test_*()`,
109+
pinned to `chat@4.26.0`. The `MAPPING` dict in that script is the
110+
authoritative scope list — it currently covers 8 of 17
111+
`packages/chat/src/*.test.ts` files (extending it is tracked as a
112+
follow-up). **CI runs `--strict`** (see `.github/workflows/lint.yml`):
113+
any missing translation in a mapped file fails the build, and a missing
114+
upstream checkout also fails (the script exits non-zero when any mapped
115+
TS file isn't found). Baseline mode (the default without `--strict`) is
116+
retained for local workflows where a few ports land in flight —
117+
regenerate via `--update-baseline` after documenting intentional
118+
divergence in `docs/UPSTREAM_SYNC.md`.
119+
120+
Before the fidelity check can run locally, clone the pinned upstream
121+
checkout (same command CI uses in `lint.yml`):
122+
```bash
123+
git clone --depth 1 --branch chat@4.26.0 \
124+
https://github.com/vercel/chat.git /tmp/vercel-chat
125+
```
126+
Then `TS_ROOT=/tmp/vercel-chat uv run python scripts/verify_test_fidelity.py --strict`.

docs/UPSTREAM_SYNC.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,38 @@ tests. If upstream tests lock in inconsistent behavior, choose one of:
7272
- **Preserve parity** and document the inconsistency in the non-parity section below
7373
- **Intentionally diverge** and document the divergence in the non-parity section
7474

75+
### Test fidelity (strict mode)
76+
77+
`scripts/verify_test_fidelity.py` runs in CI (`.github/workflows/lint.yml`) pinned
78+
to `vercel/chat@4.26.0` (matches the `UPSTREAM_PARITY` constant in
79+
`src/chat_sdk/__init__.py`). **CI runs `--strict`** — the repo ships at 0
80+
missing *for mapped core files* as of `0.4.26.2` and the baseline
81+
(`scripts/fidelity_baseline.json`) is empty. Scope is defined by the
82+
`MAPPING` dict in the script: 8 of 17 `packages/chat/src/*.test.ts` files
83+
today (extending to the remaining 9 is tracked as a follow-up). Unmapped
84+
files are not checked — tightening scope requires editing `MAPPING` and
85+
re-running `--strict`.
86+
87+
Infra guardrails:
88+
89+
- The workflow's `Clone upstream vercel/chat at pinned parity tag` step does
90+
**not** use `continue-on-error` — a failed clone aborts the job loudly.
91+
- The script itself fails with exit 1 if any mapped TS file is missing under
92+
`TS_ROOT` (defense in depth against silent skips).
93+
94+
Workflows:
95+
96+
| Goal | Command |
97+
|------|---------|
98+
| Port a missing test | Write the Python test and land it; CI rejects anything that re-introduces a gap |
99+
| Add a Python-only divergence (intentional skip) | Document in [Known Non-Parity](#known-non-parity-with-typescript-sdk), then `--update-baseline` and switch the workflow back to non-strict default for that file if truly unavoidable |
100+
| Upstream sync | After pulling new upstream, run `--strict` — newly-added TS tests appear as missing and CI fails until ported |
101+
| Final parity check | Same as CI: `TS_ROOT=/tmp/vercel-chat uv run python scripts/verify_test_fidelity.py --strict` |
102+
103+
Baseline mode (the default without `--strict`) is retained for local
104+
development where a few ports land in flight. Regenerate the baseline via
105+
`--update-baseline` rather than hand-editing.
106+
75107
## Divergence Policy
76108

77109
Every divergence from upstream has a cost: merge conflicts on future syncs,

pyproject.toml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,20 @@
22
name = "chat-sdk"
33
version = "0.4.26.2"
44
description = "Multi-platform async chat SDK for Python — port of Vercel Chat"
5+
keywords = [
6+
"chat",
7+
"chatbot",
8+
"chatops",
9+
"slack-bot",
10+
"discord-bot",
11+
"telegram-bot",
12+
"teams-bot",
13+
"whatsapp-bot",
14+
"bot-framework",
15+
"async",
16+
"asyncio",
17+
"vercel",
18+
]
519
readme = "README.md"
620
license = {text = "MIT"}
721
requires-python = ">=3.10"
@@ -16,7 +30,11 @@ classifiers = [
1630
"Programming Language :: Python :: 3.11",
1731
"Programming Language :: Python :: 3.12",
1832
"Programming Language :: Python :: 3.13",
33+
"Topic :: Communications",
1934
"Topic :: Communications :: Chat",
35+
"Topic :: Internet",
36+
"Topic :: Software Development :: Libraries :: Application Frameworks",
37+
"Topic :: Software Development :: Libraries :: Python Modules",
2038
"Typing :: Typed",
2139
]
2240

scripts/fidelity_baseline.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"_comment": "Ratchet-down baseline for scripts/verify_test_fidelity.py. This repo ships at strict fidelity for mapped core files (0 missing) against chat@4.26.0, so the baseline is empty. Scope: the MAPPING dict in scripts/verify_test_fidelity.py is the authoritative list of TS files checked; it currently covers 8 of the 17 packages/chat/src/*.test.ts files. Default CI mode runs --strict via .github/workflows/lint.yml; this file is retained for local workflows that want to opt back into baseline mode (e.g. during an upstream sync where several ports land in flight). To baseline genuinely-divergent tests, run scripts/verify_test_fidelity.py --update-baseline after documenting the divergence in docs/UPSTREAM_SYNC.md.",
3+
"ts_parity": "chat@4.26.0",
4+
"total_ts_tests": 588,
5+
"total_missing": 0,
6+
"missing": {}
7+
}

0 commit comments

Comments
 (0)