Chinchill-AI
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 2 additions & 0 deletions b/‎.github/workflows/test.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 34 additions & 25 deletions b/‎CLAUDE.md‎
Lines changed: 34 additions & 25 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/TESTING.md‎
Lines changed: 78 additions & 0 deletions b/‎docs/TESTING.md‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎docs/UPSTREAM_SYNC.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/UPSTREAM_SYNC.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 3 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 3 deletions
@@ -23,5 +23,7 @@ jobs:
         run: uv run ruff check src/
       - name: Format check
         run: uv run ruff format --check src/
+      - name: Test quality audit
+        run: uv run python scripts/audit_test_quality.py
       - name: Run tests
         run: uv run pytest tests/ -q --tb=short --cov=chat_sdk --cov-fail-under=70
@@ -17,19 +17,33 @@ Python port of Vercel Chat SDK. Multi-platform async chat framework.
 - `src/chat_sdk/adapters/` -- 8 platform adapters
 - `src/chat_sdk/shared/` -- Markdown parser, format converter, streaming renderer
 - `src/chat_sdk/state/` -- Memory, Redis, Postgres backends
-- `tests/` -- 2,477+ tests
-
-## Critical Rules
-1. **Never use `datetime.utcnow()`** -- use `datetime.now(tz=timezone.utc)`
-2. **Never use `asyncio.ensure_future`** -- use `asyncio.get_running_loop().create_task()`
-3. **Never pass raw dicts to `self._chat.process_*`** -- use typed dataclasses (ActionEvent, ReactionEvent, etc.)
-4. **Never use camelCase keys in dispatch dicts** -- always snake_case
-5. **Never use `random.choices` for security tokens** -- use `secrets.token_hex`
-6. **Never import optional deps at module level** -- lazy import inside functions
-7. **Always use `hmac.compare_digest` for signature verification** -- never `==`
-8. **Always use `is not None` for empty-string-valid fields** -- never `or`
-9. **Always validate external URLs before HTTP requests** (SSRF prevention)
-10. **Always check `extend_lock` return value** in processing loops
+- `tests/` -- 3,267 tests
+
+## Principles
+
+1. **Every test must fail when the code is wrong.** No `assert True` stubs, no
+   bare truthiness checks when specific values are available, no MagicMock where
+   AsyncMock is needed. If a test can't catch a regression, it's not a test.
+2. **Every async call must be awaited.** Unawaited coroutines silently return
+   truthy objects. Use AsyncMock (not MagicMock) in tests to surface these.
+3. **No two tests should verify the same thing.** Duplicates inflate counts
+   without catching more bugs.
+
+## Port Rules (TS → Python)
+
+These are specific patterns that broke during the port. The principles above
+explain *why*; these explain *what to watch for*.
+
+- `datetime.utcnow()` → `datetime.now(tz=UTC)` (deprecated, naive)
+- `asyncio.ensure_future` → `loop.create_task()` (deprecated)
+- Raw dicts to `process_*` → typed dataclasses (ActionEvent, etc.)
+- camelCase dispatch keys → snake_case
+- `random.choices` for tokens → `secrets.token_hex`
+- Optional deps at module level → lazy import
+- `==` for signatures → `hmac.compare_digest`
+- `or` for empty-string-valid fields → `is not None`
+- Validate external URLs before requests (SSRF)
+- Check `extend_lock` return value in loops
 
 ## Adding a New Adapter
 See docs/ARCHITECTURE.md and CONTRIBUTING.md.
@@ -42,17 +56,12 @@ See docs/UPSTREAM_SYNC.md for TS->Python translation patterns.
 - StreamingMarkdownRenderer's _remend is simplified vs the npm `remend` library
 - No setext headings, no footnotes, no HTML nodes in the parser
 
-## Test Fidelity Verification
+## Test Quality
 
-After modifying or adding tests, run:
-```bash
-python3 scripts/verify_test_fidelity.py
-```
-This verifies every TS `it("...")` test has a matching Python `def test_...()`.
-The script must show `0 missing` before committing test changes.
+**CI runs `scripts/audit_test_quality.py` before tests.** It catches phantoms,
+async mock bugs, and cross-file duplicates. PRs that introduce hard failures
+will not pass CI.
 
-When porting a new TS test file:
-1. Add the mapping to `scripts/verify_test_fidelity.py` MAPPING dict
-2. Run with `--fix` to generate stubs
-3. Translate each stub by reading the TS test body line-by-line
-4. Verify with the script before committing
+**Fidelity check** (`scripts/verify_test_fidelity.py`) verifies every TS
+`it("...")` has a matching Python `def test_*()`. Name match ≠ faithful port —
+the audit script catches the quality side.
@@ -2,7 +2,7 @@
 
 Multi-platform async chat SDK for Python. Port of [Vercel Chat](https://github.com/vercel/chat).
 
-> **Status: Alpha (0.0.1a10)** — API may change. Not yet tested in production.
+> **Status: Alpha (0.0.1a11)** — API may change. Not yet tested in production.
 
 ## Why chat-sdk?
 
 
@@ -161,6 +161,84 @@ The `asyncio_mode = "auto"` setting means all `async def test_*` functions are a
        assert result.status == 401
    ```
 
+## Test Quality Invariants
+
+Rules learned from past bugs in the TS→Python port process:
+
+### 1. Name match ≠ faithful port
+
+The fidelity script (`scripts/verify_test_fidelity.py`) only checks that a matching
+`def test_*` exists for each TS `it("...")`. It does **not** check assertion quality.
+A test with `assert True` satisfies the fidelity checker but tests nothing.
+
+**Rule:** After porting, every test MUST have real assertions. The only acceptable
+`assert True` tests are JSX-specific ones that have no Python equivalent (currently 3).
+
+### 2. Never use `MagicMock` for async methods
+
+If the real method is `async def`, the mock **must** be `AsyncMock`. `MagicMock`
+returns a truthy Mock object instead of a coroutine. This hides missing `await` in
+production code — the test passes, but production silently gets coroutine objects
+instead of values.
+
+**Example of the bug we found:**
+```python
+# BAD — state.get is async, MagicMock returns Mock not coroutine
+state.get = MagicMock(side_effect=lambda k: cache.get(k))
+
+# GOOD
+state.get = AsyncMock(side_effect=lambda k: cache.get(k))
+```
+
+### 3. Check for duplicates before adding tests
+
+When agents work in parallel on overlapping scopes, they may write the same test
+in different files. Before committing new test files, scan for identical test names
+and bodies across the suite:
+
+```bash
+# Quick check for cross-file exact duplicates
+uv run python -c "
+import ast, os, collections
+bodies = collections.defaultdict(list)
+for root, _, files in os.walk('tests'):
+    for f in sorted(files):
+        if not f.startswith('test_') or not f.endswith('.py'): continue
+        path = os.path.join(root, f)
+        src = open(path).read()
+        for node in ast.walk(ast.parse(src)):
+            if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): continue
+            if not node.name.startswith('test_'): continue
+            body = '\n'.join(src.split('\n')[node.lineno-1:node.end_lineno]).strip()
+            if len(body) > 50: bodies[body].append(f'{path}:{node.lineno} {node.name}')
+for body, locs in bodies.items():
+    files = set(l.split(':')[0] for l in locs)
+    if len(files) > 1:
+        print(f'DUPLICATE: {locs[0].split(\" \")[1]}')
+        for l in locs: print(f'  {l}')
+"
+```
+
+### 4. Phantom absorber audit
+
+After any test changes, run this to catch `assert True`-only tests:
+
+```bash
+uv run python -c "
+import ast, os
+for root, _, files in os.walk('tests'):
+    for f in sorted(files):
+        if not f.startswith('test_') or not f.endswith('.py'): continue
+        path = os.path.join(root, f)
+        for node in ast.walk(ast.parse(open(path).read())):
+            if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): continue
+            if not node.name.startswith('test_'): continue
+            stmts = [s for s in node.body if not (isinstance(s, ast.Expr) and isinstance(s.value, ast.Constant))]
+            if len(stmts) == 1 and isinstance(stmts[0], ast.Assert) and isinstance(stmts[0].test, ast.Constant) and stmts[0].test.value is True:
+                print(f'PHANTOM: {path}:{node.lineno} {node.name}')
+"
+```
+
 ## Known Coverage Gaps
 
 The following modules are under 60% coverage as of the initial alpha release. These are tracked for improvement:
 
@@ -75,7 +75,7 @@ These are intentionally different from TS:
 
 These exist only in the Python port and have no TS equivalent:
 
-- `shared/errors.py`: Typed adapter error hierarchy (`AdapterRateLimitError`, `AuthenticationError`, `ValidationError`, `NetworkError`, `ResourceNotFoundError`, `PermissionError`). TS throws plain `Error` objects.
+- `shared/errors.py`: Typed adapter error hierarchy (`AdapterRateLimitError`, `AuthenticationError`, `ValidationError`, `NetworkError`, `ResourceNotFoundError`, `AdapterPermissionError`). TS throws plain `Error` objects.
 - `testing/__init__.py` + `shared/mock_adapter.py`: Test utilities with `MockAdapter`, `MockStateAdapter`, `create_test_message()`.
 - `from __future__ import annotations` everywhere: Enables PEP 604 union syntax (`X | Y`) on Python 3.10.
 - Input validation on adapter config dataclasses (e.g., rejecting empty `signing_secret`).
 
@@ -34,8 +34,7 @@ discord = ["pynacl>=1.5", "aiohttp>=3.9"]
 teams = ["aiohttp>=3.9"]
 telegram = ["aiohttp>=3.9"]
 whatsapp = ["aiohttp>=3.9"]
-google-chat = ["aiohttp>=3.9",
-    "pyjwt[crypto]>=2.8", "google-auth>=2.0", "pyjwt>=2.8"]
+google-chat = ["aiohttp>=3.9", "pyjwt[crypto]>=2.8", "google-auth>=2.0"]
 linear = ["aiohttp>=3.9"]
 all = [
     "slack-sdk>=3.27.0",
@@ -45,7 +44,6 @@ all = [
     "cryptography>=42.0",
     "pynacl>=1.5",
     "aiohttp>=3.9",
-    "pyjwt[crypto]>=2.8",
     "google-auth>=2.0",
 ]