Skip to content

Commit fa2af15

Browse files
fix: comprehensive quality audit — 30+ production bugs, test cleanup, lint
Production bugs fixed: - 3 missing await on state.get/delete in Teams adapter - _from_full_stream missing finish-step handling and text-delta field extraction - _fallback_stream not using StreamingMarkdownRenderer (raw text instead) - _fallback_stream edit loop 500ms cancellation latency (Event instead of bool) - OnLockConflict return type bool vs "force"/"drop" string mismatch - MemoryStateAdapter.disconnect() clearing cache (TS preserves it) - options.limit or N truthiness bug across 15 adapter locations - Teams status code string/int comparison + error dict missing statusCode - Slack Retry-After header int() crash on non-integer values - Token refresh race conditions in GChat, Teams, GitHub (added asyncio.Lock) - _remend bold repair producing invalid markdown (**wor* instead of **wor**) - _extract_message_content: PostableCard "[card]", PostableAst str(dict), PostableMarkdown no AST - Slack empty table cell fallback, Teams table cell escaping incomplete - Teams cards missing task/fetch for modal buttons, ButtonElement missing action_type - AI module media_type snake_case keys (should be camelCase for SDK compat) - PermissionError shadowing Python builtin (renamed to AdapterPermissionError) - Slack installation camelCase/snake_case key incompatibility with TS - GChat subscription pending-waiters re-raising errors (TS silently returns) - Telegram or vs is not None for message_thread_id - Linear TypedDict field names not matching actual JSON keys - Teams _handle_teams_error return type None vs NoReturn - from_full_stream silently dropping non-dict StreamChunk objects - stringify_markdown bullet default "-" vs TS "*" Infrastructure improvements: - Session reuse: all 7 adapters share aiohttp.ClientSession per adapter - Task GC pinning: fire-and-forget tasks held in _background_tasks set - Task cancellation on shutdown: Chat.shutdown() cancels active handler tasks - Unbounded cache eviction: Discord, GChat, GitHub caches now bounded - Memory state periodic expired entry purge Test quality: - 76 exact duplicate tests removed - 59 phantom absorbers converted to real tests or removed - 47 all-weak-assertion tests strengthened with specific value checks - 15 new tests for production fixes (race conditions, session reuse, etc.) - 37 warnings eliminated (AsyncMock for sync methods, unclosed sessions, mark.anyio) - 97 unused imports removed, 70 import sorting fixes, 21 manual lint fixes - Fidelity script regex hardened (word boundary, empty-name guard) API and packaging: - Serialization uses camelCase keys matching TS SDK (with snake_case compat) - StateNotConnectedError replaces bare RuntimeError in state adapters - Chat(**kwargs) validates against ChatConfig fields - AdapterError and Thread exported from top-level __init__.py - Adapters use lazy __getattr__ imports (no optional dep failures) - Event handler fields typed (was Any) for IDE autocomplete - ChatConfig field docstrings added - pyproject.toml duplicate deps removed - ruff check src/ tests/ passes clean, ruff format passes clean - audit_test_quality.py added to CI pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2982290 commit fa2af15

141 files changed

Lines changed: 4159 additions & 2785 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,7 @@ jobs:
2323
run: uv run ruff check src/
2424
- name: Format check
2525
run: uv run ruff format --check src/
26+
- name: Test quality audit
27+
run: uv run python scripts/audit_test_quality.py
2628
- name: Run tests
2729
run: uv run pytest tests/ -q --tb=short --cov=chat_sdk --cov-fail-under=70

CLAUDE.md

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,33 @@ Python port of Vercel Chat SDK. Multi-platform async chat framework.
1717
- `src/chat_sdk/adapters/` -- 8 platform adapters
1818
- `src/chat_sdk/shared/` -- Markdown parser, format converter, streaming renderer
1919
- `src/chat_sdk/state/` -- Memory, Redis, Postgres backends
20-
- `tests/` -- 2,477+ tests
21-
22-
## Critical Rules
23-
1. **Never use `datetime.utcnow()`** -- use `datetime.now(tz=timezone.utc)`
24-
2. **Never use `asyncio.ensure_future`** -- use `asyncio.get_running_loop().create_task()`
25-
3. **Never pass raw dicts to `self._chat.process_*`** -- use typed dataclasses (ActionEvent, ReactionEvent, etc.)
26-
4. **Never use camelCase keys in dispatch dicts** -- always snake_case
27-
5. **Never use `random.choices` for security tokens** -- use `secrets.token_hex`
28-
6. **Never import optional deps at module level** -- lazy import inside functions
29-
7. **Always use `hmac.compare_digest` for signature verification** -- never `==`
30-
8. **Always use `is not None` for empty-string-valid fields** -- never `or`
31-
9. **Always validate external URLs before HTTP requests** (SSRF prevention)
32-
10. **Always check `extend_lock` return value** in processing loops
20+
- `tests/` -- 3,267 tests
21+
22+
## Principles
23+
24+
1. **Every test must fail when the code is wrong.** No `assert True` stubs, no
25+
bare truthiness checks when specific values are available, no MagicMock where
26+
AsyncMock is needed. If a test can't catch a regression, it's not a test.
27+
2. **Every async call must be awaited.** Unawaited coroutines silently return
28+
truthy objects. Use AsyncMock (not MagicMock) in tests to surface these.
29+
3. **No two tests should verify the same thing.** Duplicates inflate counts
30+
without catching more bugs.
31+
32+
## Port Rules (TS → Python)
33+
34+
These are specific patterns that broke during the port. The principles above
35+
explain *why*; these explain *what to watch for*.
36+
37+
- `datetime.utcnow()``datetime.now(tz=UTC)` (deprecated, naive)
38+
- `asyncio.ensure_future``loop.create_task()` (deprecated)
39+
- Raw dicts to `process_*` → typed dataclasses (ActionEvent, etc.)
40+
- camelCase dispatch keys → snake_case
41+
- `random.choices` for tokens → `secrets.token_hex`
42+
- Optional deps at module level → lazy import
43+
- `==` for signatures → `hmac.compare_digest`
44+
- `or` for empty-string-valid fields → `is not None`
45+
- Validate external URLs before requests (SSRF)
46+
- Check `extend_lock` return value in loops
3347

3448
## Adding a New Adapter
3549
See docs/ARCHITECTURE.md and CONTRIBUTING.md.
@@ -42,17 +56,12 @@ See docs/UPSTREAM_SYNC.md for TS->Python translation patterns.
4256
- StreamingMarkdownRenderer's _remend is simplified vs the npm `remend` library
4357
- No setext headings, no footnotes, no HTML nodes in the parser
4458

45-
## Test Fidelity Verification
59+
## Test Quality
4660

47-
After modifying or adding tests, run:
48-
```bash
49-
python3 scripts/verify_test_fidelity.py
50-
```
51-
This verifies every TS `it("...")` test has a matching Python `def test_...()`.
52-
The script must show `0 missing` before committing test changes.
61+
**CI runs `scripts/audit_test_quality.py` before tests.** It catches phantoms,
62+
async mock bugs, and cross-file duplicates. PRs that introduce hard failures
63+
will not pass CI.
5364

54-
When porting a new TS test file:
55-
1. Add the mapping to `scripts/verify_test_fidelity.py` MAPPING dict
56-
2. Run with `--fix` to generate stubs
57-
3. Translate each stub by reading the TS test body line-by-line
58-
4. Verify with the script before committing
65+
**Fidelity check** (`scripts/verify_test_fidelity.py`) verifies every TS
66+
`it("...")` has a matching Python `def test_*()`. Name match ≠ faithful port —
67+
the audit script catches the quality side.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Multi-platform async chat SDK for Python. Port of [Vercel Chat](https://github.com/vercel/chat).
44

5-
> **Status: Alpha (0.0.1a10)** — API may change. Not yet tested in production.
5+
> **Status: Alpha (0.0.1a11)** — API may change. Not yet tested in production.
66
77
## Why chat-sdk?
88

docs/TESTING.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,84 @@ The `asyncio_mode = "auto"` setting means all `async def test_*` functions are a
161161
assert result.status == 401
162162
```
163163

164+
## Test Quality Invariants
165+
166+
Rules learned from past bugs in the TS→Python port process:
167+
168+
### 1. Name match ≠ faithful port
169+
170+
The fidelity script (`scripts/verify_test_fidelity.py`) only checks that a matching
171+
`def test_*` exists for each TS `it("...")`. It does **not** check assertion quality.
172+
A test with `assert True` satisfies the fidelity checker but tests nothing.
173+
174+
**Rule:** After porting, every test MUST have real assertions. The only acceptable
175+
`assert True` tests are JSX-specific ones that have no Python equivalent (currently 3).
176+
177+
### 2. Never use `MagicMock` for async methods
178+
179+
If the real method is `async def`, the mock **must** be `AsyncMock`. `MagicMock`
180+
returns a truthy Mock object instead of a coroutine. This hides missing `await` in
181+
production code — the test passes, but production silently gets coroutine objects
182+
instead of values.
183+
184+
**Example of the bug we found:**
185+
```python
186+
# BAD — state.get is async, MagicMock returns Mock not coroutine
187+
state.get = MagicMock(side_effect=lambda k: cache.get(k))
188+
189+
# GOOD
190+
state.get = AsyncMock(side_effect=lambda k: cache.get(k))
191+
```
192+
193+
### 3. Check for duplicates before adding tests
194+
195+
When agents work in parallel on overlapping scopes, they may write the same test
196+
in different files. Before committing new test files, scan for identical test names
197+
and bodies across the suite:
198+
199+
```bash
200+
# Quick check for cross-file exact duplicates
201+
uv run python -c "
202+
import ast, os, collections
203+
bodies = collections.defaultdict(list)
204+
for root, _, files in os.walk('tests'):
205+
for f in sorted(files):
206+
if not f.startswith('test_') or not f.endswith('.py'): continue
207+
path = os.path.join(root, f)
208+
src = open(path).read()
209+
for node in ast.walk(ast.parse(src)):
210+
if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): continue
211+
if not node.name.startswith('test_'): continue
212+
body = '\n'.join(src.split('\n')[node.lineno-1:node.end_lineno]).strip()
213+
if len(body) > 50: bodies[body].append(f'{path}:{node.lineno} {node.name}')
214+
for body, locs in bodies.items():
215+
files = set(l.split(':')[0] for l in locs)
216+
if len(files) > 1:
217+
print(f'DUPLICATE: {locs[0].split(\" \")[1]}')
218+
for l in locs: print(f' {l}')
219+
"
220+
```
221+
222+
### 4. Phantom absorber audit
223+
224+
After any test changes, run this to catch `assert True`-only tests:
225+
226+
```bash
227+
uv run python -c "
228+
import ast, os
229+
for root, _, files in os.walk('tests'):
230+
for f in sorted(files):
231+
if not f.startswith('test_') or not f.endswith('.py'): continue
232+
path = os.path.join(root, f)
233+
for node in ast.walk(ast.parse(open(path).read())):
234+
if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): continue
235+
if not node.name.startswith('test_'): continue
236+
stmts = [s for s in node.body if not (isinstance(s, ast.Expr) and isinstance(s.value, ast.Constant))]
237+
if len(stmts) == 1 and isinstance(stmts[0], ast.Assert) and isinstance(stmts[0].test, ast.Constant) and stmts[0].test.value is True:
238+
print(f'PHANTOM: {path}:{node.lineno} {node.name}')
239+
"
240+
```
241+
164242
## Known Coverage Gaps
165243

166244
The following modules are under 60% coverage as of the initial alpha release. These are tracked for improvement:

docs/UPSTREAM_SYNC.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ These are intentionally different from TS:
7575

7676
These exist only in the Python port and have no TS equivalent:
7777

78-
- `shared/errors.py`: Typed adapter error hierarchy (`AdapterRateLimitError`, `AuthenticationError`, `ValidationError`, `NetworkError`, `ResourceNotFoundError`, `PermissionError`). TS throws plain `Error` objects.
78+
- `shared/errors.py`: Typed adapter error hierarchy (`AdapterRateLimitError`, `AuthenticationError`, `ValidationError`, `NetworkError`, `ResourceNotFoundError`, `AdapterPermissionError`). TS throws plain `Error` objects.
7979
- `testing/__init__.py` + `shared/mock_adapter.py`: Test utilities with `MockAdapter`, `MockStateAdapter`, `create_test_message()`.
8080
- `from __future__ import annotations` everywhere: Enables PEP 604 union syntax (`X | Y`) on Python 3.10.
8181
- Input validation on adapter config dataclasses (e.g., rejecting empty `signing_secret`).

pyproject.toml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,7 @@ discord = ["pynacl>=1.5", "aiohttp>=3.9"]
3434
teams = ["aiohttp>=3.9"]
3535
telegram = ["aiohttp>=3.9"]
3636
whatsapp = ["aiohttp>=3.9"]
37-
google-chat = ["aiohttp>=3.9",
38-
"pyjwt[crypto]>=2.8", "google-auth>=2.0", "pyjwt>=2.8"]
37+
google-chat = ["aiohttp>=3.9", "pyjwt[crypto]>=2.8", "google-auth>=2.0"]
3938
linear = ["aiohttp>=3.9"]
4039
all = [
4140
"slack-sdk>=3.27.0",
@@ -45,7 +44,6 @@ all = [
4544
"cryptography>=42.0",
4645
"pynacl>=1.5",
4746
"aiohttp>=3.9",
48-
"pyjwt[crypto]>=2.8",
4947
"google-auth>=2.0",
5048
]
5149

0 commit comments

Comments
 (0)