You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory contains conformance tests that validate Agent-Diff API replicas against their real-world production counterparts. The tests compare **response schema/shape** (field presence, types, and structure), **status codes**, **error semantics**, and **mutation behavior** -- not exact values, since IDs and timestamps will naturally differ between environments.
6
+
7
+
## Per-Service Methodology
8
+
9
+
### Box (REST API)
10
+
11
+
**Approach:** Dual-fire against production Box API and replica. Each operation is executed against both environments, and response schemas are compared using recursive shape extraction.
-**What is validated:** Response field presence and types, status code parity, error shapes (404, 400, 409), CRUD operations (folders, files, comments, tasks, hubs, collections, search), file upload/download, file version upload
16
+
-**Enterprise-only fields** (54 fields like `role`, `enterprise`, `sync_state`) are excluded from comparison, as they only appear for enterprise Box accounts
17
+
-**Last run:** 105/106 passed (99%)
18
+
19
+
### Google Calendar (REST API)
20
+
21
+
**Approach:** Dual-fire against Google Calendar API v3 and replica. Creates matching resources (calendars, events) in both environments, then validates all operations.
-**Optional data-dependent fields** (55+ fields like `nextPageToken`, `attendees`, `conferenceData`) are excluded from comparison
27
+
28
+
### Linear (GraphQL API)
29
+
30
+
**Approach:** Dual-fire against Linear production GraphQL API and replica. Creates matching resources (issues, labels, comments) in both environments, then validates queries and mutations. Additionally runs **focused schema introspection** to detect drift between production and replica GraphQL schemas.
-**Schema introspection:** Compares focused type surfaces (StringComparator, IssueFilter, Issue, Query, Mutation, etc.) between production and replica schemas
37
+
-**Last run:** 89/90 passed (98%) -- single failure is schema drift on newer Linear API fields (expected as Linear evolves their API)
38
+
39
+
### Slack (Docs-Golden)
40
+
41
+
**Approach:** Replica-only, validated against documented Slack API contracts. Unlike Box/Calendar/Linear, Slack conformance does not compare against a live Slack workspace because live-workspace parity is difficult to standardize (workspace state, installed apps, and permissions vary).
42
+
43
+
-**No external token required**
44
+
-**Methods tested:** 22/28 implemented methods
45
+
-**What is validated:** Response field presence (exact key sets), error semantics (`ok: false` with specific error codes), warning shapes, pagination structure
- Backend replica must be running (`docker-compose up` from `ops/`)
71
+
- For Slack tests: must run inside Docker (`docker exec ops-backend-1 pytest ...`) or have local database access
72
+
73
+
## Interpreting Results
74
+
75
+
-**Pass threshold:** pytest entry points assert >= 70% pass rate. This threshold allows for minor schema differences (e.g., enterprise-only fields, newer API fields) while catching significant divergence.
76
+
-**Schema mismatches** indicate fields present in one environment but not the other. These are logged with the specific field path and should be investigated -- many are benign (optional fields, tier-specific fields).
77
+
-**Error parity** means both environments return the same error class (e.g., both return 404, or both return a GraphQL error with similar keywords). Exact error messages may differ.
78
+
79
+
## Coverage Summary
80
+
81
+
| Service | Protocol | Endpoints Tested | Test Count | Pass Rate | Methodology |
0 commit comments