Skip to content

Commit 596a243

Browse files
Claudeclaude
authored andcommitted
test: add pro stress test — 141 tests across 5 simulated days
Comprehensive E2E test exercising all pro tools on a fresh Supabase: - 50 scars across 10 domains with 4 severity levels - 10 design patterns - 5 architectural decisions - 10 threads with create/resolve/dedup lifecycle - 3 markdown docs indexed and searched - 5 session open/close cycles verifying cross-session persistence - Recall, confirm_scars, reflect_scars, record_scar_usage lifecycle - Cache flush/status/health cycle - Graph traversal, analytics, prepare_context, absorb_observations - Archive learning, contribute_feedback, help Test plan documented in PRO-STRESS-TEST-PLAN.md v1.0. Result: 141/141 PASS on blank Supabase with auto-schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 354aef4 commit 596a243

2 files changed

Lines changed: 781 additions & 0 deletions

File tree

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# GitMem Pro Stress Test Plan v1.0
2+
3+
## Overview
4+
5+
Comprehensive end-to-end test of all GitMem Pro features on a fresh Supabase project. Simulates 5 days of real interactive usage with session cycling, data accumulation, and cross-session persistence verification.
6+
7+
**Test file:** `pro-stress-test.mjs`
8+
**Last run:** 2026-05-25 — 141/141 PASS
9+
10+
## Prerequisites
11+
12+
- Docker (for clean room isolation)
13+
- Fresh Supabase project (blank — schema auto-applied)
14+
- Environment variables:
15+
- `SUPABASE_URL` — test project URL
16+
- `SUPABASE_SERVICE_ROLE_KEY` — test project service role key
17+
- `SUPABASE_ACCESS_TOKEN` — for auto-schema (from `npx supabase login`)
18+
- `OPENROUTER_API_KEY` — for embeddings
19+
20+
## How to run
21+
22+
```bash
23+
# From /workspace/gitmem
24+
cd /workspace/gitmem
25+
26+
# Build local tarball
27+
npm run build
28+
npm pack --pack-destination testing/clean-room/
29+
mv testing/clean-room/gitmem-mcp-*.tgz testing/clean-room/gitmem-mcp-local.tgz
30+
31+
# Build Docker image
32+
docker build --no-cache -t gitmem-claude-local -f testing/clean-room/Dockerfile.claude-local testing/clean-room/
33+
34+
# Run (create env file with credentials first)
35+
docker run --rm --env-file /path/to/test.env --user root -i gitmem-claude-local bash -c '
36+
cat > /tmp/stress-test.mjs
37+
chown -R developer:developer /home/developer/my-project
38+
su developer -c "cd /home/developer/my-project && gitmem-mcp init --yes --project stress-test" 2>&1 > /dev/null
39+
su developer -c "cd /home/developer/my-project && echo \"\" | gitmem-mcp activate gitmem_pro_52061b097ac6d8b76c38ef191b74a319" 2>&1
40+
mkdir -p /tmp/test-harness && cd /tmp/test-harness
41+
npm init -y > /dev/null 2>&1 && npm install @modelcontextprotocol/sdk > /dev/null 2>&1
42+
cp /tmp/stress-test.mjs /tmp/test-harness/stress-test.mjs
43+
su developer -c "cd /home/developer/my-project && node /tmp/test-harness/stress-test.mjs"
44+
' < testing/clean-room/pro-stress-test.mjs
45+
```
46+
47+
## Test coverage — 141 tests across 5 simulated days
48+
49+
### Day 1: Initial setup (78 tests)
50+
- `session_start` — first session on blank project
51+
- 50 `create_learning` (scars) — 10 domains, 4 severity levels, real descriptions
52+
- 10 `create_learning` (patterns) — architecture design patterns
53+
- 5 `create_decision` — architectural decisions with rationale
54+
- 10 `create_thread` — unresolved work items
55+
- `list_threads` — verify all 10 visible
56+
- `session_close` — with closing reflection
57+
58+
### Day 2: Recall, confirm, resolve (18 tests)
59+
- `session_start` — loads day 1 context, verifies threads carry over
60+
- 5x `recall` — diverse queries (deploy, auth, cache, frontend, security)
61+
- `confirm_scars` — acknowledge recalled scars with APPLYING/N_A decisions
62+
- 3x `resolve_thread` — close completed work items
63+
- `list_threads` — verify 7 open, 3 resolved
64+
- `reflect_scars` — end-of-session scar reflection
65+
- `record_scar_usage` — track scar application
66+
- `session_refresh` — re-surface context mid-session
67+
- `session_close`
68+
69+
### Day 3: Docs, search, graph, analytics (23 tests)
70+
- `session_start` — loads 2 days of history
71+
- Write 3 markdown docs (architecture, deployment, API reference — 1000+ words total)
72+
- `index_docs` — embed and index the docs
73+
- 4x `search_docs` — semantic doc search
74+
- 4x `search` — keyword/semantic search across learnings
75+
- 3x `log` — chronological browsing with type filters
76+
- 2x `graph_traverse` — stats and connected_to lenses
77+
- `analyze` — session analytics summary
78+
- 3x `prepare_context` — compact, gate, and full sub-agent briefings
79+
- `absorb_observations` — capture sub-agent findings (4 observations)
80+
- `session_close`
81+
82+
### Day 4: Cache, health, archive, threads (15 tests)
83+
- `session_start`
84+
- 4x cache management — status, health, flush, status-after
85+
- `health` — write operation success rates
86+
- `archive_learning` — soft-delete a scar
87+
- `create_thread` + dedup test (similar text → returns existing)
88+
- `list_threads`, `cleanup_threads`
89+
- 3x `resolve_thread` — close more work items
90+
- `list_threads` — verify final state
91+
- `contribute_feedback` — submit tool improvement suggestion
92+
- `session_close`
93+
94+
### Day 5: Persistence verification (12 tests)
95+
- `session_start` — loads all 4 previous sessions
96+
- `list_threads` — verify threads survived 5 sessions
97+
- `log` — verify all 60 learnings persisted
98+
- 3x `recall` — verify embeddings still work
99+
- `search_docs` — verify doc index survived
100+
- `analyze` — final analytics across all sessions
101+
- `gitmem-help` — help output
102+
- `session_close`
103+
104+
## Tools tested
105+
106+
| Tool | Tests | Days |
107+
|------|-------|------|
108+
| session_start | 5 | 1-5 |
109+
| session_close | 5 | 1-5 |
110+
| session_refresh | 1 | 2 |
111+
| create_learning | 60 | 1 |
112+
| create_decision | 5 | 1 |
113+
| recall | 9 | 2, 5 |
114+
| confirm_scars | 1 | 2 |
115+
| reflect_scars | 1 | 2 |
116+
| record_scar_usage | 1 | 2 |
117+
| search | 4 | 3 |
118+
| log | 4 | 3, 5 |
119+
| create_thread | 12 | 1, 4 |
120+
| list_threads | 5 | 1-5 |
121+
| resolve_thread | 6 | 2, 4 |
122+
| cleanup_threads | 1 | 4 |
123+
| index_docs | 1 | 3 |
124+
| search_docs | 5 | 3, 5 |
125+
| graph_traverse | 2 | 3 |
126+
| analyze | 2 | 3, 5 |
127+
| prepare_context | 3 | 3 |
128+
| absorb_observations | 1 | 3 |
129+
| archive_learning | 1 | 4 |
130+
| health | 1 | 4 |
131+
| cache-status | 2 | 4 |
132+
| cache-health | 1 | 4 |
133+
| cache-flush | 1 | 4 |
134+
| contribute_feedback | 1 | 4 |
135+
| gitmem-help | 1 | 5 |
136+
| **TOTAL** | **141** | |
137+
138+
## What this test validates
139+
140+
1. **Schema auto-application** — blank Supabase → activate → tables created automatically
141+
2. **Data persistence** — learnings, sessions, threads survive across 5 session restarts
142+
3. **Embedding pipeline** — 60 learnings embedded via OpenRouter, searchable via recall
143+
4. **Semantic search** — recall finds relevant scars for diverse query topics
144+
5. **Thread lifecycle** — create → list → resolve → verify resolved count
145+
6. **Thread deduplication** — similar threads detected and deduplicated
146+
7. **Document indexing** — markdown files indexed and searchable
147+
8. **Session continuity** — each new session loads previous context
148+
9. **Multi-agent tools** — prepare_context generates briefings, absorb_observations captures findings
149+
10. **Cache management** — status → flush → verify reload
150+
11. **Analytics** — cross-session analysis works
151+
12. **Knowledge graph** — traverse returns stats and connections
152+
13. **Scar lifecycle** — create → recall → confirm → reflect → record usage → archive
153+
154+
## Version history
155+
156+
| Version | Date | Tests | Result |
157+
|---------|------|-------|--------|
158+
| v1.0 | 2026-05-25 | 141 | 141 PASS |

0 commit comments

Comments
 (0)