|
| 1 | +# tidb-stmt-cache |
| 2 | + |
| 3 | +A Spring Boot (Java 17) sample that drives two distinct keploy regressions |
| 4 | +against TiDB and Apache Pulsar in a single app: |
| 5 | + |
| 6 | +| Endpoint | Exercises | |
| 7 | +| --- | --- | |
| 8 | +| `GET /api/kv/{v}` and `GET /api/kv/insert-select/{v}` | MySQL Connector/J prepared-statement cache + HikariCP LIFO pool → orphan `COM_STMT_EXECUTE` matcher path | |
| 9 | +| `POST /events/patch` | Hibernate INSERT + Pulsar `SEND` on a **partitioned** topic with default `RoundRobinPartitionRouter` → the partition-routing replay regression | |
| 10 | + |
| 11 | +Both flows share the same `HikariDataSource` bean (Flipkart's actual shape: |
| 12 | +`autoCommit=false`, `prepStmtCacheSize=500`, `prepStmtCacheSqlLimit=2048`, |
| 13 | +JPA `provider_disables_autocommit=true`). |
| 14 | + |
| 15 | +## Why the Pulsar partitioned topic matters |
| 16 | + |
| 17 | +The Pulsar Java client's default `RoundRobinPartition` router picks a |
| 18 | +**random starting partition** when a producer is constructed, then walks |
| 19 | +through partitions in order. So: |
| 20 | + |
| 21 | +* During recording, the producer might start on partition 5. |
| 22 | +* During replay, a freshly-constructed producer starts on partition 7. |
| 23 | + |
| 24 | +The recorded `SEND` mock targets `…events-partition-5`; the live `SEND` |
| 25 | +during replay targets `…events-partition-7`. Without keploy's |
| 26 | +`baseTopic()` matcher loosening |
| 27 | +(`enterprise/pkg/core/proxy/integrations/pulsar/replayer/replayer.go`), |
| 28 | +no recorded mock matches the live topic and replay fails with |
| 29 | +`pulsar replay: payload-aware mock mismatch`. |
| 30 | + |
| 31 | +## Layout |
| 32 | + |
| 33 | +``` |
| 34 | +. |
| 35 | +├── docker-compose.yml local TiDB + Pulsar (+ partitioned-topic init) |
| 36 | +├── Dockerfile two-stage build → tidb-pulsar-app:dev |
| 37 | +├── k8s/ manifests for the k8s-proxy auto-replay path |
| 38 | +│ ├── 00-namespace.yaml |
| 39 | +│ ├── 10-tidb.yaml |
| 40 | +│ ├── 20-pulsar.yaml includes a Job that pre-creates the partitioned topic |
| 41 | +│ └── 30-app.yaml carries keploy.io/record-session=true for the webhook |
| 42 | +├── pom.xml |
| 43 | +└── src/main/java/com/example/tidbstmtcache/ |
| 44 | + ├── DataSourceConfig.java Flipkart-shape HikariCP bean |
| 45 | + ├── EventsController.java POST /events/patch — JPA save + Pulsar send |
| 46 | + ├── EventEntity.java JPA entity for the `events` table |
| 47 | + ├── EventRepository.java |
| 48 | + ├── PulsarConfig.java PulsarClient + Producer<byte[]> with RoundRobinPartition |
| 49 | + ├── QueryController.java existing orphan-EXECUTE endpoints (unchanged) |
| 50 | + ├── SchemaInitializer.java creates the `kv` table; Hibernate creates `events` |
| 51 | + └── TidbStmtCacheApplication.java |
| 52 | +``` |
| 53 | + |
| 54 | +## Quick path — local docker-compose smoke test |
| 55 | + |
| 56 | +Use this to confirm the app boots and the partition routing is |
| 57 | +non-deterministic across producer creations. Does **not** drive keploy. |
| 58 | + |
| 59 | +```bash |
| 60 | +cd samples-java/tidb-stmt-cache |
| 61 | +docker compose up -d |
| 62 | +# Wait for tidb (port 4000) and pulsar (port 6650) to be ready, and the |
| 63 | +# pulsar-init container to exit 0. |
| 64 | + |
| 65 | +mvn -DskipTests spring-boot:run & # or run from your IDE |
| 66 | +APP_PID=$! |
| 67 | + |
| 68 | +curl -s -X POST http://localhost:8080/events/patch \ |
| 69 | + -H 'Content-Type: application/json' \ |
| 70 | + -d '{ |
| 71 | + "entity_id": "FMPP4037630682", |
| 72 | + "event_name": "delivered", |
| 73 | + "event_timestamp": "2026-05-23T17:07:22+05:30", |
| 74 | + "task_orchestrator": "FSD" |
| 75 | + }' |
| 76 | +# Expect: {"message":"Event patched"} |
| 77 | + |
| 78 | +kill $APP_PID |
| 79 | +docker compose down -v |
| 80 | +``` |
| 81 | + |
| 82 | +To see the round-robin in action, restart the app between curls and |
| 83 | +diff `bin/pulsar-admin topics partitioned-stats persistent://public/default/events` |
| 84 | +output — partition message counts will land on different partitions |
| 85 | +each cold start. |
| 86 | + |
| 87 | +## Full path — k8s-proxy auto-replay (matches Flipkart prod flow) |
| 88 | + |
| 89 | +The k8s-proxy controller watches the namespace for pods carrying |
| 90 | +`keploy.io/record-session=true` and injects the keploy-agent sidecar. |
| 91 | +After a recording is captured, an auto-replay session reconstructs the |
| 92 | +app pod in isolation and feeds the recorded HTTP requests back through |
| 93 | +it; the agent replays MySQL and Pulsar from mocks. |
| 94 | + |
| 95 | +### 1 · Build the patched enterprise agent image |
| 96 | + |
| 97 | +The matcher fix lives in |
| 98 | +`enterprise/pkg/core/proxy/integrations/pulsar/replayer/replayer.go` |
| 99 | +(`baseTopic` function + its callsites). Build a keploy-agent image that |
| 100 | +includes it — the exact `make` target depends on your enterprise repo |
| 101 | +layout; from the workspace root: |
| 102 | + |
| 103 | +```bash |
| 104 | +cd ../enterprise |
| 105 | +make docker-image AGENT_IMAGE=keploy-agent:partition-fix |
| 106 | +kind load docker-image keploy-agent:partition-fix --name <your-kind-cluster> |
| 107 | +``` |
| 108 | + |
| 109 | +### 2 · Install the k8s-proxy chart pointing at the patched agent |
| 110 | + |
| 111 | +```bash |
| 112 | +cd ../k8s-proxy |
| 113 | +helm upgrade --install k8s-proxy ./charts/k8s-proxy \ |
| 114 | + --namespace k8s-proxy --create-namespace \ |
| 115 | + --set agent.image=keploy-agent:partition-fix \ |
| 116 | + --set webhook.watchNamespaces='{tidb-pulsar-replay}' |
| 117 | +``` |
| 118 | + |
| 119 | +### 3 · Build and load the sample app image |
| 120 | + |
| 121 | +```bash |
| 122 | +cd ../samples-java/tidb-stmt-cache |
| 123 | +mvn -DskipTests package |
| 124 | +docker build -t tidb-pulsar-app:dev . |
| 125 | +kind load docker-image tidb-pulsar-app:dev --name <your-kind-cluster> |
| 126 | +``` |
| 127 | + |
| 128 | +### 4 · Apply the manifests |
| 129 | + |
| 130 | +```bash |
| 131 | +kubectl apply -f k8s/ |
| 132 | +kubectl -n tidb-pulsar-replay wait deploy/tidb deploy/pulsar deploy/tidb-pulsar-app \ |
| 133 | + --for=condition=Available --timeout=5m |
| 134 | +kubectl -n tidb-pulsar-replay wait --for=condition=complete job/pulsar-init-topic --timeout=2m |
| 135 | +``` |
| 136 | + |
| 137 | +### 5 · Record a session |
| 138 | + |
| 139 | +Drive a few `POST /events/patch` requests through the in-cluster Service. |
| 140 | +The keploy-agent sidecar attached to `tidb-pulsar-app` will capture the |
| 141 | +MySQL and Pulsar traffic. |
| 142 | + |
| 143 | +```bash |
| 144 | +kubectl -n tidb-pulsar-replay port-forward svc/tidb-pulsar-app 8080:80 & |
| 145 | +PF_PID=$! |
| 146 | + |
| 147 | +for i in $(seq 1 5); do |
| 148 | + curl -s -X POST http://localhost:8080/events/patch \ |
| 149 | + -H 'Content-Type: application/json' \ |
| 150 | + -d "{\"entity_id\":\"FMPP$i\",\"event_name\":\"delivered\",\"event_timestamp\":\"2026-05-23T17:07:22+05:30\",\"task_orchestrator\":\"FSD\"}" |
| 151 | +done |
| 152 | + |
| 153 | +kill $PF_PID |
| 154 | +``` |
| 155 | + |
| 156 | +Confirm a `SEND` mock landed on a specific partition: |
| 157 | + |
| 158 | +```bash |
| 159 | +kubectl -n tidb-pulsar-replay logs deploy/tidb-pulsar-app -c keploy-agent \ |
| 160 | + | grep -E 'commandType.*SEND|topic.*partition-' |
| 161 | +``` |
| 162 | + |
| 163 | +### 6 · Trigger auto-replay |
| 164 | + |
| 165 | +Use the k8s-proxy `Replay` CR (or REST API, depending on your install). |
| 166 | +Example via the openapi-described endpoint: |
| 167 | + |
| 168 | +```bash |
| 169 | +kubectl -n k8s-proxy port-forward svc/k8s-proxy 8000:80 & |
| 170 | +curl -s -X POST http://localhost:8000/api/v1/replays \ |
| 171 | + -H 'Content-Type: application/json' \ |
| 172 | + -d '{ |
| 173 | + "namespace": "tidb-pulsar-replay", |
| 174 | + "deployment": "tidb-pulsar-app", |
| 175 | + "testSetIDs": ["<the test-set ID printed in the agent logs>"] |
| 176 | + }' |
| 177 | +``` |
| 178 | + |
| 179 | +### 7 · Assert the regression is fixed |
| 180 | + |
| 181 | +```bash |
| 182 | +kubectl -n tidb-pulsar-replay logs deploy/tidb-pulsar-app -c keploy-agent \ |
| 183 | + | grep -E 'payload-aware mock mismatch|Test passed|result.*passed' |
| 184 | +``` |
| 185 | + |
| 186 | +* **Without the patch** — at least one of the recorded sessions fails |
| 187 | + with `pulsar replay: payload-aware mock mismatch for SEND (topic=…events-partition-<N>)`, |
| 188 | + the app returns HTTP 500, the testcase is marked failed. |
| 189 | +* **With the patch** — the live `SEND` to `…events-partition-<N>` matches |
| 190 | + the recorded mock for `…events-partition-<M>` (same base topic |
| 191 | + `…events`, same payload), the synthetic `SEND_RECEIPT` is returned, |
| 192 | + the app returns HTTP 200, the testcase passes. |
| 193 | + |
| 194 | +## Reproducing the Flipkart symptom exactly |
| 195 | + |
| 196 | +The Strowger recording in `Strowger Playgro Global Shipment (1)/testset/` |
| 197 | +matches this sample structurally — the `POST /events/patch` body, the |
| 198 | +HikariCP shape, and the partitioned Pulsar topic. To run the customer's |
| 199 | +mocks against this app: |
| 200 | + |
| 201 | +1. Replace `k8s/30-app.yaml`'s `PULSAR_TOPIC` env with the customer's |
| 202 | + topic name (`persistent://toss-relayer/gsm-relayers-prod/toss_EKL-E2E-ORCHESTRATOR_gsm_ns`). |
| 203 | +2. Use the customer's `mocks.yaml` instead of a fresh recording. |
| 204 | +3. Run step 6 with the customer's test-set ID. |
| 205 | + |
| 206 | +Without the fix you should see the same `payload-aware mock mismatch |
| 207 | +for SEND (topic=…partition-7)` log line the customer reported. With the |
| 208 | +fix you should see the `SEND` resolve against the customer's |
| 209 | +`partition-5` mock and the testcase pass. |
0 commit comments