Skip to content

Commit 43718c1

Browse files
khareyash05claude
andcommitted
feat(tidb-stmt-cache): add Pulsar partitioned-topic replay scenario
Extends the existing TiDB orphan-COM_STMT_EXECUTE sample with an HTTP endpoint that exercises Apache Pulsar's RoundRobinPartition routing — the producer-side behaviour that triggers keploy's partition-suffix replay mismatch (keploy/enterprise#2064). The bean shape, JDBC parameters and JPA settings replicate the Flipkart Global-Shipment-Master service's production config so a recording captured against this sample produces mocks structurally identical to the customer's: * HikariCP autoCommit=false paired with spring.jpa.properties.hibernate.connection.provider_disables_autocommit=true * JDBC URL: useServerPrepStmts=true&cachePrepStmts=true &prepStmtCacheSize=500&prepStmtCacheSqlLimit=2048 * Hibernate MySQLDialect (TiDB is wire-compatible on :4000), open-in-view=false * Pulsar producer with MessageRoutingMode.RoundRobinPartition against an 8-partition topic pre-created by the docker-compose pulsar-init container New endpoint: POST /events/patch — JPA-saves an Event row through TiDB then synchronously publishes the payload to the partitioned topic; mirrors the Flipkart endpoint shape exactly so the customer's mocks.yaml can be replayed against this sample. Existing /api/kv/* endpoints (orphan-EXECUTE scenario) are unchanged. Additions: * EventEntity / EventRepository: JPA persistence, Hibernate auto-DDL creates the `events` table. * PulsarConfig: PulsarClient + Producer<byte[]> beans. * EventsController: @transactional POST /events/patch. * docker-compose: Apache Pulsar 4.0.3 standalone + one-shot pulsar-init that creates the partitioned topic. * Dockerfile: two-stage Maven -> JRE 17 build. * k8s/ manifests: namespace, TiDB Deployment+Service, Pulsar Deployment+Service+Job, app Deployment+Service with keploy.io/record-session=true for the static webhook. * README: walkthroughs for the local docker-compose smoke path and the in-cluster k8s-proxy auto-replay path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent eb3db5b commit 43718c1

14 files changed

Lines changed: 923 additions & 71 deletions

File tree

tidb-stmt-cache/Dockerfile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Two-stage build so the runtime image does not carry Maven, the Maven
2+
# local repo, or the source tree. The runtime image is what k8s pulls.
3+
FROM maven:3.9-eclipse-temurin-17 AS build
4+
WORKDIR /src
5+
# Copy the POM first so dependency resolution caches when only source
6+
# changes. Maven would otherwise re-fetch the entire dep set on every
7+
# source edit, which makes iteration painful.
8+
COPY pom.xml ./
9+
RUN mvn -B -q dependency:go-offline
10+
COPY src ./src
11+
RUN mvn -B -DskipTests package
12+
13+
FROM eclipse-temurin:17-jre
14+
WORKDIR /app
15+
# Spring Boot's maven plugin emits the jar to target/. Pinning the jar
16+
# name keeps the COPY deterministic (no wildcard depending on artifact
17+
# version expansion).
18+
COPY --from=build /src/target/tidb-stmt-cache-0.0.1-SNAPSHOT.jar /app/app.jar
19+
EXPOSE 8080
20+
ENTRYPOINT ["java", "-jar", "/app/app.jar"]

tidb-stmt-cache/README.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# tidb-stmt-cache
2+
3+
A Spring Boot (Java 17) sample that drives two distinct keploy regressions
4+
against TiDB and Apache Pulsar in a single app:
5+
6+
| Endpoint | Exercises |
7+
| --- | --- |
8+
| `GET /api/kv/{v}` and `GET /api/kv/insert-select/{v}` | MySQL Connector/J prepared-statement cache + HikariCP LIFO pool → orphan `COM_STMT_EXECUTE` matcher path |
9+
| `POST /events/patch` | Hibernate INSERT + Pulsar `SEND` on a **partitioned** topic with default `RoundRobinPartitionRouter` → the partition-routing replay regression |
10+
11+
Both flows share the same `HikariDataSource` bean (Flipkart's actual shape:
12+
`autoCommit=false`, `prepStmtCacheSize=500`, `prepStmtCacheSqlLimit=2048`,
13+
JPA `provider_disables_autocommit=true`).
14+
15+
## Why the Pulsar partitioned topic matters
16+
17+
The Pulsar Java client's default `RoundRobinPartition` router picks a
18+
**random starting partition** when a producer is constructed, then walks
19+
through partitions in order. So:
20+
21+
* During recording, the producer might start on partition 5.
22+
* During replay, a freshly-constructed producer starts on partition 7.
23+
24+
The recorded `SEND` mock targets `…events-partition-5`; the live `SEND`
25+
during replay targets `…events-partition-7`. Without keploy's
26+
`baseTopic()` matcher loosening
27+
(`enterprise/pkg/core/proxy/integrations/pulsar/replayer/replayer.go`),
28+
no recorded mock matches the live topic and replay fails with
29+
`pulsar replay: payload-aware mock mismatch`.
30+
31+
## Layout
32+
33+
```
34+
.
35+
├── docker-compose.yml local TiDB + Pulsar (+ partitioned-topic init)
36+
├── Dockerfile two-stage build → tidb-pulsar-app:dev
37+
├── k8s/ manifests for the k8s-proxy auto-replay path
38+
│ ├── 00-namespace.yaml
39+
│ ├── 10-tidb.yaml
40+
│ ├── 20-pulsar.yaml includes a Job that pre-creates the partitioned topic
41+
│ └── 30-app.yaml carries keploy.io/record-session=true for the webhook
42+
├── pom.xml
43+
└── src/main/java/com/example/tidbstmtcache/
44+
├── DataSourceConfig.java Flipkart-shape HikariCP bean
45+
├── EventsController.java POST /events/patch — JPA save + Pulsar send
46+
├── EventEntity.java JPA entity for the `events` table
47+
├── EventRepository.java
48+
├── PulsarConfig.java PulsarClient + Producer<byte[]> with RoundRobinPartition
49+
├── QueryController.java existing orphan-EXECUTE endpoints (unchanged)
50+
├── SchemaInitializer.java creates the `kv` table; Hibernate creates `events`
51+
└── TidbStmtCacheApplication.java
52+
```
53+
54+
## Quick path — local docker-compose smoke test
55+
56+
Use this to confirm the app boots and the partition routing is
57+
non-deterministic across producer creations. Does **not** drive keploy.
58+
59+
```bash
60+
cd samples-java/tidb-stmt-cache
61+
docker compose up -d
62+
# Wait for tidb (port 4000) and pulsar (port 6650) to be ready, and the
63+
# pulsar-init container to exit 0.
64+
65+
mvn -DskipTests spring-boot:run & # or run from your IDE
66+
APP_PID=$!
67+
68+
curl -s -X POST http://localhost:8080/events/patch \
69+
-H 'Content-Type: application/json' \
70+
-d '{
71+
"entity_id": "FMPP4037630682",
72+
"event_name": "delivered",
73+
"event_timestamp": "2026-05-23T17:07:22+05:30",
74+
"task_orchestrator": "FSD"
75+
}'
76+
# Expect: {"message":"Event patched"}
77+
78+
kill $APP_PID
79+
docker compose down -v
80+
```
81+
82+
To see the round-robin in action, restart the app between curls and
83+
diff `bin/pulsar-admin topics partitioned-stats persistent://public/default/events`
84+
output — partition message counts will land on different partitions
85+
each cold start.
86+
87+
## Full path — k8s-proxy auto-replay (matches Flipkart prod flow)
88+
89+
The k8s-proxy controller watches the namespace for pods carrying
90+
`keploy.io/record-session=true` and injects the keploy-agent sidecar.
91+
After a recording is captured, an auto-replay session reconstructs the
92+
app pod in isolation and feeds the recorded HTTP requests back through
93+
it; the agent replays MySQL and Pulsar from mocks.
94+
95+
### 1 · Build the patched enterprise agent image
96+
97+
The matcher fix lives in
98+
`enterprise/pkg/core/proxy/integrations/pulsar/replayer/replayer.go`
99+
(`baseTopic` function + its callsites). Build a keploy-agent image that
100+
includes it — the exact `make` target depends on your enterprise repo
101+
layout; from the workspace root:
102+
103+
```bash
104+
cd ../enterprise
105+
make docker-image AGENT_IMAGE=keploy-agent:partition-fix
106+
kind load docker-image keploy-agent:partition-fix --name <your-kind-cluster>
107+
```
108+
109+
### 2 · Install the k8s-proxy chart pointing at the patched agent
110+
111+
```bash
112+
cd ../k8s-proxy
113+
helm upgrade --install k8s-proxy ./charts/k8s-proxy \
114+
--namespace k8s-proxy --create-namespace \
115+
--set agent.image=keploy-agent:partition-fix \
116+
--set webhook.watchNamespaces='{tidb-pulsar-replay}'
117+
```
118+
119+
### 3 · Build and load the sample app image
120+
121+
```bash
122+
cd ../samples-java/tidb-stmt-cache
123+
mvn -DskipTests package
124+
docker build -t tidb-pulsar-app:dev .
125+
kind load docker-image tidb-pulsar-app:dev --name <your-kind-cluster>
126+
```
127+
128+
### 4 · Apply the manifests
129+
130+
```bash
131+
kubectl apply -f k8s/
132+
kubectl -n tidb-pulsar-replay wait deploy/tidb deploy/pulsar deploy/tidb-pulsar-app \
133+
--for=condition=Available --timeout=5m
134+
kubectl -n tidb-pulsar-replay wait --for=condition=complete job/pulsar-init-topic --timeout=2m
135+
```
136+
137+
### 5 · Record a session
138+
139+
Drive a few `POST /events/patch` requests through the in-cluster Service.
140+
The keploy-agent sidecar attached to `tidb-pulsar-app` will capture the
141+
MySQL and Pulsar traffic.
142+
143+
```bash
144+
kubectl -n tidb-pulsar-replay port-forward svc/tidb-pulsar-app 8080:80 &
145+
PF_PID=$!
146+
147+
for i in $(seq 1 5); do
148+
curl -s -X POST http://localhost:8080/events/patch \
149+
-H 'Content-Type: application/json' \
150+
-d "{\"entity_id\":\"FMPP$i\",\"event_name\":\"delivered\",\"event_timestamp\":\"2026-05-23T17:07:22+05:30\",\"task_orchestrator\":\"FSD\"}"
151+
done
152+
153+
kill $PF_PID
154+
```
155+
156+
Confirm a `SEND` mock landed on a specific partition:
157+
158+
```bash
159+
kubectl -n tidb-pulsar-replay logs deploy/tidb-pulsar-app -c keploy-agent \
160+
| grep -E 'commandType.*SEND|topic.*partition-'
161+
```
162+
163+
### 6 · Trigger auto-replay
164+
165+
Use the k8s-proxy `Replay` CR (or REST API, depending on your install).
166+
Example via the openapi-described endpoint:
167+
168+
```bash
169+
kubectl -n k8s-proxy port-forward svc/k8s-proxy 8000:80 &
170+
curl -s -X POST http://localhost:8000/api/v1/replays \
171+
-H 'Content-Type: application/json' \
172+
-d '{
173+
"namespace": "tidb-pulsar-replay",
174+
"deployment": "tidb-pulsar-app",
175+
"testSetIDs": ["<the test-set ID printed in the agent logs>"]
176+
}'
177+
```
178+
179+
### 7 · Assert the regression is fixed
180+
181+
```bash
182+
kubectl -n tidb-pulsar-replay logs deploy/tidb-pulsar-app -c keploy-agent \
183+
| grep -E 'payload-aware mock mismatch|Test passed|result.*passed'
184+
```
185+
186+
* **Without the patch** — at least one of the recorded sessions fails
187+
with `pulsar replay: payload-aware mock mismatch for SEND (topic=…events-partition-<N>)`,
188+
the app returns HTTP 500, the testcase is marked failed.
189+
* **With the patch** — the live `SEND` to `…events-partition-<N>` matches
190+
the recorded mock for `…events-partition-<M>` (same base topic
191+
`…events`, same payload), the synthetic `SEND_RECEIPT` is returned,
192+
the app returns HTTP 200, the testcase passes.
193+
194+
## Reproducing the Flipkart symptom exactly
195+
196+
The Strowger recording in `Strowger Playgro Global Shipment (1)/testset/`
197+
matches this sample structurally — the `POST /events/patch` body, the
198+
HikariCP shape, and the partitioned Pulsar topic. To run the customer's
199+
mocks against this app:
200+
201+
1. Replace `k8s/30-app.yaml`'s `PULSAR_TOPIC` env with the customer's
202+
topic name (`persistent://toss-relayer/gsm-relayers-prod/toss_EKL-E2E-ORCHESTRATOR_gsm_ns`).
203+
2. Use the customer's `mocks.yaml` instead of a fresh recording.
204+
3. Run step 6 with the customer's test-set ID.
205+
206+
Without the fix you should see the same `payload-aware mock mismatch
207+
for SEND (topic=…partition-7)` log line the customer reported. With the
208+
fix you should see the `SEND` resolve against the customer's
209+
`partition-5` mock and the testcase pass.

tidb-stmt-cache/docker-compose.yml

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Minimal single-node TiDB for keploy e2e.
1+
# Minimal TiDB + Apache Pulsar stack for keploy e2e.
22
#
33
# Why this stack and not the full pingcap/tidb-docker-compose with PD + TiKV?
44
# - The sample only exercises the SQL layer (MySQL wire protocol on :4000).
@@ -8,12 +8,55 @@
88
# process, in-memory) mode when no PD address is supplied, which is exactly
99
# what we want for keploy CI: ~5s boot, volatile data, no extra containers.
1010
#
11-
# Pin: v8.5.x is the LTS line current at the time this sample was added.
12-
# Bump as new LTS lines ship; the matcher behaviour we're testing has been
13-
# stable across TiDB versions because it depends on the MySQL wire protocol.
11+
# Pulsar runs in standalone mode (one broker + embedded bookkeeper + zk).
12+
# The pulsar-init sidecar exists for a single purpose: create the events
13+
# topic as PARTITIONED with 8 partitions BEFORE the app boots its Java
14+
# producer. Without that, the Java client would auto-create a
15+
# non-partitioned topic on first SEND, and the bug we are testing
16+
# (RoundRobinPartition cursor divergence across record/replay) would not
17+
# reproduce because there is only one partition to route to.
18+
#
19+
# Pin: TiDB v8.5.x is the LTS line current at the time this sample was
20+
# added. Pulsar 4.0.3 matches the broker version typically deployed in
21+
# Flipkart's Strowger environment (Pulsar Server 3.x client compatibility
22+
# is preserved through the 4.x line).
1423
services:
1524
tidb:
1625
image: pingcap/tidb:v8.5.6
1726
ports:
1827
- "4000:4000" # MySQL wire protocol
1928
- "10080:10080" # status / readiness
29+
30+
pulsar:
31+
image: apachepulsar/pulsar:4.0.3
32+
command: bin/pulsar standalone
33+
ports:
34+
- "6650:6650" # binary protocol — Java client connects here
35+
- "8080:8080" # admin REST
36+
healthcheck:
37+
test: ["CMD", "bin/pulsar-admin", "brokers", "healthcheck"]
38+
interval: 5s
39+
timeout: 5s
40+
retries: 30
41+
start_period: 30s
42+
43+
# One-shot container that waits for the broker to be healthy, then
44+
# pre-creates the events topic as partitioned. Exits 0 on success; the
45+
# app does NOT depend_on it explicitly because Pulsar Java client
46+
# resolves topic metadata lazily on first SEND, so as long as
47+
# pulsar-init has finished before /events/patch is first hit, we are
48+
# safe. In practice the lookup time exceeds Spring Boot startup time,
49+
# so this ordering holds.
50+
pulsar-init:
51+
image: apachepulsar/pulsar:4.0.3
52+
depends_on:
53+
pulsar:
54+
condition: service_healthy
55+
entrypoint:
56+
- /bin/sh
57+
- -c
58+
- |
59+
bin/pulsar-admin --admin-url http://pulsar:8080 \
60+
topics create-partitioned-topic \
61+
persistent://public/default/events --partitions 8
62+
restart: "no"
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
apiVersion: v1
2+
kind: Namespace
3+
metadata:
4+
name: tidb-pulsar-replay
5+
labels:
6+
# k8s-proxy's static webhook only mutates pods in namespaces that
7+
# match the configured watch list (see chart values
8+
# webhook.namespaceLabel + watchNamespace). The default label key is
9+
# `kubernetes.io/metadata.name`, auto-stamped by the
10+
# NamespaceDefaultLabelName admission plugin, so we just rely on
11+
# the controller's chart to point at this namespace. If you flip
12+
# the chart to a custom label key, add it here.
13+
kubernetes.io/metadata.name: tidb-pulsar-replay

tidb-stmt-cache/k8s/10-tidb.yaml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Single-node TiDB in unistore mode. No PD, no TiKV — fast boot, in-memory
2+
# state, identical SQL surface to a real cluster. Matches the docker-compose
3+
# topology so a recording captured locally and one captured in-cluster
4+
# share the same MySQL wire protocol behaviour.
5+
apiVersion: apps/v1
6+
kind: Deployment
7+
metadata:
8+
name: tidb
9+
namespace: tidb-pulsar-replay
10+
spec:
11+
replicas: 1
12+
selector:
13+
matchLabels:
14+
app: tidb
15+
template:
16+
metadata:
17+
labels:
18+
app: tidb
19+
spec:
20+
containers:
21+
- name: tidb
22+
image: pingcap/tidb:v8.5.6
23+
imagePullPolicy: IfNotPresent
24+
ports:
25+
- name: mysql
26+
containerPort: 4000
27+
- name: status
28+
containerPort: 10080
29+
readinessProbe:
30+
tcpSocket:
31+
port: 4000
32+
initialDelaySeconds: 5
33+
periodSeconds: 2
34+
failureThreshold: 30
35+
resources:
36+
requests:
37+
memory: "256Mi"
38+
cpu: "100m"
39+
limits:
40+
memory: "1Gi"
41+
cpu: "1000m"
42+
---
43+
apiVersion: v1
44+
kind: Service
45+
metadata:
46+
name: tidb
47+
namespace: tidb-pulsar-replay
48+
spec:
49+
selector:
50+
app: tidb
51+
ports:
52+
- name: mysql
53+
port: 4000
54+
targetPort: 4000

0 commit comments

Comments
 (0)