Skip to content

Commit 6f274f6

Browse files
committed
chore: add grpc-protoscope sample app for Keploy gRPC field-ordering bug
1 parent e053bd5 commit 6f274f6

13 files changed

Lines changed: 1740 additions & 0 deletions

File tree

grpc-protoscope/README.md

Lines changed: 349 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
# grpc-protoscope — Reproducing the Keploy gRPC Field-Ordering Bug
2+
3+
## Table of Contents
4+
5+
1. [The Issue Reported by the Client](#1-the-issue-reported-by-the-client)
6+
2. [Root Cause Analysis](#2-root-cause-analysis)
7+
3. [About This Sample Application](#3-about-this-sample-application)
8+
4. [How to Run](#4-how-to-run)
9+
5. [Reproducing the Bug with Keploy](#5-reproducing-the-bug-with-keploy)
10+
6. [Files in This Repository](#6-files-in-this-repository)
11+
12+
---
13+
14+
## 1. The Issue Reported by the Client
15+
16+
A user reported that Keploy was **failing gRPC tests** even though the recorded and replayed responses had **identical structure and values**. The only difference was the **order of individual fields** inside nested protobuf sub-messages.
17+
18+
### Client's Exact Input
19+
20+
```yaml
21+
expected: |-
22+
1: 67.0i32 # 0x42860000i32
23+
4: {"{\"hits\":[{\"_index\":\"pvid_search_products_v4\",\"_score\":15100000000000000000,\"_sou"
24+
"rce\":{\"_rankingInfo\":{\"typosPresent\":true,\"numberOfWordsMatched\":1}},\"match_type"
25+
"\":\"Other\",\"attributes\":{\"subThemes\":null},\"_id\":\"4f30407c-6a3c-4a4e-8a3d-652217d"
26+
"4b6cb_d67c25f8-3adb-40c1-9113-b46d54a6e8aa\",\"trimming_meta\":{\"trimming_type\":\"L3"
27+
"\"}}]}"}
28+
8: 0
29+
9: { 3: { 1: { 2: {2: 0.0} # 0x0i64
30+
1: {"candidateCnt"}}
31+
1: { 2: {3: {"OVS"}}
32+
1: {"type"}}}
33+
2: { 1: { 2: {2: 1.0} # 0x3ff0000000000000i64
34+
1: {"candidateCnt"}}
35+
1: { 2: {2: 1.0} # 0x3ff0000000000000i64
36+
1: {"resultCnt"}}}}
37+
actual: |-
38+
1: 67.0i32 # 0x42860000i32
39+
4: {"{\"hits\":[{\"_index\":\"pvid_search_products_v4\",\"_score\":15100000000000000000,\"_sou"
40+
"rce\":{\"_rankingInfo\":{\"typosPresent\":true,\"numberOfWordsMatched\":1}},\"match_type"
41+
"\":\"Other\",\"attributes\":{\"subThemes\":null},\"_id\":\"4f30407c-6a3c-4a4e-8a3d-652217d"
42+
"4b6cb_d67c25f8-3adb-40c1-9113-b46d54a6e8aa\",\"trimming_meta\":{\"trimming_type\":\"L3"
43+
"\"}}]}"}
44+
8: 0
45+
9: { 3: { 1: { 2: {3: {"OVS"}}
46+
1: {"type"}}
47+
1: { 2: {2: 0.0} # 0x0i64
48+
1: {"candidateCnt"}}}
49+
2: { 1: { 2: {2: 1.0} # 0x3ff0000000000000i64
50+
1: {"candidateCnt"}}
51+
1: { 2: {2: 1.0} # 0x3ff0000000000000i64
52+
1: {"resultCnt"}}}}
53+
```
54+
55+
### The Failure Classification
56+
57+
```yaml
58+
failure_info:
59+
risk: HIGH
60+
category:
61+
- SCHEMA_BROKEN
62+
```
63+
64+
### What's Actually Different?
65+
66+
If you look closely at field `9.3` (the availability facet bucket), the **same two sub-messages** appear but in **reversed order**:
67+
68+
**Expected** (recorded):
69+
```
70+
9: { 3: { 1: { 2: {2: 0.0} # candidateCnt (numeric=0.0)
71+
1: {"candidateCnt"}}
72+
1: { 2: {3: {"OVS"}} # type (text="OVS")
73+
1: {"type"}}}
74+
```
75+
76+
**Actual** (replayed):
77+
```
78+
9: { 3: { 1: { 2: {3: {"OVS"}} # type (text="OVS") — now first
79+
1: {"type"}}
80+
1: { 2: {2: 0.0} # 0x0i64 # candidateCnt — now second
81+
1: {"candidateCnt"}}}
82+
```
83+
84+
The values are **identical**: `candidateCnt = 0.0` and `type = "OVS"`. Only the wire serialization order changed — which is **perfectly valid** in protobuf, where `repeated` fields and map entries have no guaranteed order.
85+
86+
---
87+
88+
## 2. Root Cause Analysis
89+
90+
The bug lives in **three interacting layers** in Keploy's codebase.
91+
92+
### Layer 1: Protoscope Assigns Position-Dependent Indentation
93+
94+
Keploy uses the [`protoscope`](https://github.com/protocolbuffers/protoscope) library to convert raw protobuf wire bytes into human-readable text. The protoscope renderer assigns **indentation based on position**, not content.
95+
96+
When a sub-message is small enough, protoscope inlines it on the same line as the parent `{`:
97+
98+
```
99+
9: { 3: { 1: { 2: {2: 0.0} # 0x0i64 ← 6 spaces indent (inline)
100+
1: {"candidateCnt"}} ← 2 spaces indent (next line)
101+
```
102+
103+
The **first** sub-message gets deeper inline indentation (it continues on the same line as `{`). The **second** sub-message starts on a new line with less indentation. So when the wire order flips, the same content gets **different leading whitespace**.
104+
105+
### Layer 2: Canonicalization Sorts With Indentation Included
106+
107+
The canonicalization function in `pkg/matcher/grpc/canonical.go` (`CanonicalizeTopLevelBlocks`) is designed to make protoscope text order-insensitive. It:
108+
109+
1. Splits text into "top-level field blocks" (lines starting with `\d+:`)
110+
2. Recursively canonicalizes the content inside each `{...}` block
111+
3. **Sorts blocks lexicographically**
112+
4. Joins them back
113+
114+
The problem: `normalizeWhitespace()` only trims **trailing** whitespace and collapses blank lines. It does **not** strip or normalize **leading** indentation. So when `sort.Strings(blocks)` runs, the sort order is determined by the leading spaces, not the content:
115+
116+
```
117+
" 2: {2: 0.0}" sorts before " 1: {\"candidateCnt\"}"
118+
```
119+
120+
because `" "` (6 spaces) sorts before `" 1"` (2 spaces then `1`) in ASCII. But when the wire order flips, the indentation flips too, producing a different sorted result — even though the actual protobuf data is identical.
121+
122+
### Layer 3: Non-JSON Mismatch Is Classified as SCHEMA_BROKEN
123+
124+
In `pkg/matcher/grpc/match.go`, when the two canonicalized strings don't match:
125+
126+
```go
127+
if !decodedDataNormal {
128+
if json.Valid([]byte(expectedDecodedData)) && json.Valid([]byte(actualDecodedData)) {
129+
// JSON comparison with failure assessment
130+
} else {
131+
// non-JSON payload mismatch → Broken
132+
currentRisk = models.High
133+
currentCategories = append(currentCategories, models.SchemaBroken)
134+
}
135+
}
136+
```
137+
138+
Since protoscope text is **not valid JSON**, it falls into the `else` branch, which unconditionally classifies the failure as `HIGH` risk / `SCHEMA_BROKEN` — the most alarming category.
139+
140+
### Summary of the Chain
141+
142+
```
143+
Wire bytes have different field order (valid in protobuf)
144+
→ protoscope assigns different indentation
145+
→ canonicalization sorts by indentation instead of content
146+
→ canonicalized strings differ
147+
→ classified as SCHEMA_BROKEN / HIGH risk
148+
```
149+
150+
### The Fix
151+
152+
The fix needs to **strip leading whitespace from each block before sorting** in `canonicalizeRecursive`:
153+
154+
```go
155+
// Before sorting, strip leading whitespace so that
156+
// sort order depends on content, not position-dependent indentation.
157+
for i := range blocks {
158+
blocks[i] = strings.TrimLeft(blocks[i], " \t")
159+
}
160+
sort.Strings(blocks)
161+
```
162+
163+
---
164+
165+
## 3. About This Sample Application
166+
167+
This is a minimal Go gRPC client-server app that reproduces the exact conditions from the bug report.
168+
169+
### Why a Normal gRPC Server Isn't Enough
170+
171+
Go's standard `proto.Marshal()` serializes `repeated` fields and `map` entries in a **deterministic** (sorted) order. So a normal gRPC server would produce identical wire bytes on every call — the bug would never trigger.
172+
173+
### What This Server Does Differently
174+
175+
The server uses **raw wire encoding** via `google.golang.org/protobuf/encoding/protowire` to manually construct the protobuf response bytes with `rand.Shuffle()` on the repeated field entries:
176+
177+
```go
178+
availEntries := [][]byte{
179+
buildFacetEntry("candidateCnt", &zero, nil),
180+
buildFacetEntry("type", nil, &ovs),
181+
}
182+
rand.Shuffle(len(availEntries), func(i, j int) {
183+
availEntries[i], availEntries[j] = availEntries[j], availEntries[i]
184+
})
185+
```
186+
187+
A `rawCodec` gRPC codec passes these pre-built bytes straight to the wire without re-marshaling, preserving the randomized field ordering.
188+
189+
### Proto Schema
190+
191+
```protobuf
192+
message FacetValue {
193+
oneof value {
194+
double numeric = 2;
195+
string text = 3;
196+
}
197+
}
198+
199+
message FacetEntry {
200+
string name = 1;
201+
FacetValue data = 2;
202+
}
203+
204+
message FacetBucket {
205+
repeated FacetEntry entries = 1;
206+
}
207+
208+
message FacetInfo {
209+
FacetBucket pricing = 2;
210+
FacetBucket availability = 3;
211+
}
212+
213+
message SearchResponse {
214+
float score = 1;
215+
string hits_json = 4;
216+
int32 total = 8;
217+
FacetInfo facets = 9;
218+
}
219+
```
220+
221+
The field numbers (`1`, `4`, `8`, `9`) and nesting structure match the bug report exactly.
222+
223+
### Example: Recorded Test Case (Protoscope Format)
224+
225+
When Keploy records this server's response, the YAML test case looks like this:
226+
227+
```yaml
228+
decoded_data: |
229+
1: 67.0i32 # 0x42860000i32
230+
4: {
231+
"{\"hits\":[{\"_index\":\"pvid_search_products_v4\",..."
232+
}
233+
8: 0
234+
9: {
235+
3: {
236+
1: {
237+
1: {"type"}
238+
2: {3: {"OVS"}}
239+
}
240+
1: {
241+
1: {"candidateCnt"}
242+
2: {2: 0.0} # 0x0i64
243+
}
244+
}
245+
2: {
246+
1: {
247+
1: {"candidateCnt"}
248+
2: {2: 1.0} # 0x3ff0000000000000i64
249+
}
250+
1: {
251+
1: {"resultCnt"}
252+
2: {2: 1.0} # 0x3ff0000000000000i64
253+
}
254+
}
255+
}
256+
```
257+
258+
On the next run (test mode), the `rand.Shuffle` may flip the inner field order, producing different protoscope indentation — triggering the `SCHEMA_BROKEN` false positive.
259+
260+
---
261+
262+
## 4. How to Run
263+
264+
### Prerequisites
265+
266+
- Go 1.21+
267+
- `protoc` compiler (only needed if modifying the `.proto` file)
268+
269+
### Run without Keploy
270+
271+
```bash
272+
# Terminal 1 — start the server
273+
cd /home/anju/grpc-protoscope
274+
go run ./server/
275+
276+
# Terminal 2 — call it (run multiple times to see different field orderings)
277+
cd /home/anju/grpc-protoscope
278+
go run ./client/
279+
go run ./client/
280+
go run ./client/
281+
```
282+
283+
You'll see the facet entries printed in different orders across calls.
284+
285+
---
286+
287+
## 5. Reproducing the Bug with Keploy
288+
289+
### Step 1: Build Keploy from source (if needed)
290+
291+
```bash
292+
cd /home/anju/keploy
293+
go build -ldflags="-X main.apiServerURI=https://api.keploy.io" -o keploy
294+
```
295+
296+
### Step 2: Record a test case
297+
298+
```bash
299+
cd /home/anju/grpc-protoscope
300+
301+
# Start recording
302+
/home/anju/keploy/keploy record -c "go run ./server/"
303+
```
304+
305+
In another terminal, trigger the gRPC call:
306+
307+
```bash
308+
cd /home/anju/grpc-protoscope
309+
go run ./client/
310+
```
311+
312+
Then press `Ctrl+C` in the recording terminal. Keploy saves the test case in `keploy/test-set-0/tests/test-1.yaml`.
313+
314+
### Step 3: Replay (test mode)
315+
316+
```bash
317+
cd /home/anju/grpc-protoscope
318+
/home/anju/keploy/keploy test -c "go run ./server/"
319+
```
320+
321+
**Expected result:** Because `rand.Shuffle` randomizes field ordering each time, ~50% of test runs will produce a different wire order than the recording, triggering:
322+
323+
```
324+
failure_info:
325+
risk: HIGH
326+
category:
327+
- SCHEMA_BROKEN
328+
```
329+
330+
If the test passes (same random order happened to match), delete the `keploy/` folder and repeat steps 2–3.
331+
332+
---
333+
334+
## 6. Files in This Repository
335+
336+
```
337+
grpc-protoscope/
338+
├── README.md ← This file
339+
├── proto/search.proto ← Protobuf schema matching the bug report structure
340+
├── searchpb/ ← Generated Go protobuf/gRPC code
341+
│ ├── search.pb.go
342+
│ └── search_grpc.pb.go
343+
├── server/main.go ← gRPC server with randomized wire field ordering
344+
├── client/main.go ← gRPC client that calls the Search RPC
345+
├── go.mod
346+
├── go.sum
347+
└── keploy/ ← Keploy test artifacts (created after recording)
348+
└── test-set-0/tests/test-1.yaml
349+
```

0 commit comments

Comments
 (0)