|
| 1 | +# grpc-protoscope — Reproducing the Keploy gRPC Field-Ordering Bug |
| 2 | + |
| 3 | +## Table of Contents |
| 4 | + |
| 5 | +1. [The Issue Reported by the Client](#1-the-issue-reported-by-the-client) |
| 6 | +2. [Root Cause Analysis](#2-root-cause-analysis) |
| 7 | +3. [About This Sample Application](#3-about-this-sample-application) |
| 8 | +4. [How to Run](#4-how-to-run) |
| 9 | +5. [Reproducing the Bug with Keploy](#5-reproducing-the-bug-with-keploy) |
| 10 | +6. [Files in This Repository](#6-files-in-this-repository) |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## 1. The Issue Reported by the Client |
| 15 | + |
| 16 | +A user reported that Keploy was **failing gRPC tests** even though the recorded and replayed responses had **identical structure and values**. The only difference was the **order of individual fields** inside nested protobuf sub-messages. |
| 17 | + |
| 18 | +### Client's Exact Input |
| 19 | + |
| 20 | +```yaml |
| 21 | +expected: |- |
| 22 | + 1: 67.0i32 # 0x42860000i32 |
| 23 | + 4: {"{\"hits\":[{\"_index\":\"pvid_search_products_v4\",\"_score\":15100000000000000000,\"_sou" |
| 24 | + "rce\":{\"_rankingInfo\":{\"typosPresent\":true,\"numberOfWordsMatched\":1}},\"match_type" |
| 25 | + "\":\"Other\",\"attributes\":{\"subThemes\":null},\"_id\":\"4f30407c-6a3c-4a4e-8a3d-652217d" |
| 26 | + "4b6cb_d67c25f8-3adb-40c1-9113-b46d54a6e8aa\",\"trimming_meta\":{\"trimming_type\":\"L3" |
| 27 | + "\"}}]}"} |
| 28 | + 8: 0 |
| 29 | + 9: { 3: { 1: { 2: {2: 0.0} # 0x0i64 |
| 30 | + 1: {"candidateCnt"}} |
| 31 | + 1: { 2: {3: {"OVS"}} |
| 32 | + 1: {"type"}}} |
| 33 | + 2: { 1: { 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 34 | + 1: {"candidateCnt"}} |
| 35 | + 1: { 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 36 | + 1: {"resultCnt"}}}} |
| 37 | +actual: |- |
| 38 | + 1: 67.0i32 # 0x42860000i32 |
| 39 | + 4: {"{\"hits\":[{\"_index\":\"pvid_search_products_v4\",\"_score\":15100000000000000000,\"_sou" |
| 40 | + "rce\":{\"_rankingInfo\":{\"typosPresent\":true,\"numberOfWordsMatched\":1}},\"match_type" |
| 41 | + "\":\"Other\",\"attributes\":{\"subThemes\":null},\"_id\":\"4f30407c-6a3c-4a4e-8a3d-652217d" |
| 42 | + "4b6cb_d67c25f8-3adb-40c1-9113-b46d54a6e8aa\",\"trimming_meta\":{\"trimming_type\":\"L3" |
| 43 | + "\"}}]}"} |
| 44 | + 8: 0 |
| 45 | + 9: { 3: { 1: { 2: {3: {"OVS"}} |
| 46 | + 1: {"type"}} |
| 47 | + 1: { 2: {2: 0.0} # 0x0i64 |
| 48 | + 1: {"candidateCnt"}}} |
| 49 | + 2: { 1: { 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 50 | + 1: {"candidateCnt"}} |
| 51 | + 1: { 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 52 | + 1: {"resultCnt"}}}} |
| 53 | +``` |
| 54 | +
|
| 55 | +### The Failure Classification |
| 56 | +
|
| 57 | +```yaml |
| 58 | +failure_info: |
| 59 | + risk: HIGH |
| 60 | + category: |
| 61 | + - SCHEMA_BROKEN |
| 62 | +``` |
| 63 | +
|
| 64 | +### What's Actually Different? |
| 65 | +
|
| 66 | +If you look closely at field `9.3` (the availability facet bucket), the **same two sub-messages** appear but in **reversed order**: |
| 67 | + |
| 68 | +**Expected** (recorded): |
| 69 | +``` |
| 70 | +9: { 3: { 1: { 2: {2: 0.0} # candidateCnt (numeric=0.0) |
| 71 | + 1: {"candidateCnt"}} |
| 72 | + 1: { 2: {3: {"OVS"}} # type (text="OVS") |
| 73 | + 1: {"type"}}} |
| 74 | +``` |
| 75 | +
|
| 76 | +**Actual** (replayed): |
| 77 | +``` |
| 78 | +9: { 3: { 1: { 2: {3: {"OVS"}} # type (text="OVS") — now first |
| 79 | + 1: {"type"}} |
| 80 | + 1: { 2: {2: 0.0} # 0x0i64 # candidateCnt — now second |
| 81 | + 1: {"candidateCnt"}}} |
| 82 | +``` |
| 83 | +
|
| 84 | +The values are **identical**: `candidateCnt = 0.0` and `type = "OVS"`. Only the wire serialization order changed — which is **perfectly valid** in protobuf, where `repeated` fields and map entries have no guaranteed order. |
| 85 | +
|
| 86 | +--- |
| 87 | +
|
| 88 | +## 2. Root Cause Analysis |
| 89 | +
|
| 90 | +The bug lives in **three interacting layers** in Keploy's codebase. |
| 91 | +
|
| 92 | +### Layer 1: Protoscope Assigns Position-Dependent Indentation |
| 93 | +
|
| 94 | +Keploy uses the [`protoscope`](https://github.com/protocolbuffers/protoscope) library to convert raw protobuf wire bytes into human-readable text. The protoscope renderer assigns **indentation based on position**, not content. |
| 95 | +
|
| 96 | +When a sub-message is small enough, protoscope inlines it on the same line as the parent `{`: |
| 97 | +
|
| 98 | +``` |
| 99 | +9: { 3: { 1: { 2: {2: 0.0} # 0x0i64 ← 6 spaces indent (inline) |
| 100 | + 1: {"candidateCnt"}} ← 2 spaces indent (next line) |
| 101 | +``` |
| 102 | +
|
| 103 | +The **first** sub-message gets deeper inline indentation (it continues on the same line as `{`). The **second** sub-message starts on a new line with less indentation. So when the wire order flips, the same content gets **different leading whitespace**. |
| 104 | +
|
| 105 | +### Layer 2: Canonicalization Sorts With Indentation Included |
| 106 | +
|
| 107 | +The canonicalization function in `pkg/matcher/grpc/canonical.go` (`CanonicalizeTopLevelBlocks`) is designed to make protoscope text order-insensitive. It: |
| 108 | +
|
| 109 | +1. Splits text into "top-level field blocks" (lines starting with `\d+:`) |
| 110 | +2. Recursively canonicalizes the content inside each `{...}` block |
| 111 | +3. **Sorts blocks lexicographically** |
| 112 | +4. Joins them back |
| 113 | +
|
| 114 | +The problem: `normalizeWhitespace()` only trims **trailing** whitespace and collapses blank lines. It does **not** strip or normalize **leading** indentation. So when `sort.Strings(blocks)` runs, the sort order is determined by the leading spaces, not the content: |
| 115 | +
|
| 116 | +``` |
| 117 | +" 2: {2: 0.0}" sorts before " 1: {\"candidateCnt\"}" |
| 118 | +``` |
| 119 | +
|
| 120 | +because `" "` (6 spaces) sorts before `" 1"` (2 spaces then `1`) in ASCII. But when the wire order flips, the indentation flips too, producing a different sorted result — even though the actual protobuf data is identical. |
| 121 | +
|
| 122 | +### Layer 3: Non-JSON Mismatch Is Classified as SCHEMA_BROKEN |
| 123 | +
|
| 124 | +In `pkg/matcher/grpc/match.go`, when the two canonicalized strings don't match: |
| 125 | +
|
| 126 | +```go |
| 127 | +if !decodedDataNormal { |
| 128 | + if json.Valid([]byte(expectedDecodedData)) && json.Valid([]byte(actualDecodedData)) { |
| 129 | + // JSON comparison with failure assessment |
| 130 | + } else { |
| 131 | + // non-JSON payload mismatch → Broken |
| 132 | + currentRisk = models.High |
| 133 | + currentCategories = append(currentCategories, models.SchemaBroken) |
| 134 | + } |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +Since protoscope text is **not valid JSON**, it falls into the `else` branch, which unconditionally classifies the failure as `HIGH` risk / `SCHEMA_BROKEN` — the most alarming category. |
| 139 | + |
| 140 | +### Summary of the Chain |
| 141 | + |
| 142 | +``` |
| 143 | +Wire bytes have different field order (valid in protobuf) |
| 144 | + → protoscope assigns different indentation |
| 145 | + → canonicalization sorts by indentation instead of content |
| 146 | + → canonicalized strings differ |
| 147 | + → classified as SCHEMA_BROKEN / HIGH risk |
| 148 | +``` |
| 149 | + |
| 150 | +### The Fix |
| 151 | + |
| 152 | +The fix needs to **strip leading whitespace from each block before sorting** in `canonicalizeRecursive`: |
| 153 | + |
| 154 | +```go |
| 155 | +// Before sorting, strip leading whitespace so that |
| 156 | +// sort order depends on content, not position-dependent indentation. |
| 157 | +for i := range blocks { |
| 158 | + blocks[i] = strings.TrimLeft(blocks[i], " \t") |
| 159 | +} |
| 160 | +sort.Strings(blocks) |
| 161 | +``` |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## 3. About This Sample Application |
| 166 | + |
| 167 | +This is a minimal Go gRPC client-server app that reproduces the exact conditions from the bug report. |
| 168 | + |
| 169 | +### Why a Normal gRPC Server Isn't Enough |
| 170 | + |
| 171 | +Go's standard `proto.Marshal()` serializes `repeated` fields and `map` entries in a **deterministic** (sorted) order. So a normal gRPC server would produce identical wire bytes on every call — the bug would never trigger. |
| 172 | + |
| 173 | +### What This Server Does Differently |
| 174 | + |
| 175 | +The server uses **raw wire encoding** via `google.golang.org/protobuf/encoding/protowire` to manually construct the protobuf response bytes with `rand.Shuffle()` on the repeated field entries: |
| 176 | + |
| 177 | +```go |
| 178 | +availEntries := [][]byte{ |
| 179 | + buildFacetEntry("candidateCnt", &zero, nil), |
| 180 | + buildFacetEntry("type", nil, &ovs), |
| 181 | +} |
| 182 | +rand.Shuffle(len(availEntries), func(i, j int) { |
| 183 | + availEntries[i], availEntries[j] = availEntries[j], availEntries[i] |
| 184 | +}) |
| 185 | +``` |
| 186 | + |
| 187 | +A `rawCodec` gRPC codec passes these pre-built bytes straight to the wire without re-marshaling, preserving the randomized field ordering. |
| 188 | + |
| 189 | +### Proto Schema |
| 190 | + |
| 191 | +```protobuf |
| 192 | +message FacetValue { |
| 193 | + oneof value { |
| 194 | + double numeric = 2; |
| 195 | + string text = 3; |
| 196 | + } |
| 197 | +} |
| 198 | +
|
| 199 | +message FacetEntry { |
| 200 | + string name = 1; |
| 201 | + FacetValue data = 2; |
| 202 | +} |
| 203 | +
|
| 204 | +message FacetBucket { |
| 205 | + repeated FacetEntry entries = 1; |
| 206 | +} |
| 207 | +
|
| 208 | +message FacetInfo { |
| 209 | + FacetBucket pricing = 2; |
| 210 | + FacetBucket availability = 3; |
| 211 | +} |
| 212 | +
|
| 213 | +message SearchResponse { |
| 214 | + float score = 1; |
| 215 | + string hits_json = 4; |
| 216 | + int32 total = 8; |
| 217 | + FacetInfo facets = 9; |
| 218 | +} |
| 219 | +``` |
| 220 | + |
| 221 | +The field numbers (`1`, `4`, `8`, `9`) and nesting structure match the bug report exactly. |
| 222 | + |
| 223 | +### Example: Recorded Test Case (Protoscope Format) |
| 224 | + |
| 225 | +When Keploy records this server's response, the YAML test case looks like this: |
| 226 | + |
| 227 | +```yaml |
| 228 | +decoded_data: | |
| 229 | + 1: 67.0i32 # 0x42860000i32 |
| 230 | + 4: { |
| 231 | + "{\"hits\":[{\"_index\":\"pvid_search_products_v4\",..." |
| 232 | + } |
| 233 | + 8: 0 |
| 234 | + 9: { |
| 235 | + 3: { |
| 236 | + 1: { |
| 237 | + 1: {"type"} |
| 238 | + 2: {3: {"OVS"}} |
| 239 | + } |
| 240 | + 1: { |
| 241 | + 1: {"candidateCnt"} |
| 242 | + 2: {2: 0.0} # 0x0i64 |
| 243 | + } |
| 244 | + } |
| 245 | + 2: { |
| 246 | + 1: { |
| 247 | + 1: {"candidateCnt"} |
| 248 | + 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 249 | + } |
| 250 | + 1: { |
| 251 | + 1: {"resultCnt"} |
| 252 | + 2: {2: 1.0} # 0x3ff0000000000000i64 |
| 253 | + } |
| 254 | + } |
| 255 | + } |
| 256 | +``` |
| 257 | +
|
| 258 | +On the next run (test mode), the `rand.Shuffle` may flip the inner field order, producing different protoscope indentation — triggering the `SCHEMA_BROKEN` false positive. |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +## 4. How to Run |
| 263 | + |
| 264 | +### Prerequisites |
| 265 | + |
| 266 | +- Go 1.21+ |
| 267 | +- `protoc` compiler (only needed if modifying the `.proto` file) |
| 268 | + |
| 269 | +### Run without Keploy |
| 270 | + |
| 271 | +```bash |
| 272 | +# Terminal 1 — start the server |
| 273 | +cd /home/anju/grpc-protoscope |
| 274 | +go run ./server/ |
| 275 | +
|
| 276 | +# Terminal 2 — call it (run multiple times to see different field orderings) |
| 277 | +cd /home/anju/grpc-protoscope |
| 278 | +go run ./client/ |
| 279 | +go run ./client/ |
| 280 | +go run ./client/ |
| 281 | +``` |
| 282 | + |
| 283 | +You'll see the facet entries printed in different orders across calls. |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +## 5. Reproducing the Bug with Keploy |
| 288 | + |
| 289 | +### Step 1: Build Keploy from source (if needed) |
| 290 | + |
| 291 | +```bash |
| 292 | +cd /home/anju/keploy |
| 293 | +go build -ldflags="-X main.apiServerURI=https://api.keploy.io" -o keploy |
| 294 | +``` |
| 295 | + |
| 296 | +### Step 2: Record a test case |
| 297 | + |
| 298 | +```bash |
| 299 | +cd /home/anju/grpc-protoscope |
| 300 | +
|
| 301 | +# Start recording |
| 302 | +/home/anju/keploy/keploy record -c "go run ./server/" |
| 303 | +``` |
| 304 | + |
| 305 | +In another terminal, trigger the gRPC call: |
| 306 | + |
| 307 | +```bash |
| 308 | +cd /home/anju/grpc-protoscope |
| 309 | +go run ./client/ |
| 310 | +``` |
| 311 | + |
| 312 | +Then press `Ctrl+C` in the recording terminal. Keploy saves the test case in `keploy/test-set-0/tests/test-1.yaml`. |
| 313 | + |
| 314 | +### Step 3: Replay (test mode) |
| 315 | + |
| 316 | +```bash |
| 317 | +cd /home/anju/grpc-protoscope |
| 318 | +/home/anju/keploy/keploy test -c "go run ./server/" |
| 319 | +``` |
| 320 | + |
| 321 | +**Expected result:** Because `rand.Shuffle` randomizes field ordering each time, ~50% of test runs will produce a different wire order than the recording, triggering: |
| 322 | + |
| 323 | +``` |
| 324 | +failure_info: |
| 325 | + risk: HIGH |
| 326 | + category: |
| 327 | + - SCHEMA_BROKEN |
| 328 | +``` |
| 329 | +
|
| 330 | +If the test passes (same random order happened to match), delete the `keploy/` folder and repeat steps 2–3. |
| 331 | +
|
| 332 | +--- |
| 333 | +
|
| 334 | +## 6. Files in This Repository |
| 335 | +
|
| 336 | +``` |
| 337 | +grpc-protoscope/ |
| 338 | +├── README.md ← This file |
| 339 | +├── proto/search.proto ← Protobuf schema matching the bug report structure |
| 340 | +├── searchpb/ ← Generated Go protobuf/gRPC code |
| 341 | +│ ├── search.pb.go |
| 342 | +│ └── search_grpc.pb.go |
| 343 | +├── server/main.go ← gRPC server with randomized wire field ordering |
| 344 | +├── client/main.go ← gRPC client that calls the Search RPC |
| 345 | +├── go.mod |
| 346 | +├── go.sum |
| 347 | +└── keploy/ ← Keploy test artifacts (created after recording) |
| 348 | + └── test-set-0/tests/test-1.yaml |
| 349 | +``` |
0 commit comments