Skip to content

feat: per-key cache miss tracking for arbitrary commands#374

Open
fcostaoliveira wants to merge 13 commits into
masterfrom
feat/arbitrary-command-miss-tracking
Open

feat: per-key cache miss tracking for arbitrary commands#374
fcostaoliveira wants to merge 13 commits into
masterfrom
feat/arbitrary-command-miss-tracking

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Collaborator

@fcostaoliveira fcostaoliveira commented May 9, 2026

Summary

Extends miss/hit accounting (today only available for GET/MGET) to any Redis command supplied via --command, with per-key bucket granularity. Driven by a vendored snapshot of Redis' src/commands/*.json (8.8) processed at build time into a static C++ registry — no JSON parser linked into the binary, no runtime probe required for the common case.

Two commits for review:

  1. build: vendor Redis 8.8 commands.json + command_meta registry — pure infrastructure; introduces make commands-meta targets and the codegen pipeline. No behavior change.
  2. feat: per-key cache miss tracking for arbitrary commands — the actual feature; ~700 lines of C++.

Reply-shape coverage

Shape Examples Tracking
SingleNullBulk GET, HGET, ZSCORE, GETEX, GETDEL, LPOP, SPOP null reply ⇒ miss; 1 bucket
ArrayPerElementNulls MGET, HMGET, ZMSCORE walk top-level mbulk; per-position null mask
EmptyCollection SMEMBERS, LRANGE, HGETALL, HKEYS, HVALS, ZRANGE empty == miss heuristic
IntegerMembership EXISTS, SISMEMBER, HEXISTS integer reply N is the hit count
NotMissable / Unknown SET, INCR, modules… feature silently disabled

CLI

--command-miss-tracking={auto,off}    default: auto

auto enables tracking for any --command whose first token (or CONTAINER SUB for subcommands) resolves to a miss-bearing reply shape in the registry. off keeps the old JSON shape (no Hits/Misses fields, no Per-Key section) for back-compat with downstream tooling.

JSON output (mb.json, under ALL STATS)

  • Per-arbitrary-command entries gain real Hits/sec / Misses/sec (previously hardcoded to 0.0).
  • New nested Per-Key Misses section:
"Per-Key Misses": {
    "MGET": {
        "Total Hits": 380, "Total Misses": 820,
        "key[0] Hits": 130, "key[0] Misses": 270,
        "key[1] Hits": 130, "key[1] Misses": 270,
        "key[2] Hits": 120, "key[2] Misses": 280
    }
}

Make targets (maintainer-only)

make commands-meta                       # default REDIS_VERSION=8.8
make commands-meta REDIS_VERSION=8.6     # bump
make commands-meta-fetch                 # JSONs only (slow, network)
make commands-meta-gen                   # regen .h from JSONs (fast)
make commands-meta-check                 # CI guard for stale .h

End-user builds do not require Python; AM_PATH_PYTHON resolves to : if absent.

Smoke results

3 keys preloaded (memtier-1..3), 10-key uniform-random range, 200 reqs × 2 clients (~30% expected hit rate):

Command Total Hits / Misses Buckets
GET __key__ 120 / 280 key[0]
MGET __key__ __key__ __key__ 380 / 820 key[0..2] (no phantom 4th)
EXISTS __key__ 120 / 280 key[0]
SMEMBERS __key__ 120 / 280 key[0]
GET __key__ w/ --command-miss-tracking=off (no section)

Implementation notes

  • m_keep_value=true is forced only on connections that need reply-array inspection (ArrayPerElementNulls / EmptyCollection). SingleNullBulk and IntegerMembership reuse the parser's existing scalar m_hits / m_status and pay no allocation cost.
  • Memcached / module / unknown commands resolve to spec=NULL and the feature is silently disabled — no changes to the memcached path.
  • Thread-safe: per-thread run_stats carry per-command miss totals and per-key vectors; merged into the printer's run_stats after threads join.
  • Mismatches between user-supplied __key__ placeholder count and spec-resolved key count emit a startup warning but do not fail (user might be testing custom modules).

Test plan

  • make format, make format-check
  • make commands-meta-check (codegen idempotent)
  • make -j$(nproc) clean
  • ./memtier_benchmark --help shows the new flag
  • Man page entry present
  • Bash completion entry present
  • Smoke battery for all four reply shapes against local Redis 8 (results above)
  • CI / RLTest run (will be exercised by tests/test_arbitrary_command_misses.py)

🤖 Generated with Claude Code


Note

Medium Risk
Adds non-trivial response-shape parsing and new stats aggregation on the hot arbitrary-command reply path, which could affect correctness/performance for some commands and reply edge cases. Build now vendors a large Redis command metadata snapshot and introduces codegen targets, increasing maintenance surface area.

Overview
Adds per-key hit/miss accounting for --command arbitrary Redis commands, expanding beyond the existing GET/MGET-only tracking. At startup, commands are resolved against a new static command_meta registry (generated from a vendored Redis commands.json snapshot) to determine key positions and reply “shapes”, and at runtime client.cpp inspects responses to compute per-command and per-bucket hits/misses (with optional value materialization only when array-walking is needed).

Introduces the command_meta subsystem (command_meta.{h,cpp} + generated command_meta_data.h) and a maintainer workflow (make commands-meta*, optional Python) to fetch/regen the metadata; the Redis JSON snapshot is committed under deps/commands_json and un-ignored in .gitignore. Stats plumbing is extended via run_stats::update_arbitrary_op_misses() to accumulate totals and per-key vectors, and shell completion is updated to include the new --command-miss-tracking option.

Reviewed by Cursor Bugbot for commit f4716fd. Bugbot is set up for automated code reviews on this repo. Configure here.

fcostaoliveira and others added 2 commits May 9, 2026 12:30
Adds the codegen infrastructure that lets memtier_benchmark statically
resolve key positions and reply shapes for arbitrary Redis commands
(needed by the upcoming per-key miss tracking feature).

- scripts/fetch_commands_meta.sh: resolves a Redis ref (branch/tag/sha)
  to a commit, downloads the tarball, extracts src/commands/*.json into
  deps/commands_json/, and pins the sha in REDIS_TAG.txt.
- scripts/gen_command_meta.py: parses each JSON, emits a single
  command_meta_data.h with static CommandSpec initializers (key_specs as
  a tagged union over Index/Keyword x Range/Keynum, reply_shape inferred
  from reply_schema with a curated override table for the well-known set).
- command_meta.{h,cpp}: typed surface (BeginSearch, FindKeys, KeySpec,
  CommandSpec, ReplyShape) with a case-insensitive lookup() over the
  static table. Subcommands are looked up via the canonical
  "CONTAINER SUB" form (e.g. "XGROUP CREATE").

Make targets (maintainer-only; `make all` is unaffected):
    make commands-meta                       # default REDIS_VERSION=8.8
    make commands-meta REDIS_VERSION=8.6     # override branch/tag/sha
    make commands-meta-fetch                 # JSONs only
    make commands-meta-gen                   # regen .h from JSONs
    make commands-meta-check                 # CI guard for stale .h

Vendored snapshot: redis/redis@8.8 (sha 369cbad8...), 425 commands;
distribution: 342 NotMissable, 37 SingleNullBulk, 9 ArrayPerElementNulls,
6 EmptyCollection, 3 IntegerMembership, 28 Unknown.

End-user builds do not require Python; AM_PATH_PYTHON resolves to ":" if
absent. The 425-file directory is built into a single static array at
codegen time, no runtime JSON parser is linked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the existing GET/MGET hit/miss accounting to any --command the
user supplies, with per-key bucket granularity. Driven by the static
command_meta registry (vendored Redis 8.8 commands.json) that resolves
key positions and reply shape at startup.

What gets tracked, by reply shape:
- SingleNullBulk (GET, HGET, ZSCORE, GETEX, GETDEL, LPOP, RPOP, SPOP,
  RANDOMKEY, LINDEX): null reply == miss; one bucket.
- ArrayPerElementNulls (MGET, HMGET, ZMSCORE): walk the top-level
  mbulk; per-position null mask is the per-key miss vector.
- EmptyCollection (SMEMBERS, LRANGE, HGETALL, HKEYS, HVALS, ZRANGE):
  empty array heuristically counted as miss.
- IntegerMembership (EXISTS, SISMEMBER, HEXISTS): integer reply N is
  the hit count; first N positions marked as hits.

CLI:
    --command-miss-tracking={auto,off}   default: auto

JSON output (mb.json) under "ALL STATS":
- Existing per-arbitrary-command entries gain real Hits/sec and
  Misses/sec scalars (previously hardcoded to 0.0).
- New nested "Per-Key Misses" section with Total Hits / Total Misses
  and key[N] Hits / key[N] Misses for each command that produced
  per-position data. Section is omitted entirely when
  --command-miss-tracking=off, keeping legacy consumers stable.

Plumbing:
- arbitrary_command resolves spec_key_positions via key_specs evaluation
  (Index/Keyword x Range/Keynum, including variadic and EVAL-style
  keynum). Mismatched __key__ counts emit a startup warning but don't
  fail.
- arbitrary_request carries a back-pointer to the source command so the
  reply handler can inspect shape without lookup.
- m_keep_value=true is forced on connections only when ArrayPerElement
  Nulls or EmptyCollection commands are configured (SingleNullBulk and
  IntegerMembership use the parser's existing scalar counters and pay
  no allocation cost).
- Memcached / module / unknown commands resolve to spec=NULL and the
  feature stays silently disabled.

Tests: tests/test_arbitrary_command_misses.py covers all four shapes,
the off switch, and a NotMissable command (SET) producing no Per-Key
entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fcostaoliveira fcostaoliveira requested a review from paulorsousa May 9, 2026 11:33
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 9, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

…ader

Two CI failures from the initial PR run:

1. macOS Apple Clang rejected `enum class` case labels with
   "integral constant expression must have integral or unscoped
   enumeration type" because the compiler defaulted to C++03 mode.
   We were relying on Linux GCC's modern default. Add `-std=c++11` to
   CXXFLAGS in configure.ac so every build uses the language version
   declared in AGENTS.md.

2. The `C++ Format Check` GitHub Action does
       find . -name "*.cpp" -o -name "*.h" | grep -v deps/ | grep -v autom4te.cache/
   without excluding command_meta_data.h. The local Makefile.am had
   carried an exclusion to skip clang-format on the generated header,
   which let it drift out of style. Drop the local exclusion (so local
   and CI agree) and have `make commands-meta-gen` clang-format its own
   output going forward, then re-format the existing checked-in header
   to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread client.cpp Outdated
Cursor Bugbot caught a HIGH severity crash: the per-key miss handler
called el->as_bulk() unconditionally on every top-level reply element,
which assert(0)s when the element is an mbulk_size_el. This fires for
GEOPOS, COMMAND INFO, SORT_RO and any other command whose
reply_schema infers ArrayPerElementNulls (array items oneOf containing
null) but whose non-null items are nested arrays rather than bulks.

Add public is_bulk() / is_mbulk_size() type predicates on mbulk_element
so callers can avoid as_bulk()/as_mbulk_size() asserts when walking
heterogeneous reply arrays. The handler now distinguishes:

  - bulk_el with value/value_len > 0     -> hit
  - bulk_el with value==NULL/len==0      -> miss (null bulk, $-1)
  - mbulk_size_el with non-empty children -> hit (nested array value)
  - mbulk_size_el with empty children    -> miss (null array, *-1, or
                                                  empty array, *0)

GEOPOS replies *1\r\n*-1\r\n on missing keys; the parser materializes
the inner *-1 as an empty mbulk_size_el (indistinguishable from *0),
so empty-array-in-position is the right miss signal there.

New regression test exercises GEOPOS against pre-populated geo keys
across a wider range to assert the walker counts both hits and misses
without crashing.

Reported-by: cursor[bot] (Bugbot review on PR #374)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread run_stats.cpp
Comment thread client.cpp
Two issues caught by Cursor Bugbot on commit af91a0f:

1. (Medium) `client.cpp`: ArrayPerElementNulls capped to one element for
   HMGET. The walk used `request->m_keys` (== spec key count) as the
   bucket cap, but HMGET / ZMSCORE / HGETDEL / HGETEX have 1 Redis key
   and N reply elements (one per field/member). Capping to 1 silently
   discarded all but the first position's hit/miss attribution and
   broke the aggregate. Fix: size per_key_hit to the reply length
   directly and update num_keys to match, so bucket count tracks the
   semantically interesting axis (the field/member positions) rather
   than the spec key count. MGET continues to work 1:1 because there
   the spec key count and reply length already match.

2. (Medium) `run_stats.cpp`: lazy resize via `assign()` zeroed
   accumulated counters whenever the per-key vector size differed from
   the previous call. Combined with #1's variable-size replies (and
   future merges across threads with different observed sizes), this
   would silently discard stats. Fix: switch to `resize()` with
   grow-only semantics, mirroring the merge() path that already uses
   resize for the same reason.

New regression test exercises HMGET with three fields, only two of
which exist on populated hashes; asserts exactly three buckets and
that the never-present field reports 0 hits / all misses.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit af91a0f)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread config_types.cpp Outdated
Comment thread client.cpp Outdated
fcostaoliveira and others added 2 commits May 9, 2026 14:10
The redis-py 4.x signature changed `geoadd(name, *args)` -> `geoadd(name,
values, nx=False, xx=False, ch=False)`, which made
`conn.geoadd(key, lon, lat, member)` raise
`DataError: GEOADD allows either 'nx' or 'xx', not both` because the
"member" string is interpreted as a flag. CI runs use redis-py 4.x and
this is what surfaced as "Exception raised during test execution" on
PR #374's UBSan / ASAN / CI test workflows.

Use `conn.execute_command("GEOADD", ...)` to stay version-agnostic.
…_key_spec

Two more Cursor Bugbot findings on commit 46b9b2d, both legitimate:

1. (Medium) `config_types.cpp`: off-by-one in evaluate_key_spec. Redis
   commands.json positions are 1-based with position 0 == command name;
   in our 0-based command_args vector that maps directly to args[N], no
   -1 offset. The previous `args[idx - 1]` accesses read the wrong
   slot in the Keyword path (always read the command name on the first
   iteration) and in the Keynum path (read EVAL's "script" instead of
   "numkeys", failing strtol and leaving spec_key_positions empty).
   Range never accessed args[] so the bug was latent.

   Also fix the Range / Keynum bounds: `last > argc` -> `last >= argc`
   (clamp to argc-1, the last valid index), `numidx > argc` -> `numidx
   >= argc`, and `p > argc` -> `p >= argc` for the same reason.

   Caught by Bugbot; not yet exposed by tests (no test exercises
   Keyword/Keynum specs).

2. (Low) `client.cpp`: empty-string bulk could be classified as miss.
   Use `value != NULL` (not `value_len > 0`) so that any non-null bulk
   slot counts as a hit, including $0 empty strings. Today the
   redis_protocol parser collapses $0 and $-1 into the same in-memory
   shape (value=NULL, value_len=0), so the change is a no-op until the
   parser preserves the distinction; documented inline. Forward-
   compatible with a future parser fix.

Verified locally: EVAL (Keynum path) parses numkeys correctly and runs
without crashing. All 8 tests in test_arbitrary_command_misses.py pass.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit 46b9b2d)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread client.cpp
…ot get_hits()

Cursor Bugbot finding on commit 2930249 (Medium):

response->get_hits() counts every non-null bulk the parser walks. For
SingleNullBulk commands that return arrays on success — BLPOP/BRPOP/
BZPOPMAX/BZPOPMIN return [key, value] (get_hits() == 2), and LPOP key
COUNT N returns an N-element array (get_hits() == N) — that overcounts
hits and underflows misses.

Switch SingleNullBulk to look at the top-level RESP type header (carried
in m_status) directly: $-1 / *-1 == miss, anything else == hit. Always
emit a single bucket because variadic-key blocking commands (BLPOP) put
the winning key inside the reply array and we don't currently parse it,
so per-key attribution isn't meaningful beyond aggregate hit/miss.

Verified locally: GET (1-key, established case) still reports correct
hit rate; LPOP key COUNT N (the array-reply scenario) now records one
hit per successful pop instead of N. All 8 tests in
test_arbitrary_command_misses.py pass.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit 2930249)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread client.cpp Outdated
Comment thread client.cpp Outdated
…size

Cursor Bugbot findings on commit 8d40cfc:

1. (Medium) SingleNullBulk null sentinels were $-1 and *-1, but Redis
   replies *0 (empty array) on missing key for SPOP/SRANDMEMBER when
   used with the count argument (e.g. `SPOP key 1`). My check
   misclassified the empty array as a hit. Add *0 to the null-sentinel
   set. Verified live: SPOP __key__ 1 with 3/10 keys populated and
   destructive depletion now reports 15 hits / 385 misses (3 keys * 5
   items per set = 15 successful pops, rest empty/missing).

2. (Low) per_key_hit was unconditionally pre-resized to num_keys at the
   top of the if-block, then re-assigned in 3 of the 4 cases — a wasted
   allocation on every response. Drop the pre-resize and let each case
   assign() to its actual bucket count once. EmptyCollection and
   IntegerMembership now do their own assign(num_keys, ...) explicitly
   (they previously relied on the pre-resize for indexing).

Reported-by: cursor[bot] (Bugbot review on PR #374, commit 8d40cfc)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread client.cpp
…ply isn't an array

Cursor Bugbot finding on commit 16f5054 (Medium):

In the ArrayPerElementNulls path, when response->get_mbulk_value()
returns NULL (top-level reply was +OK / -ERR / a single bulk / or any
non-mbulk shape — possible if a misshape inferred ArrayPerElementNulls
or for some module commands), the else branch incremented misses by
num_keys but left per_key_hit empty. update_arbitrary_op_misses then
recorded the aggregate but skipped the per-key vector update, drifting
the two counters out of sync.

Account uniformly: collapse to a single miss bucket so aggregate and
per-key stay consistent regardless of reply shape.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit 16f5054)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread client.cpp Outdated
…ault branch

Cursor Bugbot finding on commit e0b7272 (Low):

The default branch in the ReplyShape switch is unreachable today —
miss_tracking_enabled is only set for the four handled miss-bearing
shapes — but if a future shape is added without a case, the call to
update_arbitrary_op_misses below would silently record zero-valued
hits/misses with an empty per_key_hit, drifting the stats without any
signal. Guard the call on a non-empty per_key_hit so the intent is
explicit and a future unhandled shape becomes a clear no-op rather
than a silent "all zero" entry.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit e0b7272)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread config_types.cpp
Cursor Bugbot finding on commit ca8a20c (Low):

When find_keys.range.limit > 0, the previous formula
    last = start_pos + total - 1
treated `total` as a contiguous count, which is wrong once step > 1
(positions are strided: start_pos, +step, +2*step, ...). The corrected
formula is
    last = start_pos + (total - 1) * step

Also tighten the initial total calculation to be stride-aware:
    total = (last - start_pos) / step + 1
instead of `last - start_pos + 1`, which previously over-counted slots
between the strided positions.

Latent today — Bugbot itself notes that no registered Redis command
combines limit > 0 with step > 1 (XREAD/XREADGROUP, the only commands
using limit, both have step=1) — so this is a defensive correctness
fix that future-proofs the evaluator.

Reported-by: cursor[bot] (Bugbot review on PR #374, commit ca8a20c)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit dbff900. Configure here.

Comment thread client.cpp
num_keys = 1;
hits = is_null ? 0 : 1;
misses = is_null ? 1 : 0;
if (!is_null) per_key_hit[0] = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent NULL status handling between reply shape cases

Low Severity

The SingleNullBulk case treats a NULL status pointer (from response->get_status()) as a cache HIT because the is_null condition short-circuits to false. In contrast, the IntegerMembership case treats NULL status as all MISSES (since n stays 0). This asymmetry means that if the parser ever fails to populate the status for a non-error response, SingleNullBulk silently records phantom hits while IntegerMembership records phantom misses.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit dbff900. Configure here.

Adds four RLTest cases that observably exercise the Bugbot-found bugs
that previously had no automated coverage (verified only via local
smoke tests):

- test_arbitrary_spop_with_count_empty_array_is_miss   (#7)
  Pre-loads 3 sets * 5 members, runs SPOP key COUNT 1; without the *0
  null-sentinel fix, every call would record a hit. With the fix only
  ~15 destructive pops succeed and the rest report misses.

- test_arbitrary_blpop_variadic_keys_single_bucket    (#6)
  Pre-loads memtier-1 with enough items for every BLPOP call to return
  immediately, and verifies "Per-Key Misses" carries exactly key[0]
  (not key[0..2]). Without the fix, response->get_hits()=2 from the
  [key, value] reply would mark two of three buckets as hit and one
  as miss per call.

- test_arbitrary_xread_keyword_spec_evaluates_correctly (#4 Keyword)
  Pre-loads 3 streams; XREAD COUNT 1 STREAMS __key__ 0 across a 10-key
  range must produce both hits and misses, proving the Keyword
  begin_search resolved correctly. Without the args[idx] fix, the
  STREAMS keyword search would land on the wrong slot.

- test_arbitrary_eval_keynum_spec_runs_cleanly         (#4 Keynum)
  EVAL return 1 1 __key__ exercises the Keynum begin_search path.
  EVAL is NotMissable, so we assert no Per-Key entry but Count > 0,
  proving resolve_command_meta read numkeys from the right slot.

Findings #2 (lazy resize), #5 (empty bulk), #8 (alloc hygiene),
#9 (mbulk NULL), #10 (default switch) and #11 (Range step>1) remain
without dedicated tests because they're either parser-collapsed
(#5), defensive/unreachable (#9, #10, #11), or non-functional (#8);
documented in PR comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant