This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Elastickv is an experimental, cloud-oriented distributed key-value store written in Go (module github.com/bootjp/elastickv, Go 1.25.0 with toolchain go1.26.2). It exposes multiple wire protocols (gRPC RawKV/Transactional, Redis, DynamoDB-compatible HTTP, S3-compatible HTTP, SQS-compatible HTTP) on top of a Raft-replicated, MVCC/OCC storage engine. Not production-ready.
make test # go test -v -race ./...
make lint # golangci-lint --config=.golangci.yaml run --fix
make run # go run cmd/server/demo.go (built-in 3-node single-process demo)
make client # go run cmd/client/client.go
make gen # regenerate protobufs (cd proto && make gen)Run a single test or package:
go test -run TestName ./store/...
go test -race ./kv/...If $GOCACHE is sandbox-blocked (macOS), create the cache dirs first (Go errors out if GOTMPDIR does not exist), then prefix the command:
mkdir -p "$(pwd)/.cache/tmp" "$(pwd)/.golangci-cache"
GOCACHE=$(pwd)/.cache GOTMPDIR=$(pwd)/.cache/tmp go test ./...
GOCACHE=$(pwd)/.cache GOLANGCI_LINT_CACHE=$(pwd)/.golangci-cache golangci-lint runSingle-node server (etcd/raft is the default backend):
go run . --address "127.0.0.1:50051" --redisAddress "127.0.0.1:6379" --raftId "n1" --raftBootstrapThe local Jepsen runner (builds, starts a 3-node cluster on 5005{1,2,3} / 6379{1,2,3} / 6380{1,2,3} / 6390{1,2,3}, runs DynamoDB workloads):
./scripts/run-jepsen-local.sh # full cycle
./scripts/run-jepsen-local.sh --no-rebuild --no-cluster # reuse running clusterDirect Jepsen invocation requires isolating Leiningen state from $HOME:
cd jepsen && HOME=$(pwd)/tmp-home LEIN_HOME=$(pwd)/.lein \
LEIN_JVM_OPTS="-Duser.home=$(pwd)/tmp-home" /tmp/lein test
# Same pattern under jepsen/redis/ with HOME=$(pwd)/../tmp-home etc.Protobuf regeneration is version-pinned and will fail unless the toolchain matches: libprotoc 29.3, protoc-gen-go v1.36.11, protoc-gen-go-grpc 1.6.1 (see proto/Makefile).
Pre-commit hook (runs make lint) is opt-in: git config --local core.hooksPath .githooks.
The full diagrams live in docs/architecture_overview.md — read it before non-trivial changes touching coordination, replication, or routing. Big picture:
- Adapters (
adapter/) — Per-protocol ingress:redis.go,dynamodb.go,grpc.go,s3.go,sqs.go(withsqs_auth.go/sqs_catalog.go/sqs_keys.go/sqs_messages.go),distribution_server.go(operator/control plane). The S3 and SQS adapters share the SigV4 path (sigv4.go,s3_auth.go,sqs_auth.go) and static-credentials loader.redis_proxy.goand the standalonecmd/redis-proxy/implement a phased Redis-to-Elastickv migration proxy with dual-write/shadow-read modes (seeproxy/). - Data plane (
kv/) —ShardedCoordinator(sharded_coordinator.go) is the entry point all adapters dispatch into. It resolves keys viaShardRouter(shard_router.go) against the in-memoryRouteEnginecache, then drivesShardStore(shard_store.go) per Raft group. Transactions live intransaction.go/txn_codec.go; OCC and lock resolution inlock_resolver.go. Leader-only reads go throughlease_state.go. - Replication (
internal/raftengine/,kv/fsm.go) — Only backend isetcd/raftunderinternal/raftengine/etcd(the hashicorp backend was dropped ina35245a; the--raftEngineflag still advertiseshashicorpinmain.gobutnewRaftFactoryrejects anything other thanetcd). Each Raft data dir contains araft-enginemarker so the process refuses to reopen a dir under a different backend. Note: README anddocs/etcd_raft_migration_operations.mdstill referencego run ./cmd/etcd-raft-migrate, but that directory was deleted ina35245a— the migrator is no longer in-tree. The KV FSM (kv/fsm.go) applies committed entries to the storage layer and to the HLC ceiling. - Storage (
store/) — MVCC over Pebble (mvcc_store.go,lsm_store.go); OCC, TTL/expiry, snapshots (snapshot_pebble.go), and per-type helpers for Redis collections (hash_helpers.go,list_helpers.go,set_helpers.go,zset_helpers.go,stream_helpers.go). - Control plane (
distribution/) — Durable route catalog persisted in reserved keys of the default Raft group.engine.gois the read-side cache;watcher.gopolls the catalog and applies versioned snapshots into the engine;catalog.gois the storage layer. Operator RPCs (ListRoutes,SplitRange— same-group split only) are onproto.Distribution. All routing decisions read from the cachedRouteEngine, not from the catalog directly. - Timestamp Oracle (
kv/hlc.go,kv/hlc_wall.go) — All HLC timestamps are issued exclusively by the Raft leader viaShardedCoordinator/Coordinator— followers never callHLC.Next()for persistence. The 64-bit value splits into an upper 48-bit physical half (Unix ms) and a lower 16-bit logical counter, and the two halves take very different paths:- Physical (upper 48 bits) — Raft-agreed. The leader periodically (
hlcRenewalInterval = 1s, windowhlcPhysicalWindowMs = 3s) proposes a ceiling entry through the default Raft group; FSM apply on every node callsSetPhysicalCeiling.Next()clamps the physical half tomax(wall_ms, ceiling_ms), so a newly elected leader can never issue a timestamp inside the previous leader's lease window. - Logical (lower 16 bits) — in-memory only. Advanced by atomic CAS on each
Next()call; no Raft round-trip and no consensus per timestamp. This is what keeps issuance in the nanosecond range. - The coordinator and FSM must share the same
*HLCinstance (wired viaWithHLC/NewKvFSMWithHLC) so the in-memory counter and the replicated ceiling stay coupled.
- Physical (upper 48 bits) — Raft-agreed. The leader periodically (
- Process entrypoints —
main.gois the multi-binary server (gRPC + Redis + DynamoDB + S3 + SQS + admin + metrics + pprof). Per-protocol bootstrapping is split intomain_s3.goandmain_sqs.go; SigV4 static credentials load viamain_sigv4_creds.go. SQS exposure is opt-in via--sqsAddress(with--sqsRegionand--sqsCredentialsFile); leave--sqsAddressempty to disable.cmd/server/demo.gois a single-process 3-node demo.cmd/client/,cmd/redis-proxy/,cmd/elastickv-admin/, andcmd/raftadmin/are standalone tools.multiraft_runtime.goandshard_config.gowire shard groups to addresses for multi-group deployments (--raftRedisMap,--raftDynamoMap,--raftS3Map,--raftSqsMap).
gofmt+ the linters in.golangci.yaml(gocritic,gocyclo,gosec,wrapcheck,errorlint,mnd, etc.) are enforced. Avoid//nolint— refactor instead.- Errors: wrap with
github.com/cockroachdb/errors(thewrapchecklinter enforces wrapping at boundaries). - Logging: structured
slogwith stable keys (key,commit_ts,route_id, …). - Test files are co-located (
*_test.go); prefer table-driven tests.pgregory.net/rapidis available for property tests (store/mvcc_store_prop_test.go,adapter/redis_transcoder_prop_test.go,adapter/grpc_transcoder_prop_test.go). - After changes to replication, MVCC, OCC, or the Redis adapter, run the relevant Jepsen suite — these are the integration-level safety net.
- When code review surfaces a defect (incorrect behavior, regression, edge case), first add a failing test that reproduces the issue, then make it pass with the fix. Push the test and the fix together (one commit or two adjacent commits) so the regression is locked down. Do not respond to a review-identified defect with a fix-only change.
- HLC: do not issue persistence timestamps from non-leader nodes; OCC decisions assume leader-issued ts. Never use the local wall clock (
time.Now()/hlc_wall.godirectly) for snapshot reads, MVCC visibility checks, OCC validation, lease/expiry decisions, or any other ordering-sensitive read — always go throughHLC.Next()(writes/commits) or the leader-issued read timestamp pipeline. Local wall clocks are only valid for diagnostics/metrics and as the input that bounds the physical ceiling. Keep wall clocks reasonably synchronized across nodes. - Route catalog mutations must go through
SplitRange(or future control-plane RPCs) so the catalog version bumps and watchers fan out — never write catalog keys directly. - Commits: short imperative summary, optional scope prefix matching the touched area (
store:,adapter:,kv:,docs:, …). PR descriptions should call out behavior change, risk, and the test evidence (go test,make lint, relevant Jepsen suite).
After every code change, run five independent review passes — one lens at a time, do not collapse them. Each lens has a different failure mode and merging them tends to skip cases. Record the result of each pass (even a one-line "no issues") in the PR description.
- Data loss — Can any committed write be lost or silently overwritten? Check Raft propose/apply ordering, FSM idempotency, snapshot/restore round-trips, Pebble sync semantics (
lsm_store_sync_mode_*), TTL/expiry deletes, retention/compaction (store/mvcc_store_retention_test.go,kv/compactor.go), and crash-restart paths. New failure modes (return nilafter an error, swallowedApplyerrors, missingWAL.Sync) are the usual culprits. - Concurrency / distributed failures — Race conditions, lock ordering, deadlocks, leader change mid-operation, follower forwarding while leadership flips, partial Raft membership changes, partition + heal, slow follower, snapshot-during-apply, OCC conflict resolver paths (
kv/lock_resolver.go), and the lease-read window (kv/lease_state.go). Run the relevantgo test -raceand the matching Jepsen suite. - Performance — Hot-path allocations, lock contention, fan-out across shards, extra Raft round-trips per request (especially anything that would force consensus on a per-
Next()HLC tick), N+1 reads against Pebble, Lua/transcoder churn (adapter/redis_lua_pool.go,adapter/grpc_transcoder.go), and metric cardinality. Check existing benchmarks (*_benchmark_test.go) and add one if a hot path changed. - Data consistency — MVCC visibility, OCC commit-ts ordering, HLC physical-ceiling invariant, snapshot read isolation, route-catalog versioning + watcher fan-out, cross-shard transaction atomicity (
kv/transaction.go,kv/txn_codec.go), DynamoDB/Redis adapter semantics versus the upstream contract, and the lease-read freshness bound. Reads that bypassHLC.Next()or the leader-issued read pipeline are bugs. - Test coverage — New/changed branches must have unit tests (table-driven, co-located
*_test.go); property tests viapgregory.net/rapidfor codecs/transcoders; OCC/HLC/MVCC behavior changes need targeted tests underkv/andstore/; replication/Redis/MVCC changes need the corresponding Jepsen workload. If a reviewer found the defect, the regression test (per the convention above) must be in the same PR.
docs/design/ is dated proposals and as-implemented records. Filenames carry one of three lifecycle markers:
*_proposed_*.md— Design accepted, no implementation yet (or implementation just started).*_partial_*.md— Some milestones / phases of the design have shipped, but the full proposal is not yet complete. The doc tracks which milestones have landed and what remains. Example:2026_02_18_partial_hotspot_shard_split.md(Milestone 1 of the hotspot-split design has shipped; later milestones are still open).*_implemented_*.md— All milestones of the proposal have shipped; the doc is preserved as the as-built record.
Check this directory before designing anything new — there is likely a recent precedent (HLC lease, FSM compaction, S3 adapter, lease reads, Lua commit batching, TTL inline value, centralized TSO proposal, hotspot shard split, etc.). docs/design/README.md indexes them.
Design-doc-first workflow. For any change that goes beyond a single-file edit — new feature, new adapter, new control-plane RPC, schema/wire-format change, or any modification touching replication / MVCC / OCC / HLC / routing — write a *_proposed_*.md design doc first and land it before the implementation. Do not start implementation until the proposal has been reviewed and accepted. The PR may carry both the doc and the implementation (in that order: doc commit first, implementation commits after) as long as the doc is reviewable on its own. Lifecycle transitions: rename *_proposed_*.md → *_partial_*.md once the first milestone ships (and update the doc to record what landed and what is still open); rename *_partial_*.md → *_implemented_*.md once the final milestone ships. Use git mv so the history follows the rename.