Status note. This started as a groupcache-backed, fleet-shareable cache (hence the filename). It shipped as a process-local
sync.Mapcache — groupcache was ripped out for simplicity. There is no cross-process sharing, no peer pool, and no window-encoded key; TTL is a plain per-entry expiry. The sections below are revised to match what shipped; ignore any lingering peer/ window phrasing.
objgitd stores git repos as S3/Tigris objects. Every Stat/Open on a path that
doesn't exist costs up to two S3 round-trips: a HeadObject (→ NoSuchKey)
then a directory-probe ListObjectsV2 (internal/s3fs/basic.go:140+:163; the probe
in OpenFile at :98). git does an enormous number of these — loose objects on a
packed repo, packed-refs vs. loose refs, info/, config, alternates, .keep/.idx
siblings — each paid in series against an object store.
The fix: cache directory listings, keyed by parent prefix, and answer Stat/Open
from them. A listing records every child name plus its kind/size/mtime, so a lookup of a
key whose parent prefix is cached answers with zero round-trips when the child is
absent (negative hit) or is a sub-directory, and with one HeadObject/GetObject (for
authoritative content/metadata, which listings don't carry) when it's a file. The first
Stat/Open touching an un-cached folder lists that parent folder in full and
populates the cache, warming every sibling at once.
Backend: process-local sync.Maps (one for folder listings, one for recursive
subtrees, one for per-object heads). Each cached entry carries two pieces of bookkeeping:
- Per-entry TTL. An entry stores
expires = now + TTL; past it the entry is ignored and re-listed. This bounds how long a write this process can't see stays hidden. (There is no background warmer requirement for correctness; the warmer just keeps hot entries fresh and sweeps expired ones, since async.Maphas no LRU.) - Per-prefix local generation for precise invalidation. A process-local counter per
prefix is bumped on every local write under that prefix (and its ancestors — see
subtree caching). An entry whose stored generation no longer matches is ignored, so the
next read re-lists and sees the write immediately (read-after-write correctness).
Concurrent identical fills are coalesced with
golang.org/x/sync/singleflight.
Accepted limitation: negative staleness is bounded by the TTL — a just-deleted-or-
created object may read stale for up to one TTL. Safe for git's content (objects are
immutable and content-addressed, so positive listings never go wrong); the TTL bounds the
negative-staleness risk. Operators tune it with -s3-cache-ttl; -s3-cache-ttl 0
disables the cache and restores today's exact behavior.
Defaults: on by default, tunable; always single-process.
- Cache key space is the canonical full S3 key.
Chrootreturns a new*S3FSwith a differentroot, so the*ListingCacheis shared by pointer across a root fs and all chroot children, and prefixes are full-canonical (fs3.key()/cleanPath— root-joined, leading slash stripped).chroot.go:21must copy the cache pointer. - Cache key is just the canonical prefix; the entry holds
{gen, expires}alongside its payload. A read hits only whenentry.gen == gen(prefix)andnow < entry.expires. - Both
ReadDirandStat/Openroute throughlist(), so each folder is listed once and the result feeds both;singleflightcoalesces concurrent identical fills (one list even when a clone fans out across siblings). - Listing payload carries enough to serve both consumers without a second call:
per child
{name, kind(file|dir), size, mtimeUnixNano}(CommonPrefixes→dir with zero size/mtime,Contents→file; file wins on the pathological file+prefix collision, matching today's Head-first precedence). Stored as a Go value in thesync.Map— no serialization. - In-process writes bump the local generation of the parent prefix at write
completion, moving the key so the next read re-lists:
s3WriteFile.Close/s3MultipartUploadFile.Close— after the upload succeeds.Rename(coversTempFile→final pack promotion),Remove,MkdirAll.
- Positive
Stat/Openof a file is served from a second cache, the head cache (see Head cache), rather than a foregroundHeadObject. The head cache is seeded straight from each listing —ListObjectsV2already returns every file's size and mtime — so a positive lookup costs no extra round-trip. Listings omit the user-metadata the unix-metadata feature needs, so a caller that requires it (i.e. unix-metadata enabled) treats a listing-seeded entry as a miss and fills via a realHeadObject.Openstill issuesGetObjectfor the body but skips itsHeadObject. ANoSuchKeyfrom a delete racing the listing maps toNotExist.
- No new direct dependency.
golang.org/x/sync/singleflight(already vendored forerrgroup) dedupes concurrent fills.
The three sync.Map caches, the TTL/generation key logic, and the background warmer.
type CacheConfig struct {
TTL, RefreshInterval, IdleTTL time.Duration
DisableHeadPrefetch bool // zero value = seed heads from listings on
RecursivePrefixes []string // nil → {"refs/"}; empty → subtree caching off
MaxSubtreeKeys int // <=0 → 50000
}
type childKind uint8 // kindFile / kindDir
type childEntry struct { Name string; Kind childKind; Size, Mtime int64 }
type headData struct { Size, Mtime int64; Meta map[string]string }
type headCacheEntry struct { data headData; gen uint64; expires time.Time; hasMeta bool }
type listingEntry struct { entries []childEntry; gen uint64; expires time.Time }
type subtreeEntry struct { data subtreeData; gen uint64; expires time.Time }
type ListingCache struct {
ttl time.Duration
cfg CacheConfig
client s3Client
bucket string
separator string
roots []string // normalised RecursivePrefixes, longest first
clock func() time.Time // overridable in tests
listings sync.Map // prefix → listingEntry
subtrees sync.Map // root → subtreeEntry
heads sync.Map // object key → headCacheEntry
sf, headSF singleflight.Group // coalesce concurrent fills
hits, misses atomic.Int64 // for metrics
mu sync.Mutex
gens map[string]uint64 // per-prefix local generation
seen map[string]time.Time // prefixes accessed → driven by the warmer
}NewListingCache(cfg, client, bucket, separator)— applies defaults and normalises the recursive roots; no groups, no pool.list(ctx, prefix) ([]childEntry, error)— the one entry point bothReadDirandStat/Openuse. Routes recursive prefixes tosubtree; otherwiselistFolder(sync.Map lookup gated on gen+expiry,fillFoldervia singleflight on a miss).gen(prefix)/invalidate(prefix)—invalidatebumps the generation ofprefixand every ancestor and recordsseen(so the warmer refreshes them).RunWarmer(ctx)—time.NewTicker(RefreshInterval); each tick dropsseenentries idle pastIdleTTL, re-fills the rest (routing recursive→subtree, deduped), then sweeps expired entries from all three maps (no LRU, so the warmer bounds growth). No-op whenRefreshInterval<=0. Returns onctx.Done().Stats()accessor exports hit/miss counters and resident item counts to metrics.splitKey(key) (prefix, base)— split on last/; no slash →("", key).
The per-object head cache is a sync.Map of headCacheEntry, keyed by
canonical object key — like the listing and subtree caches. An
entry is a hit only while (a) unexpired (expires, one TTL past fill), (b) still tagged
with its parent prefix's current local generation — so invalidate's generation bump
drops every cached head under the prefix without a map scan, mirroring listing
invalidation — and (c) carrying user metadata when the caller needs it (hasMeta).
- Seeded from listings (
seedHeads), not separately fetched. When the listing getter fills a folder, it stores aheadCacheEntryfor every file directly from theListObjectsV2data (size + mtime), withhasMeta=false. This warms the head cache with zero extraHeadObjectcalls — the listing already paid for the data.CacheConfig.DisableHeadPrefetchturns seeding off (zero value = on). headInfo(ctx, key, needMeta) (*s3.HeadObjectOutput, error)serves a foreground lookup. A warm hit costs no round-trip.needMetareports whether the caller needs the x-amz-meta-* user metadata (true iff unix-metadata is enabled): when true, a listing-seeded (hasMeta=false) entry is treated as a miss and filled via one realHeadObject(which storeshasMeta=true).headSF(asingleflight.Group) dedupes concurrent fills for the same key.- The warmer also sweeps expired head entries each tick — the
sync.Maphas no LRU, so the warmer is what bounds its growth to roughly the live working set. Stat/Openof a present file callheadInfowithneedMeta = fs3.unixMeta != nil;newS3ReadFilegains an optional precomputed*s3.HeadObjectOutputso it skips its ownHeadObjectand onlyGetObjects the body.
Caching one folder per ListObjectsV2 is wasteful for bounded namespaces that callers
walk folder-by-folder — refs/ above all (refs/, refs/heads/, refs/tags/,
refs/remotes/…, each its own delimited list). A single delimiter-less
ListObjectsV2 over refs/ returns the whole subtree; every descendant folder's
listing and every negative lookup beneath it is then synthesised in memory.
- A second
sync.Map(subtrees, keyed by root) holdssubtreeData{ Objects []subtreeObject; Truncated bool }— the flat key+size+mtime set under a root — in asubtreeEntry{gen, expires}, the same TTL/generation scheme as folder listings. list(prefix)routes: ifrecursiveRoot(prefix)matches a configured root, serve fromsubtree(root)viasynthesizeListing(objects, prefix)(remainder-after-prefix with a/⇒ child dir, deduped; else child file). Otherwise the existing delimited path. A complete subtree is authoritative for negative lookups (it has all keys under the root); only!Truncatedsubtrees are trusted.- Bounded.
listSubtreestops once it exceedsMaxSubtreeKeys(default 50000) and reportsTruncated;listthen falls back to the delimited per-folder listing, so an unbounded namespace can't blow up memory. The truncated marker is itself cached, so the fallback costs one near-free subtree-cache hit plus the folder list. - Invalidation walks ancestors.
invalidate(prefix)now bumps the generation ofprefixand every parent up to""(ancestorPrefixes), so a write torefs/tags/v1moves therefs/subtree key (and the root listing's). Trade-off: a broader blast radius — a write also re-lists the coarser folders above it on their next read. Acceptable because writes are pushes and a re-list after a push is expected. - Head seeding extends to subtrees: a complete scan seeds every file's head
(
seedSubtreeHeads), each tagged with its own parent prefix's generation. A truncated scan seeds nothing (leaves heads to the fallback path). - Warmer routing mirrors
list: seen prefixes under a recursive root warm that root's subtree (deduped across siblings) rather than per-folder. - Config:
CacheConfig.RecursivePrefixes(nil ⇒{"refs/"}; explicit empty ⇒ off) andMaxSubtreeKeys.main.goexposes-s3-cache-recursive-prefixes(defaultrefs/, empty disables) and-s3-cache-max-subtree-keys(default 50000).
- Add
cache *ListingCachetoS3FS;WithListingCache(c) Option.NewS3FS'sclientparameter widens to ans3Clientinterface (the concrete*storage.Clientsatisfies it) so tests can substitute a counting stub.
- Copy
cache: fs3.cacheinto the newS3FSliteral (chroot.go:21).
- Extract
listChildren(ctx, client, bucket, separator, prefix) ([]childEntry, error)— the paginatedListObjectsV2loop, classifyingCommonPrefixes→dir,Contents→file with size/mtime, dirs-then-files preserving S3 order. A free function so the getter (which holds only the raw client) can reuse it. Used byReadDir(cache off) and the listing getter. ReadDir: whencache != nil, get the entries viacache.list(ctx, prefix)and build[]fs.DirEntryfrom them (rebuildingnewDirInfo/newFileInfofrom the payload); otherwise list directly as today.MkdirAll—cache.invalidate(parent prefix of filename)after thePutObject.
- Helper
resolve(ctx, key) (childEntry, found, known bool):(_,_,false)whencache==nil; elseprefix,base := splitKey(key),entries, err := cache.list(...); on errorknown=false(fall back to the live path — cache problems never fail the op); else scanentriesforbaseand return it. Stat: after the temp-buffer check, callresolve. Ifknown: absent →&os.PathError{Op:"stat",…,Err: fs.ErrNotExist}; dir →newDirInfo; file →cache.headInfo→newFileInfoFromHead(headNoSuchKey→NotExist). Notknown→ the existingHeadObject+probe fallback.OpenFileO_RDONLY(after temp check): sameresolve; absent →NotExist, dir →newS3DirFile, file →cache.headInfothennewS3ReadFile(…, ho)(skips itsHeadObject); notknown→ existing fallback with a nil head.Rename— invalidate parent prefix of bothsrcanddston success.Remove— invalidate parent prefix ofkeyon success.
- Add a
cache *ListingCachefield + constructor arg tos3WriteFile/s3MultipartUploadFile; theirClosecallscache.invalidate(parent prefix of f.key)after a successful upload (nil-guarded). Update the twoOpenFilecall sites (basic.go:114,:117).
- A Prometheus collector that reads
ListingCache.Stats()({Hits, Misses, ListingItems, SubtreeItems, HeadItems}) and exportsobjgit_s3_listing_cache_hits_total,_misses_total, and_items{kind=listing|subtree|head}. Norepolabel. (Cache fills are already counted asListObjectsV2/HeadObjectviaobserveS3.)mainregisters it only when the cache is enabled.
- Flags (kebab-case + flagenv):
-s3-cache-ttl(Duration, default60s) — per-entry TTL;<=0disables the cache.-s3-cache-refresh(Duration, default30s) — warmer interval;<=0disables the warmer (lazy fill still works).-s3-cache-idle(Duration, default10m) — drop un-accessed prefixes from the warmer.-s3-cache-recursive-prefixes(String, defaultrefs/) — comma-separated subtree roots; empty disables subtree caching.-s3-cache-max-subtree-keys(Int, default50000) — subtree scan cap.
- When
ttl > 0: buildcache := s3fs.NewListingCache(cfg, client, *bucket, "/"), passs3fs.WithListingCache(cache)intoNewS3FS, register the metrics collector, and addg.Go(func() error { cache.RunWarmer(gCtx); return nil })to the errgroup. - Add the cache settings to the startup
slog.Infoline.
Within a single git operation the repeated negative lookups happen within milliseconds,
so even a 60s TTL eliminates essentially all redundant HeadObject+probe pairs, and the
first miss in a folder warms every sibling via one parent listing (deduped by
singleflight). Local writes bump the per-prefix generation (and its ancestors'), so a
push reads its own objects immediately. git object content is immutable, so
positive/ReadDir results are never wrong about what exists — only the recency of
newly-added entries is TTL-bounded. The residual risk is a negative read of an object
another process just created (this cache is process-local), bounded by one TTL; -s3-cache-ttl 0
opts out entirely.
go build ./...;go mod tidy;go test ./....- Cache disabled = no behavior change: the cache is only wired when
-s3-cache-ttl>0; existing protocol tests (go test ./cmd/objgitd/..., needsgiton PATH) must pass with the cache off and on. - New unit tests in
internal/s3fs(table-driventt, counting-stubstorage.Client):- Populate-on-miss: first
Statof an absent key in a never-listed folder issues exactly oneListObjectsV2(the parent) and zeroHeadObject; a second absent sibling → zero S3; a present sibling → onlyHeadObject. ReadDirthenStat/Openof an absent sibling → zero S3; a dir child → zero S3.- Local invalidation:
Create+Close/Rename/Remove/MkdirAllbump the generation so a followingStat/ReadDirre-lists and sees the change (read-after-write). - TTL expiry: advancing time past
TTLexpires the entry → re-list (inject a clock or a settablenowinListingCachefor the test). - Warmer:
RunWarmerre-fills accessed prefixes and evicts idle ones pastIdleTTL. - Chroot sharing:
ReadDiron the root thenStatof an absent child through a chroot resolves from the same cached prefix (same canonical key). - Head seeding: one listing fill seeds every file's head from the
ListObjectsV2data with zeroHeadObjects;Statof a seeded file then does zero furtherHeadObjects (counting-stub tests disable seeding for determinism; a dedicated test enables it and asserts no heads are issued). - Subtree caching: one read of any
refs/folder scans the subtree once; otherrefs/folders and negative lookups beneath them then do zero S3; a write to a siblingrefs/folder re-scans (ancestor invalidation) and is visible; a subtree pastMaxSubtreeKeysfalls back to a delimited listing yet still returns correctly.
- Populate-on-miss: first
- End-to-end:
./objgitd -bucket $BUCKET -allow-push; clone a packed repo twice and confirmobjgit_s3_requests_total{operation="HeadObject"}grows far slower than with-s3-cache-ttl 0, and watchobjgit_s3_listing_cache_hits_totalclimb.