Today internal/s3fs/file.go's newS3ReadFile does a GetObject and
ioutil.ReadAlls the entire object body into a *bytes.Reader at Open time
(file.go:83-99). Every read handle therefore holds the whole object in RAM —
including whole pack files, which since feat: keep pushed packs whole can be
the entire repository. This wastes memory and forces a full download even when
the consumer reads only a few bytes (e.g. a 1-byte FSObject probe, or an
.idx fanout lookup).
The goal: opening a file in read mode should not download anything. A
sequential Read should lazily start a GetObject whose body is read on
demand; ReadAt should fetch only the bytes it needs via S3 Range requests,
backed by a bounded in-memory read-ahead window so the many small sequential
ReadAt calls go-git issues (zlib ~512B reads, idx binary-search lookups) don't
amplify into hundreds of tiny GETs.
- Loose objects: sequential
Readstart→finish → streaming body is ideal. - Pack
.pack:FSObject.Readerprobes with a 1-byteReadAtand reopens the file onos.ErrClosed, thenio.NewSectionReader(...).ReadAtat the object offset; theScannerusesSeek+sequentialRead. → both paths. - Index
.idx: pureReadAtat arbitrary offsets (header, fanout, hash and offset tables). → random access, bursty-sequential.
Rewrite s3ReadFile to be lazy with two independent, access-pattern-matched
mechanisms, and drop both the eager GetObject/ReadAll and the dedicated
HeadObject from the read path.
enrichedFileInfo only reads ContentLength, LastModified, and Metadata
(fileinfo.go:50-96) — all of which s3.GetObjectOutput also carries. So instead
of a standalone HeadObject, derive the metadata from whatever fetch happens
first:
- If
basic.gosupplied a cache-residenthead(theheadInfopath), use it as today — zero fetches. - Otherwise leave
head == nil/size == -1(unknown) and fetch nothing at open. The firstRead/ReadAtdoes aGetObjectanyway; build a synthetic*s3.HeadObjectOutputfrom its response (ContentLength,LastModified,Metadata,ETag,ContentType) and populatehead/size.- For a non-ranged GET (sequential
Readfrom offset 0),ContentLengthis the full size. - For a ranged GET (
ReadAt, orReadafter aSeek),ContentLengthis the range length; parse the full size fromGetObjectOutput.ContentRange(bytes <start>-<end>/<total>→total). Add a tinyparseContentRangehelper.
- For a non-ranged GET (sequential
Stat()falls back to a lazyHeadObjectonly whenheadis still nil (a consumer stats without ever reading and the cache didn't seed it). This is the sole residualHeadObject, and it's rare.
Net effect on the uncached open→read path: one GetObject total, down from
today's HeadObject + full GetObject.
Replace the reader *bytes.Reader field. New fields (one sync.Mutex guards
all mutable state for simplicity; reads on a single handle are effectively
serial):
client s3Client
bucket, key, name string
head *s3.HeadObjectOutput // nil until first fetch (or cache-supplied)
size int64 // full object size; -1 = unknown until first fetch
mu sync.Mutex
closed bool
// sequential streaming path (Read/Seek)
pos int64 // logical cursor
body io.ReadCloser // open GetObject body, nil until first Read
bodyPos int64 // object offset the body currently sits at
// random-access read-ahead window (ReadAt)
win []byte // cached chunk
winStart int64 // object offset of win[0]
newS3ReadFile: if head is supplied, set size = *head.ContentLength;
otherwise set head = nil, size = -1. Either way return immediately — no
GetObject, no HeadObject.
A shared adoptMeta(out, ranged) helper populates head/size from a
*s3.GetObjectOutput the first time one comes back (no-op if head already
set): synthesize s3.HeadObjectOutput{ContentLength, LastModified, Metadata, …};
set size from ContentLength when not ranged, else from
parseContentRange(out.ContentRange).
- Return
os.ErrClosedif closed. - If
size >= 0andpos >= size, return0, io.EOF. - If
body == nil: issueGetObjectwithRange: bytes=<pos>-(omitRangewhenpos == 0),observeS3("GetObject", …),adoptMetafrom the response, setbody,bodyPos = pos. - Read from
bodyintop, advanceposandbodyPosbyn. On the body'sio.EOF, return it through.
- Return
os.ErrClosedif closed. - Compute new absolute position (
SeekStart/SeekCurrent/SeekEndusingsize). Reject negative. - If
body != niland the new position!= pos, closebodyand nil it (nextReadreopens with the newRange). Setpos.
- Return
os.ErrClosedif closed (preserves theFSObjectreopen contract). - If
size >= 0andoff >= size, return0, io.EOF. - Serve from
winwhen[off, off+len(p))is covered by[winStart, winStart+len(win)):copyand return. - On miss: fetch
chunk := max(len(p), readChunkSize)bytes (clamped tosizewhen known) starting atoffviaGetObjectwithRange: bytes=<off>-<off+chunk-1>,observeS3("GetObject", …),adoptMetafrom the response (learnssizefromContentRangeon the first call),io.ReadFullthe body into a newwin, setwinStart = off, thencopyintop. - EOF handling without pre-known size: if S3 returns
416 InvalidRange(offat/after EOF) treat as0, io.EOF— and the416response still carries the total viaContentRange, soadoptMetacan setsize. If the satisfied length< len(p)because the range hit EOF, return the bytes copied plusio.EOF(honour theio.ReaderAtcontract: short read ⇒ non-nil error).
readChunkSize is a package const defaulting to 1 << 20 (1 MiB). RAM per open
handle is bounded to one chunk (+ at most one streaming body buffer), not the
whole object. (No flag/Option for now — confirmed not needed.)
- If
head != nil(cache-seeded or already adopted from a fetch), returnenrichedFileInfo{HeadObjectOutput: *head, …}as today. - If
head == nil(nothing read yet, no cache seed), do a one-time lazyHeadObject, store it, and return it. This is the only survivingHeadObjectcall.
- Idempotent:
ErrFileClosedif already closed. - Close
bodyif non-nil, dropwin, setclosed = true.Read/ReadAt/Seekthen returnos.ErrClosed— exactly the contractTestS3ReadFileClosedandFSObjectdepend on.
- Range header formatter:
bytes=start-andbytes=start-end. AWS SDKs3.GetObjectInput.Rangeis a*string. parseContentRange(s *string) int64: parse the/<total>suffix of abytes start-end/totalheader; returns -1 if absent/unparseable.adoptMeta(out *s3.GetObjectOutput, ranged bool): build and store the synthetichead+sizeif not already set.
internal/s3fs/file.go— the rewrite above (s3ReadFilestruct,newS3ReadFile,Read,ReadAt,Seek,Close;Stat/Name/Write*/Lockunchanged).s3WriteFile,s3MultipartUploadFile,s3DirFileuntouched.internal/s3fs/file_test.go—TestS3ReadFileClosedbuilds&s3ReadFile{name: "k", reader: bytes.NewReader(...)}; update the literal to the new struct (e.g. setclosedviaClose()on a handle with a stub client / non-nilsize) while keeping the threeos.ErrClosedsubtests.
- The
tempReadFile/tempBufferpath (tempfs.go) is a separate read path for in-flight temp packs — unchanged. - The
OpenFileresolution logic inbasic.go(cacheresolve,headInfo, directory probe) — unchanged; it still handshead(or nil) tonewS3ReadFile. ListingCache, metrics observer wiring,s3Clientinterface — unchanged (GetObjectInput.Rangealready exists on the SDK type).
go build ./...andgo test ./internal/s3fs/...— unit tests, including the updatedTestS3ReadFileClosed.- Add a table-driven test (follow the
xe-go:go-table-driven-testsskill for structure/naming) with the existingstubClient(listingcache_test.go) extended to honour theRangefield (return the sliced bytes and a matchingContentRange/ContentLength, and416for out-of-range), asserting:- open issues no
HeadObjectand noGetObject; Readissues the firstGetObjectlazily, then streams;Statafter it reports the size derived from the response (noHeadObject);ReadAtlearnssizefromContentRangeand serves a second nearby offset from the window without a secondGetObject;ReadAtpast EOF returnsio.EOF(via short read /416);Staton a never-read handle triggers exactly one lazyHeadObject;- post-
Closecalls returnos.ErrClosed.
- open issues no
go test ./cmd/objgitd/...— full protocol tests (clone/push over HTTP, git://, SSH) exercise real pack/idx reads through go-git'sFSObjectandScanner, validating the streaming + range paths end to end (requiresgiton PATH).- Manual smoke (optional): run
./objgitd -bucket $BUCKET -allow-push, push a repo, clone it back, and confirm pack/idx reads succeed while watchingobjgit_s3_*metrics for a sane GetObject count (read-ahead should keep it far below one-per-ReadAt).