Skip to content

Commit e90cae8

Browse files
committed
docs: add per-repo storage design plans
Captures the design for per-repo {orgID}/{repoName} filesystem dispatch and the per-keypair bucket-per-repo Tigris resolver. Assisted-by: Claude Opus 4.8 via Claude Code Signed-off-by: Xe Iaso <xe@tigrisdata.com>
1 parent 5bb834b commit e90cae8

2 files changed

Lines changed: 391 additions & 0 deletions

File tree

docs/plans/per-repo-bucket.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# Per-keypair Tigris resolver: a bucket per repo
2+
3+
## Context
4+
5+
The per-repo `repofs.Resolver` hook is already in place: every git request resolves
6+
to a `billy.Filesystem` via `Resolve(ctx, ref, cred)`, and the HTTP Basic-auth
7+
pair already arrives as `repofs.Credential{Username, Password}`. The default
8+
`BucketResolver` chroots one shared bucket.
9+
10+
We now want a real backend: treat the Basic-auth credential as a **Tigris
11+
keypair** (username = access key ID, password = secret access key), build a
12+
`storage.Client` per keypair, and give **each repository its own Tigris bucket**,
13+
created on first push. This replaces the single-shared-bucket model as the
14+
production default.
15+
16+
Decisions (confirmed):
17+
18+
- **Credential = keypair**: `username` → access key ID, `password` → secret.
19+
- **Bucket name**: `objgit-{base36(sha256(orgID/repoName))[:N]}` — deterministic,
20+
DNS-valid (lowercase alnum), collision-free.
21+
- **Create on push only**: the resolver creates the bucket only on the write
22+
path; reads of a missing bucket are a 404.
23+
- **Replace default**: `main.go` wires the Tigris resolver. `BucketResolver`
24+
stays in `repofs` for tests (memfs), just not wired in production.
25+
- **All S3 ops go through `github.com/tigrisdata/storage-go`** (`*storage.Client`,
26+
which embeds `*s3.Client`); never construct a bare AWS `s3.Client`.
27+
28+
## 1. Gate creation on the Resolver interface (`internal/repofs`)
29+
30+
`Resolve` needs to know read vs. write so it only creates buckets on push. Add a
31+
`create bool`:
32+
33+
```go
34+
type Resolver interface {
35+
Resolve(ctx context.Context, ref RepoRef, cred Credential, create bool) (billy.Filesystem, error)
36+
}
37+
```
38+
39+
- `BucketResolver.Resolve` ignores `create` (chroot is creation-free).
40+
- `daemon.load` (read) passes `create=false`; `daemon.loadOrInit` (push) passes
41+
`create=true` (`cmd/objgitd/git_protocol.go`).
42+
- Update the `recordingResolver` test stub in `cmd/objgitd/http_test.go`.
43+
44+
Add a `BucketName()` helper — but put it in the Tigris package (below), since the
45+
`objgit-` prefix and hashing are storage policy, not neutral identity.
46+
47+
## 2. New package: `internal/tigrisfs`
48+
49+
The concrete, Tigris-backed `repofs.Resolver`. Depends on `storage-go`, `s3fs`,
50+
and (for the not-found sentinel) go-git `transport`. Keeping it out of `repofs`
51+
preserves `repofs`'s transport/storage neutrality.
52+
53+
```go
54+
package tigrisfs
55+
56+
// Resolver implements repofs.Resolver against Tigris, one bucket per repo.
57+
type Resolver struct {
58+
// newClient builds a storage.Client from a keypair. Defaults to
59+
// storage.New(ctx, storage.WithAccessKeypair(id, secret)); overridable for tests.
60+
newClient func(ctx context.Context, cred repofs.Credential) (*storage.Client, error)
61+
fsOpts []s3fs.Option // listing/pack cache opts applied per-bucket S3FS
62+
63+
mu sync.Mutex
64+
clients map[string]*cachedClient // keyed by access key ID (cred.Username)
65+
}
66+
67+
type cachedClient struct {
68+
raw *storage.Client // bucket ops (CreateBucket/HeadBucket) go here
69+
hardened s3fs.S3Client // object I/O handed to s3fs (see note on Harden)
70+
}
71+
```
72+
73+
`Resolve`:
74+
75+
1. Reject empty `cred.Username`/`cred.Password` (auth required → surfaces as 401
76+
at the HTTP layer; see error mapping).
77+
2. Look up / build the cached client for `cred.Username` (build via `newClient`,
78+
then `hardened = s3fs.Harden(raw)`). Cache under a mutex.
79+
3. `bucket := bucketName(ref)`.
80+
4. If `create`: `ensureBucket(ctx, raw, bucket)``CreateBucket`; treat
81+
`*types.BucketAlreadyOwnedByYou` / `*types.BucketAlreadyExists` (via
82+
`errors.As`) as success.
83+
Else: `HeadBucket`; on `*types.NotFound`/`*types.NoSuchBucket` return
84+
`transport.ErrRepositoryNotFound`.
85+
5. Return `s3fs.NewS3FS(hardened, bucket, r.fsOpts...)` (root `""` — the bucket
86+
_is_ the repo).
87+
88+
```go
89+
func bucketName(ref repofs.RepoRef) string {
90+
sum := sha256.Sum256([]byte(ref.Path()))
91+
b36 := new(big.Int).SetBytes(sum[:]).Text(36) // 0-9a-z
92+
// left-pad to a fixed width so truncation is deterministic, then take N.
93+
return "objgit-" + leftPad(b36, 50, '0')[:32] // "objgit-" + 32 = 39 chars, < 63
94+
}
95+
```
96+
97+
**Harden note:** `s3fs.Harden` returns the 9-method object-only `s3Client`
98+
wrapper — it does **not** expose `CreateBucket`/`HeadBucket`. So the resolver
99+
keeps the raw `*storage.Client` for the (rare) bucket calls and hands the
100+
hardened wrapper to `s3fs` for the hot object path. Both are storage-go clients;
101+
no bare AWS client is created.
102+
103+
**s3fs export:** `s3fs.NewS3FS` currently takes an unexported `s3Client`
104+
interface. An external package can still satisfy it (method set is exported), but
105+
to name the field type in `cachedClient` cleanly, export the interface as
106+
`s3fs.S3Client` (alias/rename of the existing `s3Client`). Small, mechanical
107+
change in `internal/s3fs/filesystem.go`.
108+
109+
## 3. `main.go` wiring (replace default)
110+
111+
- Read the keypair-mode resolver instead of `BucketResolver`:
112+
`resolver: tigrisfs.New(tigrisfs.WithFSOptions(fsOpts...))`.
113+
- The default `newClient` does `storage.New(ctx, storage.WithAccessKeypair(...))`
114+
(endpoint defaults to the global Tigris endpoint; add a `-tigris-endpoint`
115+
flag later if needed).
116+
- `sysFS` (SSH host key) still uses the ambient `-bucket` `fsys` built today;
117+
repos no longer use it. The existing `-bucket` flag becomes "system bucket"
118+
(host key only). Note this in flag help.
119+
- Per-keypair `ListingCache`/`PackCache`: the caches `main` builds today are
120+
bound to one bucket and no longer fit a bucket-per-repo world. For this pass,
121+
pass no per-bucket caches (or a bounded per-(keyID,bucket) cache later);
122+
**log that repo-side caching is disabled** so it isn't mistaken for working.
123+
124+
## 4. Error mapping
125+
126+
- Empty/invalid credential → resolver returns a sentinel (`tigrisfs.ErrNoCredential`);
127+
HTTP `resolve` maps it to `401 WWW-Authenticate: Basic` (reuse the existing
128+
`auth.Unauthenticated` rendering path, or special-case the error).
129+
- Missing bucket on read → `transport.ErrRepositoryNotFound` → existing 404 path.
130+
- A bad keypair surfaces as an S3 `AccessDenied` mid-call; log and return 500
131+
(acceptable for now — real authz is a later seam).
132+
133+
## 5. Caching & concurrency
134+
135+
- One `cachedClient` per access key ID, guarded by a mutex (or `sync.Map`).
136+
Building a `storage.Client` loads AWS config (network-free) and is the main
137+
cost we're avoiding per request.
138+
- Bound the map (simple max or LRU, e.g. 1024 keypairs) as a follow-up; note the
139+
unbounded-growth risk in a comment for now.
140+
141+
## 6. Tests
142+
143+
- `internal/tigrisfs/tigrisfs_test.go` (unit, no network):
144+
- `bucketName` is deterministic, `objgit-`-prefixed, ≤ 63 chars, lowercase
145+
alnum, and differs for different `orgID/repoName`.
146+
- Empty credential → `ErrNoCredential` (checked before `newClient`).
147+
- `create` gating: with a fake `newClient` returning a client whose bucket ops
148+
are observable, assert `CreateBucket` is called iff `create==true` and
149+
`HeadBucket` otherwise. (Use a small fake satisfying the bucket-op + object
150+
method set; inject via `newClient`.)
151+
- Integration test gated by real creds (skip when
152+
`TIGRIS_STORAGE_ACCESS_KEY_ID`/`_SECRET_ACCESS_KEY` unset — see the
153+
`tigris-storage` skill's `skipIfNoCreds` pattern): push to
154+
`acme/itest-<unique>.git`, assert the bucket is created and a clone round-trips;
155+
clean up the bucket after.
156+
- `internal/repofs` and `cmd/objgitd` existing tests keep using `BucketResolver`
157+
(memfs); update them only for the new `create` parameter.
158+
159+
## Verification
160+
161+
```text
162+
go build ./...
163+
go test ./internal/repofs/... ./internal/tigrisfs/... ./cmd/objgitd/...
164+
```
165+
166+
End-to-end against real Tigris (credentials in the AWS/Tigris env):
167+
168+
```text
169+
./objgitd -bucket $SYS_BUCKET -http-bind :8080 -allow-push
170+
# username = Tigris access key ID, password = secret access key
171+
git clone http://$KEYID:$SECRET@localhost:8080/acme/demo.git # first push creates bucket objgit-<hash>
172+
# verify the bucket exists:
173+
tigris bucket list | grep objgit-
174+
git clone http://$KEYID:$SECRET@localhost:8080/acme/demo.git # second clone reuses cached client + bucket
175+
git clone http://localhost:8080/acme/demo.git # no creds -> 401
176+
```

docs/plans/per-repo-fs-dispatch.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# Per-repo filesystem resolution + `{orgID}/{repoName}` paths (HTTP focus)
2+
3+
## Context
4+
5+
Today the `daemon` holds a single static `fs billy.Filesystem` (the whole bucket)
6+
and a single `loader transport.Loader`, both built once in `main.go`. Every
7+
transport passes a **raw, unvalidated, variable-depth** path straight into
8+
`auth.Request.Repo` and into `load`/`loadOrInit`, which `Chroot`s the one bucket
9+
fs by that path.
10+
11+
We want two coupled changes:
12+
13+
1. **Restrict repo paths to `{orgID}/{repoName}`**`orgID` is an opaque
14+
reference a later API call will validate; for now it's accepted as-is. Paths
15+
that aren't exactly two segments are rejected. The `.git` suffix is stripped
16+
from the repo name (`org/repo.git` and `org/repo` resolve to the same repo;
17+
storage key `org/repo/`).
18+
2. **Discover the billy filesystem per-repo via a pluggable hook**, and **pass
19+
the HTTP Basic-auth username/password into that hook** so a real backend can
20+
route an org to its own bucket/credentials based on who's calling. The
21+
default hook preserves today's behavior (chroot the one bucket fs, ignoring
22+
the credential).
23+
24+
**Scope:** this pass targets the **HTTP** transport. SSH is explicitly out of
25+
scope. The shared resolution layer is transport-agnostic, so git:// and SSH get
26+
only the mechanical edits needed to keep compiling (they pass an empty
27+
credential); their auth semantics are unchanged.
28+
29+
## New package: `internal/repofs`
30+
31+
Transport-neutral, mirroring how `internal/auth` is structured. Imports only
32+
`context`, `errors`, `path`, `strings`, and `go-billy/v6`.
33+
34+
```go
35+
package repofs
36+
37+
var ErrInvalidPath = errors.New("repository path must be of the form {orgID}/{repoName}")
38+
39+
// RepoRef identifies a repository. OrgID is opaque (validated later); Name has
40+
// any trailing ".git" stripped.
41+
type RepoRef struct {
42+
OrgID string
43+
Name string
44+
}
45+
46+
// Path is the canonical storage/identity path "orgID/name".
47+
func (r RepoRef) Path() string { return path.Join(r.OrgID, r.Name) }
48+
49+
// Parse trims surrounding slashes, requires exactly two non-empty segments,
50+
// and strips a trailing ".git" from the name. OrgID is not otherwise validated.
51+
func Parse(raw string) (RepoRef, error)
52+
53+
// Credential carries the HTTP Basic-auth username/password (zero value = none).
54+
// Unvalidated; the Resolver decides what to do with it.
55+
type Credential struct {
56+
Username string
57+
Password string
58+
}
59+
60+
// Resolver maps a RepoRef (plus the caller's credential) to the
61+
// billy.Filesystem rooted at that repository. This is the hook a real backend
62+
// implements to route an org to its bucket.
63+
type Resolver interface {
64+
Resolve(ctx context.Context, ref RepoRef, cred Credential) (billy.Filesystem, error)
65+
}
66+
67+
// BucketResolver is the default Resolver: chroot one base filesystem (the whole
68+
// bucket) to ref.Path(), ignoring the credential. Preserves current behavior.
69+
type BucketResolver struct{ Base billy.Filesystem }
70+
func (b BucketResolver) Resolve(_ context.Context, ref RepoRef, _ Credential) (billy.Filesystem, error) {
71+
return b.Base.Chroot(ref.Path())
72+
}
73+
```
74+
75+
`Parse` is the single validation path. Add unit tests for valid input,
76+
missing/extra segments, empty segments, trailing slash, and `.git` stripping.
77+
78+
## `daemon` changes (`cmd/objgitd/git_protocol.go`)
79+
80+
Replace the `fs` and `loader` fields:
81+
82+
```go
83+
type daemon struct {
84+
sysFS billy.Filesystem // bucket-level storage (SSH host key); NOT repo-scoped
85+
resolver repofs.Resolver
86+
authz auth.Authorizer
87+
allowHooks bool
88+
hookTimeout time.Duration
89+
}
90+
```
91+
92+
Rewrite resolution to go through the hook (threading the credential), building
93+
the storer per resolved fs. Reuse go-git's bare-repo detection
94+
(`FilesystemLoader.load` returns `ErrRepositoryNotFound` when no `config` exists
95+
at the chroot root):
96+
97+
```go
98+
// storerFor returns the bare-repo storer rooted at fs, or
99+
// transport.ErrRepositoryNotFound when none exists there.
100+
func storerFor(fs billy.Filesystem) (storage.Storer, error) {
101+
return transport.NewFilesystemLoader(fs, false).Load(&url.URL{Path: "/"})
102+
}
103+
104+
func (d *daemon) load(ctx context.Context, ref repofs.RepoRef, cred repofs.Credential) (storage.Storer, error) {
105+
fs, err := d.resolver.Resolve(ctx, ref, cred)
106+
if err != nil { return nil, err }
107+
st, err := storerFor(fs)
108+
if err != nil { return nil, err }
109+
if err := ensureHEAD(st); err != nil { slog.Warn("...", "repo", ref.Path(), "err", err) }
110+
return st, nil
111+
}
112+
113+
func (d *daemon) loadOrInit(ctx context.Context, ref repofs.RepoRef, cred repofs.Credential) (storage.Storer, error) {
114+
fs, err := d.resolver.Resolve(ctx, ref, cred)
115+
if err != nil { return nil, err }
116+
st, err := storerFor(fs)
117+
if err == nil { ensureHEAD(st); return st, nil }
118+
if !errors.Is(err, transport.ErrRepositoryNotFound) { return nil, err }
119+
st = filesystem.NewStorage(fs, cache.NewObjectLRUDefault())
120+
if _, err := git.Init(st, git.WithDefaultBranch(plumbing.NewBranchReferenceName("main"))); err != nil {
121+
return nil, fmt.Errorf("init bare repo: %w", err)
122+
}
123+
metrics.ReposCreated()
124+
slog.Info("created repository", "repo", ref.Path())
125+
return st, nil
126+
}
127+
```
128+
129+
The old `d.fs.Chroot(repoPath)` step is gone — `Resolve` returns the repo-root
130+
fs directly, so resolution happens once per request.
131+
132+
## HTTP transport (`cmd/objgitd/http.go` + `main.go`) — primary work
133+
134+
Replace the suffix-dispatch `ServeHTTP` with an `http.ServeMux` (built by a new
135+
`d.httpHandler()` method, wired in `main.go` as the server `Handler`). With a
136+
fixed two-segment path the wildcards the old code couldn't use now work:
137+
138+
- `GET /{orgID}/{repoName}/info/refs`
139+
- `POST /{orgID}/{repoName}/git-upload-pack`
140+
- `POST /{orgID}/{repoName}/git-receive-pack`
141+
142+
Handlers read `r.PathValue("orgID")`/`r.PathValue("repoName")`, build the ref via
143+
`repofs.Parse(path.Join(orgID, repoName))`, and 400 on `ErrInvalidPath`.
144+
ServeMux 404s anything that isn't exactly two segments before the suffix, so the
145+
shape is enforced for free.
146+
147+
`resolve` extracts the Basic-auth credential and threads it through:
148+
149+
```go
150+
func credFromRequest(r *http.Request) (auth.Credential, repofs.Credential) {
151+
if u, p, ok := r.BasicAuth(); ok {
152+
return auth.BasicAuth{Username: u, Password: p}, repofs.Credential{Username: u, Password: p}
153+
}
154+
return auth.Anonymous{}, repofs.Credential{}
155+
}
156+
```
157+
158+
(or keep the existing `auth` credential helper and build the `repofs.Credential`
159+
inline). `resolve`, `handleInfoRefs`, `handleRPC`, and `d.receivePack` change
160+
their `repoPath string` parameter to a `repofs.RepoRef`; `resolve` passes the
161+
`repofs.Credential` to `load`/`loadOrInit`. Logging/hook context uses
162+
`ref.Path()`. Remove the variable-depth comment block and the now-unused
163+
`strings` import if it drops out.
164+
165+
## git:// and SSH — mechanical only (out of scope)
166+
167+
`git_protocol.go handle` and `ssh.go handleSSH` must adapt to the new
168+
`load`/`loadOrInit` signatures: parse their raw path with `repofs.Parse`
169+
(rendering `ErrInvalidPath` in their own dialect — pktline error / stderr+exit)
170+
and pass an empty `repofs.Credential{}`. `ssh.go`'s host-key load switches from
171+
`d.fs` to `d.sysFS`. No further redesign of these transports.
172+
173+
## `main.go` changes
174+
175+
- Keep building the base bucket fs (`fsys`) as today.
176+
- `d := &daemon{ sysFS: fsys, resolver: repofs.BucketResolver{Base: fsys}, authz: ..., allowHooks: ..., hookTimeout: ... }` — drop the `loader` field.
177+
- HTTP server `Handler: d.httpHandler()` instead of `Handler: d`.
178+
- Drop the `transport.NewFilesystemLoader` call; remove the `transport` import
179+
from `main.go` if it becomes unused.
180+
181+
## Behavioral note / migration
182+
183+
Stripping `.git` and requiring an org changes the storage key from `repo.git/`
184+
to `org/repo/`. Repos created under the old layout won't resolve under the new
185+
scheme. Acceptable for the current stage; no migration is in scope.
186+
187+
## Tests
188+
189+
- New `internal/repofs/repofs_test.go` — table-driven `Parse` cases (and a tiny
190+
`BucketResolver.Resolve` check that it chroots to `ref.Path()`).
191+
- Update `cmd/objgitd/http_test.go` (and the shared helpers in
192+
`git_protocol_test.go` it reuses): remotes gain an org segment (`/test.git`
193+
`/acme/test.git`), and storage-key assertions drop `.git`
194+
(`/test.git/config``/acme/test/config`; `assertPackedRepo(t, fs,
195+
"/acme/test")`). The git:// tests in `git_protocol_test.go` need the same path
196+
updates to keep passing.
197+
- Optionally add an HTTP test that a single-segment path returns 404 and that a
198+
Basic-auth credential reaches a stub resolver.
199+
200+
## Verification
201+
202+
```text
203+
go build ./...
204+
go test ./internal/repofs/...
205+
go test -run TestSmartHTTP ./cmd/objgitd/... # requires git on PATH
206+
go test ./cmd/objgitd/...
207+
```
208+
209+
End-to-end against a real bucket:
210+
211+
```text
212+
./objgitd -bucket $BUCKET -http-bind :8080 -allow-push
213+
git clone http://user:pass@localhost:8080/acme/demo.git # creates acme/demo/ on first push; user/pass reach the resolver
214+
git clone http://localhost:8080/demo.git # single segment -> 404
215+
```

0 commit comments

Comments
 (0)