Skip to content

feat: in-process embedded permissions and per-schema-version caching#3166

Open
josephschorr wants to merge 5 commits into
authzed:mainfrom
josephschorr:embedded-permissions
Open

feat: in-process embedded permissions and per-schema-version caching#3166
josephschorr wants to merge 5 commits into
authzed:mainfrom
josephschorr:embedded-permissions

Conversation

@josephschorr

@josephschorr josephschorr commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Adds the ability to embed SpiceDB's permission engine in-process (no gRPC server) and makes per-schema-version caching of derived artifacts possible, eliminating repeated work on the check hot path.

Changes

  • datalayer: cache schema-derived artifacts on the stored-schema cache. The stored-schema cache already shares one instance per schema version (keyed by schema hash). Bundle lazily-built, schema-derived caches (compiled caveats today; type systems / reachability later) into that instance via a new CachedSchema type, so they share the schema's lifetime and are discarded, together, when the schema changes. Register a kind with RegisterDerivedCache; retrieve the per-schema instance with GetDerivedCache. The stored-schema cache now holds *datalayer.CachedSchema; pkg/datastore is unchanged.

  • datastore: honor schema mode when writing bootstrap data. NewDatastore's bootstrap always wrote schema in legacy per-definition form regardless of the configured schema mode, so a server reading from the unified schema would find none. Adds a BootstrapSchemaMode option; the zero value preserves prior behavior.

  • caveats: cache compiled caveats per schema version. ComputeCheck constructs a fresh CaveatRunner per call, so its per-runner deserialized-caveat cache never persists across checks. Deserializing a caveat rebuilds its CEL environment, which is expensive and — with a rich caveat type set — dominated check cost. This caches compiled caveats on the schema-derived cache, keyed by caveat name, so the CEL environment is built once per schema version rather than once per check. Falls back to per-runner caching when the reader is not backed by a unified stored schema.

  • embedded: add in-process permissions library. Adds pkg/embedded, a focused library for running permission checks in-process against a datastore via the dispatch engine — no gRPC, caveat context passed as native Go values. Includes a README and tests.

  • datastore: cost the stored-schema cache by estimated byte size. The cache previously admitted every entry at a fixed cost of 1, so a byte-denominated MaxCost (the server's 32MiB default, or the embedded SchemaCacheMaxCostBytes) actually bounded the number of cached schema versions rather than memory. Each entry is now costed by ReadOnlyStoredSchema.EstimatedSize(): a rough schema byte size plus, for each registered schema-derived cache kind, an estimate of the bytes it adds once populated. Datastore readers pass the exact serialized size they just read; other callers fall back to a cheap schema-text proxy, avoiding a full proto.Size walk (~1.5ms on a 1k-definition schema). RegisterDerivedCache now takes a size estimator, and the compiled-caveat cache estimates per-caveat CEL overhead.

Testing

  • mage lint:all passes; unit suite passes.
  • New tests: pkg/datalayer (derived-cache registry: lazy build, per-schema isolation, duplicate/unregistered handling), internal/caveats (compiled-caveat cache), pkg/embedded (checks across both legacy and unified schema modes, incl. caveated/conditional results), and stored-schema cost estimation (EstimatedSize reflects schema bytes plus registered derived-cache estimators).

@josephschorr josephschorr requested a review from a team as a code owner June 9, 2026 02:22
@github-actions github-actions Bot added area/cli Affects the command line area/api v1 Affects the v1 API area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jun 9, 2026
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

@josephschorr josephschorr force-pushed the embedded-permissions branch 2 times, most recently from 96c22be to 7fc3209 Compare June 9, 2026 19:33

@tstirrat15 tstirrat15 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Comment thread internal/caveats/schemacache.go Outdated
Comment on lines +17 to +19
if err := datalayer.RegisterDerivedCache(compiledCaveatCacheKey, func() any { return &CompiledCaveatCache{} }); err != nil {
spiceerrors.MustPanicf("failed to register compiled caveat cache: %v", err)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting to have many of these? Like do we need a registry, or can this be singleton?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry, to avoid an import cycle: a typed field per cache kind would make pkg/datastore depend on the packages that own those caches (caveats now, type systems/reachability later) — which already depend on it. Registration inverts that. We only expect a handful.

Comment thread pkg/datalayer/derivedcache.go Outdated
)

// CachedSchema bundles a read-only stored schema with lazily-built, schema-derived caches
// (for example compiled caveats, type systems, or reachability). Instances are produced and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These being the other examples of things that we would register?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — compiled caveats today; type systems and reachability later.

Comment thread pkg/datalayer/derivedcache.go Outdated

// DerivedCacheKey identifies a kind of schema-derived cache (e.g. compiled caveats). Create
// one per kind, typically as a package-level var, via NewDerivedCacheKey.
type DerivedCacheKey struct{ name string }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own edification: why is this a struct with a single member, rather than a subtype of string directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the key is opaque — it can only come from NewDerivedCacheKey, not a bare string literal at some call site. Can make it a string subtype if you'd rather.

Comment thread pkg/datalayer/derivedcache.go Outdated
// GetDerivedCache returns the schema-derived cache of type T registered under key for the
// given cached schema, building it lazily on first access.
func GetDerivedCache[T any](c *CachedSchema, key DerivedCacheKey) T {
return c.derivedCache(key).(T)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MustBugF here? Or are we not worried about a panic here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah — it now returns an error instead of relying on the bare type assertion.

Comment on lines +42 to +46
// SchemaCacheMaxCostBytes, when > 0, enables an in-memory stored-schema cache of the
// given size in bytes. Caching the stored schema across checks is what allows
// schema-derived caches (e.g. compiled caveats) to persist; strongly recommended when
// SchemaMode reads from the unified schema.
SchemaCacheMaxCostBytes int64

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious - why do we need a max cost here but not on the other schemas?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the memory bound for the one cache the embedded library stands up. It's a real byte budget now — each entry is costed by the schema size plus the derived caches hanging off it.

}

// Permissions issues in-process permission checks against a datastore.
type Permissions struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PermissionsChecker? Or is the idea that this is PermissionsService minus the Service?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the name until the API shape settles (your thread with miparnisari).

Comment thread pkg/embedded/permissions.go Outdated
Comment on lines +116 to +118
// CheckRequest is a single, fully-consistent permission check. The caveat context is passed
// as native Go values (no structpb / base64 round-trip).
type CheckRequest struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mark that this should track the protobuf API shape? Is there a way to get a derived golang shape from the protobuf shape? Like I'm assuming this is because we don't want to pay the serde cost on Structs?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly — passing native Go values avoids building a structpb.Struct just for the server to convert it straight back. Added a note that it mirrors v1.CheckPermissionRequest and should be kept in sync.

}
}

// Close releases resources held by the checker. It does not close the datastore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the datastore, is that because we don't want to have to do dependency checking on the datastore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller owns the datastore — it may be shared and outlive the checker, so closing it here would be wrong. Close only releases what this object created.

@miparnisari miparnisari left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the approach taken in this PR. You're constructing things that we already have (e.g you wrote a new CheckRequest object and embedded.NewPermissions) and we don't offer the entire API surface. We are also losing request validation I think?

Can we do something like this instead? Reuse our existing objects:

type server struct {
    dl  datalayer
    v1svc.NewPermissionsServer
    v1svc.NewSchemaServer
    v1svc.NewWatchServer
}

// server inherits the entire API surface, plus:

func  New(dl datalayer, options ...) *server {
    // configure everything
    // including eg the protovalidate object
}

func (s *server) Close() {
}

It's probably more work, but i think it's better for the long run since we don't duplicate code.

If you look at our existing devcontext code, it looks very similar to what I suggested 😉 we could replace all of it with this server object.

func (dc *DevContext) RunV1InMemoryService() (*grpc.ClientConn, func(), error) {
listener := bufconn.Listen(defaultConnBufferSize)
s := grpc.NewServer(
grpc.ChainUnaryInterceptor(
datalayer.UnaryServerInterceptor(dc.DataLayer),
consistency.UnaryServerInterceptor("development", consistency.TreatMismatchingTokensAsError),
),
grpc.ChainStreamInterceptor(
datalayer.StreamServerInterceptor(dc.DataLayer),
consistency.StreamServerInterceptor("development", consistency.TreatMismatchingTokensAsError),
),
)
validator, err := genutil.NewProtoValidator()
if err != nil {
return nil, nil, err
}
ps := v1svc.NewPermissionsServer(dc.Dispatcher, v1svc.PermissionsServerConfig{
MaxUpdatesPerWrite: 50,
MaxPreconditionsCount: 50,
MaximumAPIDepth: 50,
MaxCaveatContextSize: 0,
ExpiringRelationshipsEnabled: true,
CaveatTypeSet: caveattypes.Default.TypeSet,
PerformanceInsightMetricsEnabled: false,
}, validator)
ss := v1svc.NewSchemaServer(v1svc.SchemaServerConfig{
CaveatTypeSet: caveattypes.Default.TypeSet,
AdditiveOnly: false,
ExpiringRelsEnabled: true,
PerformanceInsightMetricsEnabled: false,
}, validator)
v1.RegisterPermissionsServiceServer(s, ps)
v1.RegisterSchemaServiceServer(s, ss)
go func() {
if err := s.Serve(listener); err != nil {
log.Err(err).Msg("error when serving in-memory server")
}
}()
conn, err := grpchelpers.NewBufferedClient(listener)
return conn, func() {
conn.Close()
listener.Close()
s.Stop()
}, err
}

@josephschorr

Copy link
Copy Markdown
Member Author

I'm not a fan of the approach taken in this PR. You're constructing things that we already have (e.g you wrote a new CheckRequest object and embedded.NewPermissions) and we don't offer the entire API surface. We are also losing request validation I think?

Can we do something like this instead? Reuse our existing objects:

Not without re-introducing a requirement to encode/decode obejcts as structpb's, which is a loss of performance we should not need to pay, especially for this simplified use case

@miparnisari

Copy link
Copy Markdown
Contributor

which is a loss of performance we should not need to pay

The penalty comes only in caveated checks. IMO it doesn't justify the maintenance burden that comes with the duplication and the poor DX that comes from only offering one API (CheckPermission).

@josephschorr josephschorr force-pushed the embedded-permissions branch from 7fc3209 to 44a772b Compare June 25, 2026 17:37
@josephschorr

Copy link
Copy Markdown
Member Author

which is a loss of performance we should not need to pay

The penalty comes only in caveated checks. IMO it doesn't justify the maintenance burden that comes with the duplication and the poor DX that comes from only offering one API (CheckPermission).

It is a non-trivial amount of overhead and many "in memory" systems do use caveats for dynamic or real-time behavior, so I do think it is worth it

@github-actions github-actions Bot added area/datastore Affects the storage system and removed area/api v1 Affects the v1 API labels Jun 25, 2026
Add a lazily-built, schema-derived cache layer to ReadOnlyStoredSchema. Because a
single ReadOnlyStoredSchema instance is shared across callers for a given schema
version via the stored-schema cache, derived caches (e.g. compiled caveats) hung
off it live exactly as long as that schema version and are discarded, together,
when the schema changes.

Cache kinds register a factory at init via RegisterDerivedCache and are retrieved
with the generic GetDerivedCache. The schema-reader adapter exposes the shared
stored schema so consumers can reach it.
NewDatastore's bootstrap always wrote schema in legacy per-definition form
regardless of the configured schema mode, so a server reading from the unified
schema would find none. Add a BootstrapSchemaMode option so the bootstrap
datalayer writes schema in the requested mode. The zero value preserves the
prior (legacy) behavior.

When bootstrapping the first schema under the unified schema, the populate path
reads the (absent) existing schema to resolve references; tolerate
ErrSchemaNotFound there and resolve against the definitions being written, the
same way the schema-write service does.
Deserializing a caveat rebuilds its CEL environment, which is expensive. When the
reader is backed by a unified stored schema, cache compiled caveats on that schema
version (via the schema-derived cache on ReadOnlyStoredSchema) so the cost is paid
once per schema rather than once per check. Falls back to per-runner caching when
no stored schema is available.
Add pkg/embedded, a thin in-process wrapper over the dispatch engine
(computed.ComputeCheck + a local-only dispatcher) that issues fully-consistent
permission checks against a datastore without standing up a gRPC server. Caveat
context is passed as native Go values, avoiding the structpb/base64 round-trip
the gRPC path pays. The schema is optionally cached across checks, which lets the
schema-derived compiled-caveat cache persist.
The stored-schema cache previously admitted every entry at a fixed cost of 1, so a
MaxCost configured in bytes (e.g. the server's 32MiB default, or the embedded
SchemaCacheMaxCostBytes) actually bounded the number of cached schema versions rather
than memory.

Cost each entry by ReadOnlyStoredSchema.EstimatedSize() instead: a rough schema byte
size plus, for each registered schema-derived cache kind, an estimate of the bytes it
adds once populated. Datastore readers pass the exact serialized size they just read;
other callers fall back to a cheap schema-text proxy, avoiding a full proto walk
(proto.Size is ~1.5ms on a 1k-definition schema). RegisterDerivedCache now takes a size
estimator, and the compiled-caveat cache estimates per-caveat CEL overhead.
@josephschorr josephschorr force-pushed the embedded-permissions branch from 44a772b to a6f9516 Compare June 25, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants