Skip to content

Create a key based file databse so each key has its own file#538

Merged
satyakigh merged 2 commits intomainfrom
keyeddb
Apr 27, 2026
Merged

Create a key based file databse so each key has its own file#538
satyakigh merged 2 commits intomainfrom
keyeddb

Conversation

@satyakigh
Copy link
Copy Markdown
Collaborator

@satyakigh satyakigh commented Apr 26, 2026

Summary

Restructures the FileStore persistence layer from a single-file-per-store model to a one-file-per-key model. Each key gets its own encrypted file on disk instead of all keys being serialized into one monolithic JSON blob.

Key Semantics

In the context of this language server, a "key" is an AWS region (e.g., us-east-1, eu-west-1) most of the time. Each store (such as public_schemas) caches resource schemas per region. The number of keys is bounded by the number of AWS regions a user works with — typically single digits, at most a few dozen. This means the per-key file count stays small and hash collisions are effectively impossible.

Architecture Change

Old (EncryptedFileStore)

  • One encrypted file per store: {storeName}.enc
  • File contains all key-value pairs: { "key1": value1, "key2": value2, ... }
  • Every write acquires a lock, re-reads the entire file from disk, mutates the in-memory record, and serializes the full record back via atomic write (tmp + rename)
  • Constructor eagerly writes an empty {} to disk if the file doesn't exist

New (KeyedFileStore + EncryptedFile)

  • One encrypted file per key: {storeName}.{stableHashCode(key)}.enc
  • Each file contains an EncryptedEntry envelope: { "key": "original-key", "value": <data> }
  • The key is stored inside the encrypted payload for recovery during startup directory scan
  • KeyedFileStore maintains a Map<string, EncryptedFile> mapping logical keys to file handles
  • EncryptedFile is a new low-level class handling single-file encryption, locking, read, write, and delete
  • No files created until first put() — lazy initialization

What This Enables

  • Reduced write amplification: Writing one key no longer serializes/deserializes the entire store. Significant I/O improvement for stores with large schemas.
  • Better concurrency: Each key has its own file lock. Two processes writing different keys no longer contend on the same lock.
  • Granular deletion: Removing a key deletes one file instead of rewriting the entire store.
  • Reduced corruption blast radius: A corrupted file only affects one key, not the entire store.

Behavioral Differences

Behavior Old New
Files on disk 1 per store 1 per key per store
Write scope Full store rewrite Single file per key
Read-before-write Yes (re-reads entire store under lock) No (in-memory cache per file)
Cross-process visibility Immediate on next write On restart or loadAllKeys()
Empty store on disk File created immediately with {} No file until first put()
Lock granularity One lock per store One lock per key file
Corruption blast radius Entire store Single key
clear() Writes empty {} to single file Deletes all individual key files
keys() Returns from in-memory record Scans directory + decrypts all files

@satyakigh satyakigh marked this pull request as ready for review April 27, 2026 14:44
@satyakigh satyakigh requested a review from a team as a code owner April 27, 2026 14:44
Copy link
Copy Markdown
Collaborator

@kddejong kddejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Feedback

1. TOCTOU race in EncryptedFile.put() — first write is unlocked

if (!this.exists()) {
    await this.save();   // no lock
    return true;
}
const release = await lock(this.file, LOCK_OPTIONS);

Two language server processes writing the same region key for the first time will both see exists() === false and both skip the lock. The atomic rename prevents corruption, but proper-lockfile also has a problem on the remove() side — if one process deletes the data file, another process calling lock() on it gets ENOENT. Since the language server can run as multiple concurrent sessions (one per IDE window), this is a realistic scenario during simultaneous startup.

In practice the data is idempotent (same public schemas), so there is no data loss, but the unhandled ENOENT from proper-lockfile could surface as an uncaught exception.

Suggestion: Always create the file before locking, or use a lock file path independent of the data file (e.g., {file}.lk).

2. loadAllKeys() re-scans and re-decrypts everything on every call

loadAllKeys() is called from the constructor, keys(), stats(), and clear(). Since keys() and stats() are called every 60 seconds from the metrics interval, this results in a full readdirSync + N × (lockSync + readFileSync + decrypt) every minute per language server process. The old code just returned Object.keys(this.content) and did a single statSync.

Additionally, recoverKey() unconditionally creates a new EncryptedFile and sets it in the map, replacing any existing instance — discarding the in-memory reference and doing redundant I/O for keys already loaded.

Suggestion: Only scan for new files — skip filenames already mapped in keysToFiles. Or limit the full directory scan to construction time and let keys()/stats() return from the in-memory map only.

Comment thread src/datastore/file/EncryptedFile.ts
Comment thread src/datastore/file/KeyedFileStore.ts Outdated
@satyakigh satyakigh merged commit e4a44a8 into main Apr 27, 2026
15 checks passed
@satyakigh satyakigh deleted the keyeddb branch April 27, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants