Playing around with rust and distributed systems.
- node = storage node responsible for storing and retreiving data chunks in its 1:1 sqlite database
- control-plane = core orchestrator service that handles chunk -> node mappings, performs health checks, handles replication and node management. Exposes basic admin endpoints to get an idea of your system state.
- cli = client to upload and download files
- assembler = service which stores file manifests for retreival allowing the cli client to request a filename, which it can then map to the client id and content hash for retreival
UPLOAD
- User uploads file via cli
- Cli hashes file contents for unique id, breaks file into fixed size chunks and sends them to the control-plane to be stored
- Cli also generates a manifest of the file containing metadata such as original filename, clientid and a list of chunks generates as -. This is sent to the assemlber.
- Control plane using ring hash to determine which node to send each chunk too, each chunk will be replicated n times based on policy config. Each chunk is sent to a node to be stored.
- Nodes receive chunks and PUT them into their db under the chunk id
DOWNLOAD
- User requests download of a fileame via cli
- Cli sends filename and client id to assembler to handle file retreival
- Assembler finds the manifest matching the client and file name
- Uses futures to send request for all file chunks for given manifest to the control plane and awaits return.
- Control plane ring hash is deteministic so will know which nodes possibly hold the data chunks, it sends get requests to these nodes and returns whichever returns first back to the assembler.
- Once all chunks received, assembler recreates file in index order and sends back to cli client as file download. If failed chunks it aborts.
BACKGROUND
- Control-plane performs health checks on nodes for faile or absent responses
- If node has failed health check, it marks it as dead, finds the chunks the nodes has and replicates them to another node to maintain n replica.
- It then spawns another node to ensure node fleet maintained at certain threshold.
ENVOY?
- I originally tried this using envoy as the orchestrator, works fine for single node storage but harder to have explicit control over replica and track where chunks are stored. Maybe envoy comes back later in some way or form, I do want to explore proxys more
- would be cool to have a UI that can link in to the admin endpoints and visualise the node network and how the data is being transferred.
- maybe some basic controls to stress test the network to see how it performs