Skip to content

Commit f0b7a4b

Browse files
epoberezkinspaced4ndyevgeny-simplexshumvgolove
authored
messaging services (#1667)
* smp server: messaging services (#1565) * smp server: refactor message delivery to always respond SOK to subscriptions * refactor ntf subscribe * cancel subscription thread and reduce service subscription count when queue is deleted * subscribe rcv service, deliver sent messages to subscribed service * subscribe rcv service to messages (TODO delivery on subscription) * WIP * efficient initial delivery of messages to subscribed service * test: delivery to client with service certificate * test: upgrade/downgrade to/from service subscriptions * remove service association from agent API, add per-user flag to use the service * agent client (WIP) * service certificates in the client * rfc about drift detection, and SALL to mark end of message delivery * fix test * fix test * add function for postgresql message storage * update migration * servers: maintain xor-hash of all associated queue IDs in PostgreSQL (#1668) * servers: maintain xor-hash of all associated queue IDs in PostgreSQL (#1615) * ntf server: maintain xor-hash of all associated queue IDs via PostgreSQL triggers * smp server: xor hash with triggers * fix sql and using pgcrypto extension in tests * track counts and hashes in smp/ntf servers via triggers, smp server stats for service subscription, update SMP protocol to pass expected count and hash in SSUB/NSSUB commands * agent migrations with functions/triggers * remove agent triggers * try tracking service subs in the agent (WIP, does not compile) * Revert "try tracking service subs in the agent (WIP, does not compile)" This reverts commit 59e9081. * comment * agent database triggers * service subscriptions in the client * test / fix client services * update schema * fix postgres migration * update schema * move schema test to the end * use static function with SQLite to avoid dynamic wrapper * agent: fail when per-connection transport isolation is used with services (#1670) * agent: service subscription events (#1671) * agent: use server keyhash when loading service record * agent: process queue/service associations with delayed subscription results * agent: service subscription events * agent: finalize initial service subscriptions, remove associations on service ID changes (#1672) * agent: remove service/queue associations when service ID changes * agent: check that service ID in NEW response matches session ID in transport session * agent subscription WIP * test * comment * enable tests * update queries * agent: option to add SQLite aggregates to DB connection (#1673) * agent: add build_relations_vector function to sqlite * update aggregate * use static aggregate * remove relations --------- Co-authored-by: Evgeny Poberezkin <evgeny@poberezkin.com> * add test, treat BAD_SERVICE as temp error, only remove queue associations on service errors * add packZipWith for backward compatibility with GHC 8.10.7 --------- Co-authored-by: spaced4ndy <8711996+spaced4ndy@users.noreply.github.com> * servers: service stats and logging, allow services without option (removed), report errors during service message delivery, remove threads when service subscription ended (#1676) * smp server: always allow services without option * smp server: maintain IDs hash in session subscription states * smp server: service message delivery error handling * ntf server: log subscription count and hash differences * smp server: remove delivery threads when service subscription ended/client disconnected * agent: remove service queue association when service ID changed, process ENDS event, test migrating to/from service (#1677) * agent: remove service queue association when service ID changed * agent: process ENDS event * agent: send service subscription error event * agent: test migrating to/from service subscriptions, fixes * agent: always remove service when disabled, fix service subscriptions * ntf server: use different client certs for each SMP server, remove support for store log (#1681) * ntf server: remove support for store log * ntf server: use different client certificates for each SMP server * smp protocol: fix encoding for SOKS/ENDS responses (#1683) * agent: create user with option to enable client service (#1684) * agent: create user with option to enable client service * handle HTTP2 errors * do not catch async exceptions * agent: minor fixes * docs: update protocol (#1705) * docs: agent threat model * update protocol docs * update RFCs (#1730) * update RFCs * update * update overview * update terminology * original language in threat model --------- Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> * docs: fix minor issues in protocols * docs: add e2e encrypted message wire encoding to PQDR spec * docs: add missing encodings and other protocol corrections * docs: move implemented rfcs * smp: service fixes (#1737) * smp: deliver service subscription to correct client * tests: more resilient to concurrency * optimize PostgreSQL query * fix service re-association after server "downgrade" * correctly handle service removed from server (and ID changed) * remove unused --------- Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> * prometheus: fix metrics names (#1747) * test: rcv service re-association on restart (#1746) * agent: correct log message * docs: update whitepaper * smp: fix messaging client service issues (#1751) * services: fix minor issues * fix accounting for subscribed service queues, add prometheus stats * fix uncorrelated subquery * fix potential race condition when inserting service defensively, as it is also prevented by how client is created --------- Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> * agent: refactor cleanup if no pending subs (#1757) * smp server: batch processing of subscription messages (#1753) * smp server: batch processing of subscription messages * refactor * empty line * fix --------- Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> * smp: batch queue association updates on subscriptions (#1760) * smp: batch queue association updates on subscriptions * refactor to fused batching * simpler * batch assoc functions * clean up * fix --------- Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> * agent: use primary key index in setRcvServiceAssocs (#1783) * agent: use primary key index in setRcvServiceAssocs Previous WHERE rcv_id = ? did not match the (host, port, rcv_id) primary key prefix and fell back to a table scan via idx_rcv_queues_client_notice_id. With ~390k rows per queue, each update in a 1350-row batch scanned the whole table, yielding ~290s per batch and a multi-hour rcv-services migration. * agent: pass SMPServer explicitly to setRcvServiceAssocs Avoid extracting host/port from the first queue inside setRcvServiceAssocs. The caller already has SMPServer in scope (from tSess) and the call chain is short, so threading it through is simpler than inspecting the list. Removes the empty-list guard from setRcvServiceAssocs (it remains in processRcvServiceAssocs). --------- Co-authored-by: spaced4ndy <8711996+spaced4ndy@users.noreply.github.com> Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com> Co-authored-by: sh <37271604+shumvgolove@users.noreply.github.com>
1 parent f03cec7 commit f0b7a4b

132 files changed

Lines changed: 7599 additions & 2476 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Server: batched SUB command processing
2+
3+
Implementation plan for Part 1 of [RFC 2026-03-28-subscription-performance](../rfcs/2026-03-28-subscription-performance.md).
4+
5+
## Current state
6+
7+
When a batch of ~135 SUB commands arrives, the server already batches:
8+
- Queue record lookups (`getQueueRecs` in `receive`, Server.hs:1151)
9+
- Command verification (`verifyLoadedQueue`, Server.hs:1152)
10+
11+
But command processing is per-command (`foldrM process` in `client`, Server.hs:1372-1375). Each SUB calls `subscribeQueueAndDeliver` which calls `tryPeekMsg` - one DB query per queue. For Postgres, that's ~135 individual `SELECT ... FROM messages WHERE recipient_id = ? ORDER BY message_id ASC LIMIT 1` queries per batch.
12+
13+
## Goal
14+
15+
Replace ~135 individual message peek queries with 1 batched query per batch. No protocol changes.
16+
17+
## Implementation
18+
19+
### Step 1: Add `tryPeekMsgs` to MsgStoreClass
20+
21+
File: `src/Simplex/Messaging/Server/MsgStore/Types.hs`
22+
23+
Add to `MsgStoreClass`:
24+
25+
```haskell
26+
tryPeekMsgs :: s -> [StoreQueue s] -> ExceptT ErrorType IO (Map RecipientId Message)
27+
```
28+
29+
Returns a map from recipient ID to earliest pending message for each queue that has one. Queues with no messages are absent from the map.
30+
31+
### Step 2: Parameterize `deliver` to accept pre-fetched message
32+
33+
File: `src/Simplex/Messaging/Server.hs`
34+
35+
Currently `deliver` (inside `subscribeQueueAndDeliver`, line 1641) calls `tryPeekMsg ms q`. Add a parameter for an optional pre-fetched message:
36+
37+
```haskell
38+
deliver :: Maybe Message -> (Bool, Maybe Sub) -> M s ResponseAndMessage
39+
deliver prefetchedMsg (hasSub, sub_) = do
40+
stats <- asks serverStats
41+
fmap (either ((,Nothing) . err) id) $ liftIO $ runExceptT $ do
42+
msg_ <- maybe (tryPeekMsg ms q) (pure . Just) prefetchedMsg
43+
...
44+
```
45+
46+
When `Nothing` is passed, falls back to individual `tryPeekMsg` (existing behavior). When `Just msg` is passed, uses it directly (batched path).
47+
48+
### Step 3: Pre-fetch messages before the processing loop
49+
50+
File: `src/Simplex/Messaging/Server.hs`
51+
52+
Currently (lines 1372-1375):
53+
54+
```haskell
55+
forever $
56+
atomically (readTBQueue rcvQ)
57+
>>= foldrM process ([], [])
58+
>>= \(rs_, msgs) -> ...
59+
```
60+
61+
Add a pre-fetch step before the existing loop:
62+
63+
```haskell
64+
forever $ do
65+
batch <- atomically (readTBQueue rcvQ)
66+
msgMap <- prefetchMsgs batch
67+
foldrM (process msgMap) ([], []) batch
68+
>>= \(rs_, msgs) -> ...
69+
```
70+
71+
`prefetchMsgs` scans the batch, collects queues from SUB commands that have a verified queue (`q_ = Just (q, _)`), calls `tryPeekMsgs` once, returns the map. For batches with no SUBs it returns an empty map (no DB call).
72+
73+
`process` passes the looked-up message (or Nothing) through to `processCommand` and down to `deliver`.
74+
75+
The `foldrM process` loop, `processCommand`, `subscribeQueueAndDeliver`, and all other command handlers stay structurally the same. Only `deliver` gains one parameter, and the `client` loop gains one pre-fetch call.
76+
77+
### Step 4: Review
78+
79+
Review the typeclass signature and server usage. Confirm the interface has the right shape before implementing store backends.
80+
81+
### Step 5: Implement for each store backend
82+
83+
#### Postgres
84+
85+
File: `src/Simplex/Messaging/Server/MsgStore/Postgres.hs`
86+
87+
Single query using `DISTINCT ON`:
88+
89+
```sql
90+
SELECT DISTINCT ON (recipient_id)
91+
recipient_id, msg_id, msg_ts, msg_quota, msg_ntf_flag, msg_body
92+
FROM messages
93+
WHERE recipient_id IN ?
94+
ORDER BY recipient_id, message_id ASC
95+
```
96+
97+
Build `Map RecipientId Message` from results.
98+
99+
#### STM
100+
101+
File: `src/Simplex/Messaging/Server/MsgStore/STM.hs`
102+
103+
Loop over queues, call `tryPeekMsg` for each, collect into map.
104+
105+
#### Journal
106+
107+
File: `src/Simplex/Messaging/Server/MsgStore/Journal.hs`
108+
109+
Loop over queues, call `tryPeekMsg` for each, collect into map.
110+
111+
### Step 6: Handle edge cases
112+
113+
1. **Mixed batches**: `prefetchMsgs` collects only SUB queues. Non-SUB commands get Nothing for the pre-fetched message and process unchanged.
114+
115+
2. **Already-subscribed queues**: Include in pre-fetch - `deliver` is called for re-SUBs too (delivers pending message).
116+
117+
3. **Service subscriptions**: The pre-fetch doesn't care about service state. `sharedSubscribeQueue` handles service association in STM; message peek is the same.
118+
119+
4. **Error queues**: Verification errors from `receive` are Left values in the batch. `prefetchMsgs` only looks at Right values with SUB commands.
120+
121+
5. **Empty pre-fetch**: If batch has no SUBs (e.g., all ACKs), `prefetchMsgs` returns empty map, no DB call made.
122+
123+
### Step 7: Batch other commands (future, not in scope)
124+
125+
The same pattern (pre-fetch before loop, parameterize handler) can extend to:
126+
- `ACK` with `tryDelPeekMsg` - batch delete+peek
127+
- `GET` with `tryPeekMsg` - same map lookup
128+
129+
Lower priority since these don't have the N-at-once pattern of subscriptions.
130+
131+
## File changes summary
132+
133+
| File | Change |
134+
|---|---|
135+
| `src/Simplex/Messaging/Server/MsgStore/Types.hs` | Add `tryPeekMsgs` to typeclass |
136+
| `src/Simplex/Messaging/Server/MsgStore/Postgres.hs` | Implement `tryPeekMsgs` with batch SQL |
137+
| `src/Simplex/Messaging/Server/MsgStore/STM.hs` | Implement `tryPeekMsgs` as loop |
138+
| `src/Simplex/Messaging/Server/MsgStore/Journal.hs` | Implement `tryPeekMsgs` as loop |
139+
| `src/Simplex/Messaging/Server.hs` | Add `prefetchMsgs`, parameterize `deliver` |
140+
141+
## Testing
142+
143+
1. Existing server tests must pass unchanged (correctness preserved).
144+
2. Add a test that subscribes a batch of queues (some with pending messages, some without) and verifies all get correct SOK + MSG responses.
145+
3. Prometheus metrics: existing `qSub` stat should still increment correctly.
146+
147+
## Performance expectation
148+
149+
For 300K queues across ~2200 batches:
150+
- Before: ~300K individual DB queries
151+
- After: ~2200 batched DB queries (one per batch of ~135)
152+
- ~136x reduction in DB round-trips
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Server: batch queue service associations
2+
3+
When a batch of SUB or NSUB commands arrives from a service client, each command that needs a new or removed service association calls `setQueueService` individually - one DB write per command. For 135 commands per batch, that's 135 individual `UPDATE msg_queues` queries.
4+
5+
## Goal
6+
7+
Reduce to at most 2 DB queries per batch (one for rcv associations, one for ntf associations), using `UPDATE ... RETURNING recipient_id` to identify which queues were actually updated.
8+
9+
Also fuse message pre-fetch and association batching into a single batch preparation step with a clean contract.
10+
11+
## Contract
12+
13+
```haskell
14+
prepareBatch :: Maybe ServiceId -> NonEmpty (VerifiedTransmission s) -> M s (Either ErrorType (Map RecipientId (Maybe Message, Maybe (Either ErrorType ()))))
15+
```
16+
17+
`Left e` = batch-level failure (message pre-fetch or association query failed entirely). All SUBs/NSUBs in the batch get this error.
18+
19+
`Right map` = per-queue results as a tuple:
20+
- `Maybe Message` - pre-fetched message for SUB queues, `Nothing` for NSUB or no message
21+
- `Maybe (Either ErrorType ())` - association result. `Nothing` = no update needed. `Just (Right ())` = update succeeded. `Just (Left e)` = update failed for this queue.
22+
23+
One map, one lookup per queue. `processCommand` passes both values to `subscribeQueueAndDeliver` / `subscribeNotifications` -> `sharedSubscribeQueue`.
24+
25+
Queues not in the map (non-SUB/NSUB commands, failed verification) are not affected.
26+
27+
## prepareBatch implementation
28+
29+
One accumulating fold over the batch, collecting three lists:
30+
- `subMsgQs :: [StoreQueue s]` - SUB queues for message pre-fetch
31+
- `rcvAssocQs :: [StoreQueue s]` - SUB queues needing `rcv_service_id` update (`clntServiceId /= rcvServiceId qr`)
32+
- `ntfAssocQs :: [StoreQueue s]` - NSUB queues needing `ntf_service_id` update (`clntServiceId /= ntfServiceId` from `NtfCreds`)
33+
34+
Classification reads from the already-loaded `QueueRec` in `VerifiedTransmission` - no extra DB query.
35+
36+
Then three store calls (each skipped if its list is empty):
37+
1. `tryPeekMsgs ms subMsgQs` -> `Map RecipientId Message`
38+
2. `setRcvQueueServices (queueStore ms) clntServiceId rcvAssocQs` -> `Set RecipientId`
39+
3. `setNtfQueueServices (queueStore ms) clntServiceId ntfAssocQs` -> `Set RecipientId`
40+
41+
Then one pass to merge results into `Map RecipientId (Maybe Message, Maybe (Either ErrorType ()))`:
42+
- For each SUB queue: `(M.lookup rId msgMap, assocResult rId rcvUpdated rcvAssocQs)`
43+
- For each NSUB queue: `(Nothing, assocResult rId ntfUpdated ntfAssocQs)`
44+
45+
Where `assocResult rId updated assocQs` = if the queue was in `assocQs` (needed update), then `Just (Right ())` if `rId` is in `updated`, else `Just (Left AUTH)`. If not in `assocQs` (no update needed), `Nothing`.
46+
47+
If any of the three calls fails entirely, return `Left e`.
48+
49+
## Store interface
50+
51+
Replace the polymorphic `setQueueServices` with two plain functions in `QueueStoreClass`:
52+
53+
```haskell
54+
setRcvQueueServices :: s -> Maybe ServiceId -> [q] -> IO (Set RecipientId)
55+
setNtfQueueServices :: s -> Maybe ServiceId -> [q] -> IO (Set RecipientId)
56+
```
57+
58+
No `SParty p` polymorphism. Each function knows its column.
59+
60+
### Postgres implementation
61+
62+
`setRcvQueueServices`:
63+
```sql
64+
UPDATE msg_queues SET rcv_service_id = ?
65+
WHERE recipient_id IN ? AND deleted_at IS NULL
66+
RETURNING recipient_id
67+
```
68+
69+
`setNtfQueueServices`:
70+
```sql
71+
UPDATE msg_queues SET ntf_service_id = ?
72+
WHERE recipient_id IN ? AND notifier_id IS NOT NULL AND deleted_at IS NULL
73+
RETURNING recipient_id
74+
```
75+
76+
After each batch query, for each queue in the returned set:
77+
1. Read QueueRec TVar, update with new serviceId
78+
2. Write store log entry
79+
80+
### STM implementation
81+
82+
Loop over queues, call existing per-item logic, collect succeeded `RecipientId`s into a Set.
83+
84+
## Downstream changes in Server.hs
85+
86+
### processCommand
87+
88+
Gains one parameter: `Map RecipientId (Maybe Message, Maybe (Either ErrorType ()))`.
89+
90+
SUB case: `M.lookup entId prepared` gives `Just (msg_, assocResult)` or `Nothing`. Pass both to `subscribeQueueAndDeliver`.
91+
92+
NSUB case: `M.lookup entId prepared` gives `Just (Nothing, assocResult)` or `Nothing`. Pass `assocResult` to `subscribeNotifications`.
93+
94+
Forwarded commands: pass `M.empty`.
95+
96+
### subscribeQueueAndDeliver
97+
98+
Takes `Maybe Message` and `Maybe (Either ErrorType ())` as before. No change in how it uses them.
99+
100+
### sharedSubscribeQueue
101+
102+
Takes `Maybe (Either ErrorType ())`. On paths needing association update:
103+
- `Just (Left e)` -> return error
104+
- `Just (Right ())` -> skip `setQueueService`, proceed with STM work
105+
- `Nothing` -> no update needed, proceed with existing logic
106+
107+
## Implementation order (top-down)
108+
109+
1. Define the `prepareBatch` contract and thread one map through `processCommand` -> `subscribeQueueAndDeliver` / `subscribeNotifications` -> `sharedSubscribeQueue` (Server.hs)
110+
2. Implement `prepareBatch` with the fold, three calls, and merge (Server.hs)
111+
3. Add `setRcvQueueServices` and `setNtfQueueServices` to `QueueStoreClass` (Types.hs)
112+
4. Implement for Postgres with batch `UPDATE ... RETURNING` (Postgres.hs)
113+
5. Implement for STM as loop (STM.hs)
114+
6. Implement for Journal as delegation (Journal.hs)
115+
116+
At step 2, store functions can initially be stubs returning empty sets. Steps 3-6 fill in the real implementations.
117+
118+
## Files changed
119+
120+
| File | Change |
121+
|---|---|
122+
| `src/Simplex/Messaging/Server.hs` | `prepareBatch` with fold + merge; one map parameter through `processCommand` -> `subscribeQueueAndDeliver` / `subscribeNotifications` -> `sharedSubscribeQueue` |
123+
| `src/Simplex/Messaging/Server/QueueStore/Types.hs` | Add `setRcvQueueServices`, `setNtfQueueServices` to `QueueStoreClass` |
124+
| `src/Simplex/Messaging/Server/QueueStore/Postgres.hs` | Implement with batch `UPDATE ... RETURNING` + per-item TVar/log updates |
125+
| `src/Simplex/Messaging/Server/QueueStore/STM.hs` | Implement as loop |
126+
| `src/Simplex/Messaging/Server/MsgStore/Journal.hs` | Delegate to underlying store |

rfcs/2024-09-01-smp-message-storage.md renamed to plans/done/2024-09-01-smp-message-storage.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
1-
# SMP server message storage
1+
2+
# SMP router message storage
23

34
## Problem
45

5-
Currently SMP servers store all queues in server memory. As the traffic grows, so does the number of undelivered messages. What is worse, Haskell is not avoiding heap fragmentation when messages are allocated and then de-allocated - undelivered messages use ByteString and GC cannot move them around, as they use pinned memory.
6+
Currently SMP routers store all queues in router memory. As the traffic grows, so does the number of undelivered messages. What is worse, Haskell is not avoiding heap fragmentation when messages are allocated and then de-allocated - undelivered messages use ByteString and GC cannot move them around, as they use pinned memory.
67

78
## Possible solutions
89

910
### Solution 1: solve only GC fragmentation problem
1011

1112
Move from ByteString to some other primitive to store messages in memory long term, e.g. ShortByteString, or manage allocation/de-allocation of stored messages manually in some other way.
1213

13-
Pros: the simplest solution that avoids substantial re-engineering of the server.
14+
Pros: the simplest solution that avoids substantial re-engineering of the router.
1415

1516
Cons:
1617
- not a long term solution, as memory growth still has limits.
@@ -22,12 +23,12 @@ Use files or RocksDB to store messages.
2223

2324
Pros:
2425
- much lower memory usage.
25-
- no message loss in case of abnormal server termination (important until clients have delivery redundancy).
26+
- no message loss in case of abnormal router termination (important until clients have delivery redundancy).
2627
- this is a long term solution, and at some point it might need to be done anyway.
2728

2829
Cons:
2930
- substantial re-engineering costs and risks.
30-
- metadata privacy. Currently we only save undelivered messages when server is restarted, with this approach all messages will be stored for some time. this argument is limited, as hosting providers of VMs can make memory snapshots too, on the other hand they are harder to analyze than files. On another hand, with this approach messages will be stored for a shorter time.
31+
- metadata privacy. Currently we only save undelivered messages when router is restarted, with this approach all messages will be stored for some time. this argument is limited, as hosting providers of VMs can make memory snapshots too, on the other hand they are harder to analyze than files. On another hand, with this approach messages will be stored for a shorter time.
3132

3233
#### RocksDB and other key-value stores
3334

@@ -67,7 +68,7 @@ queueLogLine =
6768
%s"write_msg=" digits
6869
```
6970

70-
When queue is first requested by the server:
71+
When queue is first requested by the router:
7172

7273
```c
7374
if queue folder exists:
@@ -87,7 +88,7 @@ nextReadMsg = read_msg
8788
open write_file in AppendMode
8889
```
8990

90-
When message is added to the queue (assumes that queue state is loaded to server memory, if not the previous section will be done first):
91+
When message is added to the queue (assumes that queue state is loaded to router memory, if not the previous section will be done first):
9192

9293
```c
9394
if write_msg > max_queue_messages:
@@ -128,7 +129,7 @@ else
128129
nextReadByte = current position in file
129130
```
130131

131-
When message delivery is acknowledged, the read queue needs to be advanced, and possibly switched to read from the current write_queue:
132+
When message delivery is acknowledged, the read queue needs to be advanced, and possibly switched to read from the current write queue:
132133

133134
```c
134135
if nextReadByte == read_byte:
@@ -162,9 +163,9 @@ Most Linux systems use EXT4 filesystem where the file lookup time scales linearl
162163

163164
So storing all queue folders in one folder won't scale.
164165

165-
To solve this problem we could use recipient queue ID in base64url format not as a folder name, but as a folder path, splitting it to path fragments of some length. The number of fragments can be configurable and migration to a different fragment size can be supported as the number of queues on a given server grows.
166+
To solve this problem we could use recipient queue ID in base64url format not as a folder name, but as a folder path, splitting it to path fragments of some length. The number of fragments can be configurable and migration to a different fragment size can be supported as the number of queues on a given router grows.
166167

167-
Currently, queue ID is 24 bytes random number, thus allowing 2^192 possible queue IDs. If we assume that a server must hold 1b queues, it means that we have ~2^162 possible addresses for each existing queue. 24 bytes in base64 is 32 characters that can be split into say 8 fragments with 4 characters each, so that queue folder path for queue with ID `abcdefghijklmnopqrstuvwxyz012345` would be:
168+
Currently, queue ID is 24 bytes random number, thus allowing 2^192 possible queue IDs. If we assume that a router must hold 1b queues, it means that we have ~2^162 possible addresses for each existing queue. 24 bytes in base64 is 32 characters that can be split into say 8 fragments with 4 characters each, so that queue folder path for queue with ID `abcdefghijklmnopqrstuvwxyz012345` would be:
168169

169170
`/var/opt/simplex/messages/abcd/efgh/ijkl/mnop/qrst/uvwx/yz01/2345`
170171

@@ -174,6 +175,6 @@ So we could use an unequal split of path, two letters each and the last being lo
174175

175176
`/var/opt/simplex/messages/ab/cd/ef/ghijklmnopqrstuvwxyz012345`
176177

177-
The first three levels in this case can have 4096 subfolders each, and it gives 68b possible subfolders (64^2^3), so the last level will be sparse in case of 1b queues on the server. So we could make it 4 levels with 2 letters to never think about it, accounting for a large variance of the random numbers distribution:
178+
The first three levels in this case can have 4096 subfolders each, and it gives 68b possible subfolders (64^2^3), so the last level will be sparse in case of 1b queues on the router. So we could make it 4 levels with 2 letters to never think about it, accounting for a large variance of the random numbers distribution:
178179

179180
`/var/opt/simplex/messages/ab/cd/ef/gh/ijklmnopqrstuvwxyz012345`

0 commit comments

Comments
 (0)