|
| 1 | +# Proposal: Dynamic witness capacity allocation |
| 2 | + |
| 3 | +This is a proposal I made to increase witness utilization, originally posted on |
| 4 | +Matrix. I also had some in-person discussions about this with rgdd, notes on |
| 5 | +those below. |
| 6 | + |
| 7 | +## Original proposal |
| 8 | + |
| 9 | +Before the witness network, the process for a log operator for finding witness |
| 10 | +was: |
| 11 | + |
| 12 | + (1) Find out that a witness/witness operator (conflated for now) exists |
| 13 | + (2) Vet the witness |
| 14 | + (3) Ask the witness to configure the log and wait for that to happen |
| 15 | + |
| 16 | +(or more likely: Ask somebody to set up a witness for them) |
| 17 | + |
| 18 | +The witness operator perspective mostly comes in at step 3 where they need to |
| 19 | +decide if they can (and want, but let's suppose this is always the case for |
| 20 | +now) support the log. This is not a trivial "yes" because there always will be |
| 21 | +capacity limits to consider. |
| 22 | + |
| 23 | +The witness network helps with 1 and 3 by providing a list of witnesses and |
| 24 | +managing their configuration. This means that for the log operator, they only |
| 25 | +need to talk to the witness network maintainers and once their log gets |
| 26 | +accepted, they can have their pick from any witness that configures the |
| 27 | +respective log lists (after a short while once they have updated them). |
| 28 | + |
| 29 | +This currently works by the witness network maintainers packing logs into |
| 30 | +specific "performance tiers" defined by the maximum number of logs and |
| 31 | +witnessing requests per second (qps) a given witness can support. Witnesses are |
| 32 | +supposed to configure log lists starting from the lowest performance profile up |
| 33 | +to the highest one that they can still support in aggregate (i.e. including the |
| 34 | +lower tiers). |
| 35 | + |
| 36 | +In the following, we'll mostly consider the qps dimension of resource usage, |
| 37 | +since this is the much tighter bottleneck in practice. |
| 38 | + |
| 39 | +The way logs get assigned to performance profiles is currently not documented, |
| 40 | +but seems to follow a strategy that tries to minimize qps utilization per |
| 41 | +performance profile (i.e. even though the 10qps list would have capacity to |
| 42 | +accommodate a 1qps log, 1qps logs get allocated to the 100qps list first since |
| 43 | +adding a 1qps log to the former would use up 10% of its capacity). |
| 44 | + |
| 45 | +There are a number of problems with the current approach: |
| 46 | + |
| 47 | + - Logs will only make use of a subset of witnesses available to them, but the |
| 48 | + log list capacity planning can't know which, so it must happen based on |
| 49 | + worst case (all logs in a list fully utilize all witnesses available to |
| 50 | + them), which leads to underutilization within a given performance profile. |
| 51 | + |
| 52 | + - Similarly, a witness must assume that all logs from the lists it configures |
| 53 | + will make use of it, so it configures less log lists than it could |
| 54 | + actually handle in practice, also leading to underutilization and less |
| 55 | + witnesses being available to logs in higher profiles. |
| 56 | + |
| 57 | + - On top, this scheme requires picking somewhat arbitrary bucket sizes for the |
| 58 | + performance profiles. |
| 59 | + |
| 60 | +The core issue here is that the witness network itself tries to do capacity |
| 61 | +planning for the witness operators and tries to do so for all of them |
| 62 | +simultaneously. Furthermore, it does this in advance, without knowing how log |
| 63 | +operators will make use of witnesses. |
| 64 | + |
| 65 | +An alternative approach could be to remove this capacity planning component |
| 66 | +from the witness network and only have it be a place where witnesses and logs |
| 67 | +can advertise their existence. To that end, consider the following architecture: |
| 68 | + |
| 69 | + - There is only one list/pool of logs (containing the same data as |
| 70 | + today, including estimated qps). When a log is retired it is marked as |
| 71 | + inactive. (Could also be removed from the list, but the following description |
| 72 | + is clearer that way) |
| 73 | + |
| 74 | + - As long as they are below capacity, witnesses keep importing *all* logs in |
| 75 | + this list. They also provide an interface where log operators can query if |
| 76 | + their log is configured (i.e. the witness would accept add-checkpoints |
| 77 | + requests for it). A log will be advertised as supported as long as |
| 78 | + activating that individual log would not exceed that witness's capacity. |
| 79 | + |
| 80 | + - Witnesses keep track of utilized qps. They do this by aggregating the |
| 81 | + advertised qps from the log list over all the logs that have sent them at |
| 82 | + least one checkpoint and are not marked as inactive in the log list. |
| 83 | + |
| 84 | + - Once a witness has reached its locally-configured qps limit, it stops |
| 85 | + advertising support for/accepting checkpoints from logs from which it |
| 86 | + hasn't received any checkpoints yet. |
| 87 | + |
| 88 | + - It still keeps updating the log list to see if logs have been marked as |
| 89 | + inactive, which might free up capacity if one such log has previously been |
| 90 | + active on this witness. In that case, it starts advertising logs again. |
| 91 | + |
| 92 | +This resolves the problems described above: |
| 93 | + |
| 94 | + - If a log does not decide to make use of a witness for one reason or another, |
| 95 | + the capacity for that log is not needlessly reserved on that witness. |
| 96 | + |
| 97 | + - All witnesses with spare capacity are available to all logs. |
| 98 | + |
| 99 | + - Witness operators have fine-grained control over the capacity of their |
| 100 | + witness and witnesses can reach closer to 100% utilization. |
| 101 | + |
| 102 | +When a witness is at capacity, a witness operator can easily deploy another |
| 103 | +witness which will start picking up new/different logs (since a single log is |
| 104 | +unlikely to use multiple witnesses from the same operator unless for |
| 105 | +redundancy). |
| 106 | + |
| 107 | +But other than increased complexity, there are also some further downsides: |
| 108 | + |
| 109 | + - Unlike today, a log can't be certain it will get picked up by the witnesses |
| 110 | + it likes (or possibly any witness) if it has been accepted into the network. |
| 111 | + Thus, a log operator needs to query individual witnesses to see if they have |
| 112 | + picked up the log. But to some extent this already the case today since log |
| 113 | + list downloads might only happen weekly for example. |
| 114 | + |
| 115 | + - Since logs decide which witnesses they claim, ecosystem diversity can be |
| 116 | + affected by log choices. I.e. a 1qps log takes a hypothetical 1qps witness |
| 117 | + fully out of the ecosystem, but it would likely be better for resilience |
| 118 | + to partition that same witness among 10 0.1qps logs. |
| 119 | + |
| 120 | + - There is also the potential for race conditions. I.e. a log operator looks |
| 121 | + at all the witnesses with spare capacity, carefully vets them and a few |
| 122 | + seconds before they start making use of it, somebody else claims all spare |
| 123 | + capacity of that witness. |
| 124 | + |
| 125 | +## Discussion notes |
| 126 | + |
| 127 | +Discussion based on this with rgdd during the 2025-05 Sigsum community meeting. |
| 128 | +I'm writing this from memory a few days later so it's probably a bit |
| 129 | +inaccurate. |
| 130 | + |
| 131 | + - The main issue with managing the witness network is that we're dealing with |
| 132 | + a scarce resource (witness capacity), if every witness could do 500 qps and |
| 133 | + had unlimited storage, all witnesses could just be free-for-all and we |
| 134 | + wouldn't have to have these discussions. However, this is not the case. |
| 135 | + |
| 136 | + - The goal of the witness network is not only to help coordinate between logs |
| 137 | + and witnesses, but also manage this scarce resource in a thoughtful way. |
| 138 | + This is the part I wasn't aware of as being a deliberate decision, which |
| 139 | + invalidates the above proposal to an extent. |
| 140 | + |
| 141 | + At the same time, the witness network aims to serve the long tail of logs. |
| 142 | + The assumption is that heavy hitters such as MTC will curate their own set |
| 143 | + of witnesses for policy reasons anyway. |
| 144 | + |
| 145 | + So this fits together well - for example an average Sigsum log does 0.1qps. |
| 146 | + Others might do even less or only produce a checkpoint sporadically. Thus |
| 147 | + even 10qps of capacity could serve a lot of logs (possibly *all* of the |
| 148 | + long-tail ones). |
| 149 | + |
| 150 | + - A part of the awkwardness is that the witness network maintainers do not |
| 151 | + want to be in the position to be able to DoS logs. Thus, they deliberately |
| 152 | + aren't able to cause deconfiguration of logs. However, logs (especially |
| 153 | + things like CT logs) retire frequently. Ideally, logs would be able to |
| 154 | + signal this to the world (and witnesses in particular) cryptographically, |
| 155 | + but the proposed mechanism (tombstones) has not been fleshed out yet. |
| 156 | + |
| 157 | + Putting this off was "fine" since there was enough spare capacity as not to |
| 158 | + have to worry about this now. |
| 159 | + |
| 160 | + - However, CT logs getting started to be admitted to the witness network |
| 161 | + compounded this issue and prompted the creation of the 100k log list (which |
| 162 | + in turn prompted the above proposal). Maybe creating such a big list was a |
| 163 | + mistake. |
| 164 | + |
| 165 | + - Maybe eventually having multiple 10qps lists (maybe grouped somehow so that |
| 166 | + witness operators can choose which parts of the ecosystem to support) would |
| 167 | + be better. This would also help with better bin-packing on the witness side. |
| 168 | + |
| 169 | + Probably leaving the 100k CT log list as is though? |
| 170 | + |
| 171 | +Further notes added by rgdd: |
| 172 | + |
| 173 | + - Seems like 100qps might have been an unnecessarily big jump, which, e.g., |
| 174 | + have made it difficult for some (potential) witness operators to configure |
| 175 | + it. |
| 176 | + |
| 177 | + - When doing some napkin math, the current 10qps list would likely be able to |
| 178 | + accomodate a lot of the "long tail" / lower-frequency logs; and perhaps one |
| 179 | + or two high-profile ones with higher qps like Go's checksum database. |
| 180 | + |
| 181 | + - From CT, we're probably expecting something like 10 qps. |
| 182 | + |
| 183 | + - From MTC, we're probably talking about a qps in the same ballpark (?) |
| 184 | + |
| 185 | + - We don't have that many other high-qps logs right now, and having something |
| 186 | + like 10qps reserved for that will probably serve us well for some time. |
| 187 | + |
| 188 | + - So if it increases the number of participating witnesses, then it might be a |
| 189 | + better trade-off to have several 10qps lists (.2, .3) where we basically |
| 190 | + have one which is the "longer tail one" and another which is the "higher-qps |
| 191 | + one". And the "higher qps-one" we expect to fill up a bit quicker, and when |
| 192 | + it's full we will create another one. Or maybe we should even create multipe |
| 193 | + ones right away, and witnesses configure as many as they can even though, |
| 194 | + e.g., .3 is not being populated quite yet? |
| 195 | + |
| 196 | + - Working on defining tombstone for proper deallocation = worth while to do |
| 197 | + soon since CT is interested in taking part (and sharding is frequent there). |
0 commit comments