Skip to content

Commit ed2e3e0

Browse files
committed
add proposal/discussion on capacity planning
1 parent 995d5ce commit ed2e3e0

1 file changed

Lines changed: 197 additions & 0 deletions

File tree

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Proposal: Dynamic witness capacity allocation
2+
3+
This is a proposal I made to increase witness utilization, originally posted on
4+
Matrix. I also had some in-person discussions about this with rgdd, notes on
5+
those below.
6+
7+
## Original proposal
8+
9+
Before the witness network, the process for a log operator for finding witness
10+
was:
11+
12+
(1) Find out that a witness/witness operator (conflated for now) exists
13+
(2) Vet the witness
14+
(3) Ask the witness to configure the log and wait for that to happen
15+
16+
(or more likely: Ask somebody to set up a witness for them)
17+
18+
The witness operator perspective mostly comes in at step 3 where they need to
19+
decide if they can (and want, but let's suppose this is always the case for
20+
now) support the log. This is not a trivial "yes" because there always will be
21+
capacity limits to consider.
22+
23+
The witness network helps with 1 and 3 by providing a list of witnesses and
24+
managing their configuration. This means that for the log operator, they only
25+
need to talk to the witness network maintainers and once their log gets
26+
accepted, they can have their pick from any witness that configures the
27+
respective log lists (after a short while once they have updated them).
28+
29+
This currently works by the witness network maintainers packing logs into
30+
specific "performance tiers" defined by the maximum number of logs and
31+
witnessing requests per second (qps) a given witness can support. Witnesses are
32+
supposed to configure log lists starting from the lowest performance profile up
33+
to the highest one that they can still support in aggregate (i.e. including the
34+
lower tiers).
35+
36+
In the following, we'll mostly consider the qps dimension of resource usage,
37+
since this is the much tighter bottleneck in practice.
38+
39+
The way logs get assigned to performance profiles is currently not documented,
40+
but seems to follow a strategy that tries to minimize qps utilization per
41+
performance profile (i.e. even though the 10qps list would have capacity to
42+
accommodate a 1qps log, 1qps logs get allocated to the 100qps list first since
43+
adding a 1qps log to the former would use up 10% of its capacity).
44+
45+
There are a number of problems with the current approach:
46+
47+
- Logs will only make use of a subset of witnesses available to them, but the
48+
log list capacity planning can't know which, so it must happen based on
49+
worst case (all logs in a list fully utilize all witnesses available to
50+
them), which leads to underutilization within a given performance profile.
51+
52+
- Similarly, a witness must assume that all logs from the lists it configures
53+
will make use of it, so it configures less log lists than it could
54+
actually handle in practice, also leading to underutilization and less
55+
witnesses being available to logs in higher profiles.
56+
57+
- On top, this scheme requires picking somewhat arbitrary bucket sizes for the
58+
performance profiles.
59+
60+
The core issue here is that the witness network itself tries to do capacity
61+
planning for the witness operators and tries to do so for all of them
62+
simultaneously. Furthermore, it does this in advance, without knowing how log
63+
operators will make use of witnesses.
64+
65+
An alternative approach could be to remove this capacity planning component
66+
from the witness network and only have it be a place where witnesses and logs
67+
can advertise their existence. To that end, consider the following architecture:
68+
69+
- There is only one list/pool of logs (containing the same data as
70+
today, including estimated qps). When a log is retired it is marked as
71+
inactive. (Could also be removed from the list, but the following description
72+
is clearer that way)
73+
74+
- As long as they are below capacity, witnesses keep importing *all* logs in
75+
this list. They also provide an interface where log operators can query if
76+
their log is configured (i.e. the witness would accept add-checkpoints
77+
requests for it). A log will be advertised as supported as long as
78+
activating that individual log would not exceed that witness's capacity.
79+
80+
- Witnesses keep track of utilized qps. They do this by aggregating the
81+
advertised qps from the log list over all the logs that have sent them at
82+
least one checkpoint and are not marked as inactive in the log list.
83+
84+
- Once a witness has reached its locally-configured qps limit, it stops
85+
advertising support for/accepting checkpoints from logs from which it
86+
hasn't received any checkpoints yet.
87+
88+
- It still keeps updating the log list to see if logs have been marked as
89+
inactive, which might free up capacity if one such log has previously been
90+
active on this witness. In that case, it starts advertising logs again.
91+
92+
This resolves the problems described above:
93+
94+
- If a log does not decide to make use of a witness for one reason or another,
95+
the capacity for that log is not needlessly reserved on that witness.
96+
97+
- All witnesses with spare capacity are available to all logs.
98+
99+
- Witness operators have fine-grained control over the capacity of their
100+
witness and witnesses can reach closer to 100% utilization.
101+
102+
When a witness is at capacity, a witness operator can easily deploy another
103+
witness which will start picking up new/different logs (since a single log is
104+
unlikely to use multiple witnesses from the same operator unless for
105+
redundancy).
106+
107+
But other than increased complexity, there are also some further downsides:
108+
109+
- Unlike today, a log can't be certain it will get picked up by the witnesses
110+
it likes (or possibly any witness) if it has been accepted into the network.
111+
Thus, a log operator needs to query individual witnesses to see if they have
112+
picked up the log. But to some extent this already the case today since log
113+
list downloads might only happen weekly for example.
114+
115+
- Since logs decide which witnesses they claim, ecosystem diversity can be
116+
affected by log choices. I.e. a 1qps log takes a hypothetical 1qps witness
117+
fully out of the ecosystem, but it would likely be better for resilience
118+
to partition that same witness among 10 0.1qps logs.
119+
120+
- There is also the potential for race conditions. I.e. a log operator looks
121+
at all the witnesses with spare capacity, carefully vets them and a few
122+
seconds before they start making use of it, somebody else claims all spare
123+
capacity of that witness.
124+
125+
## Discussion notes
126+
127+
Discussion based on this with rgdd during the 2025-05 Sigsum community meeting.
128+
I'm writing this from memory a few days later so it's probably a bit
129+
inaccurate.
130+
131+
- The main issue with managing the witness network is that we're dealing with
132+
a scarce resource (witness capacity), if every witness could do 500 qps and
133+
had unlimited storage, all witnesses could just be free-for-all and we
134+
wouldn't have to have these discussions. However, this is not the case.
135+
136+
- The goal of the witness network is not only to help coordinate between logs
137+
and witnesses, but also manage this scarce resource in a thoughtful way.
138+
This is the part I wasn't aware of as being a deliberate decision, which
139+
invalidates the above proposal to an extent.
140+
141+
At the same time, the witness network aims to serve the long tail of logs.
142+
The assumption is that heavy hitters such as MTC will curate their own set
143+
of witnesses for policy reasons anyway.
144+
145+
So this fits together well - for example an average Sigsum log does 0.1qps.
146+
Others might do even less or only produce a checkpoint sporadically. Thus
147+
even 10qps of capacity could serve a lot of logs (possibly *all* of the
148+
long-tail ones).
149+
150+
- A part of the awkwardness is that the witness network maintainers do not
151+
want to be in the position to be able to DoS logs. Thus, they deliberately
152+
aren't able to cause deconfiguration of logs. However, logs (especially
153+
things like CT logs) retire frequently. Ideally, logs would be able to
154+
signal this to the world (and witnesses in particular) cryptographically,
155+
but the proposed mechanism (tombstones) has not been fleshed out yet.
156+
157+
Putting this off was "fine" since there was enough spare capacity as not to
158+
have to worry about this now.
159+
160+
- However, CT logs getting started to be admitted to the witness network
161+
compounded this issue and prompted the creation of the 100k log list (which
162+
in turn prompted the above proposal). Maybe creating such a big list was a
163+
mistake.
164+
165+
- Maybe eventually having multiple 10qps lists (maybe grouped somehow so that
166+
witness operators can choose which parts of the ecosystem to support) would
167+
be better. This would also help with better bin-packing on the witness side.
168+
169+
Probably leaving the 100k CT log list as is though?
170+
171+
Further notes added by rgdd:
172+
173+
- Seems like 100qps might have been an unnecessarily big jump, which, e.g.,
174+
have made it difficult for some (potential) witness operators to configure
175+
it.
176+
177+
- When doing some napkin math, the current 10qps list would likely be able to
178+
accomodate a lot of the "long tail" / lower-frequency logs; and perhaps one
179+
or two high-profile ones with higher qps like Go's checksum database.
180+
181+
- From CT, we're probably expecting something like 10 qps.
182+
183+
- From MTC, we're probably talking about a qps in the same ballpark (?)
184+
185+
- We don't have that many other high-qps logs right now, and having something
186+
like 10qps reserved for that will probably serve us well for some time.
187+
188+
- So if it increases the number of participating witnesses, then it might be a
189+
better trade-off to have several 10qps lists (.2, .3) where we basically
190+
have one which is the "longer tail one" and another which is the "higher-qps
191+
one". And the "higher qps-one" we expect to fill up a bit quicker, and when
192+
it's full we will create another one. Or maybe we should even create multipe
193+
ones right away, and witnesses configure as many as they can even though,
194+
e.g., .3 is not being populated quite yet?
195+
196+
- Working on defining tombstone for proper deallocation = worth while to do
197+
soon since CT is interested in taking part (and sharding is frequent there).

0 commit comments

Comments
 (0)