You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR builds on #8753
This is a hefty PR, but it's not as bad as it looks. Over 4k lines of it
is in the example log file in the second commit. There's also some moved
and unmodified code that I'll point out.
This PR introduces a new test tool for the trust-quorum protocol:
tqdb. tqdb is a repl that takes event traces produced by the `cluster`
proptest and uses them for deterministic replay of actions against test
state.
The test state includes a "universe" of real protocol nodes, a fake
nexus, and fake networks. The proptest and debugging state is shared and
contained in the `trust-quorum-test-utils`.
The debugger allows a variety of functionality including stepping
through individual events, setting breakpoints, snapshotting and diffing
states and viewing the event log itself.
The purpose of tqdb is twofold:
1. Allow for debugging of failed proptests. This is non-trivial in
some cases, even with shrunken tests, because the generated
actions are high-level and are all generated up front. The actual
operations such as reconfigurations are derived from these high
level random generations in conjunction with the current state
of the system. Therefore the set of failing generated actions
doesn't really tell you much. You have to look at the logs, and
the assertion that fired and reason about it with incomplete
information. Now, for each concrete action taken, we record the
event in a log. In the case of a failure an event log can be
loaded into tqdb, with a breakpoint set right before the failure. A
snapshot of the state can be taken, and then the failing event can
be applied. The diff will tell you what changed and allow you to
inspect the actual state of the system. Full visibility into your
failure is now possible.
2. The trust quorum protocol is non-trivial. Tqdb allows developers
to see in detail how the protocol behaves and understand what is
happening in certain situations. Event logs can be created by hand
(or script) for particularly interesting scenarios and then run
through tqdb.
In order to get the diff functionality to work as I wanted, I had to
implement `Eq` for types that implemented `subtle::ConstantTimeEq` in
both `gfss` (our secret sharing library), and `trust-quorum` crates.
However the safety in terms of the compiler breaking the constant
time guarantees is unknown. Therefore, a feature flag was added
such that only `test-utils` and `tqdb` crates are able to use these
implementations. They are not used in the production codebase. Feature
unification is not at play here because neither `test-utils` or `tqdb`
are part of the product.
0 commit comments