|
1 | | -TODO |
| 1 | +# Copilot Verifier |
| 2 | + |
| 3 | +This "Copilot Verifier" is an add-on to the |
| 4 | +[Copilot Stream DSL](https://copilot-language.github.io) |
| 5 | +for verifying the correctness of C code generated |
| 6 | +by the `copilot-c99` package. |
| 7 | + |
| 8 | +The main idea of the verifier is to use the |
| 9 | +[Crucible symbolic simulator](https://github.com/galoisinc/crucible) |
| 10 | +to interpret the semantics of the generated C program and |
| 11 | +and to produce verification conditions sufficient to guarantee |
| 12 | +that the meaning of the generated program corresponds in a precise |
| 13 | +way to the meaning of the original stream specification. The generated |
| 14 | +verification conditions are then dispatched to SMT solvers to |
| 15 | +be automatically solved. We will have more to say about exactly |
| 16 | +what is meant by this correspondence below. |
| 17 | + |
| 18 | +## Using the Copilot Verifier |
| 19 | + |
| 20 | +The main interface to the verifier is the `Copilot.Verifier.verify` |
| 21 | +function, which takes some options to drive the code generation |
| 22 | +process and a Copilot `Spec` object. It will invoke the Copilot |
| 23 | +code generator to obtain C sources, compile the C sources into |
| 24 | +LLVM bitcode using the `clang` compiler front-end, then |
| 25 | +parse and interpret the bitcode using `crucible`. After generating |
| 26 | +verification conditions, it will dispatch them to an SMT solver, |
| 27 | +and the result of those queries will be presented to the user. |
| 28 | + |
| 29 | +There are a number of examples (based on similar examples |
| 30 | +from the main Copilot repository) demonstrating how to |
| 31 | +incorporate the verifier into a Copilot program. |
| 32 | +See the `copilot-verifier/examples` subdirectory of this repository. |
| 33 | + |
| 34 | +## Details of the Verification |
| 35 | + |
| 36 | +### Synopsis of Copilot semantics |
| 37 | + |
| 38 | +The Copilot DSL represents a certain nicely-behaved |
| 39 | +class of discrete-time stream programs. Each `Stream` |
| 40 | +in Copilot denotes an infinite stream of values; one may |
| 41 | +just as well think that `Stream a` represents a pure mathematical |
| 42 | +function `ℕ → a` from natural numbers to values of type `a`. |
| 43 | +See the |
| 44 | +[Copilot manual](https://ntrs.nasa.gov/api/citations/20200003164/downloads/20200003164.pdf) |
| 45 | +for more details of the Copilot language itself and its semantics. |
| 46 | + |
| 47 | +One of the central design considerations for Copilot is that is should |
| 48 | +be possible to implement stream programs using a fixed, finite amount |
| 49 | +of storage. As a result, Copilot will only accept stream programs |
| 50 | +that access a bounded amount of their input streams (including any |
| 51 | +recursive stream references). This allows an |
| 52 | +implementation strategy where the generated C code can use fixed-size |
| 53 | +ring buffers to store a limited number of previous stream values. |
| 54 | + |
| 55 | +The execution model for the generated programs is that the program |
| 56 | +starts in a state corresponding to stream values at time `t = 0`; |
| 57 | +"external" stream input values are placed in distinguished global |
| 58 | +variables by the calling environment, which then executes a `step()` |
| 59 | +function to move to the next time step. The `step()` function captures |
| 60 | +the values of these external inputs and does whatever computation is |
| 61 | +necessary to update its internal state from time `t=n` to time |
| 62 | +`t=n+1`. In addition, it will call any defined "trigger" functions |
| 63 | +if the stream state at that time step satisfies the defined guard condition. |
| 64 | +In short, the generated C program steps one moment at a time through |
| 65 | +the stream program, consuming a sequence of input values provided by |
| 66 | +a cooperating environment and calling handler functions whenever |
| 67 | +states of interest occur. |
| 68 | + |
| 69 | +### The Desired Correspondence |
| 70 | + |
| 71 | +What does it mean, then, for a generated C program in this style |
| 72 | +to correctly implement a given stream program? The intuition |
| 73 | +is basically that after `n` calls to the `step` function, |
| 74 | +the state of the ring-buffers of the C program should correctly |
| 75 | +compute the value of the corresponding stream expression |
| 76 | +evaluated at index `n`, assuming the C program has been fed |
| 77 | +inputs corresponding to the first `n` values of the external stream |
| 78 | +inputs. Moreover, the trigger functions should be called from |
| 79 | +the `step` function exactly at the time values when the stream expressions |
| 80 | +evaluate to true. |
| 81 | + |
| 82 | +The notion of correspondence for the values flowing in streams is |
| 83 | +relatively straightforward: these values consist of fixed-width |
| 84 | +machine integers, floating-point values, structs and fixed-length |
| 85 | +arrays. For each, the appropriate notion of equality is fairly clear. |
| 86 | + |
| 87 | +Both the original `Stream` program and the generated C program |
| 88 | +can be viewed straightforwardly as a transition system, and under |
| 89 | +this view, the correspondence we want to establish is a bisimulation |
| 90 | +between the states of the high-level stream program and the low-level |
| 91 | +C program. The proof method for bisimulation requires us to provide |
| 92 | +a "correspondence" relation between the program states, and then prove |
| 93 | +three things about this relation: |
| 94 | + |
| 95 | +1. that the initial states of the programs are in the relation; |
| 96 | +2. if we assume two arbitrary program states begin in the relation |
| 97 | +and each takes a single transition (consuming corresponding inputs), |
| 98 | +the resulting states are back in the relation; |
| 99 | +3. that any observable |
| 100 | +actions taken by one program are exactly mirrored by the other. |
| 101 | + |
| 102 | +On the high-level side of the bisimulation, the program |
| 103 | +state is essentially just the value of the current time step `n`, |
| 104 | +whereas on the C side it consists of the regions of global memory that |
| 105 | +contain the ring-buffers and their current indices. The transition |
| 106 | +relation for the high-level program just increments the time value by |
| 107 | +one, and the transition for the C program is defined by the action |
| 108 | +of the generated `step()` function. |
| 109 | + |
| 110 | +Suppose `s` is one of the stream definitions in the original Copilot |
| 111 | +program which is required to retain `k` previous values; |
| 112 | +let `buf` be the global variable name of the ring-buffer in the C |
| 113 | +program, and `idx` be the global variable name maintaining the |
| 114 | +current index into the ring buffer. Then the correspondence |
| 115 | +relation is basically that `0 <= idx < k` and |
| 116 | +`s[n+i] = buf[(idx+i) mod k]` as `i` ranges from `0 .. k-1`. |
| 117 | +By abuse of notation, here we mean that `s[j]` is |
| 118 | +the value of the stream expression `s` evaluated at index `j`, |
| 119 | +whereas `buf[j]` means the value obtained by reading the `j`th value |
| 120 | +of the buffer `buf` from memory. The overall correspondence relation |
| 121 | +is a conjunction of statements like this, one for each stream |
| 122 | +expression that is realized via a buffer. |
| 123 | + |
| 124 | +### Implementing the Bisimulation proof steps |
| 125 | + |
| 126 | +The kind of program correspondence property we desire is a largely |
| 127 | +mechanical affair. As the code under consideration is automatically |
| 128 | +generated, it has a very regular structure and is specifically |
| 129 | +intended to implement the semantics we wich to verify it against. As |
| 130 | +such, we expect these proofs to be highly automatable. |
| 131 | + |
| 132 | +The proof principle of bisimulation itself is not amenable |
| 133 | +to reduction to SMT, as if falls outside the first-order theories |
| 134 | +those solvers understand. Likewise, the semantics of Copilot |
| 135 | +and C might possibly be reducible directly to SMT, but it would be |
| 136 | +impractical to do so. However, we can reduce the individual |
| 137 | +proof obligations mentioned above into a series of lower-level |
| 138 | +logical statements that are amenable to SMT proof by |
| 139 | +defining the logical semantics of stream programs, and using |
| 140 | +symbolic simulation techniques to interpret the semantics of the |
| 141 | +C program. Performing this reduction is the key contribution |
| 142 | +of `copilot-verifier`. |
| 143 | + |
| 144 | +#### Initial state correspondence |
| 145 | + |
| 146 | +The proof first obligation we must discharge is to show that |
| 147 | +the initial states of the two programs correspond. For each |
| 148 | +stream `s` there is a corresponding `k`, which is the length of |
| 149 | +the ring-buffer implementing it. We must simply verify that |
| 150 | +the C program initializes its buffer with the first `k` values |
| 151 | +of the stream `s`, and that the `idx` value starts at 0. |
| 152 | +Because of the restrictions Copilot places on programs, these |
| 153 | +first `k` values must be specified concretely and will not be |
| 154 | +able to depend on any external inputs. As such, this step |
| 155 | +is quite easily performed, as it requires only direct evaluation |
| 156 | +of concrete inputs. |
| 157 | + |
| 158 | +#### Transition correspondence |
| 159 | + |
| 160 | +The bulk of the proof effort consists of demonstrating that |
| 161 | +the bisimulation relation is preserved by transitions. |
| 162 | +In order to do this step, we must begin with totally symbolic |
| 163 | +initial states: we know nothing except that we are at some |
| 164 | +arbitrary time value `n`, and that the C program buffers |
| 165 | +correspond to their stream values as required by the relation. |
| 166 | +Thus, we create fresh symbolic variables for the streams |
| 167 | +from `n` to `n + k-1`, and for the values of all the involved |
| 168 | +external streams at time `n`. Then, we run forward the Copilot |
| 169 | +program by evaluating the stream recurrence expression to |
| 170 | +compute the value of each stream at time `n+k`. |
| 171 | + |
| 172 | +Next we set up an initial state of the C program by choosing, |
| 173 | +for each ring buffer, an arbitrary value for its current index |
| 174 | +within its allowed range, and then writing the variables |
| 175 | +corresponding to each stream value into the buffers at |
| 176 | +their appropriate offsets. The symbolic simulator is then |
| 177 | +invoked to compute the state update effects of the `step()` |
| 178 | +function. Afterward, we read the poststate values from the |
| 179 | +ring-buffers and verify that they correspond to the stream |
| 180 | +values from `n+1` up to `n+k`. |
| 181 | + |
| 182 | +As part of symbolic simulation, Crucible may also generate |
| 183 | +side-conditions that relate to memory safety of the program, or to |
| 184 | +error conditions that must be avoided. All of the bisimulation |
| 185 | +equivalence conditions and the safety side-conditions will be |
| 186 | +submitted to an SMT solver. |
| 187 | + |
| 188 | +#### Observable effects |
| 189 | + |
| 190 | +For our purposes, the only observable effects of a Copilot program |
| 191 | +relate to any "trigger" functions defined in the spec. Our task is to |
| 192 | +show that the generated C code calls the external trigger functions if |
| 193 | +and only if the corresponding guard condition is true, and that the |
| 194 | +arguments passed to those functions are as expected. |
| 195 | +This proof obligation is proved in the same phase along with |
| 196 | +the transition relation proof above because the `step()` function |
| 197 | +is responsible for both invoking triggers and for performing state |
| 198 | +updates. |
| 199 | + |
| 200 | +The technique we use to perform this aspect of the proof is to |
| 201 | +install "override" functions to the external symbols corresponding |
| 202 | +to the C trigger functions before we begin symbolic simulation. |
| 203 | +In a real system, the external environment would be responsible |
| 204 | +for implementing these functions and taking whatever appropriate |
| 205 | +action is necessary when the triggers fire. However, we are verifying |
| 206 | +the generated code in isolation from its environment, so we have no |
| 207 | +implementation in hand. Instead, the override will |
| 208 | +essentially implement a stub function that simply captures its |
| 209 | +arguments and the path condition under which it was called. |
| 210 | +After simulation completes, the captured arguments and path condition |
| 211 | +are required to be equivalent to the corresponding trigger guard |
| 212 | +and arguments from the Copilot spec. These conditions are |
| 213 | +discharged to the SMT solver at the same time as the transition |
| 214 | +relation obligations. |
| 215 | + |
| 216 | +Because of the way we model the trigger functions, we make a number of |
| 217 | +implicit assumptions about how the actual implementations of those |
| 218 | +functions must behave. The most important of those assumptions is that |
| 219 | +the trigger functions must not modify any memory under the control of |
| 220 | +the Copilot program, including its ring buffers and stack. We also |
| 221 | +assume that the trigger functions are well defined, i.e. they are |
| 222 | +memory safe and do not perform any undefined behavior. |
| 223 | + |
| 224 | +### Caveats About the Verifier |
| 225 | + |
| 226 | +We rely on the `clang` compiler front-end to consume C source files |
| 227 | +and produce LLVM intermediate language, which then becomes the input |
| 228 | +to the later verification steps. To the extent that the input program |
| 229 | +is not fully-portable C, `clang` may make implementation-specific |
| 230 | +decisions about how to compile the program which might be made |
| 231 | +different if compiled by a different compiler (e.g. `gcc`). We expect |
| 232 | +this aspect to be mitigated by the fact that Copilot programs are |
| 233 | +automatically generated into a rather simple subset of the C language, |
| 234 | +and is designed to be as simple as possible. |
| 235 | +Any code-generation bugs in `clang` itself may affect the soundness |
| 236 | +of our verifier. Again, however, Copilot generates a well-understood |
| 237 | +subset of the language, and we expect `clang` to be well-tested on |
| 238 | +the code patterns produced. |
| 239 | + |
| 240 | +The semantics of LLVM bitcode, as encoded in the `crucible-llvm` |
| 241 | +package, may have errors that affect soundness. We mitigate this risk |
| 242 | +by testing our semantics against a corpus of verification problems |
| 243 | +produced for the SV-COMP verification competition, paying special |
| 244 | +attention to any soundness issues that arise. `Crux`, a standalone |
| 245 | +verification system based on `crucible-llvm`, was a participant in the |
| 246 | +2022 edition of SV-COMP. |
| 247 | + |
| 248 | +The semantics of Copilot programs, as encoded in the |
| 249 | +`Copilot.Theorem.What4` module may have errors that affect soundness. |
| 250 | +For the moment we do not have an effective mitigation strategy for |
| 251 | +this risk other than manual examination and comparison against the |
| 252 | +intended semantics of Copilot, as encoded in the interpreter. |
| 253 | + |
| 254 | +There is limited SMT solver support for floating-point values, |
| 255 | +especially for transcendental functions like the trig primitives. As a |
| 256 | +result, we reason about floating point expressions via uninterpreted |
| 257 | +functions. In other words, we leave the semantics of the |
| 258 | +floating-point operations totally abstract, and simply verify that the |
| 259 | +Copilot program and the corresponding C program apply the same |
| 260 | +operations in the same order. This is sound, but leaves the possibility |
| 261 | +that the compiler will apply some correct transformation to |
| 262 | +floating-point expressions that we are nonetheless unable to verify. |
| 263 | +However, on low optimizations and without the `--fast-math` flag, |
| 264 | +compilers generally (and `clang` in particular) are very reluctant to |
| 265 | +rearrange floating-point code, and the verifications generally succeed. |
0 commit comments