|
| 1 | +Encryption and IPLD, 2021 |
| 2 | +========================= |
| 3 | + |
| 4 | +This is an exploration report about the role and relationship of encryption relating to IPLD, |
| 5 | +gathering some thoughts and recent updates in early 2021. |
| 6 | +It's meant to be useful as a reference piece for further discussion at this time. |
| 7 | + |
| 8 | +This document takes input from tons of people. |
| 9 | +It's written by warpfork in the immediate aftermath of close conversation with Mikeal, |
| 10 | +but also has tons of input from Carson of Textile (via the 2021.01.11 IPLD Weekly meeting), |
| 11 | +and also draws on other notes exchanged over time with project such as the Ceramic Network, 3Box, Peergos, Qri, and others. |
| 12 | +(Even if you don't see your name here, it's likely you've contributed something -- |
| 13 | +this topic has just been a long time brewing, so attributing all inspirations completely is now hard!) |
| 14 | +Thank you to all these folks for their efforts. |
| 15 | + |
| 16 | + |
| 17 | +Overview |
| 18 | +-------- |
| 19 | + |
| 20 | +People frequently want to implement encryption as part of decentralized systems. |
| 21 | +So, it's no surprise that it's also frequent that people want a story for how encryption and IPLD should interact. |
| 22 | + |
| 23 | +For a long time, IPLD has been agnostic to any sort of encryption. |
| 24 | +(We've been afraid of doing a _wrong_ thing and baking it into specs.) |
| 25 | +Instead, we've asked that people using IPLD figure out how to compose IPLD and encryption on their own. |
| 26 | +It may now be time for this to change, as we gather lots of input from the community. |
| 27 | + |
| 28 | +In this document, we're going to cover three major topic groups: |
| 29 | + |
| 30 | +- 1. A proposal for encryption primitives in IPLD, and a plan for how to use multicodec indicators for encryption! |
| 31 | +- 2. A section about usage conventions we see which have repeatedly emerged, and seem useful, and thus now seem worth identifying and creating vocabulary for. |
| 32 | +- 3. A section for gathering notes about use cases, tradeoff notes, and general cautions about general applied cryptography. |
| 33 | + |
| 34 | +Comments and feedback on each of these topic groups are welcome. |
| 35 | + |
| 36 | + |
| 37 | +Encryption Codecs |
| 38 | +----------------- |
| 39 | + |
| 40 | +The IPLD team is now ready to plan (and implement) encryption which is signaled by multicodec indicators |
| 41 | +(and thus works anywhere CIDs are used), |
| 42 | +and works in the natural way an IPLD codec is expected to work. |
| 43 | + |
| 44 | +(This is a big change in stance. |
| 45 | +Previously, we've considered it unclear whether codecs) |
| 46 | + |
| 47 | +There's a couple of details about how we expect this to work which are recent realizations, |
| 48 | +and so this document might be nearly the first description of them: |
| 49 | + |
| 50 | + |
| 51 | +### encryption codecs use multicodec indicators |
| 52 | + |
| 53 | +As stated in the summary above: encryption will use multicodec indicators. |
| 54 | + |
| 55 | +This means we'll reserve new numbers in the multicodec table. |
| 56 | +We'll expect to see values like "AES-GCM" appear in the same table as "DAG-CBOR". |
| 57 | + |
| 58 | + |
| 59 | +### encryption codecs are still codecs of the usual contracts |
| 60 | + |
| 61 | +Codecs which do encryption will look like regular IPLD codecs. |
| 62 | + |
| 63 | +What does this mean? Well, in our recent improvements to formalizations, we now describe a codec as |
| 64 | +the operation "decode" -- `function (rawByteStream) -> (ipldDataModelNode | error)` -- |
| 65 | +and the operation "encode" -- `function (ipldDataModelNode, writeableBytestream) -> (error)`. |
| 66 | +(Loosely. This is psuedocode, not any particular programming language.) |
| 67 | + |
| 68 | +(Okay, what did _that_ mean? ;) ...I'll do it again in plain language.) |
| 69 | + |
| 70 | +The key detail that is important for IPLD's soundness is: |
| 71 | +the encoded data stream must be transformable to a "node" -- which must be describable _entirely_ and _purely_ by the IPLD Data Model -- |
| 72 | +and then back again, from that "node" to an encoded data stream. |
| 73 | + |
| 74 | +Okay, background established. Now: why does this matter to encryption? |
| 75 | + |
| 76 | +Two reasons: |
| 77 | + |
| 78 | +- that contract means *no additional parameters* are allowed. So, for encryption, it means keys don't -- *can't* -- enter into this yet. |
| 79 | +- that contract means we always have to be able to transform the encoded form into *something*. |
| 80 | + |
| 81 | + |
| 82 | +#### encryption codecs are defined as destructuring ciphertext |
| 83 | + |
| 84 | +... *not* as yielding cleartext. This may be unintuitive, but is important. |
| 85 | + |
| 86 | +First, an example: many encryption schemes have two components in their ciphertext: |
| 87 | +some sort of "initialization vector" (commonly known as an "IV"), which is a number unique to that ciphertext; |
| 88 | +and then the ciphertext body itself. |
| 89 | +So, for such an encryption scheme, the relevant IPLD codec would probably produce a _map_, matching this schema: |
| 90 | + |
| 91 | +```ipldsch |
| 92 | +type CodecResult struct { |
| 93 | + iv Bytes |
| 94 | + body Bytes |
| 95 | +} representation map |
| 96 | +``` |
| 97 | + |
| 98 | +(The actual serial form may look like anything it wants (likely, some binary length-prefixed format), |
| 99 | +because that's the responsibility of the codec implementation to define; |
| 100 | +this small schema just describes the Data Model view we might expect to be yielded.) |
| 101 | + |
| 102 | +This is neat in several ways: |
| 103 | + |
| 104 | +- It means that processing the data into Data Model is always *defined* -- even if you don't have key material. |
| 105 | + - This in turn means IPLD Selectors, and all sorts of other stuff, *work normally* over encrypted data. |
| 106 | + (Not over the cleartext, obviously -- then the encryption wouldn't be doing much, would it? |
| 107 | + But their operation is *defined*, so they can be used safely and predictably.) |
| 108 | +- It means we have a way to access the ciphertext. |
| 109 | + - ... That may not sound like a big deal, but it's been a weird and interesting buggaboo in a lot of other previous proposals about how to fit encryption into IPLD. |
| 110 | +- It means we don't have to solve the problem of how to get key material into a codec. |
| 111 | + - This is a big deal because it means, well, a bunch of our abstraction layers in IPLD don't... uh, shatter. Good. |
| 112 | + |
| 113 | +Okay, but how do we get to cleartext then? Let us proceed to the next section! |
| 114 | + |
| 115 | +#### getting to cleartext when using encryption codecs involves feature detection |
| 116 | + |
| 117 | +Encryption codecs in IPLD libraries will have extra methods on them, and support some kind of "feature detection" to advertise this. |
| 118 | +Those additional methods will accept key material as a parameter, and return an IPLD Data Model Node... of the *cleartext*. |
| 119 | +(E.g., the "node" returned here, and the "node" returned by the codec alone, will be *very* different data.) |
| 120 | + |
| 121 | +How exactly this looks will vary by langauge and library implementation; |
| 122 | +different languages will have different idioms for doing feature detection. |
| 123 | + |
| 124 | + |
| 125 | +### key management is out of band |
| 126 | + |
| 127 | +Keys still need to be supplied to the encrypt and decrypt methods of an encryption codec when they are used. |
| 128 | +This key supply and management is something that must be handled "out of band". |
| 129 | + |
| 130 | +We don't have a total strategy for automatic application of keys in large graphs. |
| 131 | +And we probably won't, either. |
| 132 | +We expect that most applications using cryptography will have some key management strategy that is unique to them, |
| 133 | +and will probably _not_ want their IPLD library dictating anything about key management. |
| 134 | +(For example, many complex applications using cryptography may involve key derivation strategies, |
| 135 | +which can even be content or data-organization aware -- we cannot possibly specify such things in IPLD; we need to just accept instructions on that.) |
| 136 | + |
| 137 | +IPLD will be open to future work on library functions for how to handle key management in practice. |
| 138 | +If we can find sufficiently common patterns, they may be worth library features. |
| 139 | +However, we should be comfortable understanding that there may actually not be single answers to key management, |
| 140 | +and the number of features relating to it that belong in IPLD libraries might be correspondingly minimal. |
| 141 | + |
| 142 | + |
| 143 | +### encryption codecs can be used recursively |
| 144 | + |
| 145 | +TODO (this emerges fairly naturally but deserves comment and example) |
| 146 | + |
| 147 | + |
| 148 | +### limitations of this approach |
| 149 | + |
| 150 | +#### double hashing |
| 151 | + |
| 152 | +This approach is roughly "mac-then-encrypt-then-mac" (if you're from the era of crypto education which called things "MAC" rather than "MIC" (which would make much more sense (but, I digress))). |
| 153 | + |
| 154 | +In other words: we hash things twice, and one of the hashes ends up in the output data body (because it's inside the ciphertext). |
| 155 | + |
| 156 | +There's nothing wrong with this (it's certainly cryptographically sound!); it's just slightly excessive and does spend a few bytes. |
| 157 | + |
| 158 | + |
| 159 | + |
| 160 | +Conventions and Usage Patterns around Encryption in IPLD |
| 161 | +--------------------------------------------------------- |
| 162 | + |
| 163 | +General notice: there are not single solutions to how to compose crypto systems. |
| 164 | +Many tradeoffs exist in design of applications using encryption. |
| 165 | +In some situations, metadata and size and access pattern concealment don't matter; |
| 166 | +in others, they're critically important, and an infinite amount of performance penalty is an acceptable trade. |
| 167 | +We can't make these decisions for applications. |
| 168 | +In this document, we'll limit our scope to talking about patterns that we've seen, |
| 169 | +and building some vocabulary around them, and sharing the ideas that seem to have good results. |
| 170 | + |
| 171 | + |
| 172 | +### Desirable traits |
| 173 | + |
| 174 | +Some frequently identified desires when working with encrypted data include: |
| 175 | + |
| 176 | +- ability to use "pinning" services without special integrations or disclosure of key material |
| 177 | +- ability to tersely identify subtrees, e.g. for purposes such as network transfer |
| 178 | + |
| 179 | +These are things which are well-provided for when using IPLD without encryption, |
| 180 | +but require some additional design when using IPLD with encryption, since the link structure of documents is generally itself encrypted. |
| 181 | + |
| 182 | +Mind: these goals are complicated: if they didn't require information that is _encrypted_, they wouldn't be worth special mention in the first place. |
| 183 | +It's very important to be sure you also consider the [cryptography caveats](#introduction-to-cryptography-caveats) when working with these goals. |
| 184 | + |
| 185 | + |
| 186 | +### Pattern: Cleartext Manifest over CIDs of Encrypted Data |
| 187 | + |
| 188 | +Key concepts: |
| 189 | + |
| 190 | +- All content is encrypted at block level (using the systems described in the [Encryption Codecs](#encryption-codecs) section) (so, we have a set of CIDs, all of which have a multicodec indicator that indicates some kind of encryption codec). |
| 191 | +- We still want to be able to pin the whole set, or fetch the whole set using one query. |
| 192 | + |
| 193 | +The solution to this is pretty clear: we want some merkle tree of cleartext IPLD objects, and that tree should just link to the encrypted data CIDs. |
| 194 | + |
| 195 | +#### manifest tree structure can be any form |
| 196 | + |
| 197 | +An interesting trait of the manifest pattern is that to provide its key benefits -- |
| 198 | +e.g. being able to refer to the whole set of data at once -- |
| 199 | +it doesn't actually _matter_ exactly _what_ tree structure or layout algorithm is used. |
| 200 | + |
| 201 | +HAMTs or Chunky Trees can both be used; or for small enough data, a plain map in a single block. |
| 202 | +Anything that reaches the goals works; there's little or no need to standardize on this. |
| 203 | + |
| 204 | + |
| 205 | + |
| 206 | +introduction to cryptography caveats |
| 207 | +------------------------------------ |
| 208 | + |
| 209 | +Designing cryptographic systems is _tricky_ -- to put it mildly. |
| 210 | + |
| 211 | +We can't always offer complete systems and complete guidance to cryptographic work. |
| 212 | +What we can do in IPLD is offer some components and, sometimes, some patterns of suggested use. |
| 213 | +How to put those things together (and do so safely) is still fundamentally the responsibility of the application developer. |
| 214 | + |
| 215 | +We also can't provide a complete introduction and set of coursework on how to compose cryptographic systems! |
| 216 | +Those are educational resources you'll need to find access to elsewhere if you haven't gotten it already. |
| 217 | + |
| 218 | +With all those caveats made, though, we'd like to offer a few pointers to topics you should at least be aware of. |
| 219 | +These topics are also especially relevant to the combination of encryption and IPLD because of how they involve tradeoffs |
| 220 | +(and, some of those tradeoffs are things that inform *why* we don't move certain kinds of features into IPLD specs -- it's because there's more than one way to go about it). |
| 221 | + |
| 222 | +### access patterns of ciphertext can leak hints about the cleartext |
| 223 | + |
| 224 | +### size of ciphertext may leak hints about the cleartext |
0 commit comments