Skip to content
This repository was archived by the owner on Jan 8, 2026. It is now read-only.

Commit 7ef4620

Browse files
committed
exploration report on encryption in IPLD
1 parent d8ae7e9 commit 7ef4620

1 file changed

Lines changed: 224 additions & 0 deletions

File tree

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
Encryption and IPLD, 2021
2+
=========================
3+
4+
This is an exploration report about the role and relationship of encryption relating to IPLD,
5+
gathering some thoughts and recent updates in early 2021.
6+
It's meant to be useful as a reference piece for further discussion at this time.
7+
8+
This document takes input from tons of people.
9+
It's written by warpfork in the immediate aftermath of close conversation with Mikeal,
10+
but also has tons of input from Carson of Textile (via the 2021.01.11 IPLD Weekly meeting),
11+
and also draws on other notes exchanged over time with project such as the Ceramic Network, 3Box, Peergos, Qri, and others.
12+
(Even if you don't see your name here, it's likely you've contributed something --
13+
this topic has just been a long time brewing, so attributing all inspirations completely is now hard!)
14+
Thank you to all these folks for their efforts.
15+
16+
17+
Overview
18+
--------
19+
20+
People frequently want to implement encryption as part of decentralized systems.
21+
So, it's no surprise that it's also frequent that people want a story for how encryption and IPLD should interact.
22+
23+
For a long time, IPLD has been agnostic to any sort of encryption.
24+
(We've been afraid of doing a _wrong_ thing and baking it into specs.)
25+
Instead, we've asked that people using IPLD figure out how to compose IPLD and encryption on their own.
26+
It may now be time for this to change, as we gather lots of input from the community.
27+
28+
In this document, we're going to cover three major topic groups:
29+
30+
- 1. A proposal for encryption primitives in IPLD, and a plan for how to use multicodec indicators for encryption!
31+
- 2. A section about usage conventions we see which have repeatedly emerged, and seem useful, and thus now seem worth identifying and creating vocabulary for.
32+
- 3. A section for gathering notes about use cases, tradeoff notes, and general cautions about general applied cryptography.
33+
34+
Comments and feedback on each of these topic groups are welcome.
35+
36+
37+
Encryption Codecs
38+
-----------------
39+
40+
The IPLD team is now ready to plan (and implement) encryption which is signaled by multicodec indicators
41+
(and thus works anywhere CIDs are used),
42+
and works in the natural way an IPLD codec is expected to work.
43+
44+
(This is a big change in stance.
45+
Previously, we've considered it unclear whether codecs)
46+
47+
There's a couple of details about how we expect this to work which are recent realizations,
48+
and so this document might be nearly the first description of them:
49+
50+
51+
### encryption codecs use multicodec indicators
52+
53+
As stated in the summary above: encryption will use multicodec indicators.
54+
55+
This means we'll reserve new numbers in the multicodec table.
56+
We'll expect to see values like "AES-GCM" appear in the same table as "DAG-CBOR".
57+
58+
59+
### encryption codecs are still codecs of the usual contracts
60+
61+
Codecs which do encryption will look like regular IPLD codecs.
62+
63+
What does this mean? Well, in our recent improvements to formalizations, we now describe a codec as
64+
the operation "decode" -- `function (rawByteStream) -> (ipldDataModelNode | error)` --
65+
and the operation "encode" -- `function (ipldDataModelNode, writeableBytestream) -> (error)`.
66+
(Loosely. This is psuedocode, not any particular programming language.)
67+
68+
(Okay, what did _that_ mean? ;) ...I'll do it again in plain language.)
69+
70+
The key detail that is important for IPLD's soundness is:
71+
the encoded data stream must be transformable to a "node" -- which must be describable _entirely_ and _purely_ by the IPLD Data Model --
72+
and then back again, from that "node" to an encoded data stream.
73+
74+
Okay, background established. Now: why does this matter to encryption?
75+
76+
Two reasons:
77+
78+
- that contract means *no additional parameters* are allowed. So, for encryption, it means keys don't -- *can't* -- enter into this yet.
79+
- that contract means we always have to be able to transform the encoded form into *something*.
80+
81+
82+
#### encryption codecs are defined as destructuring ciphertext
83+
84+
... *not* as yielding cleartext. This may be unintuitive, but is important.
85+
86+
First, an example: many encryption schemes have two components in their ciphertext:
87+
some sort of "initialization vector" (commonly known as an "IV"), which is a number unique to that ciphertext;
88+
and then the ciphertext body itself.
89+
So, for such an encryption scheme, the relevant IPLD codec would probably produce a _map_, matching this schema:
90+
91+
```ipldsch
92+
type CodecResult struct {
93+
iv Bytes
94+
body Bytes
95+
} representation map
96+
```
97+
98+
(The actual serial form may look like anything it wants (likely, some binary length-prefixed format),
99+
because that's the responsibility of the codec implementation to define;
100+
this small schema just describes the Data Model view we might expect to be yielded.)
101+
102+
This is neat in several ways:
103+
104+
- It means that processing the data into Data Model is always *defined* -- even if you don't have key material.
105+
- This in turn means IPLD Selectors, and all sorts of other stuff, *work normally* over encrypted data.
106+
(Not over the cleartext, obviously -- then the encryption wouldn't be doing much, would it?
107+
But their operation is *defined*, so they can be used safely and predictably.)
108+
- It means we have a way to access the ciphertext.
109+
- ... That may not sound like a big deal, but it's been a weird and interesting buggaboo in a lot of other previous proposals about how to fit encryption into IPLD.
110+
- It means we don't have to solve the problem of how to get key material into a codec.
111+
- This is a big deal because it means, well, a bunch of our abstraction layers in IPLD don't... uh, shatter. Good.
112+
113+
Okay, but how do we get to cleartext then? Let us proceed to the next section!
114+
115+
#### getting to cleartext when using encryption codecs involves feature detection
116+
117+
Encryption codecs in IPLD libraries will have extra methods on them, and support some kind of "feature detection" to advertise this.
118+
Those additional methods will accept key material as a parameter, and return an IPLD Data Model Node... of the *cleartext*.
119+
(E.g., the "node" returned here, and the "node" returned by the codec alone, will be *very* different data.)
120+
121+
How exactly this looks will vary by langauge and library implementation;
122+
different languages will have different idioms for doing feature detection.
123+
124+
125+
### key management is out of band
126+
127+
Keys still need to be supplied to the encrypt and decrypt methods of an encryption codec when they are used.
128+
This key supply and management is something that must be handled "out of band".
129+
130+
We don't have a total strategy for automatic application of keys in large graphs.
131+
And we probably won't, either.
132+
We expect that most applications using cryptography will have some key management strategy that is unique to them,
133+
and will probably _not_ want their IPLD library dictating anything about key management.
134+
(For example, many complex applications using cryptography may involve key derivation strategies,
135+
which can even be content or data-organization aware -- we cannot possibly specify such things in IPLD; we need to just accept instructions on that.)
136+
137+
IPLD will be open to future work on library functions for how to handle key management in practice.
138+
If we can find sufficiently common patterns, they may be worth library features.
139+
However, we should be comfortable understanding that there may actually not be single answers to key management,
140+
and the number of features relating to it that belong in IPLD libraries might be correspondingly minimal.
141+
142+
143+
### encryption codecs can be used recursively
144+
145+
TODO (this emerges fairly naturally but deserves comment and example)
146+
147+
148+
### limitations of this approach
149+
150+
#### double hashing
151+
152+
This approach is roughly "mac-then-encrypt-then-mac" (if you're from the era of crypto education which called things "MAC" rather than "MIC" (which would make much more sense (but, I digress))).
153+
154+
In other words: we hash things twice, and one of the hashes ends up in the output data body (because it's inside the ciphertext).
155+
156+
There's nothing wrong with this (it's certainly cryptographically sound!); it's just slightly excessive and does spend a few bytes.
157+
158+
159+
160+
Conventions and Usage Patterns around Encryption in IPLD
161+
---------------------------------------------------------
162+
163+
General notice: there are not single solutions to how to compose crypto systems.
164+
Many tradeoffs exist in design of applications using encryption.
165+
In some situations, metadata and size and access pattern concealment don't matter;
166+
in others, they're critically important, and an infinite amount of performance penalty is an acceptable trade.
167+
We can't make these decisions for applications.
168+
In this document, we'll limit our scope to talking about patterns that we've seen,
169+
and building some vocabulary around them, and sharing the ideas that seem to have good results.
170+
171+
172+
### Desirable traits
173+
174+
Some frequently identified desires when working with encrypted data include:
175+
176+
- ability to use "pinning" services without special integrations or disclosure of key material
177+
- ability to tersely identify subtrees, e.g. for purposes such as network transfer
178+
179+
These are things which are well-provided for when using IPLD without encryption,
180+
but require some additional design when using IPLD with encryption, since the link structure of documents is generally itself encrypted.
181+
182+
Mind: these goals are complicated: if they didn't require information that is _encrypted_, they wouldn't be worth special mention in the first place.
183+
It's very important to be sure you also consider the [cryptography caveats](#introduction-to-cryptography-caveats) when working with these goals.
184+
185+
186+
### Pattern: Cleartext Manifest over CIDs of Encrypted Data
187+
188+
Key concepts:
189+
190+
- All content is encrypted at block level (using the systems described in the [Encryption Codecs](#encryption-codecs) section) (so, we have a set of CIDs, all of which have a multicodec indicator that indicates some kind of encryption codec).
191+
- We still want to be able to pin the whole set, or fetch the whole set using one query.
192+
193+
The solution to this is pretty clear: we want some merkle tree of cleartext IPLD objects, and that tree should just link to the encrypted data CIDs.
194+
195+
#### manifest tree structure can be any form
196+
197+
An interesting trait of the manifest pattern is that to provide its key benefits --
198+
e.g. being able to refer to the whole set of data at once --
199+
it doesn't actually _matter_ exactly _what_ tree structure or layout algorithm is used.
200+
201+
HAMTs or Chunky Trees can both be used; or for small enough data, a plain map in a single block.
202+
Anything that reaches the goals works; there's little or no need to standardize on this.
203+
204+
205+
206+
introduction to cryptography caveats
207+
------------------------------------
208+
209+
Designing cryptographic systems is _tricky_ -- to put it mildly.
210+
211+
We can't always offer complete systems and complete guidance to cryptographic work.
212+
What we can do in IPLD is offer some components and, sometimes, some patterns of suggested use.
213+
How to put those things together (and do so safely) is still fundamentally the responsibility of the application developer.
214+
215+
We also can't provide a complete introduction and set of coursework on how to compose cryptographic systems!
216+
Those are educational resources you'll need to find access to elsewhere if you haven't gotten it already.
217+
218+
With all those caveats made, though, we'd like to offer a few pointers to topics you should at least be aware of.
219+
These topics are also especially relevant to the combination of encryption and IPLD because of how they involve tradeoffs
220+
(and, some of those tradeoffs are things that inform *why* we don't move certain kinds of features into IPLD specs -- it's because there's more than one way to go about it).
221+
222+
### access patterns of ciphertext can leak hints about the cleartext
223+
224+
### size of ciphertext may leak hints about the cleartext

0 commit comments

Comments
 (0)