11Here we talk about PII redaction rules.
2- Tentatively I have just called them "filters ".
2+ These are called "transforms ".
33
44## Proposed syntax
55
6- Each filter is made up of two parts: a matcher, and a transform .
6+ Each transform is made up of two parts: a matcher, and an action .
77
88** Matcher**
99
@@ -29,7 +29,7 @@ This configuration syntax was chosen mainly because of simplicity.
2929It is easy to explain, and to understand.
3030The thing that matches, is the thing that satisfies all conditions at the same
3131time.
32- Some pseudo code gets the idea across
32+ Some pseudo code gets the idea across (although it's not that simple)
3333```
3434def matched(matcher, method, path, jsonBody, queryParams, etc...):
3535 if matcher.pathPattern:
@@ -53,44 +53,23 @@ Look at the first code example.
5353It is currently ambiguous if we want to redact the path or the JSON body.
5454Common sense tells us it is the body, but that still leaves room for confusion.
5555What if both a query param and a json path is provided?
56- Here is a quick set of rules to make it work and still be common sensical
5756enough:
58- - We can split matches into "common fields" and "matching fields".
57+
58+ Therefore,
59+ - We split matches into "common fields" and "matching fields".
5960- There can only be one matching field. There can be many common fields.
60- - Common fields are like ` method ` , ` pathPattern ` , etc.
61- - Matching fields are like ` jsonPath ` , ` queryPath ` , etc.
62- - Common fields are things nobody really cares to redact. They're just used to
63- narrow down the search.
64- - Matching fields are things that will be redacted. Only one is allowed to be
65- present so that it's clear what is going to be modified.
66-
67- Available common fields:
68- - ` method `
69- - ` pathPattern `
70- - ` host ` (have to investigate what the SDK sees, ` 127.0.0.1 ` might not always
71- represent localhost for example)
72-
73- Available matching fields:
74- - ` jsonPath `
75- - ` queryPath `
76- - ` headerPath `
77-
78- Questions:
79- - Does this cover most (sensible) cases? Are most things matchable with this
80- syntax?
81- - Is this intuitive enough?
82-
83-
84- ** Transforms**
85-
86- Transforms specify how to mutate the span.
87- This is relatively simple.
88- There only need to be three kinds of transforms:
61+ - Common fields are fields you don't redact, and only use for matching, like ` method ` , ` pathPattern ` , etc.
62+ - Matching fields are the other fields like ` jsonPath ` , ` queryPath ` , etc. Only
63+ one is present so that it's clear what is going to be modified.
64+
65+ ** Actions**
66+
67+ Actions specify how to mutate the span.
89681 . Redact. This replaces the value with a hash.
90692 . Masking. This replaces the value with a repeated (or random) character. This
9170 satisfies the use case of fixed width strings (phone numbers, zip codes,
9271etc.)
93- 3 . Custom . User specifies a string to act as replacement. This can be used for
72+ 3 . Replace . User specifies a string to act as replacement. This can be used for
9473 things like testing token, etc.
95744 . Drop. Just drops the whole span.
9675
@@ -102,7 +81,7 @@ An example config:
10281 "method": "POST",
10382 "jsonPath": "$.user.password"
10483 },
105- "transform ": {
84+ "action ": {
10685 "type": "redact"
10786 }
10887}
@@ -111,26 +90,42 @@ An example config:
11190## Configuration
11291
11392Each instrumentation will have their own configuration.
114- The user will specify this during ` initialize ` under the ` <xyz>Config ` field,
115- where xyz is the instrumentation module they're configuring.
116- For example, ` httpConfig ` holds stuff regarding the HTTP instrumentation (not
117- necessarily just filters!).
118- The big class then just injects the configurations to the right instrumentation
119- module.
120-
121- One idea I had is, since it's all in code anyway, we could just have users
122- provide functions that we call.
123- That's great iff we don't want to be able to serialize configs.
124- In other words, we can consider doing it this way only if
125- - We don't really need users to provide a config file, just do it in code
126- - We don't really need to save the config file (for example to share with other
127- services)
128- Which actually we can get by with?
129-
130- ## Refined Transform Configuration Format
93+ This would ideally sit in the config.yaml, though its also possible to supply it
94+ as an argument to ` initialize() ` .
13195
13296``` typescript
133-
97+ // During SDK initialization
98+ TuskDrift .initialize ({
99+ httpConfig: {
100+ filters: [
101+ // Inbound request filters
102+ {
103+ matcher: {
104+ direction: " inbound" ,
105+ method: " POST" ,
106+ pathPattern: " /api/auth/*" ,
107+ jsonPath: " $.password"
108+ },
109+ action: { type: " redact" }
110+ }
111+ ]
112+ },
113+ pgConfig: {
114+ filters: [
115+ // Database query filters
116+ {
117+ matcher: {
118+ direction: " outbound" ,
119+ jsonPath: " $.query"
120+ },
121+ action: {
122+ type: " replace" ,
123+ replaceWith: " SELECT * FROM users WHERE id = ?"
124+ }
125+ }
126+ ]
127+ }
128+ });
134129```
135130
136131## Examples
@@ -146,7 +141,7 @@ Which actually we can get by with?
146141 "pathPattern" : " /api/auth/login" ,
147142 "jsonPath" : " $.password"
148143 },
149- "transform " : {
144+ "action " : {
150145 "type" : " redact"
151146 }
152147}
@@ -163,16 +158,15 @@ Which actually we can get by with?
163158 "pathPattern" : " /api/user/lookup" ,
164159 "queryParam" : " ssn"
165160 },
166- "transform " : {
161+ "action " : {
167162 "type" : " mask" ,
168163 "maskChar" : " X" ,
169- "preserveLength" : true
170164 }
171165}
172166```
173167
174168** Before** : ` /api/user/lookup?ssn=123-45-6789&name=john `
175- ** After** : ` /api/user/lookup?ssn=XXX-XX-XXXX &name=john `
169+ ** After** : ` /api/user/lookup?ssn=XXXXXXXXXXX &name=john `
176170
177171#### Example: Replace
178172``` json
@@ -181,7 +175,7 @@ Which actually we can get by with?
181175 "direction" : " inbound" ,
182176 "headerName" : " Authorization"
183177 },
184- "transform " : {
178+ "action " : {
185179 "type" : " replace" ,
186180 "replaceWith" : " Bearer test-token-12345"
187181 }
@@ -198,7 +192,7 @@ Which actually we can get by with?
198192 "direction" : " inbound" ,
199193 "pathPattern" : " /admin/internal/*"
200194 },
201- "transform " : {
195+ "action " : {
202196 "type" : " drop"
203197 }
204198}
@@ -217,7 +211,7 @@ Which actually we can get by with?
217211 "pathPattern" : " /api/user/profile" ,
218212 "jsonPath" : " $.data.creditCard"
219213 },
220- "transform " : {
214+ "action " : {
221215 "type" : " redact"
222216 }
223217}
@@ -234,10 +228,9 @@ Which actually we can get by with?
234228 "pathPattern" : " /api/users" ,
235229 "jsonPath" : " $.users[*].phone"
236230 },
237- "transform " : {
231+ "action " : {
238232 "type" : " mask" ,
239233 "maskChar" : " *" ,
240- "preserveLength" : false
241234 }
242235}
243236```
@@ -255,7 +248,7 @@ Which actually we can get by with?
255248 "host" : " api.stripe.com" ,
256249 "headerName" : " Authorization"
257250 },
258- "transform " : {
251+ "action " : {
259252 "type" : " redact"
260253 },
261254 "description" : " Redact Stripe API keys"
@@ -274,16 +267,15 @@ Which actually we can get by with?
274267 "method" : " POST" ,
275268 "jsonPath" : " $.customer.creditCard.number"
276269 },
277- "transform " : {
270+ "action " : {
278271 "type" : " mask" ,
279272 "maskChar" : " *" ,
280- "preserveLength" : true
281273 }
282274}
283275```
284276
285277** Before** : ` {"customer": {"creditCard": {"number": "4111111111111111"}}} `
286- ** After** : ` {"customer": {"creditCard": {"number": "************1111 "}}} `
278+ ** After** : ` {"customer": {"creditCard": {"number": "**************** "}}} `
287279
288280#### Example: Replace
289281``` json
@@ -293,7 +285,7 @@ Which actually we can get by with?
293285 "host" : " database.internal.com" ,
294286 "jsonPath" : " $.auth.password"
295287 },
296- "transform " : {
288+ "action " : {
297289 "type" : " replace" ,
298290 "replaceWith" : " test-db-password"
299291 }
@@ -313,7 +305,7 @@ Which actually we can get by with?
313305 "host" : " api.external-service.com" ,
314306 "jsonPath" : " $.users[*].email"
315307 },
316- "transform " : {
308+ "action " : {
317309 "type" : " redact"
318310 }
319311}
@@ -330,58 +322,31 @@ Which actually we can get by with?
330322 "host" : " api.bank.com" ,
331323 "jsonPath" : " $.accounts[*].accountNumber"
332324 },
333- "transform " : {
325+ "action " : {
334326 "type" : " mask" ,
335327 "maskChar" : " X" ,
336- "preserveLength" : false
337328 }
338329}
339330```
340331
341332** Before** : ` {"accounts": [{"type": "checking", "accountNumber": "1234567890123456"}]} `
342- ** After** : ` {"accounts": [{"type": "checking", "accountNumber": "XXXXXXXXXXXX3456"}]} `
343-
344- ## Configuration Integration
345-
346- We can just set it up like this at the start.
347-
348- ``` typescript
349- // During SDK initialization
350- TuskDrift .initialize ({
351- httpConfig: {
352- filters: [
353- // Inbound request filters
354- {
355- matcher: {
356- direction: " inbound" ,
357- method: " POST" ,
358- pathPattern: " /api/auth/*" ,
359- jsonPath: " $.password"
360- },
361- transform: { type: " redact" }
362- }
363- ]
364- },
365- pgConfig: {
366- filters: [
367- // Database query filters
368- {
369- matcher: {
370- direction: " outbound" ,
371- jsonPath: " $.query"
372- },
373- transform: {
374- type: " replace" ,
375- replaceWith: " SELECT * FROM users WHERE id = ?"
376- }
377- }
378- ]
379- }
380- });
381- ```
382-
383- We could choose to keep this configuration method in the future.
384- Or, we can migrate to a JSON file most likely.
333+ ** After** : ` {"accounts": [{"type": "checking", "accountNumber": "XXXXXXXXXXXXXXXX"}]} `
334+
335+ # Implementation details
336+
337+ Each instrumentation will get its own transform engine, since each different
338+ package will probably have its own input output formats, and hence transform
339+ matchers and actions, and therefore configurations and processing strategies.
340+
341+ Furthermore we can't just apply transforms at export time because the spans and
342+ traces might be exported in different batches.
343+ Hence we have to do this during span creation.
344+ Right now this is mainly for span dropping, because outbound spans should be
345+ marked as dropped instead of disappearing, so that tests can still run for the
346+ parent trace.
347+ However incoming traces will be dropped entirely, since there's no way to test
348+ further.
349+ Kind of like opting a whole endpoint out of recording.
385350
386351# Some things to think about
387352
0 commit comments