Skip to content

Commit f832394

Browse files
authored
Fetch transforms (#24)
* add fetch transform engine * fix lots of lint errors * fix deps * fix test
1 parent bd94b5c commit f832394

26 files changed

Lines changed: 1576 additions & 234 deletions

docs/Transforms.md

Lines changed: 79 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
Here we talk about PII redaction rules.
2-
Tentatively I have just called them "filters".
2+
These are called "transforms".
33

44
## Proposed syntax
55

6-
Each filter is made up of two parts: a matcher, and a transform.
6+
Each transform is made up of two parts: a matcher, and an action.
77

88
**Matcher**
99

@@ -29,7 +29,7 @@ This configuration syntax was chosen mainly because of simplicity.
2929
It is easy to explain, and to understand.
3030
The thing that matches, is the thing that satisfies all conditions at the same
3131
time.
32-
Some pseudo code gets the idea across
32+
Some pseudo code gets the idea across (although it's not that simple)
3333
```
3434
def matched(matcher, method, path, jsonBody, queryParams, etc...):
3535
if matcher.pathPattern:
@@ -53,44 +53,23 @@ Look at the first code example.
5353
It is currently ambiguous if we want to redact the path or the JSON body.
5454
Common sense tells us it is the body, but that still leaves room for confusion.
5555
What if both a query param and a json path is provided?
56-
Here is a quick set of rules to make it work and still be common sensical
5756
enough:
58-
- We can split matches into "common fields" and "matching fields".
57+
58+
Therefore,
59+
- We split matches into "common fields" and "matching fields".
5960
- There can only be one matching field. There can be many common fields.
60-
- Common fields are like `method`, `pathPattern`, etc.
61-
- Matching fields are like `jsonPath`, `queryPath`, etc.
62-
- Common fields are things nobody really cares to redact. They're just used to
63-
narrow down the search.
64-
- Matching fields are things that will be redacted. Only one is allowed to be
65-
present so that it's clear what is going to be modified.
66-
67-
Available common fields:
68-
- `method`
69-
- `pathPattern`
70-
- `host` (have to investigate what the SDK sees, `127.0.0.1` might not always
71-
represent localhost for example)
72-
73-
Available matching fields:
74-
- `jsonPath`
75-
- `queryPath`
76-
- `headerPath`
77-
78-
Questions:
79-
- Does this cover most (sensible) cases? Are most things matchable with this
80-
syntax?
81-
- Is this intuitive enough?
82-
83-
84-
**Transforms**
85-
86-
Transforms specify how to mutate the span.
87-
This is relatively simple.
88-
There only need to be three kinds of transforms:
61+
- Common fields are fields you don't redact, and only use for matching, like `method`, `pathPattern`, etc.
62+
- Matching fields are the other fields like `jsonPath`, `queryPath`, etc. Only
63+
one is present so that it's clear what is going to be modified.
64+
65+
**Actions**
66+
67+
Actions specify how to mutate the span.
8968
1. Redact. This replaces the value with a hash.
9069
2. Masking. This replaces the value with a repeated (or random) character. This
9170
satisfies the use case of fixed width strings (phone numbers, zip codes,
9271
etc.)
93-
3. Custom. User specifies a string to act as replacement. This can be used for
72+
3. Replace. User specifies a string to act as replacement. This can be used for
9473
things like testing token, etc.
9574
4. Drop. Just drops the whole span.
9675

@@ -102,7 +81,7 @@ An example config:
10281
"method": "POST",
10382
"jsonPath": "$.user.password"
10483
},
105-
"transform": {
84+
"action": {
10685
"type": "redact"
10786
}
10887
}
@@ -111,26 +90,42 @@ An example config:
11190
## Configuration
11291

11392
Each instrumentation will have their own configuration.
114-
The user will specify this during `initialize` under the `<xyz>Config` field,
115-
where xyz is the instrumentation module they're configuring.
116-
For example, `httpConfig` holds stuff regarding the HTTP instrumentation (not
117-
necessarily just filters!).
118-
The big class then just injects the configurations to the right instrumentation
119-
module.
120-
121-
One idea I had is, since it's all in code anyway, we could just have users
122-
provide functions that we call.
123-
That's great iff we don't want to be able to serialize configs.
124-
In other words, we can consider doing it this way only if
125-
- We don't really need users to provide a config file, just do it in code
126-
- We don't really need to save the config file (for example to share with other
127-
services)
128-
Which actually we can get by with?
129-
130-
## Refined Transform Configuration Format
93+
This would ideally sit in the config.yaml, though its also possible to supply it
94+
as an argument to `initialize()`.
13195

13296
```typescript
133-
97+
// During SDK initialization
98+
TuskDrift.initialize({
99+
httpConfig: {
100+
filters: [
101+
// Inbound request filters
102+
{
103+
matcher: {
104+
direction: "inbound",
105+
method: "POST",
106+
pathPattern: "/api/auth/*",
107+
jsonPath: "$.password"
108+
},
109+
action: { type: "redact" }
110+
}
111+
]
112+
},
113+
pgConfig: {
114+
filters: [
115+
// Database query filters
116+
{
117+
matcher: {
118+
direction: "outbound",
119+
jsonPath: "$.query"
120+
},
121+
action: {
122+
type: "replace",
123+
replaceWith: "SELECT * FROM users WHERE id = ?"
124+
}
125+
}
126+
]
127+
}
128+
});
134129
```
135130

136131
## Examples
@@ -146,7 +141,7 @@ Which actually we can get by with?
146141
"pathPattern": "/api/auth/login",
147142
"jsonPath": "$.password"
148143
},
149-
"transform": {
144+
"action": {
150145
"type": "redact"
151146
}
152147
}
@@ -163,16 +158,15 @@ Which actually we can get by with?
163158
"pathPattern": "/api/user/lookup",
164159
"queryParam": "ssn"
165160
},
166-
"transform": {
161+
"action": {
167162
"type": "mask",
168163
"maskChar": "X",
169-
"preserveLength": true
170164
}
171165
}
172166
```
173167

174168
**Before**: `/api/user/lookup?ssn=123-45-6789&name=john`
175-
**After**: `/api/user/lookup?ssn=XXX-XX-XXXX&name=john`
169+
**After**: `/api/user/lookup?ssn=XXXXXXXXXXX&name=john`
176170

177171
#### Example: Replace
178172
```json
@@ -181,7 +175,7 @@ Which actually we can get by with?
181175
"direction": "inbound",
182176
"headerName": "Authorization"
183177
},
184-
"transform": {
178+
"action": {
185179
"type": "replace",
186180
"replaceWith": "Bearer test-token-12345"
187181
}
@@ -198,7 +192,7 @@ Which actually we can get by with?
198192
"direction": "inbound",
199193
"pathPattern": "/admin/internal/*"
200194
},
201-
"transform": {
195+
"action": {
202196
"type": "drop"
203197
}
204198
}
@@ -217,7 +211,7 @@ Which actually we can get by with?
217211
"pathPattern": "/api/user/profile",
218212
"jsonPath": "$.data.creditCard"
219213
},
220-
"transform": {
214+
"action": {
221215
"type": "redact"
222216
}
223217
}
@@ -234,10 +228,9 @@ Which actually we can get by with?
234228
"pathPattern": "/api/users",
235229
"jsonPath": "$.users[*].phone"
236230
},
237-
"transform": {
231+
"action": {
238232
"type": "mask",
239233
"maskChar": "*",
240-
"preserveLength": false
241234
}
242235
}
243236
```
@@ -255,7 +248,7 @@ Which actually we can get by with?
255248
"host": "api.stripe.com",
256249
"headerName": "Authorization"
257250
},
258-
"transform": {
251+
"action": {
259252
"type": "redact"
260253
},
261254
"description": "Redact Stripe API keys"
@@ -274,16 +267,15 @@ Which actually we can get by with?
274267
"method": "POST",
275268
"jsonPath": "$.customer.creditCard.number"
276269
},
277-
"transform": {
270+
"action": {
278271
"type": "mask",
279272
"maskChar": "*",
280-
"preserveLength": true
281273
}
282274
}
283275
```
284276

285277
**Before**: `{"customer": {"creditCard": {"number": "4111111111111111"}}}`
286-
**After**: `{"customer": {"creditCard": {"number": "************1111"}}}`
278+
**After**: `{"customer": {"creditCard": {"number": "****************"}}}`
287279

288280
#### Example: Replace
289281
```json
@@ -293,7 +285,7 @@ Which actually we can get by with?
293285
"host": "database.internal.com",
294286
"jsonPath": "$.auth.password"
295287
},
296-
"transform": {
288+
"action": {
297289
"type": "replace",
298290
"replaceWith": "test-db-password"
299291
}
@@ -313,7 +305,7 @@ Which actually we can get by with?
313305
"host": "api.external-service.com",
314306
"jsonPath": "$.users[*].email"
315307
},
316-
"transform": {
308+
"action": {
317309
"type": "redact"
318310
}
319311
}
@@ -330,58 +322,31 @@ Which actually we can get by with?
330322
"host": "api.bank.com",
331323
"jsonPath": "$.accounts[*].accountNumber"
332324
},
333-
"transform": {
325+
"action": {
334326
"type": "mask",
335327
"maskChar": "X",
336-
"preserveLength": false
337328
}
338329
}
339330
```
340331

341332
**Before**: `{"accounts": [{"type": "checking", "accountNumber": "1234567890123456"}]}`
342-
**After**: `{"accounts": [{"type": "checking", "accountNumber": "XXXXXXXXXXXX3456"}]}`
343-
344-
## Configuration Integration
345-
346-
We can just set it up like this at the start.
347-
348-
```typescript
349-
// During SDK initialization
350-
TuskDrift.initialize({
351-
httpConfig: {
352-
filters: [
353-
// Inbound request filters
354-
{
355-
matcher: {
356-
direction: "inbound",
357-
method: "POST",
358-
pathPattern: "/api/auth/*",
359-
jsonPath: "$.password"
360-
},
361-
transform: { type: "redact" }
362-
}
363-
]
364-
},
365-
pgConfig: {
366-
filters: [
367-
// Database query filters
368-
{
369-
matcher: {
370-
direction: "outbound",
371-
jsonPath: "$.query"
372-
},
373-
transform: {
374-
type: "replace",
375-
replaceWith: "SELECT * FROM users WHERE id = ?"
376-
}
377-
}
378-
]
379-
}
380-
});
381-
```
382-
383-
We could choose to keep this configuration method in the future.
384-
Or, we can migrate to a JSON file most likely.
333+
**After**: `{"accounts": [{"type": "checking", "accountNumber": "XXXXXXXXXXXXXXXX"}]}`
334+
335+
# Implementation details
336+
337+
Each instrumentation will get its own transform engine, since each different
338+
package will probably have its own input output formats, and hence transform
339+
matchers and actions, and therefore configurations and processing strategies.
340+
341+
Furthermore we can't just apply transforms at export time because the spans and
342+
traces might be exported in different batches.
343+
Hence we have to do this during span creation.
344+
Right now this is mainly for span dropping, because outbound spans should be
345+
marked as dropped instead of disappearing, so that tests can still run for the
346+
parent trace.
347+
However incoming traces will be dropped entirely, since there's no way to test
348+
further.
349+
Kind of like opting a whole endpoint out of recording.
385350

386351
# Some things to think about
387352

package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@
3434
"clean": "rm -rf dist",
3535
"lint": "eslint 'src/**/*.ts'",
3636
"lint:fix": "eslint 'src/**/*.ts' --fix",
37-
"test": "ava --workerThreads false"
37+
"test": "ava --workerThreads false",
38+
"test:docker": "docker compose -f docker-compose.test.yml up -d --wait"
3839
},
3940
"keywords": [
4041
"instrumentation",

src/core/TuskDrift.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ import {
3232
TuskConfig,
3333
OriginalGlobalUtils,
3434
} from "./utils";
35-
import { TransformConfigs } from "../instrumentation/libraries/http/HttpTransformEngine";
35+
import { TransformConfigs } from "../instrumentation/libraries/types";
3636

3737
export interface InitParams {
3838
apiKey?: string;
@@ -168,6 +168,7 @@ export class TuskDriftCore {
168168
new FetchInstrumentation({
169169
enabled: true,
170170
mode: this.mode,
171+
transforms,
171172
});
172173

173174
new TcpInstrumentation({

src/core/utils/configUtils.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import fs from "fs";
22
import path from "path";
33
import yaml from "js-yaml";
44
import { logger } from "./logger";
5-
import { TransformConfigs } from "src/instrumentation/libraries/http/HttpTransformEngine";
5+
import { TransformConfigs } from "src/instrumentation/libraries/types";
66

77
export interface TuskConfig {
88
service?: {

0 commit comments

Comments
 (0)