-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix(core): Sanitize lone surrogates in log body and attributes #20245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 6 commits
1274120
b3fc9db
cae61cd
3572bda
fdfa633
cf4aa9d
5394c90
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| import type { Attributes } from '../attributes'; | ||
| import { serializeAttributes } from '../attributes'; | ||
| import { getGlobalSingleton } from '../carrier'; | ||
| import type { Client } from '../client'; | ||
|
|
@@ -161,14 +162,14 @@ | |
| const serializedLog: SerializedLog = { | ||
| timestamp, | ||
| level, | ||
| body: message, | ||
| body: _INTERNAL_removeLoneSurrogates(String(message)), | ||
| trace_id: traceContext?.trace_id, | ||
| severity_number: severityNumber ?? SEVERITY_TEXT_TO_SEVERITY_NUMBER[level], | ||
| attributes: { | ||
| attributes: sanitizeLogAttributes({ | ||
| ...serializeAttributes(scopeAttributes), | ||
| ...serializeAttributes(logAttributes, true), | ||
| [sequenceAttr.key]: sequenceAttr.value, | ||
| }, | ||
| }), | ||
| }; | ||
|
|
||
| captureSerializedLog(client, serializedLog); | ||
|
|
@@ -220,3 +221,41 @@ | |
| // The reference to the Client <> LogBuffer map is stored on the carrier to ensure it's always the same | ||
| return getGlobalSingleton('clientToLogBufferMap', () => new WeakMap<Client, Array<SerializedLog>>()); | ||
| } | ||
|
|
||
| /** | ||
| * Sanitizes serialized log attributes by replacing lone surrogates in both | ||
| * keys and string values with U+FFFD. | ||
| */ | ||
| function sanitizeLogAttributes(attributes: Attributes): Attributes { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. m: just thinking out loud here but wdyt about doing this within
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's good feedback @Lms24 👍 I hesitated a bit to make a wider change in the scope of this PR to avoid possible side effects and limited the scope just to logs which I guess are more vulnerable to free-form user/developer text input. Looking at it again I think it makes sense to do this in |
||
| const sanitized: Attributes = {}; | ||
| for (const [key, attr] of Object.entries(attributes)) { | ||
| const sanitizedKey = _INTERNAL_removeLoneSurrogates(key); | ||
| if (attr.type === 'string') { | ||
| sanitized[sanitizedKey] = { ...attr, value: _INTERNAL_removeLoneSurrogates(attr.value as string) }; | ||
| } else { | ||
| sanitized[sanitizedKey] = attr; | ||
| } | ||
| } | ||
| return sanitized; | ||
| } | ||
|
|
||
| /** | ||
| * Replaces unpaired UTF-16 surrogates with U+FFFD (replacement character). | ||
| * | ||
| * Lone surrogates (U+D800–U+DFFF not part of a valid pair) cause `serde_json` | ||
| * on the server to reject the entire log/span batch when they appear in | ||
| * JSON-escaped form (e.g. `\uD800`). Replacing them at the SDK level ensures | ||
| * only the offending characters are lost instead of the whole payload. | ||
| * | ||
| * Uses the native `String.prototype.toWellFormed()` when available | ||
| * (Node 20+, Chrome 111+, Safari 15.4+, Firefox 119+, Hermes). | ||
| * On older runtimes without native support, returns the string as-is. | ||
| */ | ||
| export function _INTERNAL_removeLoneSurrogates(str: string): string { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm given this is quite a lot of code and hence a bundle size impact, have we considered alternatives?
If both options are a "No", that's fine, too. Just want to make sure we have our bases covered.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, handling this in Relay makes a lot more sense. The SDK side fix is probably more of an extra check if a relay fix is deployed. I'll check what would this involve.
I think it is supported. I'll further look into this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Lms24 This is now ready for another pass 🤞 |
||
| // isWellFormed/toWellFormed are ES2024 (not in our TS lib target), so we feature-detect via Object(). | ||
| const strObj: Record<string, unknown> = Object(str); | ||
| if (typeof strObj['isWellFormed'] === 'function') { | ||
| return (strObj['isWellFormed'] as () => boolean)() ? str : (strObj['toWellFormed'] as () => string)(); | ||
| } | ||
| return str; | ||
antonis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.