Skip to content

Commit 2d544fd

Browse files
committed
feat: accept non-normalized encoding labels in legacyHookDecode
1 parent 055e0b6 commit 2d544fd

3 files changed

Lines changed: 8 additions & 5 deletions

File tree

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -227,15 +227,14 @@ Given a `TypedArray` or an `ArrayBuffer` instance `input`, returns either of:
227227
Implements [decode](https://encoding.spec.whatwg.org/#decode) legacy hook.
228228

229229
Given a `TypedArray` or an `ArrayBuffer` instance `input` and an optional `fallbackEncoding`
230-
normalized encoding name, sniffs encoding from BOM with `fallbackEncoding` fallback and then
230+
encoding [label](https://encoding.spec.whatwg.org/#names-and-labels),
231+
sniffs encoding from BOM with `fallbackEncoding` fallback and then
231232
decodes the `input` using that encoding, skipping BOM if it was present.
232233

233234
Notes:
234235

235236
* BOM-sniffed encoding takes precedence over `fallbackEncoding` option per spec.
236237
Use with care.
237-
* `fallbackEncoding` must be ASCII-lowercased encoding name,
238-
e.g. a result of `normalizeEncoding(label)` call.
239238
* Always operates in non-fatal [mode](https://encoding.spec.whatwg.org/#textdecoder-error-mode),
240239
aka replacement. It can convert different byte sequences to equal strings.
241240

fallback/encoding.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,13 +256,13 @@ export function getBOMEncoding(input) {
256256
// https://encoding.spec.whatwg.org/#decode
257257
// Warning: encoding sniffed from BOM takes preference over the supplied one
258258
// Warning: lossy, performs replacement, no option of throwing
259-
// Expects normalized (lower-case) encoding as input. Completely ignores it and even skips validation when BOM is found
259+
// Completely ignores encoding and even skips validation when BOM is found
260260
// Unlike TextDecoder public API, additionally supports 'replacement' encoding
261261
export function legacyHookDecode(input, fallbackEncoding = 'utf-8') {
262262
let u8 = fromSource(input)
263263
const bomEncoding = getBOMEncoding(u8)
264264
if (bomEncoding) u8 = u8.subarray(bomEncoding === 'utf-8' ? 3 : 2)
265-
const enc = bomEncoding ?? fallbackEncoding // "the byte order mark is more authoritative than anything else"
265+
const enc = bomEncoding ?? normalizeEncoding(fallbackEncoding) // "the byte order mark is more authoritative than anything else"
266266

267267
if (enc === 'utf-8') return utf8toStringLoose(u8)
268268
if (enc === 'utf-16le' || enc === 'utf-16be') {

tests/encoding/generic.test.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,10 @@ describe('legacyHookDecode', () => {
111111
['feffd70020', '\uD700\uFFFD'],
112112
['feffd80820', '\uFFFD\uFFFD'],
113113
],
114+
// non-normalized names
115+
Utf8: [['c280', '\x80']],
116+
unicodefeff: [['c280', '\u80C2']],
117+
UnicodeFFFE: [['c280', '\uC280']],
114118
}
115119

116120
test('null encoding', (t) => {

0 commit comments

Comments
 (0)