Skip to content

UTF-8 encoder allows to encode codepoints in range #xD800 - #xDFFF #47

@Gleefre

Description

@Gleefre

Such code-points do not represent unicode characters.
This also breaks the non-ambiguity of :utf-8 encoding:

(babel:string-to-octets (string (code-char #xd800)))
; => #(237 160 128)
(babel:octets-to-string *)
; Evaluation aborted on #<BABEL-ENCODINGS:CHARACTER-OUT-OF-RANGE {10053D9533}>.

For example sbcl throws an error in such case:

(sb-ext:string-to-octets (string (code-char #xd800)))
; Evaluation aborted on #<SB-IMPL::OCTETS-ENCODING-ERROR {10013BEA23}>.

This seems to affect some other utf/ucs encodings as well (like :utf-16be or :utf-16le).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions