Implement SASLprep (RFC 4013) for AES-256 password normalization

pypdf fails to decrypt AES-256 encrypted PDFs when the password contains unicode 
characters, even if the password is correct. 

There is a TODO in pypdf/_encryption.py at line 1009 inside verify_v5:

-TODO: use SASLprep process

The PDF specification for AES-256 (Revision 5/6) requires passwords to be normalized 
using SASLprep (RFC 4013) before UTF-8 encoding. Currently _encode_password tries 
latin-1 first then falls back to utf-8 without any SASLprep normalization, so the 
byte representation does not match what a spec-compliant PDF creator produced.

I tried to reproduce this and confirmed it fails. i've attached the script i used :

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
Linux-6.19.14-200.fc43.x86_64-x86_64-with-glibc2.42

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.11.0, crypt_provider=('cryptography', '47.0.0'), PIL=12.2.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
import pikepdf
from pypdf import PdfReader, PasswordType

pdf = pikepdf.Pdf.new()
pdf.save("sample.pdf")

with pikepdf.open("sample.pdf") as pdf:
    pdf.save(
        "encrypted_unicode.pdf",
        encryption=pikepdf.Encryption(
            owner="owner",
            user="pässwört",
            R=6
        )
    )

reader = PdfReader("encrypted_unicode.pdf")
result = reader.decrypt("pässwört")
print(result)
# Output:  0 (PasswordType.NOT_DECRYPTED) , produced this with current script
# Expected: 1 (PasswordType.USER_PASSWORD) , produced this result when user="password"
```
## Possible Fix

The fix would likely go in _encode_password , normalize the password using
SASLprep before UTF-8 encoding, but only for AES-256 encryption revisions
(R=5/6).

Python's standard library exposes stringprep tables but does not provide a
complete RFC 4013 SASLprep implementation.

Before working on it, I wanted to ask which direction would be preferred:

- adding a lightweight dependency such as saslprep
- or implementing a minimal RFC 4013-compatible normalization layer maybe using
  unicodedata + stringprep to keep pypdf dependency-free

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SASLprep (RFC 4013) for AES-256 password normalization #3777

Environment

Code + PDF

Possible Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement SASLprep (RFC 4013) for AES-256 password normalization #3777

Description

Environment

Code + PDF

Possible Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions