Skip to content

Commit 2cb6e5d

Browse files
icanhasmathclaude
andcommitted
Harden header/host control-char validation: unicode values + trailing newline
Two defense-in-depth gaps in the 2.7.18.14 header-injection backports (CVE-2024-6923 / control-char cluster), surfaced in review of PR #85: - wsgiref.headers._check_string only checked `str`, so a `unicode` header name/value carrying control characters bypassed the guard and could still be serialized. Check `basestring` instead. - email.generator.NEWLINE_WITHOUT_FWSP used consuming character classes ([^ \t]) that require a following character, so a value ending in a bare CR/LF/CRLF was not rejected -- the generator then appends its own newline, prematurely terminating the header block. Add an end-of-string anchor so a trailing bare CR/LF is caught too, and drop the redundant `\r\n` branch (a CRLF that is not a valid fold is already matched by the lone-LF branch or the end anchor) -- per PR #86 review. Valid CRLF folding is still permitted. Regression tests extended in test_wsgiref (unicode names/values) and test_email (trailing-newline values). A third review comment, about urlparse._check_bracketed_host leaking TypeError from socket.inet_pton on unicode hosts, does not reproduce: CPython 2.7's "s" arg parser encodes unicode, so invalid hosts raise UnicodeEncodeError (a ValueError subclass) or socket.error, both already handled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent fdf0d24 commit 2cb6e5d

4 files changed

Lines changed: 22 additions & 5 deletions

File tree

Lib/email/generator.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,11 @@
1616
from email.errors import HeaderWriteError
1717

1818
# Matches a CR/LF that is NOT part of a valid header folding (i.e. not
19-
# immediately followed by folding whitespace). Used to detect injected
20-
# newlines in generated headers (CVE-2024-6923).
21-
NEWLINE_WITHOUT_FWSP = re.compile(r'\r\n[^ \t]|\r[^ \n\t]|\n[^ \t]')
19+
# immediately followed by folding whitespace), including a bare CR/LF at the
20+
# end of the string. Used to detect injected newlines in generated headers
21+
# (CVE-2024-6923). A '\r\n' need not be matched on its own: a CRLF that isn't
22+
# a valid fold is already caught by the lone-LF branch or the end anchor.
23+
NEWLINE_WITHOUT_FWSP = re.compile(r'\r[^ \n\t]|\n[^ \t]|[\r\n]$')
2224

2325
UNDERSCORE = '_'
2426
NL = '\n'

Lib/email/test/test_email.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,13 @@ def test_string_rejects_header_injection(self):
8787
from cStringIO import StringIO
8888
for bad in ('value\r\nInjected: header',
8989
'value\nInjected: header',
90-
'value\rstuff'):
90+
'value\rstuff',
91+
# A bare CR/LF/CRLF at end-of-string is also an injected
92+
# newline: the generator appends its own newline after it,
93+
# prematurely terminating the header block.
94+
'value\n',
95+
'value\r',
96+
'value\r\n'):
9197
msg = Message()
9298
msg['Subject'] = bad
9399
g = Generator(StringIO(), maxheaderlen=0)

Lib/test/test_wsgiref.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,8 +309,14 @@ def testControlCharactersRejected(self):
309309
self.assertRaises(ValueError, h.add_header, 'Foo', 'a\nb')
310310
self.assertRaises(ValueError, h.add_header, 'Foo', 'ok', baz='x\ry')
311311
self.assertRaises(ValueError, Headers, [('Foo', 'a\nb')])
312+
# unicode names/values must be rejected too (a str-only check would
313+
# let a unicode value carrying control characters slip through).
314+
self.assertRaises(ValueError, h.__setitem__, 'Foo', u'bar\r\nInjected: 1')
315+
self.assertRaises(ValueError, h.__setitem__, u'Ba\nd', 'value')
316+
self.assertRaises(ValueError, Headers, [(u'Foo', u'a\nb')])
312317
# Benign headers still work.
313318
h['Foo'] = 'bar'
319+
h[u'Bar'] = u'baz'
314320
h.add_header('Content-Disposition', 'attachment', filename='ok.txt')
315321

316322
def testMappingInterface(self):

Lib/wsgiref/headers.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,10 @@
1818

1919
def _check_string(value):
2020
"""Reject header names/values containing control characters."""
21-
if isinstance(value, str) and _control_chars_re.search(value):
21+
# Check both str and unicode: in Python 2 a unicode header value would
22+
# otherwise bypass the guard and could still be serialized, leaving the
23+
# response-splitting protection incomplete.
24+
if isinstance(value, basestring) and _control_chars_re.search(value):
2225
raise ValueError("Control characters not allowed in headers")
2326
return value
2427

0 commit comments

Comments
 (0)