Fix charset decoding for server.request.body.files_content in commons-fileupload#11212
Conversation
…ileItemContentReader
… reuse across integrations
…Test Charset fallback scenarios are covered by MultipartContentDecoderTest. One integration test is kept to verify that getContentType() is passed through.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 63 metrics, 8 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064236
Total [baseline] (8.825 s) : 0, 8825491
Agent [candidate] (1.064 s) : 0, 1064393
Total [candidate] (8.837 s) : 0, 8837398
section iast
Agent [baseline] (1.241 s) : 0, 1241349
Total [baseline] (9.528 s) : 0, 9527842
Agent [candidate] (1.248 s) : 0, 1247917
Total [candidate] (9.545 s) : 0, 9545178
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.232 ms) : 0, 1232
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (636.228 ms) : 0, 636228
BytebuddyAgent [candidate] (635.815 ms) : 0, 635815
AgentMeter [baseline] (29.472 ms) : 0, 29472
AgentMeter [candidate] (29.47 ms) : 0, 29470
GlobalTracer [baseline] (249.048 ms) : 0, 249048
GlobalTracer [candidate] (249.158 ms) : 0, 249158
AppSec [baseline] (32.761 ms) : 0, 32761
AppSec [candidate] (32.626 ms) : 0, 32626
Debugger [baseline] (60.495 ms) : 0, 60495
Debugger [candidate] (59.872 ms) : 0, 59872
Remote Config [baseline] (598.483 µs) : 0, 598
Remote Config [candidate] (598.748 µs) : 0, 599
Telemetry [baseline] (8.366 ms) : 0, 8366
Telemetry [candidate] (8.37 ms) : 0, 8370
Flare Poller [baseline] (9.929 ms) : 0, 9929
Flare Poller [candidate] (11.26 ms) : 0, 11260
section iast
crashtracking [baseline] (1.225 ms) : 0, 1225
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (821.637 ms) : 0, 821637
BytebuddyAgent [candidate] (827.744 ms) : 0, 827744
AgentMeter [baseline] (11.275 ms) : 0, 11275
AgentMeter [candidate] (11.271 ms) : 0, 11271
GlobalTracer [baseline] (237.53 ms) : 0, 237530
GlobalTracer [candidate] (237.381 ms) : 0, 237381
AppSec [baseline] (30.496 ms) : 0, 30496
AppSec [candidate] (30.574 ms) : 0, 30574
Debugger [baseline] (62.45 ms) : 0, 62450
Debugger [candidate] (62.557 ms) : 0, 62557
Remote Config [baseline] (534.095 µs) : 0, 534
Remote Config [candidate] (527.302 µs) : 0, 527
Telemetry [baseline] (7.935 ms) : 0, 7935
Telemetry [candidate] (7.926 ms) : 0, 7926
Flare Poller [baseline] (3.345 ms) : 0, 3345
Flare Poller [candidate] (3.344 ms) : 0, 3344
IAST [baseline] (28.955 ms) : 0, 28955
IAST [candidate] (29.225 ms) : 0, 29225
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.071 s) : 0, 1070587
Total [baseline] (11.074 s) : 0, 11073615
Agent [candidate] (1.064 s) : 0, 1064355
Total [candidate] (11.111 s) : 0, 11111029
section appsec
Agent [baseline] (1.279 s) : 0, 1279411
Total [baseline] (11.19 s) : 0, 11189722
Agent [candidate] (1.269 s) : 0, 1268854
Total [candidate] (11.152 s) : 0, 11152139
section iast
Agent [baseline] (1.247 s) : 0, 1247351
Total [baseline] (11.263 s) : 0, 11263068
Agent [candidate] (1.243 s) : 0, 1243260
Total [candidate] (11.211 s) : 0, 11210656
section profiling
Agent [baseline] (1.19 s) : 0, 1190027
Total [baseline] (10.992 s) : 0, 10991747
Agent [candidate] (1.204 s) : 0, 1203760
Total [candidate] (11.022 s) : 0, 11021941
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.21 ms) : 0, 1210
BytebuddyAgent [baseline] (638.614 ms) : 0, 638614
BytebuddyAgent [candidate] (635.624 ms) : 0, 635624
AgentMeter [baseline] (29.714 ms) : 0, 29714
AgentMeter [candidate] (29.607 ms) : 0, 29607
GlobalTracer [baseline] (250.617 ms) : 0, 250617
GlobalTracer [candidate] (249.936 ms) : 0, 249936
AppSec [baseline] (33.089 ms) : 0, 33089
AppSec [candidate] (32.889 ms) : 0, 32889
Debugger [baseline] (60.753 ms) : 0, 60753
Debugger [candidate] (60.927 ms) : 0, 60927
Remote Config [baseline] (601.425 µs) : 0, 601
Remote Config [candidate] (609.0 µs) : 0, 609
Telemetry [baseline] (10.749 ms) : 0, 10749
Telemetry [candidate] (9.183 ms) : 0, 9183
Flare Poller [baseline] (9.127 ms) : 0, 9127
Flare Poller [candidate] (8.438 ms) : 0, 8438
section appsec
crashtracking [baseline] (1.24 ms) : 0, 1240
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (686.073 ms) : 0, 686073
BytebuddyAgent [candidate] (674.611 ms) : 0, 674611
AgentMeter [baseline] (12.438 ms) : 0, 12438
AgentMeter [candidate] (12.237 ms) : 0, 12237
GlobalTracer [baseline] (251.247 ms) : 0, 251247
GlobalTracer [candidate] (250.792 ms) : 0, 250792
AppSec [baseline] (185.726 ms) : 0, 185726
AppSec [candidate] (186.52 ms) : 0, 186520
Debugger [baseline] (65.468 ms) : 0, 65468
Debugger [candidate] (65.283 ms) : 0, 65283
Remote Config [baseline] (589.489 µs) : 0, 589
Remote Config [candidate] (562.502 µs) : 0, 563
Telemetry [baseline] (7.988 ms) : 0, 7988
Telemetry [candidate] (7.899 ms) : 0, 7899
Flare Poller [baseline] (6.789 ms) : 0, 6789
Flare Poller [candidate] (8.145 ms) : 0, 8145
IAST [baseline] (24.874 ms) : 0, 24874
IAST [candidate] (24.951 ms) : 0, 24951
section iast
crashtracking [baseline] (1.219 ms) : 0, 1219
crashtracking [candidate] (1.246 ms) : 0, 1246
BytebuddyAgent [baseline] (825.872 ms) : 0, 825872
BytebuddyAgent [candidate] (822.708 ms) : 0, 822708
AgentMeter [baseline] (11.444 ms) : 0, 11444
AgentMeter [candidate] (11.291 ms) : 0, 11291
GlobalTracer [baseline] (237.775 ms) : 0, 237775
GlobalTracer [candidate] (237.555 ms) : 0, 237555
AppSec [baseline] (32.167 ms) : 0, 32167
AppSec [candidate] (30.661 ms) : 0, 30661
Debugger [baseline] (64.118 ms) : 0, 64118
Debugger [candidate] (63.798 ms) : 0, 63798
Remote Config [baseline] (532.535 µs) : 0, 533
Remote Config [candidate] (522.196 µs) : 0, 522
Telemetry [baseline] (7.961 ms) : 0, 7961
Telemetry [candidate] (7.909 ms) : 0, 7909
Flare Poller [baseline] (3.455 ms) : 0, 3455
Flare Poller [candidate] (3.424 ms) : 0, 3424
IAST [baseline] (26.625 ms) : 0, 26625
IAST [candidate] (28.15 ms) : 0, 28150
section profiling
crashtracking [baseline] (1.179 ms) : 0, 1179
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (694.486 ms) : 0, 694486
BytebuddyAgent [candidate] (704.139 ms) : 0, 704139
AgentMeter [baseline] (8.96 ms) : 0, 8960
AgentMeter [candidate] (9.095 ms) : 0, 9095
GlobalTracer [baseline] (208.714 ms) : 0, 208714
GlobalTracer [candidate] (210.324 ms) : 0, 210324
AppSec [baseline] (32.757 ms) : 0, 32757
AppSec [candidate] (33.182 ms) : 0, 33182
Debugger [baseline] (66.122 ms) : 0, 66122
Debugger [candidate] (66.31 ms) : 0, 66310
Remote Config [baseline] (594.393 µs) : 0, 594
Remote Config [candidate] (583.26 µs) : 0, 583
Telemetry [baseline] (8.109 ms) : 0, 8109
Telemetry [candidate] (8.177 ms) : 0, 8177
Flare Poller [baseline] (3.533 ms) : 0, 3533
Flare Poller [candidate] (3.671 ms) : 0, 3671
ProfilingAgent [baseline] (93.988 ms) : 0, 93988
ProfilingAgent [candidate] (94.885 ms) : 0, 94885
Profiling [baseline] (94.544 ms) : 0, 94544
Profiling [candidate] (95.44 ms) : 0, 95440
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 21 metrics, 15 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section baseline
no_agent (19.55 ms) : 19353, 19746
. : milestone, 19550,
appsec (18.735 ms) : 18546, 18924
. : milestone, 18735,
code_origins (17.975 ms) : 17798, 18152
. : milestone, 17975,
iast (17.978 ms) : 17798, 18157
. : milestone, 17978,
profiling (18.788 ms) : 18601, 18975
. : milestone, 18788,
tracing (17.933 ms) : 17754, 18111
. : milestone, 17933,
section candidate
no_agent (19.372 ms) : 19176, 19569
. : milestone, 19372,
appsec (18.68 ms) : 18492, 18868
. : milestone, 18680,
code_origins (17.776 ms) : 17599, 17952
. : milestone, 17776,
iast (17.887 ms) : 17709, 18064
. : milestone, 17887,
profiling (18.438 ms) : 18255, 18622
. : milestone, 18438,
tracing (17.694 ms) : 17519, 17869
. : milestone, 17694,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section baseline
no_agent (1.27 ms) : 1258, 1282
. : milestone, 1270,
iast (3.318 ms) : 3272, 3364
. : milestone, 3318,
iast_FULL (6.068 ms) : 6006, 6130
. : milestone, 6068,
iast_GLOBAL (3.689 ms) : 3627, 3750
. : milestone, 3689,
profiling (2.267 ms) : 2246, 2289
. : milestone, 2267,
tracing (1.953 ms) : 1936, 1969
. : milestone, 1953,
section candidate
no_agent (1.237 ms) : 1225, 1249
. : milestone, 1237,
iast (3.374 ms) : 3328, 3420
. : milestone, 3374,
iast_FULL (5.983 ms) : 5922, 6045
. : milestone, 5983,
iast_GLOBAL (3.686 ms) : 3625, 3747
. : milestone, 3686,
profiling (2.152 ms) : 2133, 2172
. : milestone, 2152,
tracing (1.932 ms) : 1916, 1948
. : milestone, 1932,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section baseline
no_agent (14.932 s) : 14932000, 14932000
. : milestone, 14932000,
appsec (14.726 s) : 14726000, 14726000
. : milestone, 14726000,
iast (18.183 s) : 18183000, 18183000
. : milestone, 18183000,
iast_GLOBAL (18.028 s) : 18028000, 18028000
. : milestone, 18028000,
profiling (15.033 s) : 15033000, 15033000
. : milestone, 15033000,
tracing (14.76 s) : 14760000, 14760000
. : milestone, 14760000,
section candidate
no_agent (15.457 s) : 15457000, 15457000
. : milestone, 15457000,
appsec (14.523 s) : 14523000, 14523000
. : milestone, 14523000,
iast (19.089 s) : 19089000, 19089000
. : milestone, 19089000,
iast_GLOBAL (17.891 s) : 17891000, 17891000
. : milestone, 17891000,
profiling (15.069 s) : 15069000, 15069000
. : milestone, 15069000,
tracing (14.763 s) : 14763000, 14763000
. : milestone, 14763000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~33f9eb1764, baseline=1.62.0-SNAPSHOT~75fe2b3c55
dateFormat X
axisFormat %s
section baseline
no_agent (1.486 ms) : 1475, 1498
. : milestone, 1486,
appsec (3.822 ms) : 3601, 4043
. : milestone, 3822,
iast (2.284 ms) : 2214, 2355
. : milestone, 2284,
iast_GLOBAL (2.314 ms) : 2243, 2384
. : milestone, 2314,
profiling (2.111 ms) : 2056, 2167
. : milestone, 2111,
tracing (2.074 ms) : 2020, 2128
. : milestone, 2074,
section candidate
no_agent (1.482 ms) : 1471, 1494
. : milestone, 1482,
appsec (3.784 ms) : 3567, 4001
. : milestone, 3784,
iast (2.28 ms) : 2210, 2350
. : milestone, 2280,
iast_GLOBAL (2.319 ms) : 2248, 2390
. : milestone, 2319,
profiling (2.526 ms) : 2360, 2693
. : milestone, 2526,
tracing (2.072 ms) : 2018, 2126
. : milestone, 2072,
|
…entDecoder Replaces hardcoded UTF-8 (no-charset default) and ISO-8859-1 (fallback) with Charset.defaultCharset() in both cases, per reviewer feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e74778b436
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…partContentDecoder RFC 2045 allows quoted parameter values (charset="UTF-8"). Without stripping the quotes Charset.forName rejects the name and decodeBytes falls back to the JVM default instead of the declared charset.
…ecoder String#split is forbidden (uses regex internally). Replace with an explicit char scan to find the first ; , or space after charset=.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 228e1c467a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…oder.extractCharset Replace toLowerCase(Locale.ROOT).indexOf with an inline ASCII case-insensitive scan to avoid allocating a full lowercase copy of the Content-Type string. Also use the already-computed end variable as the loop bound.
All files in this package (StoredByteBody, StoredBodySupplier, MultipartContentDecoder, etc.) are AppSec HTTP body inspection infrastructure.
When FileItemContentReader truncates at MAX_CONTENT_BYTES a cut in the middle of a multibyte character no longer triggers the fallback path. REPLACE substitutes the incomplete sequence with U+FFFD using the declared charset; REPORT was throwing and silently switching to the JVM default charset for the whole string.
|
@codex review |
With CodingErrorAction.REPLACE the decoder never throws CharacterCodingException, making the catch branch unreachable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22536b19ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Substring search could match 'xcharset=' as 'charset=', allowing a client-controlled decoy parameter to override the real charset. Now requires the match to be at position 0 or preceded by ';' or ' '.
CharsetDecoder.decode(ByteBuffer) declares throws CharacterCodingException even though CodingErrorAction.REPLACE makes it unreachable; the compiler still requires the exception to be caught or declared.
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
Introduces MultipartContentDecoder (internal-api) to decode multipart file bytes using the charset declared in each part's Content-Type header, with JVM-default fallback and REPLACE on malformed input. Mirrors the approach in PR #11212 for commons-fileupload so all three integrations share the same decoding logic.
9d73760
into
master
| try { | ||
| return Charset.forName(name); | ||
| } catch (IllegalArgumentException e) { | ||
| return null; |
There was a problem hiding this comment.
Does this risk a NullPointerException elsewhere?
Should we fallback to a default Charset instead?
There was a problem hiding this comment.
Right now it's only used in decodeBytes method, where that is done
if (charset == null) charset = Charset.defaultCharset();
Probably It makes more sense to return it where you mention, not sure if in the future it will be necessary to know exactly if we are not able to get the charset
I could change it in my next PR related with this topic if you want :)
Introduces MultipartContentDecoder (internal-api) to decode multipart file bytes using the charset declared in each part's Content-Type header, with JVM-default fallback and REPLACE on malformed input. Mirrors the approach in PR #11212 for commons-fileupload so all three integrations share the same decoding logic.
What Does This Do
MultipartContentDecoderininternal-api/src/main/java/datadog/trace/api/http/— a shared utility that decodes multipart file content bytes using the charset declared in each part'sContent-TypeheaderFileItemContentReader.readContent()now delegates toMultipartContentDecoder.decodeBytes(buf, total, fileItem.getContentType())instead of hardcoding ISO-8859-1MultipartContentDecoder.extractCharset()is case-insensitive, handles RFC 2045 quoted values (charset="UTF-8"), and guards against parameter name substring false positives (xcharset=…no longer matches ascharset=)CodingErrorAction.REPLACEis used so that truncation at a multibyte character boundary (from theMAX_CONTENT_BYTEScap) produces U+FFFD for the incomplete sequence rather than falling back to the JVM default charset for the entire stringinternal-api/src/main/java/datadog/trace/api/http/to@DataDog/asm-javaMotivation
FileItemContentReaderwas decoding all uploaded file content with hardcoded ISO-8859-1. Files uploaded as UTF-8 or another charset arrived at the WAF with garbled non-ASCII characters, preventing detection on any content outside the ASCII range.Additional Notes
Charset fallback diverges from
StoredByteBody: when no charset is declared in the part'sContent-Type,MultipartContentDecoderfalls back toCharset.defaultCharset()(the JVM default) rather than hardcoding UTF-8 → ISO-8859-1 asStoredByteBodydoes. This was deliberately chosen after discussion with Manu — the JVM default is a safer assumption for a multi-framework utility than an opinionated constant.catch (CharacterCodingException e)is unreachable at runtime:CharsetDecoder.decode(ByteBuffer)declaresthrows CharacterCodingExceptionas a checked exception so the compiler requires the catch, butCodingErrorAction.REPLACEensures the exception is never actually thrown. The catch block rethrows asIllegalStateExceptionto make the intent clear.MultipartContentDecoderis designed for reuse: the same class is used in PR #11198 (Tomcat and Netty integrations) so the charset parsing logic lives in one place.Jira ticket: APPSEC-61875