Skip to content

ICU-23448 Fix Invalid-enum-value crash in udat_open#4056

Open
FrankYFTang wants to merge 9 commits into
unicode-org:mainfrom
FrankYFTang:fix-udat-open-invalid-enum
Open

ICU-23448 Fix Invalid-enum-value crash in udat_open#4056
FrankYFTang wants to merge 9 commits into
unicode-org:mainfrom
FrankYFTang:fix-udat-open-invalid-enum

Conversation

@FrankYFTang

@FrankYFTang FrankYFTang commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • Required: Issue filed: ICU-23448
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

Validate timeStyle and dateStyle in udat_open to prevent passing invalid styles to C++ APIs, which causes Undefined Behavior (load of invalid enum value) flagged by UBSAN.

Also update date_format_fuzzer.cpp to avoid passing invalid enums to C++ APIs directly, and instead test udat_open validation with raw bytes.
Add a unit test in cdattst.c to verify.

Bug: https://issues.oss-fuzz.com/issues/477633331

FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Jun 30, 2026
@FrankYFTang FrankYFTang force-pushed the fix-udat-open-invalid-enum branch from 44d3718 to 4acb852 Compare June 30, 2026 18:07
@jira-pull-request-webhook

Copy link
Copy Markdown

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang requested review from richgillam and sffc and removed request for sffc June 30, 2026 18:08
Validate timeStyle and dateStyle in udat_open to prevent passing
invalid styles to C++ APIs, which causes Undefined Behavior (load of
invalid enum value) flagged by UBSAN.

Also update date_format_fuzzer.cpp to avoid passing invalid enums to
C++ APIs directly, and instead test udat_open validation with raw bytes.
Add a unit test in cdattst.c to verify.

Bug: https://issues.oss-fuzz.com/issues/477633331
These uninitialized members were causing crashes in tests under UBSAN.
Prevent invalid styles from causing integer overflow and other issues.
Added unit test TestInvalidStyles.
@FrankYFTang FrankYFTang force-pushed the fix-udat-open-invalid-enum branch from 4acb852 to d2f04f4 Compare June 30, 2026 19:14
@jira-pull-request-webhook

Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/datefmt.cpp is different
  • icu4c/source/i18n/udat.cpp is different
  • icu4c/source/test/fuzzer/date_format_fuzzer.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@richgillam richgillam left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOKTM, but a couple things didn't completely make sense to me...

Comment thread icu4c/source/i18n/datefmt.cpp Outdated
static_assert(sizeof(dateStyle) == sizeof(dateStyleInt), "sizes must match");
static_assert(sizeof(timeStyle) == sizeof(timeStyleInt), "sizes must match");
std::memcpy(&dateStyleInt, &dateStyle, sizeof(dateStyle));
std::memcpy(&timeStyleInt, &timeStyle, sizeof(timeStyle));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple dumb questions: Why do you need to convert it into an int and back here? Is this necessary to check if the input is a valid enum value?

And would it make sense to just treat invalid enum values are equivalent to kNone rather than just failing on construction?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Converting to int via memcpy is necessary to avoid UBSAN (Undefined Behavior Sanitizer) warnings. In C++, casting an arbitrary invalid integer to an enum and then using it (e.g. in comparison) is undefined behavior. memcpy allows us to inspect the raw bits safely without triggering UB.
  2. Returning nullptr (failing on construction) is safer than treating invalid values as kNone. Treating invalid values as kNone might hide bugs in user code and lead to unexpected formatting behavior instead of a clear failure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incredibly ugly, but if we have to do it that way, I guess we have to do it that way. I do think it'd help to have some comments in there explaining why we have to do it that way, though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy() is unnecessary. Assigning an enum to an int is safe and conventional.

Also, the static_asserts+memcpy could be a portability problem: There is no guarantee that an enum is always the same size as an int32_t. (More likely same as an int.)

Please just use

   int32_t dateStyleInt = dateStyle;
   int32_t timeStyleInt = timeStyle;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced memcpy and static_assert with simple assignment. Verified that it compiles and passes tests under UBSAN/ASAN (even with -O3).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to memcpy (with static_assert size checks) because simple assignment actually triggered UBSan errors in CI (e.g. runtime error: load of value 100000, which is not a valid value for type 'EStyle'). In C++, loading an invalid value as an enum is undefined behavior, and UBSan catches this during the assignment. memcpy avoids loading it as enum type.

Comment thread icu4c/source/i18n/udat.cpp Outdated
@FrankYFTang FrankYFTang requested a review from richgillam June 30, 2026 22:01
richgillam
richgillam previously approved these changes Jun 30, 2026
Comment thread icu4c/source/i18n/datefmt.cpp Outdated

#include "cstring.h"
#include "windtfmt.h"
#include <cstring>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually include

  1. standard C headers
  2. standard C++ headers (like this one)
  3. unicode/utypes.h
  4. other public ICU headers
  5. internal ICU headers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the unnecessary <cstring> include and moved <typeinfo> to the top of the file.

Comment thread icu4c/source/i18n/datefmt.cpp Outdated
static_assert(sizeof(dateStyle) == sizeof(dateStyleInt), "sizes must match");
static_assert(sizeof(timeStyle) == sizeof(timeStyleInt), "sizes must match");
std::memcpy(&dateStyleInt, &dateStyle, sizeof(dateStyle));
std::memcpy(&timeStyleInt, &timeStyle, sizeof(timeStyle));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy() is unnecessary. Assigning an enum to an int is safe and conventional.

Also, the static_asserts+memcpy could be a portability problem: There is no guarantee that an enum is always the same size as an int32_t. (More likely same as an int.)

Please just use

   int32_t dateStyleInt = dateStyle;
   int32_t timeStyleInt = timeStyle;

Comment thread icu4c/source/i18n/datefmt.cpp Outdated
Comment on lines +426 to +433
static UBool isValidStyle(int32_t style) {
return (style >= DateFormat::kFull && style <= DateFormat::kShort) ||
(style >= DateFormat::kFullRelative && style <= DateFormat::kShortRelative);
}

static UBool isValidTimeStyle(int32_t style) {
return (style >= DateFormat::kFull && style <= DateFormat::kShort);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put these into an anonymous namespace and remove static.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved the helper functions into an anonymous namespace and removed static in both datefmt.cpp and udat.cpp.

Comment thread icu4c/source/i18n/datefmt.cpp Outdated
dateStyleInt = dateStyleInt + kDateOffset;
}
return create(timeStyle, dateStyle, aLocale);
return create(static_cast<EStyle>(timeStyleInt), static_cast<EStyle>(dateStyleInt), aLocale);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timeStyle is unchanged, so please just use it, rather than casting back from the int

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Used timeStyle directly in the create call.

Comment thread icu4c/source/i18n/datefmt.cpp Outdated
Comment thread icu4c/source/i18n/datefmt.cpp Outdated
Comment thread icu4c/source/i18n/datefmt.cpp
@markusicu markusicu self-assigned this Jun 30, 2026
@FrankYFTang

Copy link
Copy Markdown
Contributor Author

PTAL

richgillam
richgillam previously approved these changes Jul 1, 2026

@richgillam richgillam left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes and explanations are good enough for me; I hope they'll also be good enough for Markus.

@FrankYFTang

Copy link
Copy Markdown
Contributor Author

"int32_t dateStyleInt = dateStyle;" trigger ubsan while the value in dateStyle is not a valid enum, change back to memcpy

PTAL

@markusicu markusicu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@markusicu

Copy link
Copy Markdown
Member

"int32_t dateStyleInt = dateStyle;" trigger ubsan while the value in dateStyle is not a valid enum

Making this kind of code "undefined behavior" makes C++ nearly impossible to work with.
Thanks for agreeing to suppress UBSan here instead, so that we can write sane code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants