Skip to content

Fix out-of-bounds reads in Arabic shaping space handling#4043

Open
ubeddulla wants to merge 1 commit into
unicode-org:mainfrom
ubeddulla:ushape-shaping-bounds
Open

Fix out-of-bounds reads in Arabic shaping space handling#4043
ubeddulla wants to merge 1 commit into
unicode-org:mainfrom
ubeddulla:ushape-shaping-bounds

Conversation

@ubeddulla

Copy link
Copy Markdown

handleGeneratedSpaces and expandCompositCharAtNear in ushape.cpp walk the shaping scratch buffer one element past its end: the begin-spacing loop starts its index at sourceLength so it reads dest[sourceLength], and the near lamalef expansion reads dest[i+1] without checking i+1 is still in range. u_shapeArabic hands these helpers a buffer of exactly sourceLength UChars, heap-allocated once the input passes the 300-element stack fallback, so U_SHAPE_LAMALEF_BEGIN, U_SHAPE_TASHKEEL_BEGIN and U_SHAPE_LETTERS_UNSHAPE | U_SHAPE_LAMALEF_NEAR read out of bounds on a ~400-char input, which AddressSanitizer reports at ushape.cpp:828. The begin loop now starts at sourceLength-1 like its sibling at line 929, and the lamalef lookahead is guarded with (i < sourceLength-1) the way the nearby code at line 1396 already does. The new cintltst case fails under asan before the change and passes after, and the existing shaping output is unchanged.

Checklist

  • Required: Issue filed: ICU-NNNNN
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

handleGeneratedSpaces started its begin-spacing loop at index sourceLength and expandCompositCharAtNear read dest[i+1] without a range check, so both touched one element past the sourceLength-sized scratch buffer that u_shapeArabic allocates. Start the loop at sourceLength-1 like its sibling and guard the lamalef lookahead with (i < sourceLength-1) as the nearby code already does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants