Skip to content

Skip byte counting at EOF in ExtendedBufferedReader.read#615

Merged
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:eof-byte-count-guard
Jun 20, 2026
Merged

Skip byte counting at EOF in ExtendedBufferedReader.read#615
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:eof-byte-count-guard

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

ExtendedBufferedReader.read() feeds the EOF sentinel (-1) to getEncodedCharLength when trackBytes is on, narrowing it to (char) -1 (U+FFFF) and pushing it through the charset encoder. A single-byte charset like ISO-8859-1 or US-ASCII cannot encode U+FFFF, so byte tracking throws UnmappableCharacterException at end of input on an otherwise valid file; with UTF-8/UTF-16 it silently adds a few phantom bytes to the count. Found while running getBytePosition with a non-UTF charset; the read(char[], off, len) overload already guards this with len > 0, so this mirrors that guard for the single-char path.

@garydgregory garydgregory changed the title skip byte counting at EOF in ExtendedBufferedReader.read Skip byte counting at EOF in ExtendedBufferedReader.read Jun 20, 2026
@garydgregory

Copy link
Copy Markdown
Member

Good catch

@garydgregory

Copy link
Copy Markdown
Member

Good catch @rootvector2 , merged 🚀

@garydgregory garydgregory merged commit 5f60ca5 into apache:master Jun 20, 2026
16 checks passed
garydgregory added a commit that referenced this pull request Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants