Skip to content

Azure: Fix ADLSInputStream.readTail for length over file size#16883

Open
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/adls-readtail-clamp-negative-start
Open

Azure: Fix ADLSInputStream.readTail for length over file size#16883
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/adls-readtail-clamp-negative-start

Conversation

@thswlsqls

@thswlsqls thswlsqls commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • ADLSInputStream.readTail(buffer, offset, length) computes the read start as fileSize - length. When the requested tail length exceeds the file size, that start goes negative and Azure SDK FileRange rejects it with IllegalArgumentException, so reading the tail of a small file fails.
  • Clamp the start to 0 with Math.max(0, fileSize - length), so the whole file is read and the actual number of bytes read is returned, per the RangeReadable.readTail contract.
  • This matches the sibling GCSInputStream.readTail, which already clamps the same way (gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSInputStream.java). S3InputStream is unaffected because it uses an HTTP suffix range (bytes=-{length}).
  • Reached in practice through ParquetIO, which delegates readTail to the cloud stream when prefetching the Parquet footer of a small file.

Testing done

  • Added TestADLSInputStream#testReadTailLengthLargerThanFileSize, asserting a readTail with length greater than the file size reads the whole file and returns its size (fails before the fix).
  • ./gradlew :iceberg-azure:test --tests "org.apache.iceberg.azure.adlsv2.TestADLSInputStream" — 5 tests passed.
  • ./gradlew :iceberg-azure:spotlessCheck :iceberg-azure:checkstyleMain :iceberg-azure:checkstyleTest — passed.
  • Full :iceberg-azure:check not run locally: it pulls Azurite/Docker integration tests. Unit tests, spotless, and checkstyle were run separately.

AI Disclosure

  • Model: Claude Opus 4.8
  • Platform/Tool: Claude Code

When the requested tail length exceeds the file size, readStart became
negative and Azure SDK FileRange rejected it with IllegalArgumentException,
so reading the tail of a small file failed.

Clamp the start position to 0, matching GCSInputStream.readTail, so the
whole file is read and the actual number of bytes read is returned per the
RangeReadable.readTail contract.

Generated-by: Claude Code
@github-actions github-actions Bot added the AZURE label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant