Problem
When reading Avro or Parquet files on S3 via ArrowFileSystemFileIO, the readers call io->fs()->OpenInputFile() directly on the underlying Arrow filesystem, bypassing ArrowFileSystemFileIO::ResolvePath(). This means s3:// prefixes are never stripped. Arrow's S3FileSystem expects bare bucket/key paths, not full URIs.
Consumers using iceberg-cpp to scan tables with S3-backed storage hit this when the REST catalog returns manifest/data file paths with s3:// schemes — the file paths flow through the Avro manifest reader untransformed.
Error
Invalid: Expected an S3 object path of the form 'bucket/key...', got a URI:
's3://warehouse/default/test_table/metadata/snap-487842974509551922-0-dc0a55d6-5df1-4ffa-a01c-b7481e5c663c.avro'
This originates from Arrow's S3Path::FromString() in s3fs.cc:
if (internal::IsLikelyUri(s)) {
return Status::Invalid(
"Expected an S3 object path of the form 'bucket/key...', got a URI: '", s, "'");
}
Variant errors depending on FileIO configuration
When the FileIO falls back to local filesystem instead of S3:
Invalid: The filesystem expected a URI with one of the schemes (file) but received
s3://warehouse/testing/sample/metadata/snap-4065910918800248368-0-422aac49-0c9d-43c9-8cb4-3d21931e29f2.avro
Affected code paths
avro_reader.cc — calls io->fs()->OpenInputFile() directly
parquet_reader.cc — calls io->fs()->OpenInputFile() directly
avro_writer.cc — calls io->fs()->OpenOutputStream() directly
parquet_writer.cc — calls io->fs()->OpenOutputStream() directly
All four bypass ArrowFileSystemFileIO::ResolvePath() which handles URI scheme stripping.
Problem
When reading Avro or Parquet files on S3 via
ArrowFileSystemFileIO, the readers callio->fs()->OpenInputFile()directly on the underlying Arrow filesystem, bypassingArrowFileSystemFileIO::ResolvePath(). This meanss3://prefixes are never stripped. Arrow'sS3FileSystemexpects barebucket/keypaths, not full URIs.Consumers using iceberg-cpp to scan tables with S3-backed storage hit this when the REST catalog returns manifest/data file paths with
s3://schemes — the file paths flow through the Avro manifest reader untransformed.Error
This originates from Arrow's
S3Path::FromString()ins3fs.cc:Variant errors depending on FileIO configuration
When the FileIO falls back to local filesystem instead of S3:
Affected code paths
avro_reader.cc— callsio->fs()->OpenInputFile()directlyparquet_reader.cc— callsio->fs()->OpenInputFile()directlyavro_writer.cc— callsio->fs()->OpenOutputStream()directlyparquet_writer.cc— callsio->fs()->OpenOutputStream()directlyAll four bypass
ArrowFileSystemFileIO::ResolvePath()which handles URI scheme stripping.