Skip to content

MicrosoftAzure::list_with_offset returns empty on OneLake since 0.13.0 (regression from #623) #695

@djouallah

Description

@djouallah

MicrosoftAzure::list_with_offset returns empty on OneLake since 0.13.0 (regression from #623)

Describe the bug

Against Microsoft Fabric OneLake (*.dfs.fabric.microsoft.com), ObjectStore::list_with_offset(prefix, offset) returns zero entries even when the prefix contains files strictly greater than offset. The equivalent list(prefix) on the same store returns the correct files, so the data is reachable — only the offset-based listing is broken.

This regressed in object_store 0.13.0 via #623, which replaced the default fallback with an Azure-specific implementation that uses the ADLS Gen2 startFrom URI parameter. OneLake's REST surface does not handle startFrom the same way the standard ADLS Gen2 endpoint does.

Impact

Every downstream that uses list_with_offset against OneLake is broken on object_store >= 0.13.0:

  • delta-kernel-rs (used by DuckDB's delta extension, delta-rs): loading a Delta table with a _last_checkpoint hint fails with Invalid Checkpoint: Had a _last_checkpoint hint but didn't find any checkpoints. See delta-io/delta-kernel-rs#2433 and the (now-closed) workaround attempt #2437.
  • lakehq/sail (does NOT use delta-kernel-rs; independently hits the same bug): lakehq/sail#1730.

To Reproduce

Minimal, no-delta-kernel reproducer below. The only thing swapped between the two runs is the object_store pin.

Cargo.toml:

[package]
name = "onelake-repro"
version = "0.0.1"
edition = "2021"

[dependencies]
# Swap between "=0.12.5" (works) and "=0.13.2" (broken)
object_store = { version = "=0.13.2", features = ["azure"] }
futures = "0.3"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
url = "2"
anyhow = "1"

src/main.rs:

use std::env;

use anyhow::{anyhow, Context, Result};
use futures::stream::StreamExt;
use object_store::azure::{AzureConfigKey, MicrosoftAzureBuilder};
use object_store::path::Path;
use object_store::{ObjectMeta, ObjectStore};

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
async fn main() -> Result<()> {
    let args: Vec<String> = env::args().collect();
    if args.len() != 5 {
        return Err(anyhow!("usage: onelake-repro <workspace> <lakehouse> <table> <checkpoint_version>"));
    }
    let workspace = &args[1];
    let lakehouse = &args[2];
    let table = &args[3];
    let ckpt_version: u64 = args[4].parse()?;

    let token = env::var("AZURE_STORAGE_TOKEN")
        .context("AZURE_STORAGE_TOKEN not set")?;

    let url = format!(
        "abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{lakehouse}.Lakehouse/Tables/{table}/"
    );

    let store = MicrosoftAzureBuilder::new()
        .with_url(url.as_str())
        .with_config(AzureConfigKey::Token, token)
        .build()?;

    let prefix_str = format!("{lakehouse}.Lakehouse/Tables/{table}/_delta_log");
    let prefix = Path::from(prefix_str.as_str());
    let offset = Path::from(format!("{prefix_str}/{ckpt_version:020}").as_str());

    let a = collect(store.list(Some(&prefix))).await?;
    println!("A) list(prefix): {} entries", a.len());
    for loc in &a { println!("   {loc}"); }

    let b = collect(store.list_with_offset(Some(&prefix), &offset)).await?;
    println!("\nB) list_with_offset(prefix, offset): {} entries", b.len());
    for loc in &b { println!("   {loc}"); }

    Ok(())
}

async fn collect<S>(mut s: S) -> Result<Vec<String>>
where S: futures::Stream<Item = object_store::Result<ObjectMeta>> + Unpin
{
    let mut out = vec![];
    while let Some(m) = s.next().await { out.push(m?.location.to_string()); }
    out.sort();
    Ok(out)
}

Run:

export AZURE_STORAGE_TOKEN=$(az account get-access-token --resource https://storage.azure.com/ --query accessToken -o tsv)
cargo run --release -- <workspace> <lakehouse> <table> <checkpoint_version>

Expected behavior

list_with_offset(prefix, offset) should return exactly the files in list(prefix) whose location is lexicographically greater than offset.

Actual behavior

Against the same OneLake table (a Delta table with _last_checkpoint at v10):

With object_store = "=0.12.5" (works):

A) list(prefix): 11 entries
   _delta_log/00000000000000000005.json
   _delta_log/00000000000000000006.json
   _delta_log/00000000000000000007.json
   _delta_log/00000000000000000008.json
   _delta_log/00000000000000000009.json
   _delta_log/00000000000000000010.checkpoint.parquet
   _delta_log/00000000000000000010.json
   _delta_log/00000000000000000011.json
   _delta_log/00000000000000000012.json
   _delta_log/00000000000000000013.json
   _delta_log/_last_checkpoint

B) list_with_offset(prefix, _delta_log/00000000000000000010): 6 entries
   _delta_log/00000000000000000010.checkpoint.parquet
   _delta_log/00000000000000000010.json
   _delta_log/00000000000000000011.json
   _delta_log/00000000000000000012.json
   _delta_log/00000000000000000013.json
   _delta_log/_last_checkpoint

With object_store = "=0.13.2" (broken):

A) list(prefix): 11 entries       <-- identical to above
   ...

B) list_with_offset(prefix, _delta_log/00000000000000000010): 0 entries

(Only list_with_offset differs between the two runs.)

Suspected cause

#623 added a direct list_with_offset implementation for Azure that sends startFrom=<offset> per the ADLS Gen2 list-blobs API. OneLake's endpoint apparently does not implement startFrom compatibly — it returns an empty list regardless of the offset value.

This matches lonless9's analysis on lakehq/sail#1730 and the related Azurite#2619.

Environment

  • object_store 0.13.2 (and 0.13.0, 0.13.1 — all contain Azure ADLS list_with_offset support #623)
  • OneLake endpoint onelake.dfs.fabric.microsoft.com
  • Service-principal / Azure CLI bearer token (same auth in both runs; auth is not the issue)
  • Observed on Windows 11 / rustc 1.95.0, but not platform-dependent

Repro and report co-drafted with Claude Code (Claude Opus 4.7).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions