Skip to content

Commit bf0cb35

Browse files
TomNicholasclaudepre-commit-ci[bot]
authored
Test that we forbid inlined refs when parsing kerchunk parquet (#865)
* add empty release notes * interpret a kerchunk reference containing a nan as meaning an uninitialized zarr chunk * raise a clearer error if kerchunk doesn't specify codecs properly * test * test: add test for inlined refs in kerchunk parquet files Adds test_notimplemented_read_inline_refs_parquet to verify that kerchunk parquet files with inlined chunk data (populated 'raw' column) raise NotImplementedError, matching the existing JSON behavior. Related: #489 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix indentation bug * add unit test for nan behaviour * add integration test for sparse kerchunk parquet arrays Tests that kerchunk parquet files with sparse chunks (some missing) can be read correctly by VirtualiZarr. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add release notes for kerchunk parquet bug fixes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent ec4236c commit bf0cb35

1 file changed

Lines changed: 25 additions & 0 deletions

File tree

virtualizarr/tests/test_parsers/test_kerchunk.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,31 @@ def test_notimplemented_read_inline_refs(tmp_path, netcdf4_inlined_ref, local_re
356356
pass
357357

358358

359+
@requires_kerchunk
360+
@pytest.mark.skipif(not has_fastparquet, reason="fastparquet not installed")
361+
def test_notimplemented_read_inline_refs_parquet(
362+
tmp_path, netcdf4_inlined_ref, local_registry
363+
):
364+
# Test that parquet references with inlined data raise NotImplementedError
365+
# https://github.com/zarr-developers/VirtualiZarr/issues/489
366+
from kerchunk.df import refs_to_dataframe
367+
368+
ref_filepath = tmp_path / "ref.parquet"
369+
refs_to_dataframe(fo=netcdf4_inlined_ref, url=ref_filepath.as_posix())
370+
371+
parser = KerchunkParquetParser()
372+
with pytest.raises(
373+
NotImplementedError,
374+
match="Reading inlined reference data is currently not supported",
375+
):
376+
with open_virtual_dataset(
377+
url=ref_filepath.as_posix(),
378+
registry=local_registry,
379+
parser=parser,
380+
) as _:
381+
pass
382+
383+
359384
@pytest.mark.parametrize("skip_variables", ["a", ["a"]])
360385
def test_skip_variables(refs_file_factory, skip_variables, local_registry):
361386
refs_file = refs_file_factory()

0 commit comments

Comments
 (0)