Skip to content

Commit 17add6a

Browse files
Enhance distributed job monitoring and update package version (#103)
* fix: enhance distributed job monitoring and update package version * Update src/eopf_geozarr/s2_optimization/s2_multiscale.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: update changelog with unreleased features and fixes --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent a5b682f commit 17add6a

4 files changed

Lines changed: 55 additions & 5 deletions

File tree

.vscode/launch.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -200,9 +200,9 @@
200200
"https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202511-s02msil2a-eu/15/products/cpm_v262/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807.zarr",
201201
// "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202511-s02msil2a-eu/16/products/cpm_v262/S2A_MSIL2A_20251116T085431_N0511_R107_T35SQD_20251116T103813.zarr",
202202
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/sentinel-2-l2a-opt/S2A_MSIL2A_20250908T100041_N0511_R122_T32TQM_20250908T115116.zarr",
203-
"s3://esa-zarr-sentinel-explorer-fra/tests-output/sentinel-2-l2a-staging/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807.zarr",
203+
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/sentinel-2-l2a-staging/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807.zarr",
204204
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/sentinel-2-l2a-staging/S2A_MSIL2A_20251116T085431_N0511_R107_T35SQD_20251116T103813.zarr",
205-
// "./tests-output/eopf_geozarr/s2l2_optimized.zarr",
205+
"./tests-output/eopf_geozarr/s2l2_optimized_dis.zarr",
206206
"--spatial-chunk",
207207
"256",
208208
"--compression-level",

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,54 +5,81 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
12+
- Distributed job monitoring with proper Future status tracking when distributed client is available (#103)
13+
- Post-write verification to catch silent write failures and invalid output datasets
14+
15+
### Changed
16+
17+
- Improved `stream_write_dataset` to use `client.compute()` for better status monitoring when distributed client is active
18+
- Enhanced error reporting with specific failure context and dataset path information
19+
- Added fallback mechanisms for distributed features when client is unavailable
20+
21+
### Fixed
22+
23+
- Fixed issue where CLI would not exit with error code when write operations failed silently
24+
825
## [0.6.0] - 2025-12-18
926

1027
### Added
28+
1129
- Spatial Zarr Convention models and metadata support (#100)
1230

1331
### Changed
32+
1433
- Updated multiscales metadata handling for improved compatibility
1534
- Set up VCS versioning based on git tags for automatic version management
1635
- Improved linting configuration by dropping isort and black in favor of stronger linting
1736

1837
### Fixed
38+
1939
- Prevented crash in quality-mask downsampling for Sentinel-2 processing
2040
- Fixed S3 path test issues
2141
- Improved runtime imports for better performance
2242

2343
## [0.3.0] - 2025-11-04
2444

2545
### Added
46+
2647
- `eopf_geozarr.s2_optimization` module with streaming multiscale generation, CLI commands, and validation for Sentinel-2 L2A.
2748
- End-to-end sharding support spanning CLI flags, conversion helpers, Dask execution, and encoding metadata.
2849
- Geo Projection attribute extension documentation plus schema to lock GeoZarr metadata expectations.
2950

3051
### Changed
52+
3153
- Tightened spatial chunk and shard defaults to cut write overhead on large scenes.
3254
- Relocated the entire test suite under `src/eopf_geozarr/tests` and broadened type coverage for tooling.
3355
- Smoothed multiscale metadata handling during streaming writes to keep Sentinel datasets consistent.
3456

3557
### Fixed
58+
3659
- Preserved coordinate dtypes in overview levels and stopped auxiliary coordinate write failures.
3760
- Prevented streaming metadata consolidation from overwriting existing groups between runs.
3861

3962
## [0.2.0] - 2025-09-22
4063

4164
### Added
65+
4266
- Sentinel-1 GRD integration tests and CLI wiring to enforce GeoZarr compliance end to end.
4367
- Reprojection utilities with GCP selection and grid-mapping output for Sentinel-1 converts.
4468

4569
### Changed
70+
4671
- Extended `create_geozarr_dataset` to understand VV/VH polarization groups and build GCP-backed overviews.
4772
- Tuned chunk-size calculation and encoding helpers so shard dimensions and auxiliaries align.
4873

4974
### Fixed
75+
5076
- Stopped auxiliary coordinate writes from failing in overviews when chunked.
5177
- Silenced noisy CLI warnings and aligned launch configs with the packaged tests.
5278

5379
## [0.1.0] - 2025-01-25
5480

5581
### Added
82+
5683
- Initial release of EOPF GeoZarr library
5784
- Core conversion functionality from EOPF datasets to GeoZarr-spec 0.4 compliant format
5885
- Command-line interface with `convert`, `info`, and `validate` commands
@@ -75,6 +102,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
75102
- PyPI package configuration with proper dependencies
76103

77104
### Features
105+
78106
- **Conversion Module**: Core tools for EOPF to GeoZarr transformation
79107
- `create_geozarr_dataset`: Main conversion function
80108
- `setup_datatree_metadata_geozarr_spec_compliant`: Metadata setup for GeoZarr compliance
@@ -85,6 +113,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
85113
- **Utility Functions**: Helper functions for data processing and validation
86114

87115
### Technical Details
116+
88117
- Built on xarray, zarr, and rioxarray
89118
- Supports Python 3.11+
90119
- Follows CF conventions for geospatial metadata
@@ -93,6 +122,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
93122
- Band-by-band processing for memory efficiency
94123

95124
### Dependencies
125+
96126
- xarray >= 2025.7.1
97127
- zarr >= 3.0.10
98128
- rioxarray >= 0.13.0
@@ -103,6 +133,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
103133
- aiohttp >= 3.8.1
104134

105135
### Development
136+
106137
- Pre-commit hooks for code quality
107138
- Black, isort, flake8, and mypy for code formatting and linting
108139
- Pytest for testing with coverage reporting

src/eopf_geozarr/s2_optimization/s2_multiscale.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -901,9 +901,28 @@ def stream_write_dataset(
901901
try:
902902
import distributed
903903

904-
distributed.progress(write_job, notebook=False)
904+
# Try to get current client for better status monitoring
905+
try:
906+
client = distributed.Client.current()
907+
# Use client.compute to get a proper Future with status
908+
future = client.compute(write_job)
909+
log.info("Using distributed client for write job monitoring")
910+
911+
try:
912+
distributed.progress(future, notebook=False)
913+
except Exception as progress_error:
914+
log.warning("Could not display progress bar: {}", e=progress_error)
915+
916+
# Get result and raise if computation failed
917+
future.result()
918+
except ValueError:
919+
# No current client, fall back to regular distributed.progress
920+
log.info("No distributed client available, using regular progress")
921+
distributed.progress(write_job, notebook=False)
922+
write_job.compute()
923+
905924
except Exception as e:
906-
log.warning("Could not display progress bar: {}", e=e)
925+
log.warning("Could not use distributed features: {}", e=e)
907926
write_job.compute()
908927
else:
909928
log.info("Writing zarr file...")

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)