Commit b117364
authored
Add Sentinel-2 Optimization Module with CLI and Data Processing Enhancements (#58)
* feat: add sharding support for GeoZarr conversion and CLI
* update launch configurations for GeoZarr conversion with new data sources and adjusted parameters
* feat: enable sharding in GeoZarr conversion launch configuration
* fix: update sharding codec handling in _create_sharded_encoding function
* refactor: streamline sharding configuration in _create_geozarr_encoding function
* feat: enhance sharding logic in _create_geozarr_encoding and add _calculate_shard_dimension utility
* feat: improve sharding configuration and validation in _create_geozarr_encoding
* fix: refine shard dimension calculation and improve divisor check in utility functions
* Add dataset tree structure and test script for sharding fix
- Introduced a new dataset tree structure for Sentinel-2 data, detailing conditions, quality, and measurements.
- Added a comprehensive test script to verify the sharding fix for GeoZarr conversion.
- Implemented tests for shard dimension calculations and encoding creation with sharding enabled/disabled.
- Enhanced output for better debugging and validation of shard dimensions against chunk dimensions.
* feat: enable sharding in Dask cluster setup and enhance chunking logic for sharded variables
* Add Sentinel-2 Optimization Module with CLI Integration and Data Processing
- Created the `s2_optimization` module for optimizing Sentinel-2 Zarr datasets.
- Implemented CLI commands for converting Sentinel-2 datasets to optimized structures.
- Developed band mapping and resolution definitions for Sentinel-2 optimization.
- Added the `S2OptimizedConverter` class for handling the conversion process.
- Implemented data consolidation logic to reorganize Sentinel-2 structure.
- Created multiscale pyramid generation for optimized data.
- Added downsampling operations for various data types (reflectance, classification, quality masks).
- Implemented validation logic for optimized Sentinel-2 datasets.
- Developed unit tests for band mapping, converter functionality, and resampling operations.
* feat: enhance S2 data consolidator with comprehensive extraction methods and testing framework
* Add comprehensive tests for S2MultiscalePyramid class
- Implement unit tests for initialization, pyramid levels structure, chunk alignment, and shard dimension calculations.
- Create tests for encoding generation, dataset writing, and level dataset creation with various resolutions.
- Include integration tests for realistic measurements data and edge cases handling.
- Ensure coverage for time separation logic and coordinate preservation during processing.
* feat: simplify chunk alignment and sharding logic in S2MultiscalePyramid
* feat: integrate S2 optimization commands into CLI and enhance converter functionality
* feat: add S2L2A optimized conversion command to CLI and update launch configuration
* feat: enhance S2 converter and multiscale pyramid with optimized encoding and rechunking
* feat: enhance sharding logic to ensure compatibility with chunk dimensions in S2MultiscalePyramid
* feat: add downsampling for 10m data and adjust dataset creation for levels 3+
* feat: add support for Dask cluster in S2 optimization commands and enhance progress tracking for Zarr writes
* feat: add compression level option for GeoZarr conversion
* feat: implement Dask parallelization for multiscale pyramid creation and downsampling
* feat: enhance multiscale pyramid creation with streaming Dask parallelization and improved memory management
* feat: configure Dask client to use 3 workers with 8GB memory each for improved parallel processing
* fix: update import path for geozarr functions in S2OptimizedConverter
* feat: refactor multiscales metadata handling and root consolidation in S2OptimizedConverter
* feat: add comprehensive unit tests for S2OptimizedConverter and related functionalities
* feat: implement geographic metadata writing in S2MultiscalePyramid and add corresponding unit tests
* feat: skip duplicate variables during downsampling in S2MultiscalePyramid
* feat: enhance CRS handling by adding grid mapping variable to dataset attributes
* feat: add grid mapping variable writing for datasets in S2MultiscalePyramid
* feat: skip already present variables during downsampling in S2MultiscalePyramid
* feat: reduce memory limit for Dask client to 4GB and add geographic metadata writing in S2MultiscalePyramid
* Refactor test cases and improve code formatting in S2 resampling tests and sharding fix
- Reorganized import statements and improved code formatting for better readability in `test_s2_resampling.py`.
- Updated sample data creation functions to use consistent array formatting and improved attribute handling.
- Enhanced assertions in tests to ensure clarity and consistency.
- Improved test output messages in `test_sharding_fix.py` for better debugging and understanding of test results.
- Ensured that shard dimensions are properly calculated and validated against chunk dimensions in the sharding tests.
* feat: update memory limit for Dask client to 8GB and adjust spatial chunk size to 256 in S2MultiscalePyramid
* feat: add new CLI command for converting to GeoZarr S2L2A optimized format with sharding support
* feat: implement batched parallel downsampling for S2 datasets and improve classification downsampling method
* fix: update measurement group keys and enhance dataset loading with decoding options
* feat: add streaming support for multiscale pyramid creation in S2 converter
* feat: add --enable-streaming option for experimental streaming mode in S2 optimization command
* fix: avoid passing coordinates in lazy dataset creation to prevent alignment issues
* feat: implement Zarr v3 compatible encoding for optimized datasets
* fix: enhance measurements group writing by consolidating metadata and improving Zarr group handling
* feat: enhance streaming write with advanced chunking and sharding support
* feat: enhance encoding for streaming writes with advanced chunking and sharding support
* fix: improve root-level metadata consolidation with proper Zarr group creation and linking
* feat: add streaming support to S2 optimized converter and update measurements group handling
* fix: change root Zarr group creation mode from 'w' to 'a' for appending data
* refactor: streamline Zarr group handling and metadata consolidation in S2 converter
* fix: streamline root Zarr group creation by removing existence check and ensuring proper attributes are set
* fix: correct multiscales attribute assignment and update group prefix handling
refactor: replace os.path.exists with fs_utils.path_exists for level path check
* feat: add downsampled coordinates creation for multiscale pyramid levels
* fix: update launch configuration for S2A MSIL2A dataset and adjust grid mapping attributes in streaming pyramid creation
* Refactor downsample factor calculation in S2StreamingMultiscalePyramid
Updated the downsample factor calculation to use resolution ratios from pyramid_levels. This change improves clarity by explicitly referencing the resolutions of level 2 and the target level, ensuring accurate downsampling based on the defined pyramid structure.
* streaming as default
* refactor: update S2 optimization process to preserve original data structure and enhance multiscale pyramid creation
* refactor: enhance multiscale creation by preserving all original groups and improving empty group handling
* refactor: enhance group writing by preserving original chunking and encoding for non-measurement groups
* refactor: preserve original chunking during dataset writing by rechunking variables individually
* refactor: enhance downsampling process by organizing resolution groups and creating from coarsest available resolution
* refactor: improve error handling and verbosity in downsampling process for multiscale pyramid creation
* refactor: update band mapping for Sentinel-2 by adding 'b10' to native bands and quality data
* refactor: update tile dimensions calculation and enhance multiscales metadata handling in S2 multiscale pyramid creation
* refactor: simplify multiscales metadata addition by removing unnecessary datatree loading and streamline dataset writing return
* refactor: simplify variable naming in multiscale pyramid creation and remove unused downsampling operations
* refactor: streamline zarr group creation and multiscales metadata handling in S2 converter
* refactor: change Zarr write mode from 'a' to 'r+' in S2 converter and multiscale classes
* refactor: change Zarr write mode from 'r+' to 'a' and simplify DataTree initialization in S2 converter and multiscale classes
* fix: correct parameter name from 'modea' to 'mode' in DataTree zarr writing
* feat: add missing parent groups creation in root-level metadata consolidation
* feat: enhance root-level group creation by identifying and creating missing intermediary groups in Zarr structure
* fix: correct parameter name from 'zqarr_format' to 'zarr_format' in S2OptimizedConverter
* fix: store result of multiscales metadata addition in processed_groups
* fix: update NATIVE_BANDS to include 'b10' and adjust pyramid levels count in tests
* fix: update coordinate creation in downsampling for consistency and improve test for geo metadata integration with level creation
* refactor: remove unused fixture and update tests for pyramid levels and chunk dimensions
* Remove Sentinel-2 Zarr Conversion Optimization Plan and associated test script
- Deleted the comprehensive optimization plan for the Sentinel-2 Zarr conversion, which included details on the current state, proposed structure, technical specifications, implementation plan, and expected benefits.
- Removed the test script for verifying the sharding fix in GeoZarr conversion, which included tests for shard dimensions and encoding creation.
* Implement feature X to enhance user experience and fix bug Y in module Z
* delete: remove dataset_tree_simplified.txt as it is no longer needed1 parent 66fc5ac commit b117364
17 files changed
Lines changed: 3558 additions & 5 deletions
File tree
- .vscode
- src/eopf_geozarr
- conversion
- s2_optimization
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
159 | 187 | | |
160 | 188 | | |
161 | 189 | | |
| |||
261 | 289 | | |
262 | 290 | | |
263 | 291 | | |
264 | | - | |
| 292 | + | |
| 293 | + | |
265 | 294 | | |
266 | 295 | | |
267 | 296 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
55 | | - | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
1145 | 1147 | | |
1146 | 1148 | | |
1147 | 1149 | | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
1148 | 1153 | | |
1149 | 1154 | | |
1150 | 1155 | | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
1151 | 1261 | | |
1152 | 1262 | | |
1153 | 1263 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
853 | 853 | | |
854 | 854 | | |
855 | 855 | | |
856 | | - | |
857 | | - | |
| 856 | + | |
| 857 | + | |
858 | 858 | | |
859 | 859 | | |
860 | 860 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
0 commit comments