- CHS-CSB_Processing is a Python 3.11 pipeline that ingests raw bathymetry logs, cleans/filter tags soundings, georeferences with vessel sensors, optionally applies IWLS water-level reduction, then exports data + metadata.
- Core orchestration lives in
src/csb_processing.py(processing_workflow). Treat this as the behavior anchor for feature changes. - User entrypoints are thin wrappers over the same core flow: CLI in
src/cli.pyand NiceGUI insrc/web_ui.py+src/app/processing_handler.py.
src/csb_processing.py::processing_workflow(...): canonical end-to-end behavior.src/cli.py::process_bathymetric_data(...)andsrc/cli.py::convert_gpkg(...): user-facing validation and defaults.src/app/processing_handler.py::ProcessingHandler.process_files(...): UI path that must remain behavior-parity with CLI.src/CONFIG_csb-processing.toml: executable defaults for options, export formats, IWLS settings, CARIS section.flow.mermaid: visual mirror of the implemented branches (with/without water-level loop).
- Parser selection is factory-based in
src/ingestion/factory_parser.py(FACTORY_PARSER): extension normalization + header matching. ParserFilesinsrc/ingestion/parser_models.pyenforces a single parser type per run (mixed formats raiseMultipleParsersError).- Data contracts are Pandera models in
src/schema/model.py; many functions assumeDataLoggerSchemaorDataLoggerWithTideZoneSchemacolumns/dtypes. processing_workflowsequence (src/csb_processing.py): load config -> parse ->cleaner.clean_data-> georeference -> optional IWLS iteration loop -> export data + metadata.- Water-level mode iterates over Voronoi zones and excluded stations until
Depth_processed_meterhas no NaN or max iterations reached. flow.mermaidmirrors the implemented workflow; use it to reason about regressions across branches of the pipeline.
- Output structure is fixed by
get_data_structure(...)insrc/csb_processing.py:<output>/Data,<output>/Tide,<output>/Log. - Completion criterion for IWLS loop is
schema_ids.DEPTH_PROCESSED_METERwithout NaN inprocessing_workflow. - Export naming convention comes from
src/export/export_helpers.py::get_export_file_name(...)(CH-<logger>-<vessel>-<start>-<end>). - Final export schema projection uses
src/export/export_helpers.py::finalize_geodataframe(...)andschema.DataLoggerSchema.__annotations__.keys(). - Tide-zone join contract uses
src/tide/tide_zone_processing.py::add_tide_zone_id_to_geodataframe(...)with renamed IDs (id -> Tide_zone_id, etc.). - Config loading is cached (
@lru_cache) insrc/config/helper.py::load_config(...); avoid side effects dependent on re-reading TOML within same process.
- Setup dependencies with uv (documented in
README.en.md/README.fr.md):uv sync. - Process files:
python src/cli.py process <files...> --output <dir>. - Convert already processed geospatial files:
python src/cli.py convert <files...> --output <dir> --format <fmt>(input restricted to.gpkg/.geojson). - Launch GUI:
python src/web_ui.py(Windows helper script:run_WebUI.bat). - Build docs from
docs/:make html(ormake.bat htmlon Windows). - There is no automated test suite today (
tests/is empty), so validate changes with targeted CLI/UI runs.
- Logging pattern: module-level
LOGGER = logger.bind(name="...")(examples:CSB-Processing.WorkFlow,CSB-Processing.CLI,CSB-Processing.Export.Helpers). - CLI/UI parity pattern: file filtering logic is duplicated in
src/cli.py::is_valid_file/get_filesandsrc/app/processing_handler.py::is_valid_file/get_files. - Vessel source exclusivity pattern:
--vesselvs--waterlineis enforced in CLI and validator/UI; keep both paths aligned. - Relative path resolution pattern:
- vessel config:
src/vessel/vessel_config_json_manager.pyresolves non-absolute paths fromPath(__file__).parent.parent. - IWLS cache:
src/config/iwls_api_config.py::CacheConfig.validate_cache_pathresolves relative paths and creates folder.
- vessel config:
- Duration validation pattern: regex
^\d+\s*(min|h)$in bothsrc/config/processing_config.pyandsrc/config/iwls_api_config.py. - Optional CARIS dependency pattern: runtime imports inside
src/export/export_format.py(from caris_api import ...inside functions only).
- IWLS integration boundary:
src/iwls_api.pyinitializes config/environment/API/station handler; HTTP handling lives insrc/iwls_api_request/*(rate limiting, retry adapter, optional cache session). - Tide computations are split between Voronoi (
src/tide/voronoi/*), tide-zone mapping (src/tide/tide_zone_processing.py), and time-series fetch/interpolation (src/tide/time_serie/*). - Export boundary is
export.export_processed_data_to_file_types(...)insrc/export/export_helpers.py; prefer extending export factory over branching callers. - CARIS is optional globally but required for
csarexport;src/config/caris_config.pyvalidates install paths before run. - Keep runtime CARIS imports in
src/export/export_format.py(from caris_api import ...inside functions) to avoid hard dependency failures when CARIS is absent.
- Add a new raw input parser:
- implement parser class under
src/ingestion/followingDataParserABCpattern, - register header/extension in
FACTORY_PARSER(src/ingestion/factory_parser.py), - map parser -> logger type in
DATA_TYPE_MAPPING(src/ingestion/parser_models.py).
- implement parser class under
- Add a new export format:
- implement
export_geodataframe_to_<fmt>(...)insrc/export/export_format.py, - register in
FACTORY_EXPORT_GEODATAFRAME(src/export/factory_export.py), - expose in
config.processing_config.FileTypesand CLI--formatchoices.
- implement
- Add/change dataframe columns:
- update
src/schema/model.pyfirst, - then update transformations (
cleaner,georeference, tide-zone join), - then update export finalization (
finalize_geodataframe(...)).
- update
- Change workflow behavior:
- verify both entrypoints (
src/cli.pyandsrc/app/processing_handler.py) still produce same processing parameters, - re-check loop termination and excluded station logic in
processing_workflow.
- verify both entrypoints (
- If you change processing semantics, check both CLI (
src/cli.py) and UI handler (src/app/processing_handler.py) because both callprocessing_workflow. - When adding/changing dataframe columns, update Pandera schemas first, then downstream transformations/exports.
- Prefer adding new parser/export types via factories (
FACTORY_PARSER,FACTORY_EXPORT_GEODATAFRAME) instead of command-levelif/elsebranching. - Treat
src/CONFIG_csb-processing.tomlas executable documentation for defaults and expected config shape.
- CLI path smoke test: run one
processcommand and verifyData/,Tide/,Log/are created. - UI parity check: confirm
ProcessingHandleruses same vessel/waterline behavior as CLI defaults. - If IWLS touched: verify station resolution and exclusions through one iteration (
StationVoronoi-<n>.gpkgappears). - If export touched: validate at least one vector format (
.gpkg) and one tabular/raster format if modified. - If
csartouched: ensure missing CARIS still fails gracefully without breaking non-CSAR exports.
- If you touch workflow logic in
processing_workflow-> check parity params insrc/cli.py::process_bathymetric_dataandsrc/app/processing_handler.py::_run_processing_workflow-> validate withpython src/cli.py process <file> --output <dir>. - If you touch file acceptance rules -> update both
src/cli.py::is_valid_file/get_filesandsrc/app/processing_handler.py::is_valid_file/get_files-> validate with one CLI run and one UI run (python src/web_ui.py). - If you add parser support -> add parser class under
src/ingestion/, register insrc/ingestion/factory_parser.py::FACTORY_PARSER, then map insrc/ingestion/parser_models.py::DATA_TYPE_MAPPING-> validate withpython src/cli.py process <new-format-file> --output <dir>. - If you add export format -> implement
export_geodataframe_to_<fmt>(...)insrc/export/export_format.py, register insrc/export/factory_export.py::FACTORY_EXPORT_GEODATAFRAME, expose inconfig.processing_config.FileTypesand CLI--format-> validate withpython src/cli.py convert <input.gpkg> --output <dir> --format <fmt>. - If you change dataframe columns/types -> update
src/schema/model.pyfirst (DataLoggerSchema/DataLoggerWithTideZoneSchema), then tide/georeference transforms, thensrc/export/export_helpers.py::finalize_geodataframe-> validate with oneprocessrun and output read of.gpkg. - If you touch tide-zone joins -> preserve rename contract in
src/tide/tide_zone_processing.py::add_tide_zone_id_to_geodataframe(id/code/name->Tide_zone_*) -> validate with IWLS-enabled run and check non-empty tide-zone columns. - If you touch IWLS behavior -> review
src/iwls_api.py::initialize_iwls_api,src/tide/time_serie/*, and loop termination onschema_ids.DEPTH_PROCESSED_METERinprocessing_workflow-> validate thatTide/StationVoronoi-<n>.gpkgis produced. - If you touch config semantics -> update validators in
src/config/processing_config.py/src/config/iwls_api_config.pyand keep"<number> <min|h>"duration pattern -> validate by running CLI with default and custom--config. - If you touch CARIS/CSAR -> keep runtime imports inside
src/export/export_format.pyand path checks insrc/config/caris_config.py-> validate both--format csar(with CARIS) and non-CSAR formats (without CARIS). - If you touch docs-facing behavior -> update relevant README section and keep
flow.mermaidconsistent withprocessing_workflowbranches -> validate docs build fromdocs/usingmake htmlormake.bat html.