Skip to content

Latest commit

 

History

History
741 lines (571 loc) · 29.5 KB

File metadata and controls

741 lines (571 loc) · 29.5 KB

Table of Contents


CHS-CSB-Processing

CHS-CSB-Processing is currently under development. Several features are still incomplete or under development. You can visit the documentation for more information.


Description

This module is designed to automate the processing of bathymetric data files. It allows you to:

  • Identify and load appropriate files (CSV, TXT, XYZ, GeoJSON).
  • Perform georeferencing based on specific configurations.
  • Manage vessel identifiers and associated configurations.
  • Convert processed files to different output formats.

The supported file formats are as follows:

  • OFM: .xyz extension with at least the columns LON, LAT, DEPTH, TIME in the header.
  • DCDB: .csv extension with at least the columns LON, LAT, DEPTH, TIME in the header.
  • Lowrance: .csv extension with at least the columns Longitude[°WGS84], Latitude[°WGS84], WaterDepth[Feet], DateTime[UTC] in the header. These files are the result of SL3 files from Lowrance exported by the tool SL3Reader.
  • Actisense: coming soon.
  • BlackBox: .TXT extension without header with columns in the order Time, Date, Latitude, Longitude, Speed (km/h) and Depth (m).
  • WIBL: numeric extension (e.g., .1, .2, .3, etc.).
  • HydroBlock: .TXT extension, semicolon-delimited, with the header timestamp;latitude;longitude;chartdatumheight. Depth is already reduced to chart datum (positive axis downward).

Installation

Prerequisites

This project uses UV as a Python package and environment manager. UV is a modern, fast, and reliable tool that simplifies dependency management.

To install UV, follow the instructions on the official page.

Environment Setup

Once UV is installed, create the virtual environment with all necessary dependencies:

# Navigate to the project directory
cd .\CHS-CSB_Processing

# Create the virtual environment and install dependencies
uv sync

This command:

  • Automatically creates a Python virtual environment
  • Installs all dependencies specified in pyproject.toml
  • Ensures all package versions are compatible

To activate the virtual environment:

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

You are now ready to use the CHS-CSB-Processing module!


Graphical User Interface (GUI)

A graphical user interface (GUI) is available to facilitate the processing of bathymetric files. You can launch the GUI by running the following command:

python web_ui.py

Command-Line Interface Tutorial for Bathymetric File Processing

This tutorial provides a detailed explanation of how to use the command-line module to process and georeference bathymetric data files. It covers every parameter and provides practical examples.

The CLI now offers two main commands:

  • process: for processing and georeferencing bathymetric data
  • convert: for converting GPKG/GeoJSON files to different formats

Available Commands

1. process Command

Processes bathymetric data files and georeferences them.

Syntax

python cli.py process [FILES...] [OPTIONS]

Arguments

  • FILES: One or more paths to files or directories to process. Supported formats:
    • .csv, .txt, .xyz, .geojson
    • Numeric extensions (.1, .2, .3, etc.)

Available Options

Option Type Required Description
--output Path Yes Output directory path
--vessel Text No Vessel identifier. If not specified, a default vessel with lever arms at 0 will be used. Incompatible with --waterline
--waterline Decimal No Vessel waterline in meters. If not specified, a value of 0 will be used. Incompatible with --vessel
--vessel-name Text No Vessel name used for export. Overrides the name from vessel configuration if provided. Available independently of --vessel and --waterline
--config Path No Configuration file path. If not specified, the default configuration file will be used
--apply-water-level Flag No Apply water level reduction when georeferencing soundings (default: enabled). Use --no-apply-water-level to disable
--water-level-station Text No Water level station code(s) to use for processing. Can be specified multiple times. (Stations list)
--excluded-station Text No Water level station code(s) to exclude from processing. Can be specified multiple times. (Stations list)
--already-at-chart-datum Flag No Indicates that input data is already reduced to chart datum. Water level reduction is automatically disabled. Modifies TVU calculation (per-datalogger constant from datalogger_uncertainty.json). Incompatible with --apply-water-level
--merge-files Flag No Merge all input files into a single output file (default: enabled). Use --no-merge-files to process each file separately; the output filename will then match the input filename

Usage Examples

Basic processing with default vessel:

python cli.py process data.csv --output ./results

Processing with specific vessel:

python cli.py process data.csv --vessel "CCGS_CARTIER" --output ./results

Processing with custom waterline:

python cli.py process data.csv --waterline 2.5 --output ./results

Processing with custom configuration:

python cli.py process data.csv --config ./custom_config.toml --output ./results

Processing without water level reduction:

python cli.py process data.csv --no-apply-water-level --output ./results

Processing with specific water level station:

python cli.py process data.csv --water-level-station "04435" --water-level-station "04436" --output ./results

Processing excluding certain water level stations:

python cli.py process data.csv --excluded-station "04435" --excluded-station "04436" --output ./results

Processing multiple files:

python cli.py process file1.csv file2.xyz directory/ --output ./results

Processing with a custom vessel name for export:

python cli.py process data.csv --vessel "CCGS_CARTIER" --vessel-name "Cartier Survey 2025" --output ./results

Processing data already reduced to chart datum:

python cli.py process data.csv --already-at-chart-datum --output ./results

Processing multiple files without merging (one output per file):

python cli.py process file1.csv file2.xyz --no-merge-files --output ./results

2. convert Command

Converts GPKG/GeoJSON files to different output formats.

Syntax

python cli.py convert [INPUT_FILES...] [OPTIONS]

Arguments

  • INPUT_FILES: One or more GPKG (.gpkg) or GeoJSON (.geojson) files to convert

Available Options

Option Type Required Description
--output Path Yes Output directory path
--format Multiple choice Yes Desired output format(s). Can be specified multiple times to export to multiple formats
--config Path No Configuration file path. If not specified, the default configuration file will be used
--group-by-iho-order Boolean No Group data by IHO order during export (default: false)

Supported Output Formats

The following formats are available via the --format option:

  • geojson - GeoJSON format
  • gpkg - GeoPackage format
  • csar - CSAR format
  • parquet - Apache Parquet format
  • feather - Apache Feather format
  • csv - CSV format
  • geotiff - GeoTIFF format

Usage Examples

Convert to single format:

python cli.py convert input.gpkg --output ./output --format geojson

Convert to multiple formats:

python cli.py convert input.gpkg --output ./output --format geojson --format csv --format parquet

Convert with IHO order grouping:

python cli.py convert input.gpkg --output ./output --format gpkg --group-by-iho-order true

Convert multiple files:

python cli.py convert file1.gpkg file2.geojson --output ./output --format geotiff

Convert with custom configuration:

python cli.py convert input.gpkg --output ./output --format csv --config ./custom_config.toml

Error Handling

The module includes robust error handling to avoid unexpected interruptions. Below are the main cases covered:

Invalid Files

  • Issue: If a provided file is invalid (incorrect format or non-existent).
  • Solution: The script logs an error and skips invalid files.
    [ERROR] No valid files to process.

Missing Parameters

  • Issue: If a required parameter such as --output is missing or there are no files to process.
  • Solution: The script displays an error message explaining the missing parameter.
    [ERROR] The --output parameter is required.

Restrictions and Validations

  1. Mutually exclusive options: The --vessel and --waterline options cannot be used simultaneously in the process command.

  2. Waterline validation: The --waterline value must be positive.

  3. Supported files for process:

    • Extensions: .csv, .txt, .xyz, .geojson
    • Numeric extensions: .1, .2, .3, etc.
  4. Supported files for convert:

    • Only: .gpkg and .geojson

Default Behavior

  • If neither vessel nor waterline is specified, a default vessel with lever arms at 0 will be used
  • If no configuration file is provided, the default configuration will be used
  • Water level reduction is applied by default during georeferencing (--apply-water-level)
  • Input files are merged into a single output file by default (--merge-files)
  • Processing can be performed on individual files or entire directories (recursive processing)

The CLI displays explicit error messages in French and English to facilitate debugging and ensure correct system usage.


Processing Flow Diagram

flowchart TD
%% Global styles for the diagram
    classDef start fill: #66FF66, stroke: #009900, stroke-width: 2px, color: #000000, font-weight: bold
    classDef process fill: #B9E0FF, stroke: #0078D7, stroke-width: 1px, color: #000000
    classDef decision fill: #FFD700, stroke: #B8860B, stroke-width: 1px, color: #000000
    classDef endNode fill: #FFA500, stroke: #FF4500, stroke-width: 2px, color: #000000, font-weight: bold
    classDef highlighted fill: #CCEBFF, stroke: #0078D7, stroke-width: 1px, color: #000000, font-weight: bold
    classDef iteration fill: #FF9ED2, stroke: #E63F8B, stroke-width: 1px, color: #000000
    classDef export fill: #CCFFCC, stroke: #009900, stroke-width: 1px, color: #000000
%% Diagram nodes
    Start[Begin CSB workflow] --> Config[Loading configurations]
    Config --> CarisConfig{CSAR format required ?}
    CarisConfig -->|Yes| LoadCarisAPI[Loading Caris API configuration]
    CarisConfig -->|No| VesselConfig["<b>Loading vessel manager</b>
    • Retrieving vessel configuration"]
LoadCarisAPI --> VesselConfig

VesselConfig --> ParseFiles["<b>Parsing raw files:</b>
• Format identification
• Data reading
• Conversion to GeoDataFrame
• Pre-filtering of data"]
ParseFiles --> CleanData["<b>Data cleaning and filtering:</b>
• Duplicate removal
• Min/max depth filtering
• Invalid coordinate filtering
• Invalid timestamp filtering
• Min/max speed filtering"]
CleanData --> CheckData{Valid data ?}
CheckData -->|No|EndNoData[End: no valid data]
CheckData -->|Yes|Outliers[Outlier detection]
Outliers --> GetSensors[Retrieving sensor configurations based on vessel]

GetSensors --> ApplyWL{Apply water level ?}
ApplyWL -->|No|GeoreferenceNoWL["<b>Georeferencing without water level:</b>
• TVU calculation
• THU calculation
• Survey order calculation"]
GeoreferenceNoWL --> ExportNoWL[Exporting data & metadata]
ExportNoWL --> EndNoWL[End: georeferenced data without water level]

ApplyWL -->|Yes|LoadIWLS[Loading IWLS API configuration]
LoadIWLS --> RunIter["run = 1, excluded_stations = (empty)"]

RunIter --> IterCheck{run <= max_iterations ?}
IterCheck -->|No|EndMaxRun[Incomplete processing: max iterations reached]
EndMaxRun --> PlotWL

IterCheck -->|Yes|GetVoronoi["Retrieving tidal zones (Voronoi)
without excluded stations"]
GetVoronoi --> GetTideInfo[Retrieving tidal zone information
intersecting with data]

GetTideInfo --> GetWaterLevel[Retrieving water level data
for each station]
GetWaterLevel --> ExportWL[Exporting water level data]
ExportWL --> Georeference["<b>Georeferencing bathymetric data:</b>
• Applying water levels
• TVU calculation
• THU calculation
• Survey order calculation"]

Georeference --> DataComplete{Processing completed ?
DEPTH_PROCESSED_METER without NaN ?}
DataComplete -->|Yes|PlotWL[Creating water level charts]

DataComplete -->|No|AddExcluded[Adding problematic stations
to exclude]
AddExcluded --> Increment[run += 1]
Increment --> IterCheck

PlotWL --> ExportData[Exporting processed data]
ExportData --> ExportMetadata[Exporting metadata]
ExportMetadata --> End[End: georeferenced data with water level]

%% Applying styles to nodes
class Start start
class Config, LoadCarisAPI, VesselConfig, GetSensors, LoadIWLS process
class CarisConfig, ApplyWL, IterCheck, DataComplete,CheckData decision
class EndNoData, EndNoWL, EndMaxRun, End endNode
class ParseFiles, CleanData, GeoreferenceNoWL, Georeference highlighted
class RunIter, GetVoronoi, GetTideInfo, GetWaterLevel, AddExcluded, Increment iteration
class ExportNoWL, ExportWL, PlotWL,ExportData, ExportMetadata export
class Outliers process
Loading

Configuration File (TOML)

The TOML configuration file defines parameters for processing. Below is an example of the default configuration file ( ./src/CONFIG_csb-processing.toml):

[IWLS.API.TimeSeries]
priority = ["wlo", "wlp"]  # Priority of time series to retrieve.
max_time_gap = "1 min"  # Maximum interval without data before interpolation.
threshold_interpolation_filling = "4 h"  # Threshold for interpolation and filling missing data.
wlo_qc_flag_filter = ["NOT_EVAL", "QUESTIONABLE", "BAD", "MISSING", "2", "3"]  # Quality filters for wlo.
buffer_time = "24 h"  # Buffer time to retrieve data needed for interpolation.

[IWLS.API.Profile]
active = "public"  # Active profile: {"dev", "prod", "public"}.

[IWLS.API.Environment.dev]
name = "DEV"
endpoint = "EndpointPrivateDev"
calls = 15  # Maximum number of calls per period.
period = 1  # Period (in seconds).

[IWLS.API.Environment.prod]
name = "PROD"
endpoint = "EndpointPrivateProd"
calls = 10
period = 1

[IWLS.API.Environment.public]
name = "PUBLIC"
endpoint = "EndpointPublic"
calls = 15
period = 1

[IWLS.API.Cache]
ttl = 86400  # Cache data lifetime (in seconds).
cache_path = "./cache"  # Directory for cache storage.

[DATA.Transformation.filter]
min_latitude = -90
max_latitude = 90
min_longitude = -180
max_longitude = 180
min_speed = 0  # Minimum speed in knots (optional).
max_speed = 30  # Maximum speed in knots (optional).
min_depth = 0
max_depth = 1000  # Maximum depth value (optional).
filter_to_apply = [
    "LATITUDE_FILTER",
    "LONGITUDE_FILTER",
    "TIME_FILTER",
    "SPEED_FILTER",
    "DEPTH_FILTER"
]

[DATA.Georeference.water_level]
water_level_tolerance = "15 min"  # Tolerance for georeferencing water levels.

[DATA.Georeference.uncertainty.tvu]
constant_tvu_wlo = 0.04  # TVU constant for WLO water levels.
default_constant_tvu_wlp = 0.35  # TVU constant for WLP water levels.
depth_coefficient_tvu = 0.5  # Depth coefficient for TVU calculation.
default_depth_ssp_error_coefficient = 4.1584  # Default SSP error coefficient.
max_distance_ssp = 30000  # Maximum distance to link an SSP value (in meters).

[DATA.Georeference.uncertainty.thu]
cone_angle_sonar = 20  # Sonar cone angle for THU calculation (in degrees).
constant_thu = 3  # THU constant.

[CSB.Processing.vessel]
manager_type = "VesselConfigJsonManager"
json_config_path = "./TCSB_VESSELSLIST.json"  # Path to vessel configuration file.

[CSB.Processing.export]
export_format = ["gpkg", "csv"]  # Formats of files for exporting processed data.
resolution = 0.00005  # Grid resolution for exporting data in raster format (in degrees).
group_by_iho_order = false  # Group data by IHO order.

[CSB.Processing.options]
log_level = "INFO"  # Log level: {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}.
max_iterations = 5  # Maximum number of iterations {int}.
decimal_precision = 1  # Decimal precision for processed data.

[CARIS.Environment]
base_path = "C:/Program Files/CARIS"  # Path to the CARIS installation directory.
software = "BASE Editor"  # CARIS software to use.
version = "6.1"  # CARIS software version.
python_version = "3.11"  # Python version to use.
args = []  # Additional arguments for exporting data in CSAR format.

Main Sections

  • [IWLS.API.TimeSeries] (Optional): Parameters for time series. If no parameter is defined, default values will be used and no interpolation will be performed.

    • priority: List of time series to use by priority (e.g., ["wlo", "wlp"]).
    • max_time_gap: Maximum time without data before interpolation (format: "<number> <unit>", e.g., "1 min").
    • threshold_interpolation_filling: Threshold for interpolation and filling missing data (e.g., "4 h").
    • wlo_qc_flag_filter: Quality filters for WLO data.
    • buffer_time: Buffer time for interpolations (format: "<number> <unit>", e.g., "24 h").
  • [IWLS.API.Profile] (Optional): Defines the active profile ("dev", "prod", "public"). A public profile is used by default with 15 calls per second.

  • [IWLS.API.Environment.<profile>] (Optional): Environment-specific parameters

    • name: Name of the environment (e.g., "PUBLIC").
    • endpoint: API endpoint (e.g., "EndpointPublic"). Note that only public endpoints are accessible to everyone.
    • calls: Maximum number of calls per period.
    • period: Time period for calls.
  • [IWLS.API.Cache] (Optional): Defines cache management.

    • ttl: Cache data lifetime (in seconds).
    • cache_path: Directory for cache storage.
  • [DATA.Transformation.filter] (Optional): Defines geographic, depth and speed limits for tagging inconsistent data.

    • min_speed: Minimum speed in knots (optional).
    • max_speed: Maximum speed in knots (optional).
    • max_depth: Maximum depth value in meters (optional).
    • filter_to_apply: List of filters to apply. Data is directly rejected if the filter is applied, otherwise data is simply tagged. Available filters are:
      • DEPTH_FILTER: Depth filter (limit defined by min_depth and max_depth).
      • LATITUDE_FILTER: Latitude filter (limit defined by min_latitude and max_latitude).
      • LONGITUDE_FILTER: Longitude filter (limit defined by min_longitude and max_longitude).
      • TIME_FILTER: Time filter (checks valid timestamps).
      • SPEED_FILTER: Speed filter (limit defined by min_speed and max_speed).
  • [DATA.Georeference.water_level] (Optional): Defines tolerance for georeferencing based on water levels (format: "<number> <unit>", e.g., "15 min").

  • [DATA.Georeference.uncertainty.tvu] (Optional): Configuration for TVU (Total Vertical Uncertainty) calculation.

    • constant_tvu_wlo: TVU constant for WLO water levels (default: 0.04).
    • default_constant_tvu_wlp: TVU constant for WLP water levels (default: 0.35).
    • depth_coefficient_tvu: Depth coefficient for TVU calculation (default: 0.5).
    • default_depth_ssp_error_coefficient: Default SSP error coefficient (default: 4.1584).
    • max_distance_ssp: Maximum distance to link an SSP value in meters (default: 30000).

    TVU Calculation Formula:

    TVU = c + (a × d)
    where:
    - c = chart datum reduction component, determined by mode:
        • apply_water_level=True  → station_uncertainty.json (WLO: constant_tvu_wlo, WLP: default_constant_tvu_wlp or value in station_uncertainty.json)
        • apply_water_level=False → 0.0
        • already_at_chart_datum=True → datalogger_uncertainty.json[DataLoggerType]["constant_tvu"] if available, otherwise 0.0
    - a = depth coefficient = depth_coefficient_tvu + SSP coefficient (default_depth_ssp_error_coefficient or value in canadian_water_ssp_errors.gpkg based on position)
    - d = raw depth in meters
    
  • [DATA.Georeference.uncertainty.thu] (Optional): Configuration for THU (Total Horizontal Uncertainty) calculation.

    • cone_angle_sonar: Sonar cone angle for THU calculation in degrees (default: 20).
    • constant_thu: THU constant (default: 3).

    THU Calculation Formula:

    THU = c + (d × tan(θ/2))
    where:
    - c = THU constant = datalogger_uncertainty.json[DataLoggerType]["constant_thu"] if available for the identified sensor type, otherwise constant_thu (TOML value)
    - d = raw depth in meters
    - θ = sonar beam angular opening (cone_angle_sonar, in degrees)
    
  • [CSB.Processing.vessel] (Optional): Configures the vessel manager and vessel file. Required only if you use vessels for georeferencing.

    • manager_type: Type of vessel manager (e.g., "VesselConfigJsonManager").
    • json_config_path (Used with "VesselConfigJsonManager"): Path to the vessel configuration file.
  • [CSB.Processing.export] (Optional): Export parameters for processed data.

    • export_format: List of file formats for exporting processed data: {"geojson", "gpkg", "csar", "parquet", "feather", "csv", "geotiff"} (e.g., ["gpkg", "csv"]).
    • resolution: Grid resolution for exporting data in raster format (in degrees).
    • group_by_iho_order: Group data by IHO order: {true, false}.
  • [CSB.Processing.options] (Optional): Processing options.

    • log_level: Log level: {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}.
    • max_iterations: Maximum number of iterations (default: 5).
    • decimal_precision: Number of significant decimal places for processed data.
  • [CARIS.Environment] (Optional): CARIS environment-specific parameters. Used to export data in CSAR format.

    • base_path: Path to CARIS software installation (default: "C:/Program Files/CARIS").
    • software: CARIS software used (e.g., "BASE Editor", "HIPS and SIPS").
    • version: Specific version of CARIS software (e.g., "6.1").
    • python_version: Python version used by the CARIS API (e.g., "3.11").
    • args: Additional arguments for exporting data in CSAR format.

Static Uncertainty Files (src/static/uncertainty/)

These JSON and GeoPackage files define uncertainty parameters used during TVU and THU calculation. They are loaded automatically during processing.

datalogger_uncertainty.json — Per-sensor-type constants

This file allows overriding the constant_thu and constant_tvu TOML values for specific sensor types. It is used in two contexts:

Constant Usage context
constant_thu Always — JSON value takes priority over TOML for the identified sensor type
constant_tvu Only when --already-at-chart-datum — data bypasses tidal correction

Format:

{
  "HydroBlock": {
    "constant_thu": 0.1,
    "constant_tvu": 0.2
  }
}

The key matches the DataLoggerType value identified during raw file parsing. If the sensor type is absent from the file, TOML values (constant_thu, constant_tvu) are used.

station_uncertainty.json — TVU uncertainty per tide station

Provides the TVU chart datum reduction component (in meters) for each WLP station by code. Used only in apply_water_level=True mode.

canadian_water_ssp_errors.gpkg — Sound speed errors

Provides the SSP error coefficient (%) by geographic zone, spatially joined to soundings to refine the depth component of TVU. Search radius: max_distance_ssp (default 30 km). If no value is found within this radius, default_depth_ssp_error_coefficient is used.


Vessel File (Vessels)

The vessel configuration file is a JSON file containing the necessary information for each vessel, such as the identifier, axis conventions, and associated data. The path to the JSON file is defined in the TOML configuration file. Here is an example file:

[
  {
    "id": "42134324",
    "name": "Tuktoyaktuk",
    "axis_convention": "CARIS",
    "navigation": [
      {
        "time_stamp": "2021-09-25T00:00:00Z",
        "x": 0.0,
        "y": 0.0,
        "z": 0.0
      },
      {
        "time_stamp": "2024-11-11T00:00:00Z",
        "x": 0.4,
        "y": 0.0,
        "z": 0.0
      }
    ],
    "motion": [
      {
        "time_stamp": "2021-09-25T00:00:00Z",
        "x": 0.0,
        "y": 0.0,
        "z": 0.0
      }
    ],
    "sounder": [
      {
        "time_stamp": "2021-09-25T00:00:00Z",
        "x": 0.0,
        "y": 0.0,
        "z": 0.0
      }
    ],
    "waterline": [
      {
        "time_stamp": "2021-09-25T00:00:00Z",
        "z": 0.0
      },
      {
        "time_stamp": "2024-11-11T00:00:00Z",
        "z": -0.65
      }
    ],
    "sound_speed": [
      {
        "time_stamp": "2021-09-25T00:00:00Z",
        "ssp": false,
        "sound_speed": 1500.0
      }
    ]
  }
]

Field Descriptions

  • id: Unique vessel identifier.
  • name: Vessel name.
  • axis_convention: Axis convention used (e.g., "CARIS").
  • navigation: List of navigation positions with their coordinates (x, y, z) and a timestamp (time_stamp).
  • motion: Vessel motion data (same fields as navigation).
  • sounder: Vessel sounder data (same fields as navigation).
  • waterline: Waterline information, including elevation (z) and a timestamp.
  • ssp_applied: Indicates whether the sound speed profile model has been applied (ssp).

For all time_stamp attributes, the format must be ISO 8601 (e.g., "2021-09-25T00:00:00Z"). Additionally, the time_stamp indicates the date from which the configuration is valid.