Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 60 additions & 77 deletions docs/cli/t4sanity.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
`t4sanity` performs sanity checks on T4 datasets, reporting any issues in a structured format.
It checks the dataset directories and versions, tries to load them using the `Tier4` library, and reports any exceptions or warnings.
`t4sanity` performs sanity checks on T4 datasets, reporting any issues regarding the [dataset requirements](../schema/requirement.md).

```shell
$ t4sanity -h

Usage: t4sanity [OPTIONS] DB_PARENT

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * db_parent TEXT Path to parent directory of the databases [default: None] [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v Show the application version and exit. │
│ --output -o TEXT Path to output JSON file. [default: None] │
│ --revision -rv TEXT Specify if you want to load the specific version. [default: None] │
│ --include-warning -iw Indicates whether to report any warnings. │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help -h Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * data_root TEXT Path to root directory of a dataset. [default: None] [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v Show the application version and exit. │
│ --output -o TEXT Path to output JSON file. │
│ --revision -rv TEXT Specify if you want to check the specific version. │
│ --exclude -e TEXT Exclude specific rules or rule groups. │
│ --include-warning -iw Indicates whether to report any warnings. │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help -h Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

## Shell Completion
Expand All @@ -33,86 +33,69 @@ t4sanity --install-completion
As an example, we have the following the dataset structure:

```shell
<DATA_ROOT>
├── dataset1
│ └── <VERSION>
│ ├── annotation
│ ├── data
| ...
├── dataset2
│ ├── annotation
│ ├── data
| ...
...
<DATA_ROOT; (DATASET ID)>
├── <VERSION>
│ ├── annotation
│ ├── data
| ...
```

Then, you can run sanity checks with `t4sanity <DATA_ROOT>`:

```shell
>>>Sanity checking...: 1it [00:00, 9.70it/s]
✅ No exceptions occurred!!
```

### Exclude Warnings

To run sanity check ignoring warnings, providing the path to the parent directory of the datasets:

```shell
$ t4sanity <DATA_ROOT>

>>>Sanity checking...: 2it [00:00, 18.69it/s]
⚠️ Encountered some exceptions!!
+-----------+---------+--------+------------------------------------------------------------------------------------------------+
| DatasetID | Version | Status | Message |
+-----------+---------+--------+------------------------------------------------------------------------------------------------+
| dataset1 | 2 | ERROR | bbox must be (xmin, ymin, xmax, ymax) and xmin <= xmax && ymin <= ymax: (1532, 198, 1440, 265) |
| dataset2 | 1 | OK | |
+-----------+---------+--------+------------------------------------------------------------------------------------------------+
```

### Include Warnings

To run sanity check and report any warnings, use the `-iw; --include-warning` option:
>>>Sanity checking...: 1it [00:00, 9.70it/s]

```shell
$ t4sanity <DATA_ROOT> -iw

>>>Sanity checking...: 2it [00:00, 21.54it/s]
⚠️ Encountered some exceptions!!
+-----------+---------+---------+------------------------------------------------------------------------------------------------+
| DatasetID | Version | Status | Message |
+-----------+---------+---------+------------------------------------------------------------------------------------------------+
| dataset1 | 2 | ERROR | bbox must be (xmin, ymin, xmax, ymax) and xmin <= xmax && ymin <= ymax: (1532, 198, 1440, 265) |
| dataset2 | 1 | WARNING | Category token is empty for surface ann: 0c15d9c143fb2723c16ac7e0c735b0a8 |
+-----------+---------+---------+------------------------------------------------------------------------------------------------+
=== DatasetID: dataset1 ===
STR001: ✅
STR002: ✅
STR003: ✅
STR004: ✅
STR005: ✅
STR006: ✅
STR007: ✅
STR008: ✅
...

+-----------+---------+---------+-------+---------+----------+-------+
| DatasetID | Version | Status | Rules | Success | Failures | Skips |
+-----------+---------+---------+-------+---------+----------+-------+
| dataset1 | 0 | SUCCESS | 44 | 44 | 0 | 0 |
+-----------+---------+---------+-------+---------+----------+-------+
```

### Dump Results as JSON

To dump results into JSON, use the `-o; --output` option:

```shell
$ t4sanity <DATA_ROOT> -o results.json

>>>Sanity checking...: 2it [00:00, 21.54it/s]
...
t4sanity <DATA_ROOT> -o result.json
```

Then a JSON file named `results.json` will be generated:
Then a JSON file named `result.json` will be generated as follows:

```json
[
{
"dataset_id": "dataset1",
"version": 2,
"status": "ERROR",
"message": "bbox must be (xmin, ymin, xmax, ymax) and xmin <= xmax && ymin <= ymax: (1532, 198, 1440, 265)"
},
{
"dataset_id": "dataset2",
"version": 1,
"status": "WARNING",
"message": "Category token is empty for surface ann: 0c15d9c143fb2723c16ac7e0c735b0a8"
}
]
{
"dataset_id": "<DatasetID: str>",
"version": <Version: int>,
"reports": [
{
"id": "<RuleID: str>",
"name": "<RuleName: str>",
"description": "<Description: str>",
"status": "<SUCCESS/FAILURE/SKIPPED: str>",
"reasons": "<[<Reason1>, <Reason2>, ...]: [str; N] | null>" // Failure or skipped reasons, null if success
},
]
}
```

### Exclude Checks

With `-e; --excludes` option enables us to exclude specific checks by specifying the **rule IDs or groups**:

```shell
# Exclude STR001 and all FMT-relevant rules
t4sanity <DATA_ROOT> -e STR001 -e FMT
```
75 changes: 75 additions & 0 deletions docs/schema/requirement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Dataset Requirements

## Structure (`STR`)

| ID | Name | Severity | Description |
| -------- | ----------------------------- | -------- | -------------------------------------------------------------------- |
| `STR001` | `version-dir-presence` | `Warn` | `version/` directory exists under the dataset root directory. |
| `STR002` | `annotation-dir-presence` | `Error` | `annotation/` directory exists under the dataset root directory. |
| `STR003` | `data-dir-presence` | `Error` | `data/` directory exists under the dataset root directory. |
| `STR004` | `map-dir-presence` | `Error` | `map/` directory exists under the dataset root directory. |
Comment thread
ktro2828 marked this conversation as resolved.
| `STR005` | `bag-dir-presence` | `Error` | `input_bag/` directory exists under the dataset root directory. |
| `STR006` | `status-file-presence` | `Error` | `status.json` file exists under the dataset root directory. |
| `STR007` | `schema-files-presence` | `Error` | Mandatory schema JSON files exist under the `annotation/` directory. |
| `STR008` | `lanelet-file-presence` | `Warn` | `lanelet2_map.osm` file exists under the `map/` directory. |
| `STR009` | `pointcloud-map-dir-presence` | `Warn` | `pointcloud_map.pcd` directory exists under the `map/` directory. |

## Schema Record (`REC`)

| ID | Name | Severity | Description |
| -------- | ----------------------------- | -------- | --------------------------------------- |
| `REC001` | `scene-single` | `Error` | `Scene` record is a single. |
| `REC002` | `sample-not-empty` | `Error` | `Sample` record is not empty. |
| `REC003` | `sample-data-not-empty` | `Error` | `SampleData` record is not empty. |
| `REC004` | `ego-pose-not-empty` | `Error` | `EgoPose` record is not empty. |
| `REC005` | `calibrated-sensor-non-empty` | `Error` | `CalibratedSensor` record is not empty. |
| `REC006` | `instance-not-empty` | `Error` | `Instance` record is not empty. |

## Reference (`REF`)

| ID | Name | Severity | Description |
| -------- | ------------------------------------- | -------- | ------------------------------------------------------------------------- |
| `REF001` | `scene-to-log` | `Error` | `Scene.log_token` refers to `Log` record. |
| `REF002` | `scene-to-first-sample` | `Error` | `Scene.first_sample_token` refers to `Sample` record. |
| `REF003` | `scene-to-last-sample` | `Error` | `Scene.last_sample_token` refers to `Sample` record. |
| `REF004` | `sample-to-scene` | `Error` | `Sample.scene_token` refers to `Scene` record. |
| `REF005` | `sample-data-to-sample` | `Error` | `SampleData.sample_token` refers to `Sample` record. |
| `REF006` | `sample-data-to-ego-pose` | `Error` | `SampleData.ego_pose_token` refers to `EgoPose` record. |
| `REF007` | `sample-data-to-calibrated-sensor` | `Error` | `SampleData.calibrated_sensor_token` refers to `CalibratedSensor` record. |
| `REF008` | `calibrated-sensor-to-sensor` | `Error` | `CalibratedSensor.sensor_token` refers to `Sensor` record. |
| `REF009` | `instance-to-category` | `Error` | `Instance.category_token` refers to `Category` record. |
| `REF010` | `instance-to-first-sample-annotation` | `Error` | `Instance.first_annotation_token` refers to `SampleAnnotation` record. |
| `REF011` | `instance-to-last-sample-annotation` | `Error` | `Instance.last_annotation_token` refers to `SampleAnnotation` record. |
Comment thread
ktro2828 marked this conversation as resolved.
| `REF012` | `lidarseg-to-sample-data` | `Error` | `LidarSeg.sample_data_token` refers to `SampleData` record. |
| `REF013` | `sample-data-filename-presence` | `Error` | `SampleData.filename` exists. |
| `REF014` | `sample-data-info-filename-presence` | `Error` | `SampleData.info_filename` exists if it is not `None`. |
| `REF015` | `lidarseg-filename-presence` | `Error` | `LidarSeg.filename` exists if `lidarseg.json` exists. |

## Format (`FMT`)

| ID | Name | Severity | Description |
| -------- | ------------------------- | -------- | ------------------------------------------------- |
| `FMT001` | `attribute-field` | `Error` | All types of `Attribute` fields are valid. |
| `FMT002` | `calibrated-sensor-field` | `Error` | All types of `CalibratedSensor` fields are valid. |
| `FMT003` | `category-field` | `Error` | All types of `Category` fields are valid. |
| `FMT004` | `ego-pose-field` | `Error` | All types of `EgoPose` fields are valid. |
| `FMT005` | `instance-field` | `Error` | All types of `Instance` fields are valid. |
| `FMT006` | `log-field` | `Error` | All types of `Log` fields are valid. |
| `FMT007` | `map-field` | `Error` | All types of `Map` fields are valid. |
| `FMT008` | `sample-field` | `Error` | All types of `Sample` fields are valid. |
| `FMT009` | `sample-annotation-field` | `Error` | All types of `SampleAnnotation` fields are valid. |
| `FMT010` | `sample-data-field` | `Error` | All types of `SampleData` fields are valid. |
| `FMT011` | `scene-field` | `Error` | All types of `Scene` fields are valid. |
| `FMT012` | `sensor-field` | `Error` | All types of `Sensor` fields are valid. |
| `FMT013` | `visibility-field` | `Error` | All types of `Visibility` fields are valid. |
| `FMT014` | `lidarseg-field` | `Error` | All types of `Lidarseg` fields are valid. |
| `FMT015` | `object-ann-field` | `Error` | All types of `ObjectAnn` fields are valid. |
| `FMT016` | `surface-ann-field` | `Error` | All types of `SurfaceAnn` fields are valid. |
| `FMT017` | `keypoint-field` | `Error` | All types of `Keypoint` fields are valid. |
| `FMT018` | `vehicle-state-field` | `Error` | All types of `VehicleState` fields are valid. |

## Tier4 Instance (`TIV`)

| ID | Name | Severity | Description |
| -------- | ------------ | -------- | ----------------------------------------------- |
| `TIV001` | `load-tier4` | `Error` | Ensure `Tier4` instance is loaded successfully. |
1 change: 1 addition & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ nav:
- Home: schema/index.md
- Schema Tables: schema/table.md
- Sensor Data: schema/data.md
- Requirements: schema/requirement.md
- Tutorials:
- Initialization: tutorials/initialize.md
- Visualization: tutorials/render.md
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ dependencies = [
"typer>=0.15.3",
"tabulate>=0.9.0",
"tqdm>=4.67.1",
"returns>=0.26.0",
]

[dependency-groups]
Expand Down
42 changes: 14 additions & 28 deletions t4_devkit/cli/sanity.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
from __future__ import annotations

from pathlib import Path

import typer
from tabulate import tabulate
from tqdm import tqdm

from t4_devkit.common.io import save_json
from t4_devkit.common.sanity import DBException, sanity_check
from t4_devkit.common.serialize import serialize_dataclasses
from t4_devkit.common.serialize import serialize_dataclass
from t4_devkit.sanity import print_sanity_result, sanity_check

from .version import version_callback

Expand All @@ -20,18 +16,6 @@
)


def _run_sanity_check(
db_parent: str,
*,
revision: str | None = None,
include_warning: bool = False,
) -> list[DBException]:
return [
sanity_check(db_root, revision=revision, include_warning=include_warning)
for db_root in tqdm(Path(db_parent).glob("*"), desc=">>>Sanity checking...")
]


@cli.command()
def main(
version: bool = typer.Option(
Expand All @@ -42,25 +26,27 @@ def main(
callback=version_callback,
is_eager=True,
),
db_parent: str = typer.Argument(..., help="Path to parent directory of the databases."),
data_root: str = typer.Argument(..., help="Path to root directory of a dataset."),
output: str | None = typer.Option(None, "-o", "--output", help="Path to output JSON file."),
revision: str | None = typer.Option(
None, "-rv", "--revision", help="Specify if you want to check the specific version."
),
excludes: list[str] | None = typer.Option(
None, "-e", "--exclude", help="Exclude specific rules or rule groups."
),
include_warning: bool = typer.Option(
False, "-iw", "--include-warning", help="Indicates whether to report any warnings."
),
) -> None:
exceptions = _run_sanity_check(db_parent, revision=revision, include_warning=include_warning)
result = sanity_check(
data_root=data_root,
revision=revision,
excludes=excludes,
include_warning=include_warning,
)

if all(e.is_ok() for e in exceptions):
print("✅ No exceptions occurred!!")
else:
print("⚠️ Encountered some exceptions!!")
headers = ["DatasetID", "Version", "Status", "Message"]
table = [[e.dataset_id, e.version, e.status, e.message] for e in exceptions]
print(tabulate(table, headers=headers, tablefmt="pretty"))
print_sanity_result(result)

if output:
serialized = serialize_dataclasses(exceptions)
serialized = serialize_dataclass(result)
save_json(serialized, output)
Loading
Loading