Skip to content

feat: add detailed sanity checking#212

Merged
ktro2828 merged 21 commits into
mainfrom
feat/sanity-details
Nov 11, 2025
Merged

feat: add detailed sanity checking#212
ktro2828 merged 21 commits into
mainfrom
feat/sanity-details

Conversation

@ktro2828

@ktro2828 ktro2828 commented Oct 23, 2025

Copy link
Copy Markdown
Collaborator

What

This pull request introduces a new, extensible framework for dataset sanity checking, including a registry-based checker system, a context object for passing dataset metadata, and a set of modular, schema-driven field validation checkers. It also updates the CLI to use the new system and improves output formatting. The changes are organized into the following themes:

1. Sanity Checker Framework and Registry

  • Introduced a new base Checker class in t4_devkit/sanity/checker.py for implementing individual rule checkers, with support for skip logic and standardized result reporting.
  • Added a registry mechanism for checkers, allowing easy registration and discovery of all rule checkers.

2. Context and Result Handling

  • Added a SanityContext class (t4_devkit/sanity/context.py) to encapsulate dataset metadata and provide convenient access to dataset paths and schema files.
  • Standardized result and reporting structures for checkers, supporting success, failure, and skipped states.

3. Modular Field Validation Checkers

  • Implemented a suite of field validation checkers (FMT001–FMT006) for different schema types (e.g., Attribute, CalibratedSensor, Category, EgoPose, Instance, Log), each as a separate module under t4_devkit/sanity/format/, and registered them in the new system. [1] [2] [3] [4] [5] [6] [7]

4. CLI Refactor and Output Improvements

  • Updated the CLI (t4_devkit/cli/sanity.py) to use the new checker/result system, including improved summary and detailed reporting with tabular output, and support for serializing results. [1] [2] [3]
  • Added a new dependency, returns, to support functional error handling and optional types.

5. Documentation

  • Added a comprehensive requirements document (docs/schema/requirement.md) listing all dataset structure, schema, reference, and format rules, serving as the basis for the implemented checkers.

How to Use?

  • CLI (t4sanity command):

For CLI, input datasets root must be the directory path of a dataset.

t4sanity </path/to/dataset> -o result.json

Sample of console output:

=== DatasetID: d38afbf2-aa2b-4040-81df-dfc7b0c8c327 ===
  STR001: ✅
  STR002: ✅
  STR003: ✅
  STR004: ✅
  STR005: ✅
  STR006: ✅
  STR007: ✅
  STR008: ✅
  STR009: ✅
  SCH001: ✅
  SCH002: ✅
  SCH003: ✅
  SCH004: ✅
  SCH005: ✅
  SCH006: ✅
  REF001: ✅
  REF002: ✅
  REF003: ✅
  REF004: ✅
  REF005: ✅
  REF006: ✅
  REF007: ✅
  REF008: ✅
  REF009: ✅
  REF010: ✅
  REF011: ✅
  FMT001: ✅
  FMT002: ✅
  FMT003: ✅
  FMT004: ✅
  FMT005: ✅
  FMT006: ✅
  FMT007: ✅
  FMT008: ✅
  FMT009: ✅
  FMT010: ✅
  FMT011: ✅
  FMT012: ✅
  FMT013: ✅
  FMT014: ✅
  FMT015: ✅
  FMT016: ✅
  FMT017: ✅
  FMT018: ✅

...

======================================== Summary ========================================
+--------------------------------------+---------+---------+-------+---------+----------+-------+
|              DatasetID               | Version | Status  | Rules | Success | Failures | Skips |
+--------------------------------------+---------+---------+-------+---------+----------+-------+
| d38afbf2-aa2b-4040-81df-dfc7b0c8c327 |    0    | SUCCESS |  44   |   44    |    0     |   0   |
+--------------------------------------+---------+---------+-------+---------+----------+-------+

Sample output of JSON file:

result.json

  • sanity_check(...) function on your codebase:
from t4_devkit.common.io import save_json
from t4_devkit.common.serialzie import serialize_dataclass
from t4_devkti.sanity import print_sanity_result, sanity_check

result = sanity_check("/path/to/dataset")  # path to a single dataset root

# save result as JSON file
serialized = serialize_dataclass(result)
save_json(serialized, "result.json")

# print result if you want
print_sanity_result(result)

Copilot AI review requested due to automatic review settings October 23, 2025 12:35
@ktro2828 ktro2828 marked this pull request as draft October 23, 2025 12:35
@ktro2828 ktro2828 linked an issue Oct 23, 2025 that may be closed by this pull request
@github-actions github-actions Bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation new-feature New feature or request labels Oct 23, 2025
@github-actions

github-actions Bot commented Oct 23, 2025

Copy link
Copy Markdown
Contributor

☂️ Python Coverage

current status: ❌

Overall Coverage

Lines Covered Coverage Threshold Status
3834 2125 55% 50% 🟢

New Files

File Coverage Status
t4_devkit/sanity/init.py 0% 🔴
t4_devkit/sanity/checker.py 0% 🔴
t4_devkit/sanity/context.py 0% 🔴
t4_devkit/sanity/format/init.py 0% 🔴
t4_devkit/sanity/format/base.py 0% 🔴
t4_devkit/sanity/format/fmt001.py 0% 🔴
t4_devkit/sanity/format/fmt002.py 0% 🔴
t4_devkit/sanity/format/fmt003.py 0% 🔴
t4_devkit/sanity/format/fmt004.py 0% 🔴
t4_devkit/sanity/format/fmt005.py 0% 🔴
t4_devkit/sanity/format/fmt006.py 0% 🔴
t4_devkit/sanity/format/fmt007.py 0% 🔴
t4_devkit/sanity/format/fmt008.py 0% 🔴
t4_devkit/sanity/format/fmt009.py 0% 🔴
t4_devkit/sanity/format/fmt010.py 0% 🔴
t4_devkit/sanity/format/fmt011.py 0% 🔴
t4_devkit/sanity/format/fmt012.py 0% 🔴
t4_devkit/sanity/format/fmt013.py 0% 🔴
t4_devkit/sanity/format/fmt014.py 0% 🔴
t4_devkit/sanity/format/fmt015.py 0% 🔴
t4_devkit/sanity/format/fmt016.py 0% 🔴
t4_devkit/sanity/format/fmt017.py 0% 🔴
t4_devkit/sanity/format/fmt018.py 0% 🔴
t4_devkit/sanity/record/init.py 0% 🔴
t4_devkit/sanity/record/base.py 0% 🔴
t4_devkit/sanity/record/rec001.py 0% 🔴
t4_devkit/sanity/record/rec002.py 0% 🔴
t4_devkit/sanity/record/rec003.py 0% 🔴
t4_devkit/sanity/record/rec004.py 0% 🔴
t4_devkit/sanity/record/rec005.py 0% 🔴
t4_devkit/sanity/record/rec006.py 0% 🔴
t4_devkit/sanity/reference/init.py 0% 🔴
t4_devkit/sanity/reference/base.py 0% 🔴
t4_devkit/sanity/reference/ref001.py 0% 🔴
t4_devkit/sanity/reference/ref002.py 0% 🔴
t4_devkit/sanity/reference/ref003.py 0% 🔴
t4_devkit/sanity/reference/ref004.py 0% 🔴
t4_devkit/sanity/reference/ref005.py 0% 🔴
t4_devkit/sanity/reference/ref006.py 0% 🔴
t4_devkit/sanity/reference/ref007.py 0% 🔴
t4_devkit/sanity/reference/ref008.py 0% 🔴
t4_devkit/sanity/reference/ref009.py 0% 🔴
t4_devkit/sanity/reference/ref010.py 0% 🔴
t4_devkit/sanity/reference/ref011.py 0% 🔴
t4_devkit/sanity/reference/ref012.py 0% 🔴
t4_devkit/sanity/reference/ref013.py 0% 🔴
t4_devkit/sanity/reference/ref014.py 0% 🔴
t4_devkit/sanity/reference/ref015.py 0% 🔴
t4_devkit/sanity/registry.py 0% 🔴
t4_devkit/sanity/result.py 0% 🔴
t4_devkit/sanity/run.py 0% 🔴
t4_devkit/sanity/safety.py 0% 🔴
t4_devkit/sanity/structure/init.py 0% 🔴
t4_devkit/sanity/structure/str001.py 0% 🔴
t4_devkit/sanity/structure/str002.py 0% 🔴
t4_devkit/sanity/structure/str003.py 0% 🔴
t4_devkit/sanity/structure/str004.py 0% 🔴
t4_devkit/sanity/structure/str005.py 0% 🔴
t4_devkit/sanity/structure/str006.py 0% 🔴
t4_devkit/sanity/structure/str007.py 0% 🔴
t4_devkit/sanity/structure/str008.py 0% 🔴
t4_devkit/sanity/structure/str009.py 0% 🔴
t4_devkit/sanity/tier4/init.py 0% 🔴
t4_devkit/sanity/tier4/tiv001.py 0% 🔴
TOTAL 0% 🔴

Modified Files

No covered modified files...

updated for commit: 5b054ad by action🐍

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive sanity checking framework for dataset validation, replacing the previous implementation with a modular, registry-based system that organizes checks by category (structure, schema, reference, format) and provides improved reporting capabilities.

Key changes include:

  • New extensible checker framework with base classes, registry, and context objects for managing dataset metadata
  • Suite of 44+ modular validation checkers organized by rule category (STR, SCH, REF, FMT)
  • Enhanced CLI with detailed reporting, summary tables, and result serialization

Reviewed Changes

Copilot reviewed 60 out of 60 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
t4_devkit/schema/builder.py Added build_schema_safe function for error-safe schema building
t4_devkit/sanity/*.py Core framework files (checker, context, result, registry, run, safety)
t4_devkit/sanity/structure/*.py Structure validation checkers (STR001-STR009)
t4_devkit/sanity/schema/*.py Schema validation checkers (SCH001-SCH006)
t4_devkit/sanity/reference/*.py Reference validation checkers (REF001-REF011)
t4_devkit/sanity/format/*.py Format validation checkers (FMT001-FMT018)
t4_devkit/cli/sanity.py Updated CLI to use new framework with improved output
pyproject.toml Added returns dependency
docs/schema/requirement.md New requirements documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread t4_devkit/schema/builder.py Outdated
Comment thread t4_devkit/sanity/structure/str003.py Outdated
Comment thread t4_devkit/sanity/structure/str002.py Outdated
Comment thread t4_devkit/sanity/schema/sch002.py Outdated
@ktro2828 ktro2828 force-pushed the feat/sanity-details branch 4 times, most recently from 001bb62 to 531a48e Compare October 28, 2025 08:34
@ktro2828 ktro2828 marked this pull request as ready for review October 31, 2025 02:16

@shekharhimanshu shekharhimanshu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRありがとうございます!一点質問しました。

Comment thread docs/schema/requirement.md
@ktro2828

ktro2828 commented Nov 5, 2025

Copy link
Copy Markdown
Collaborator Author

@shekharhimanshu @SamratThapa120 Let me ask the usage of t4sanity CLI.
Currently, t4sanity CLI requires the parent directory path to the multiple datasets as follows:

<DB_PARENT>
├── dataset1
│   └── <VERSION>
│       ├── annotation
│       ├── data
|       ...
├── dataset2
│   ├── annotation
│   ├── data
|   ...
...

It outputs a single JSON file with t4sanity <DB_PARENT> -o result.json.
Then, result.json includes a list of results for each dataset.

Do you prefer to validating only a single dataset and generating a JSON file containing the result for a single dataset?

@shekharhimanshu

Copy link
Copy Markdown
Contributor

Do you prefer to validating only a single dataset and generating a JSON file containing the result for a single dataset?

@ktro2828
IMO, validating a single dataset is sufficient (multiple datasets can be validated by running for each individually).

@SamratThapa120

SamratThapa120 commented Nov 6, 2025

Copy link
Copy Markdown
Contributor

Do you prefer to validating only a single dataset and generating a JSON file containing the result for a single dataset?

@ktro2828 I agree with @shekharhimanshu, validating single dataset seems sufficient.

@SamratThapa120

Copy link
Copy Markdown
Contributor

@ktro2828 Sorry for the delay, I will finish the review by tomorrow.

@SamratThapa120 SamratThapa120 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great feature. I have left a question.

Comment thread t4_devkit/sanity/format/base.py
@ktro2828

ktro2828 commented Nov 6, 2025

Copy link
Copy Markdown
Collaborator Author

@shekharhimanshu @SamratThapa120 Thank you guys for comments. I updated t4sanity CLI to check only a single dataset in 4116639.

@ktro2828 Sorry for the delay, I will finish the review by tomorrow.

@SamratThapa120 No worries, take your time! I'm sorry for this huge PR, too.

@shekharhimanshu shekharhimanshu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. LGTM! 💯

Comment thread docs/schema/requirement.md
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>
@ktro2828 ktro2828 force-pushed the feat/sanity-details branch from f6a71f5 to ae62130 Compare November 10, 2025 06:56
Signed-off-by: ktro2828 <kotaro.uetake@tier4.jp>

@SamratThapa120 SamratThapa120 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯.

Please resolve this is subsequent PRs
#212 (comment)

@ktro2828 ktro2828 merged commit d919ae6 into main Nov 11, 2025
5 checks passed
@ktro2828 ktro2828 deleted the feat/sanity-details branch November 11, 2025 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation new-feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] extend sanity checking

5 participants