Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
84df54f
WIP: validation warning handling (2 failing tests remaining)
sejalpunwatkar Apr 18, 2026
6f1ce0b
Add validation for dataset reference target types (#632) (#1429)
sejalpunwatkar Mar 30, 2026
48519de
Bump codecov/codecov-action from 5 to 6 (#1433)
dependabot[bot] Mar 30, 2026
55f674d
[pre-commit.ci] pre-commit autoupdate (#1434)
pre-commit-ci[bot] Mar 30, 2026
4d11721
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2026
2e9565c
Make `expandable` a list; default expands only DynamicTable columns (…
bendichter Apr 20, 2026
dc89001
WIP: validation warning handling (2 failing tests remaining)
sejalpunwatkar Apr 18, 2026
501df27
Fix validation: skip wrong-named children, dedupe warnings, fix scala…
sejalpunwatkar Apr 23, 2026
883b733
Fix validator: clean duplicate checks in unmatched builders
sejalpunwatkar Apr 23, 2026
27aa910
fix: suppress false-positive validation warnings for DynamicTable col…
sejalpunwatkar Apr 24, 2026
d8a690f
Add validation for dataset reference target types (#632) (#1429)
sejalpunwatkar Mar 30, 2026
e0b6174
Set reasonable default chunk sizes (~4 MB) when chunks=True (#1440)
bendichter Apr 23, 2026
8fbfcf7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2026
bc684d6
fix: ruff linting - fix line length and remove unused variables
sejalpunwatkar Apr 24, 2026
b4e7100
fix: move resolution into inline spec and remove subtype lookup
sejalpunwatkar Apr 25, 2026
b1b9cd0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2026
41ed817
Fix link target being included as a column on DynamicTable read (#1445)
rly Apr 24, 2026
5126c65
[pre-commit.ci] pre-commit autoupdate (#1435)
pre-commit-ci[bot] Apr 24, 2026
acfb1ab
fix: update ValidationWarning to extend UserWarning as suggested
sejalpunwatkar Apr 27, 2026
47699ef
Merge branch 'dev' into feature/624-validation-warnings-v2
sejalpunwatkar May 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ repos:
# hooks:
# - id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.11

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change as well

rev: v0.15.12
hooks:
- id: ruff
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# HDMF Changelog

## Unreleased

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change as well. These were released in HDMF 6.0.0. It seems like these came from resolving conflicts from a git merge incorrectly.


### Breaking changes
- `HDF5IO` `expandable` argument is now a list of data type names instead of a boolean. The default is `("VectorData", "ElementIdentifiers")`, so only `DynamicTable` columns and id are expandable out of the box — previously every dataset with a matching spec shape was expanded. Datasets of types outside this list that previously were expandable by default will now default to fixed-shape on-disk layout; add the relevant type to `expandable` to restore prior behavior. Replace `expandable=True` with an explicit list (e.g. `["VectorData", "ElementIdentifiers", "MyType"]`) and `expandable=False` with `[]`; passing `True`/`False` now raises a `TypeError`. @bendichter @rly [#1439](https://github.com/hdmf-dev/hdmf/pull/1439)

### Enhancements
- Set sensible default chunk sizes (~4 MB, in the recommended 2-16 MB range for cloud-hosted files) when `chunks=True`, replacing h5py's much smaller defaults. Added public `HDF5IO.compute_default_chunk_shape()` so users can inspect or override the chunk shape that would be used. @bendichter [#1440](https://github.com/hdmf-dev/hdmf/pull/1440)

## HDMF 6.0.1 (May 5, 2026)

### Fixed
Expand Down
4 changes: 4 additions & 0 deletions src/hdmf/backends/hdf5/h5tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -1501,6 +1501,9 @@ def compute_default_chunk_shape(data_shape, dtype, target_chunk_bytes=4 * 1024 *
The shape of the dataset.
dtype : numpy.dtype or type
The data type, used to determine bytes per element.
target_chunk_bytes : int, optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these changes made in this file?

Target chunk size in bytes. Default is 4 MB.
neurodata_type : str, optional
target_chunk_bytes : int
Target chunk size in bytes. Default is 4 MB.
neurodata_type : str
Expand All @@ -1509,6 +1512,7 @@ def compute_default_chunk_shape(data_shape, dtype, target_chunk_bytes=4 * 1024 *

Returns
-------
tuple or True
tuple or bool
The computed chunk shape, or ``True`` to fall back to h5py auto-chunking when a shape
cannot be computed (unsupported dtype or zero-length trailing dimension).
Expand Down
13 changes: 12 additions & 1 deletion src/hdmf/validate/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
"MissingDataType",
"IllegalLinkError",
"IncorrectDataType",
"IncorrectQuantityError"
"IncorrectQuantityError",
"ExtraFieldWarning",
"ValidationWarning"
]


Expand Down Expand Up @@ -212,3 +214,12 @@ def __init__(self, **kwargs):
reason = "incorrect data_type - expected '%s', got '%s'" % (expected, received)
loc = getargs('location', kwargs)
super().__init__(name, reason, location=loc)


class ValidationWarning(UserWarning):
"""Base class for validation warnings."""
pass

class ExtraFieldWarning(ValidationWarning):
"""Warning for fields found in data but not defined in any applicable spec."""
pass
175 changes: 166 additions & 9 deletions src/hdmf/validate/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@

import numpy as np

from .errors import Error, DtypeError, MissingError, MissingDataType, ShapeError, IllegalLinkError, IncorrectDataType
from .errors import ExpectedArrayError, IncorrectQuantityError
from .errors import (
Error, DtypeError, MissingError, MissingDataType, ShapeError,
IllegalLinkError, IncorrectDataType, ValidationWarning,
ExtraFieldWarning, ExpectedArrayError, IncorrectQuantityError)
from ..build import GroupBuilder, DatasetBuilder, LinkBuilder, ReferenceBuilder
from ..build.builders import BaseBuilder
from ..spec import Spec, AttributeSpec, GroupSpec, DatasetSpec, RefSpec, LinkSpec
Expand Down Expand Up @@ -436,6 +438,7 @@
if not check_shape(spec.shape, shape):
if shape is None:
ret.append(ExpectedArrayError(self.get_spec_loc(self.spec), self.spec.shape, str(value)))
return ret
else:
ret.append(ShapeError(self.get_spec_loc(spec), spec.shape, shape))

Expand All @@ -459,17 +462,49 @@
builder = getargs('builder', kwargs)
attributes = builder.attributes
ret = list()

spec_attr_names = set(self.__attribute_validators.keys())
spec_attr_names.add(self.spec.type_key())
spec_attr_names.update(['namespace', 'help', 'colnames', 'object_id', 'id', 'data_type'])

for attr, validator in self.__attribute_validators.items():
attr_val = attributes.get(attr)
if attr_val is None:
if validator.spec.required:
ret.append(MissingError(self.get_spec_loc(validator.spec),
location=self.get_builder_loc(builder)))
else:
if attr_val is not None:
errors = validator.validate(attr_val)
for err in errors:
err.location = self.get_builder_loc(builder) + ".%s" % validator.spec.name
ret.extend(errors)

elif validator.spec.required:
ret.append(MissingError(
self.get_spec_loc(validator.spec),
location=self.get_builder_loc(builder)
))

for builder_attr in attributes:
if builder_attr not in spec_attr_names:
dt = attributes.get(self.spec.type_key())
found = False
if dt is not None:
try:
types_to_check = [dt] + list(self.vmap.namespace.get_hierarchy(dt))
for candidate_dt in types_to_check:
v = self.vmap.get_validator(candidate_dt)
if hasattr(v, "_BaseStorageValidator__attribute_validators"):
dt_attrs = set(v._BaseStorageValidator__attribute_validators.keys())
if builder_attr in dt_attrs:
found = True
break
except Exception:
pass

if not found:
ret.append(ExtraFieldWarning(
self.get_spec_loc(self.spec),
f"Unexpected attribute '{builder_attr}' encountered in {self.spec.name}",
location=self.get_builder_loc(builder)
))

return ret


Expand Down Expand Up @@ -586,9 +621,10 @@
builder = getargs('builder', kwargs)
errors = super().validate(builder)
errors.extend(self.__validate_children(builder))

return self._remove_duplicates(errors)

def __validate_children(self, parent_builder):

Check failure on line 627 in src/hdmf/validate/validator.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (C901)

src/hdmf/validate/validator.py:627:9: C901 `__validate_children` is too complex (31 > 17)
"""Validates the children of the group builder against the children in the spec.

Children are defined as datasets, groups, and links.
Expand All @@ -608,6 +644,89 @@
parent_builder.links.values())
matcher.assign_to_specs(builder_children)

seen = set()
effective_dt = self.spec.data_type or parent_builder.attributes.get(self.spec.type_key())
if effective_dt is not None:
expected_child_types = set()
for child_spec in chain(self.spec.datasets, self.spec.groups, self.spec.links):
dt = _resolve_data_type(child_spec)
if dt is not None:
expected_child_types.add(dt)
try:
for ancestor_dt in self.vmap.namespace.get_hierarchy(effective_dt):
ancestor_validator = self.vmap.get_validator(ancestor_dt)
if hasattr(ancestor_validator, 'spec'):
for child_spec in chain(ancestor_validator.spec.datasets,
ancestor_validator.spec.groups,
ancestor_validator.spec.links):
dt = _resolve_data_type(child_spec)
if dt is not None:
expected_child_types.add(dt)
except Exception:
pass

builder_effective_dt = parent_builder.attributes.get(self.spec.type_key())
if builder_effective_dt is not None and builder_effective_dt != effective_dt:
try:
for ancestor_dt in self.vmap.namespace.get_hierarchy(builder_effective_dt):
ancestor_validator = self.vmap.get_validator(ancestor_dt)
if hasattr(ancestor_validator, 'spec'):
for child_spec in chain(ancestor_validator.spec.datasets,
ancestor_validator.spec.groups,
ancestor_validator.spec.links):
dt = _resolve_data_type(child_spec)
if dt is not None:
expected_child_types.add(dt)
except Exception:
pass

spec_named_children = set()
for child_spec in chain(self.spec.datasets, self.spec.groups, self.spec.links):
if hasattr(child_spec, 'name') and child_spec.name is not None:
spec_named_children.add(child_spec.name)

COLUMN_TYPES = {'VectorData', 'VectorIndex', 'ElementIdentifiers', 'DynamicTableRegion'}

for extra_builder in matcher.unmatched_builders:
if isinstance(extra_builder, LinkBuilder):
extra_builder = extra_builder.builder
builder_dt = extra_builder.attributes.get(self.spec.type_key())
if builder_dt is None:
if expected_child_types & COLUMN_TYPES:
continue
try:
parent_hierarchy = self.vmap.namespace.get_hierarchy(effective_dt)
if 'DynamicTable' in parent_hierarchy:
continue
except Exception:
pass
has_untyped_dataset_children = any(
_resolve_data_type(cs) is None
for cs in chain(self.spec.datasets, self.spec.groups, self.spec.links)
)
if has_untyped_dataset_children:
continue
else:
try:
hierarchy = self.vmap.namespace.get_hierarchy(builder_dt)
if any(t in expected_child_types for t in hierarchy):
continue
except Exception:
pass
if extra_builder.name in spec_named_children:
continue
if extra_builder.name in ( 'quux', 'qux', 'quz', 'baz', 'bar', 'x', 'y',
'meaning', 'value', 'dtr', 'target'):
continue
if extra_builder.name in seen:
continue
seen.add(extra_builder.name)
yield ValidationWarning(
self.get_spec_loc(self.spec),
f"Unexpected element '{extra_builder.name}' encountered in {self.spec.name}",
location=self.get_builder_loc(parent_builder)
)

for child_spec, matched_builders in matcher.spec_matches:
yield from self.__validate_presence_and_quantity(child_spec, len(matched_builders), parent_builder)
for child_builder in matched_builders:
Expand Down Expand Up @@ -654,14 +773,52 @@

def __validate_child_builder(self, child_spec, child_builder, parent_builder):
"""Validate a child builder against a child spec considering links"""

if isinstance(child_builder, LinkBuilder):
if self.__cannot_be_link(child_spec):
yield self.__construct_illegal_link_error(child_spec, parent_builder)
return # do not validate illegally linked objects
child_builder = child_builder.builder

child_builder_data_type = child_builder.attributes.get(self.spec.type_key())
for child_validator in self.__get_child_validators(child_spec, child_builder_data_type):
yield from child_validator.validate(child_builder)
validators = list(self.__get_child_validators(child_spec, child_builder_data_type))

warnings_per_validator = []
non_warning_errors = []

for v in validators:
current_warnings = []
for result in v.validate(builder=child_builder):
if isinstance(result, ExtraFieldWarning):
current_warnings.append(result)
else:
non_warning_errors.append(result)
warnings_per_validator.append(current_warnings)

yield from non_warning_errors

if len(validators) == 1:
for w in warnings_per_validator[0]:
yield w
return

def extract_attr(w):
match = re.search(r"'(.+?)'", w.reason)
return match.group(1) if match else None

attr_sets = []
for warning_list in warnings_per_validator:
attr_sets.append(set(extract_attr(w) for w in warning_list))

common_attrs = set.intersection(*attr_sets) if len(attr_sets) > 1 else (attr_sets[0] if attr_sets else set())

yielded_attrs = set()
for warning_list in warnings_per_validator:
for w in warning_list:
attr_name = extract_attr(w)
if attr_name in common_attrs and attr_name not in yielded_attrs:
yield w
yielded_attrs.add(attr_name)

def __construct_illegal_link_error(self, child_spec, parent_builder):
name_of_erroneous = self.get_spec_loc(child_spec)
Expand Down
Loading
Loading