Skip to content

Commit 3083c46

Browse files
committed
Add BMFF item semantics and color summaries
Introduce item type-name and semantic labeling plus bounded primary color/profile summaries for BMFF handling. Adds ColrProp and ItemSemantic (nclx/nclc/rICC/prof and profile byte counts) and classification/emission logic in src/openmeta/bmff_fields_decode.cc, emitting fields like item.type_name, item.semantic, primary.color_type, primary.nclx_*, and primary.color_profile_bytes. Updates and expands unit tests to cover nclx, rICC profiles, short/ignored colr boxes, and item semantic labeling. Documentation, CHANGES.md, and VERSION bumped to 0.4.23 to reflect the new features.
1 parent c17477a commit 3083c46

8 files changed

Lines changed: 746 additions & 13 deletions

File tree

CHANGES.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# OpenMeta Changes
22

3+
## 0.4.23 - 2026-05-21
4+
5+
Changes compared with `0.4.22`.
6+
7+
### Added
8+
9+
- Added BMFF item type-name and semantic labels for image, EXIF, XMP, JUMBF,
10+
C2PA, ICC profile, URI, thumbnail, derived-image, auxiliary, and
11+
content-description items.
12+
- Added bounded primary BMFF color/property summaries for `colr` `nclx`,
13+
`nclc`, `rICC`, and `prof` properties, including ICC profile byte counts and
14+
nclx color fields.
15+
316
## 0.4.22 - 2026-05-21
417

518
Changes compared with `0.4.21`.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.22
1+
0.4.23

docs/development.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -658,7 +658,53 @@ This policy surface is intentionally marked draft and may be refined.
658658
plus `format_icc_tag_display_value(...)` for shared CLI/Python rendering)
659659
- ISO-BMFF (HEIF/AVIF/CR3) container-derived fields: `src/openmeta/bmff_fields_decode.cc`
660660
- Emitted during `simple_meta_read(...)` as `MetaKeyKind::BmffField` entries.
661-
- Current fields: `ftyp.*`, primary item properties (`meta.primary_item_id`, `primary.width`, `primary.height`, `primary.rotation_degrees`, `primary.mirror` from `pitm` + `iprp/ipco ispe/irot/imir` + `ipma`), item-info rows from `iinf/infe` (`item.info_count`, `item.id`, `item.type`, `item.name`, `item.content_type`, `item.content_encoding`, `item.uri_type`; emitted even when `meta` has no `pitm`, plus `primary.item_type`, `primary.item_name`, `primary.content_type`, `primary.content_encoding`, `primary.uri_type` aliases when `pitm` is present), bounded `iref.*` relation fields (`ref_type`, `ref_type_name`, `from_item_id`, `to_item_id`, `edge_count`), typed derived relation rows (`iref.auxl.*`, `iref.dimg.*`, `iref.thmb.*`, `iref.cdsc.*`, and other safe ASCII FourCC relation families), per-type relation counters (`iref.<type>.edge_count`) and per-type unique source/target counters (`iref.<type>.from_item_unique_count`, `iref.<type>.to_item_unique_count`), per-type graph-summary aliases (`iref.graph.<type>.edge_count`, `iref.graph.<type>.from_item_unique_count`, `iref.graph.<type>.to_item_unique_count`), typed relation item summaries (`iref.<type>.item_count`, `iref.<type>.item_id`, `iref.<type>.item_out_edge_count`, `iref.<type>.item_in_edge_count`), relation-graph summaries (`iref.item_count`, `iref.from_item_unique_count`, `iref.to_item_unique_count`, row-wise `iref.item_id` + `iref.item_out_edge_count` + `iref.item_in_edge_count`), bounded primary-linked image-role rows (`primary.linked_item_role_count`, row-wise `primary.linked_item_id` + `primary.linked_item_type` + `primary.linked_item_name` + `primary.linked_item_role` when `iinf/infe` data exists), and `auxC`-based aux semantics (`aux.item_count`, `aux.item_id`, `aux.semantic`, `aux.type`, `aux.subtype_hex`, `aux.subtype_kind`, `aux.subtype_text`, `aux.subtype_uuid`, `aux.subtype_u32`, `aux.subtype_u64`, `aux.alpha_count`, `aux.depth_count`, `aux.disparity_count`, `aux.matte_count`, `primary.auxl_count`, `primary.auxl_semantic`, `primary.depth_count`, `primary.depth_item_id`, `primary.alpha_count`, `primary.alpha_item_id`, `primary.disparity_count`, `primary.disparity_item_id`, `primary.matte_count`, `primary.matte_item_id`, `primary.dimg_count`, `primary.dimg_item_id`, `primary.thmb_count`, `primary.thmb_item_id`, `primary.cdsc_count`, `primary.cdsc_item_id`, ...). Full multi-image scene modeling beyond that primary-linked role surface is still follow-up work.
661+
- Current fields: `ftyp.*`, primary item properties
662+
(`meta.primary_item_id`, `primary.width`, `primary.height`,
663+
`primary.rotation_degrees`, `primary.mirror` from `pitm` + `iprp/ipco
664+
ispe/irot/imir` + `ipma`), primary `colr` summaries
665+
(`primary.color_type`, `primary.color_type_name`,
666+
`primary.nclx_colour_primaries`,
667+
`primary.nclx_transfer_characteristics`,
668+
`primary.nclx_matrix_coefficients`, `primary.nclx_full_range_flag`, and
669+
`primary.color_profile_bytes` for bounded ICC profile carriers), item-info
670+
rows from `iinf/infe` (`item.info_count`, `item.id`, `item.type`,
671+
`item.type_name`, `item.semantic`, `item.name`, `item.content_type`,
672+
`item.content_encoding`, `item.uri_type`; emitted even when `meta` has no
673+
`pitm`, plus `primary.item_type`, `primary.item_type_name`,
674+
`primary.item_semantic`, `primary.item_name`, `primary.content_type`,
675+
`primary.content_encoding`, `primary.uri_type` aliases when `pitm` is
676+
present), bounded `iref.*` relation fields (`ref_type`, `ref_type_name`,
677+
`from_item_id`, `to_item_id`, `edge_count`), typed derived relation rows
678+
(`iref.auxl.*`, `iref.dimg.*`, `iref.thmb.*`, `iref.cdsc.*`, and other safe
679+
ASCII FourCC relation families), per-type relation counters
680+
(`iref.<type>.edge_count`) and per-type unique source/target counters
681+
(`iref.<type>.from_item_unique_count`,
682+
`iref.<type>.to_item_unique_count`), per-type graph-summary aliases
683+
(`iref.graph.<type>.edge_count`,
684+
`iref.graph.<type>.from_item_unique_count`,
685+
`iref.graph.<type>.to_item_unique_count`), typed relation item summaries
686+
(`iref.<type>.item_count`, `iref.<type>.item_id`,
687+
`iref.<type>.item_out_edge_count`, `iref.<type>.item_in_edge_count`),
688+
relation-graph summaries (`iref.item_count`,
689+
`iref.from_item_unique_count`, `iref.to_item_unique_count`, row-wise
690+
`iref.item_id` + `iref.item_out_edge_count` +
691+
`iref.item_in_edge_count`), bounded primary-linked image-role rows
692+
(`primary.linked_item_role_count`, row-wise `primary.linked_item_id` +
693+
`primary.linked_item_type` + `primary.linked_item_type_name` +
694+
`primary.linked_item_name` + `primary.linked_item_semantic` +
695+
`primary.linked_item_role` when `iinf/infe` data exists), and
696+
`auxC`-based aux semantics (`aux.item_count`, `aux.item_id`,
697+
`aux.semantic`, `aux.type`, `aux.subtype_hex`, `aux.subtype_kind`,
698+
`aux.subtype_text`, `aux.subtype_uuid`, `aux.subtype_u32`,
699+
`aux.subtype_u64`, `aux.alpha_count`, `aux.depth_count`,
700+
`aux.disparity_count`, `aux.matte_count`, `primary.auxl_count`,
701+
`primary.auxl_semantic`, `primary.depth_count`, `primary.depth_item_id`,
702+
`primary.alpha_count`, `primary.alpha_item_id`,
703+
`primary.disparity_count`, `primary.disparity_item_id`,
704+
`primary.matte_count`, `primary.matte_item_id`, `primary.dimg_count`,
705+
`primary.dimg_item_id`, `primary.thmb_count`, `primary.thmb_item_id`,
706+
`primary.cdsc_count`, `primary.cdsc_item_id`, ...). Full multi-image scene
707+
modeling beyond that primary-linked role surface is still follow-up work.
662708
- `auxC` subtype interpretation now includes `ascii_z` and `u64be` kinds in addition to earlier numeric/FourCC/UUID/ASCII forms.
663709
- Parsing is intentionally bounded (depth/box count caps) and ignores unknown properties.
664710
- JUMBF/C2PA decode (draft phase-3): `src/openmeta/jumbf_decode.cc`

docs/interpretation_status.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ explicit outcome:
3939
| Color, white balance, and matrices | DNG color/calibration/reduction/forward matrix groups, white-balance vector groups, ICC metadata, RAW color/source-processing safety buckets, transfer hints, per-family grouped vendor color/WB candidates, long-tail camera-to-XYZ/RGB, style/color, and white-balance gain aliases, and cross-family concept candidates with full grouped value vectors are identified. Matrix/vector groups require numeric payloads with conservative minimum shapes before promotion. | Medium-high, about 82-90%. | Deeper camera/vendor color science interpretation is intentionally conservative, especially for rendered-image transfer. |
4040
| Lens correction and RAW processing | Lens-correction groups, black/white levels, linearization, CFA/sensor layout, raw-storage identifiers, vendor RAW/source-processing buckets, creative/picture style, film simulation, dynamic-range, optical correction, and raw-development aliases, per-family vendor raw-storage/sensor/source-processing table candidates, transfer hints, transfer diagnostics, and concept candidates with grouped table/vector values are classified for query and transfer safety. Lens-correction grouped tables require numeric payloads before promotion. | Medium-high, about 82-89%. | Long-tail per-model correction tables and richer numeric normalization. |
4141
| Vendor MakerNotes | Broad MakerNote naming and source-processing classification exists for common vendors and several live computational/thermal vendors. Unknown entries remain lossless and source-private subgroups distinguish preview, face geometry, computational, thermal, stitch/panorama, pixel-shift, multi-shot, composite, auto-lighting, RAW crop/active-area, source color-transform, source style/rendering aliases, lens-correction, raw-level processing data, and Phase One/Leaf RAW-processing fields handled by direct classification plus dedicated normalized helpers. Classified multi-field vendor groups now surface as grouped query/interpretation candidates where safe to expose structurally. | Medium-high, about 83-90%. | ExifTool-style long-tail print conversions, encrypted/custom settings, and per-model private tables. |
42-
| BMFF item graph, HEIF/AVIF/CR3, JUMBF, and C2PA | BMFF derived fields, item-info rows, bounded relations, primary-linked roles, aux semantics, and draft C2PA/JUMBF structural fields are exposed. | Medium, about 60-70%. | Full BMFF scene modeling and full C2PA manifest/policy semantics. |
42+
| BMFF item graph, HEIF/AVIF/CR3, JUMBF, and C2PA | BMFF derived fields, item-info rows, item type/semantic labels for common metadata carriers, bounded relations, primary-linked roles, aux semantics, primary color/profile property summaries, and draft C2PA/JUMBF structural fields are exposed. | Medium, about 63-73%. | Full BMFF scene modeling and full C2PA manifest/policy semantics. |
4343
| Photoshop IRB | Raw resources are preserved and a bounded interpreted subset is decoded for fixed-layout resources, including resolution/version/print data, alpha names/identifiers, captions, QuickMask info, URL/list data, channel options, and clipping-path names. | Medium, about 62-72%. | Broader resource-specific interpretation. |
4444
| Semantic query/search and records | Query helpers expose raw matches, confidence, provenance, value shapes, normalized candidates, canonical crop/active-area rectangles, Fujifilm RAF raw crop/zoom rectangles, Canon/Nikon/Sony crop and border patterns, border margins, exposure/gain roles, selected vendor/MakerNote exposure-name aliases, per-family grouped vendor records, descriptive EXIF/IPTC/XMP concepts, expanded source color/style/lens/source-processing aliases, source-processing buckets, optional RapidFuzz near-miss matching, structured interpretation records, and bounded cross-family concept resolution for orientation, date/time, exposure/gain, color/profile, GPS, geometry, lens-correction, and RAW-processing with parsed date/time fields, timezone/precision classification, combined GPS timestamps, GPS altitude-reference state and display token, canonical geometry origin/size/rect/margins, normalized exposure values, shape-checked grouped value vectors, transfer hints, rendered/compatible safety booleans, and tolerance-aware GPS/exposure/color/geometry conflicts. | Medium-high, about 79-85%. | More long-tail per-model concept aliases and richer localized policy wording. |
4545
| Transfer-safety classification | Compatible-file versus rendered-image safety policies classify source-specific image geometry, color/profile, RAW-processing, MakerNote, JUMBF/C2PA, and vendor-private data, with concept-level diagnostics that report keep/drop/requires-target-image-spec actions, severity, and role-specific default message text before prepare. | High, about 89-93%. | More per-family policy tests and optional host localization hooks. |

docs/metadata_support.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,10 +140,14 @@ OpenMeta now has a bounded semantic model on top of raw item discovery:
140140
- `ftyp.*`
141141
- primary item properties
142142
- `iinf/infe` item-info rows
143+
- item type-name and semantic labels for EXIF, XMP, JUMBF, C2PA, ICC profile,
144+
image, URI, auxiliary, thumbnail, derived-image, and content-description items
143145
- typed `iref.<type>.*` rows
144146
- graph summaries
145147
- `auxC`-typed auxiliary semantics
146148
- bounded primary-linked image-role fields
149+
- primary `colr` summaries for `nclx`/`nclc` color fields and ICC profile-size
150+
carriers
147151

148152
This is intentionally smaller than a full QuickTime/BMFF semantic model.
149153

docs/sphinx/interpretation_status.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -154,9 +154,11 @@ Coverage matrix
154154
- ExifTool-style long-tail print conversions, encrypted/custom settings,
155155
and per-model private tables.
156156
* - BMFF item graph, HEIF/AVIF/CR3, JUMBF, and C2PA
157-
- BMFF derived fields, item-info rows, bounded relations, primary-linked
158-
roles, aux semantics, and draft C2PA/JUMBF structural fields are exposed.
159-
- Medium, about 60-70%.
157+
- BMFF derived fields, item-info rows, item type/semantic labels for
158+
common metadata carriers, bounded relations, primary-linked roles, aux
159+
semantics, primary color/profile property summaries, and draft
160+
C2PA/JUMBF structural fields are exposed.
161+
- Medium, about 63-73%.
160162
- Full BMFF scene modeling and full C2PA manifest/policy semantics.
161163
* - Photoshop IRB
162164
- Raw resources are preserved and a bounded interpreted subset is decoded

0 commit comments

Comments
 (0)