Skip to content

Commit ba4c041

Browse files
committed
Add descriptive metadata query
Introduce a Descriptive MetadataQueryKind with semantic kinds (Title, Description, Creator, Keywords) and corresponding match-term bits, plus the query_descriptive_metadata API. Add EXIF/IPTC/XMP tag constants and matching helpers (exact and fuzzy term matching, XMP leaf extraction, IPTC dataset mapping) to reconcile common descriptive fields across EXIF/IPTC/XMP (title/headline, description/caption, creator/author, keywords/subject). Wire descriptive candidates into interpretation output and query dispatch, update kind/semantic name strings, and add unit tests covering EXIF/IPTC/XMP reconciliation. Update docs and CHANGES, and bump VERSION to 0.4.21.
1 parent fe4b971 commit ba4c041

12 files changed

Lines changed: 404 additions & 25 deletions

CHANGES.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# OpenMeta Changes
22

3+
## 0.4.21 - 2026-05-21
4+
5+
Changes compared with `0.4.20`.
6+
7+
### Added
8+
9+
- Added `query_descriptive_metadata(...)` for bounded descriptive
10+
EXIF/IPTC/XMP reconciliation across title/headline,
11+
description/caption, creator/author, and keywords/subject metadata.
12+
- Structured interpretation now includes descriptive query records, preserving
13+
source entry provenance for UI and host reconciliation workflows.
14+
315
## 0.4.20 - 2026-05-21
416

517
Changes compared with `0.4.19`.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.20
1+
0.4.21

docs/interpretation_status.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ explicit outcome:
3232
| --- | --- | --- | --- |
3333
| Standard EXIF/TIFF/DNG tag names and typed values | Standard tag names, common scalar/vector values, DNG crop/color/exposure/RAW-processing fields, GeoTIFF key names, and common EXIF/TIFF/DNG numeric value-name helpers are available. Exposure time, aperture, ISO sensitivity, exposure bias, exposure program, gain, and raw exposure-adjustment records now flow into concept candidates. | High, about 91-95%. | More enum-style human-readable values and richer conflict handling between duplicated families. |
3434
| ICC profiles | ICC header/tag table decode plus interpreted `desc`, text, signatures, XYZ, curves, named-color, measurement, viewing-condition, MFT/MAB/MBA, numeric array, and malformed/limit handling. | High, about 90-95%. | Full color-management policy remains host-owned; OpenMeta interprets profile metadata, not rendered color transforms. |
35-
| IPTC-IIM and portable XMP | IPTC datasets and XMP properties decode into typed entries, and bounded EXIF/IPTC-to-XMP projection exists for transfer/writeback. | Medium-high, about 75-85%. | Full MWG-style reconciliation of duplicated EXIF/XMP/IPTC concepts is still bounded. |
35+
| IPTC-IIM and portable XMP | IPTC datasets and XMP properties decode into typed entries, bounded EXIF/IPTC-to-XMP projection exists for transfer/writeback, and common descriptive EXIF/IPTC/XMP concepts such as title/headline, description/caption, creator/author, and keywords/subject are queryable with source-entry provenance. | Medium-high, about 78-86%. | Full MWG-style reconciliation of duplicated EXIF/XMP/IPTC concepts remains bounded. |
3636
| Orientation | EXIF/TIFF orientation query, LibRaw flip mapping, generic orientation helpers for index, rotation degrees, mirrored state, dimension swap, rotation-only fallback, human-readable labels, and EXIF-vs-XMP conflict reporting in the LibRaw bridge. | High, about 90-95%. | Higher-level policy for resolving container and host pixel-orientation state remains host-specific. |
3737
| Geometry, crop, active area, and borders | DNG crop/active-area/masked-area tags, Phase One/Leaf geometry, Fujifilm RAF raw crop/zoom rectangles, Canon aspect/crop metadata, Nikon Capture crop bounds, Sony panorama crop margins, canonical border margins, vendor RAW-processing geometry buckets, and fuzzy crop/border-style paths are queryable. | High, about 88-92%. | More vendor-specific normalized rectangles and stronger output contracts for ambiguous multi-tag geometry. |
3838
| Exposure and gain | Standard EXIF exposure time, f-number, exposure program, photographic sensitivity, exposure bias, exposure index, gain control, selected DNG baseline/raw-preview gain fields, matching XMP paths, and selected decoded vendor/MakerNote exposure names are queryable and promoted into cross-family exposure roles. Standard EXIF exposure program and gain-control values carry human-readable labels in concept candidates. Capture exposure facts are marked safe, while raw/DNG exposure adjustments are marked unsafe for rendered-image transfer. | Medium-high, about 87-91%. | More vendor MakerNote exposure print conversions and richer per-vendor exposure/gain labels. |
@@ -41,7 +41,7 @@ explicit outcome:
4141
| Vendor MakerNotes | Broad MakerNote naming and source-processing classification exists for common vendors and several live computational/thermal vendors. Unknown entries remain lossless and source-private subgroups distinguish preview, face geometry, computational, thermal, stitch/panorama, pixel-shift, multi-shot, composite, auto-lighting, RAW crop/active-area, source color-transform, source style/rendering aliases, lens-correction, raw-level processing data, and Phase One/Leaf RAW-processing fields handled by direct classification plus dedicated normalized helpers. Classified multi-field vendor groups now surface as grouped query/interpretation candidates where safe to expose structurally. | Medium-high, about 83-90%. | ExifTool-style long-tail print conversions, encrypted/custom settings, and per-model private tables. |
4242
| BMFF item graph, HEIF/AVIF/CR3, JUMBF, and C2PA | BMFF derived fields, item-info rows, bounded relations, primary-linked roles, aux semantics, and draft C2PA/JUMBF structural fields are exposed. | Medium, about 60-70%. | Full BMFF scene modeling and full C2PA manifest/policy semantics. |
4343
| Photoshop IRB | Raw resources are preserved and a bounded interpreted subset is decoded for fixed-layout resources. | Medium, about 60-70%. | Broader resource-specific interpretation. |
44-
| Semantic query/search and records | Query helpers expose raw matches, confidence, provenance, value shapes, normalized candidates, canonical crop/active-area rectangles, Fujifilm RAF raw crop/zoom rectangles, Canon/Nikon/Sony crop and border patterns, border margins, exposure/gain roles, selected vendor/MakerNote exposure-name aliases, per-family grouped vendor records, expanded source color/style/lens/source-processing aliases, source-processing buckets, optional RapidFuzz near-miss matching, structured interpretation records, and bounded cross-family concept resolution for orientation, date/time, exposure/gain, color/profile, GPS, geometry, lens-correction, and RAW-processing with parsed date/time fields, timezone/precision classification, combined GPS timestamps, GPS altitude-reference state and display token, canonical geometry origin/size/rect/margins, normalized exposure values, shape-checked grouped value vectors, transfer hints, rendered/compatible safety booleans, and tolerance-aware GPS/exposure/color/geometry conflicts. | Medium-high, about 78-84%. | More long-tail per-model concept aliases and richer localized policy wording. |
44+
| Semantic query/search and records | Query helpers expose raw matches, confidence, provenance, value shapes, normalized candidates, canonical crop/active-area rectangles, Fujifilm RAF raw crop/zoom rectangles, Canon/Nikon/Sony crop and border patterns, border margins, exposure/gain roles, selected vendor/MakerNote exposure-name aliases, per-family grouped vendor records, descriptive EXIF/IPTC/XMP concepts, expanded source color/style/lens/source-processing aliases, source-processing buckets, optional RapidFuzz near-miss matching, structured interpretation records, and bounded cross-family concept resolution for orientation, date/time, exposure/gain, color/profile, GPS, geometry, lens-correction, and RAW-processing with parsed date/time fields, timezone/precision classification, combined GPS timestamps, GPS altitude-reference state and display token, canonical geometry origin/size/rect/margins, normalized exposure values, shape-checked grouped value vectors, transfer hints, rendered/compatible safety booleans, and tolerance-aware GPS/exposure/color/geometry conflicts. | Medium-high, about 79-85%. | More long-tail per-model concept aliases and richer localized policy wording. |
4545
| Transfer-safety classification | Compatible-file versus rendered-image safety policies classify source-specific image geometry, color/profile, RAW-processing, MakerNote, JUMBF/C2PA, and vendor-private data, with concept-level diagnostics that report keep/drop/requires-target-image-spec actions, severity, and role-specific default message text before prepare. | High, about 89-93%. | More per-family policy tests and optional host localization hooks. |
4646

4747
## Competitor Position

docs/quick_start.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,10 @@ Use exact keys for:
145145
For semantic inspection UI, use `openmeta/metadata_query.h`. It reports raw
146146
matches plus normalized candidates for areas such as crop/active-area,
147147
exposure/gain, white balance, color, lens correction, orientation, and
148-
RAW-processing metadata. These helpers use deterministic built-in name/tag
148+
RAW-processing metadata. `query_descriptive_metadata(...)` also exposes a
149+
bounded EXIF/IPTC/XMP reconciliation view for common descriptive fields:
150+
title/headline, description/caption, creator/author, and keywords/subject.
151+
These helpers use deterministic built-in name/tag
149152
matching by default. If OpenMeta is configured with
150153
`-DOPENMETA_ENABLE_RAPIDFUZZ=ON`, the same query helpers also use RapidFuzz to
151154
match near-miss property names such as misspelled crop/border/padding paths.

docs/sphinx/interpretation_status.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -80,11 +80,14 @@ Coverage matrix
8080
- Full color-management policy remains host-owned; OpenMeta interprets
8181
profile metadata, not rendered color transforms.
8282
* - IPTC-IIM and portable XMP
83-
- IPTC datasets and XMP properties decode into typed entries, and bounded
84-
EXIF/IPTC-to-XMP projection exists for transfer/writeback.
85-
- Medium-high, about 75-85%.
86-
- Full MWG-style reconciliation of duplicated EXIF/XMP/IPTC concepts is
87-
still bounded.
83+
- IPTC datasets and XMP properties decode into typed entries, bounded
84+
EXIF/IPTC-to-XMP projection exists for transfer/writeback, and common
85+
descriptive EXIF/IPTC/XMP concepts such as title/headline,
86+
description/caption, creator/author, and keywords/subject are queryable
87+
with source-entry provenance.
88+
- Medium-high, about 78-86%.
89+
- Full MWG-style reconciliation of duplicated EXIF/XMP/IPTC concepts
90+
remains bounded.
8891
* - Orientation
8992
- EXIF/TIFF orientation query, LibRaw flip mapping, and generic
9093
orientation helpers for index, rotation degrees, mirrored state,
@@ -166,8 +169,9 @@ Coverage matrix
166169
RAF raw crop/zoom rectangles, Canon/Nikon/Sony crop and border
167170
patterns, border margins, exposure/gain roles, selected
168171
vendor/MakerNote exposure-name aliases, per-family grouped vendor
169-
records, expanded source color/style/lens/source-processing aliases,
170-
source-processing buckets, optional RapidFuzz near-miss matching,
172+
records, descriptive EXIF/IPTC/XMP concepts, expanded source
173+
color/style/lens/source-processing aliases, source-processing buckets,
174+
optional RapidFuzz near-miss matching,
171175
structured interpretation records, and bounded cross-family concept
172176
resolution for orientation, date/time, exposure/gain,
173177
color/profile, GPS, geometry, lens-correction, and RAW-processing with
@@ -177,7 +181,7 @@ Coverage matrix
177181
origin/size/rect/margins, normalized exposure values, shape-checked
178182
grouped value vectors, transfer hints, rendered/compatible safety
179183
booleans, and tolerance-aware GPS/exposure/color/geometry conflicts.
180-
- Medium-high, about 78-84%.
184+
- Medium-high, about 79-85%.
181185
- More long-tail per-model concept aliases and richer localized policy
182186
wording.
183187
* - Transfer-safety classification

docs/sphinx/quick_start.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,10 @@ the metadata family and key:
123123
For semantic inspection UI, use ``openmeta/metadata_query.h``. It reports raw
124124
matches plus normalized candidates for areas such as crop/active-area,
125125
exposure/gain, white balance, color, lens correction, orientation, and
126-
RAW-processing metadata. These helpers use deterministic built-in name/tag
126+
RAW-processing metadata. ``query_descriptive_metadata(...)`` also exposes a
127+
bounded EXIF/IPTC/XMP reconciliation view for common descriptive fields:
128+
title/headline, description/caption, creator/author, and keywords/subject.
129+
These helpers use deterministic built-in name/tag
127130
matching by default. If OpenMeta is configured with
128131
``-DOPENMETA_ENABLE_RAPIDFUZZ=ON``, the same query helpers also use RapidFuzz to
129132
match near-miss property names such as misspelled crop/border/padding paths.

src/include/openmeta/metadata_query.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ enum class MetadataQueryKind : uint8_t {
2424
LensCorrection,
2525
Orientation,
2626
RawProcessing,
27+
Descriptive,
2728
};
2829

2930
enum class MetadataQuerySemanticKind : uint8_t {
@@ -46,6 +47,10 @@ enum class MetadataQuerySemanticKind : uint8_t {
4647
SensorGeometry,
4748
RawStorage,
4849
SourceProcessing,
50+
Title,
51+
Description,
52+
Creator,
53+
Keywords,
4954
};
5055

5156
enum class MetadataQueryValueShape : uint8_t {
@@ -94,6 +99,10 @@ enum class MetadataQueryMatchTerm : uint32_t {
9499
Raw = 1U << 25U,
95100
Storage = 1U << 26U,
96101
SourceProcessing = 1U << 27U,
102+
Title = 1U << 28U,
103+
Description = 1U << 29U,
104+
Creator = 1U << 30U,
105+
Keywords = 1U << 31U,
97106
};
98107

99108
struct MetadataQueryMatch final {
@@ -165,6 +174,9 @@ query_orientation_metadata(const MetaStore& store);
165174
MetadataQueryResult
166175
query_raw_processing_metadata(const MetaStore& store);
167176

177+
MetadataQueryResult
178+
query_descriptive_metadata(const MetaStore& store);
179+
168180
bool
169181
metadata_query_fuzzy_search_available() noexcept;
170182

src/openmeta/metadata_concepts.cc

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1867,7 +1867,11 @@ namespace {
18671867
case MetadataQuerySemanticKind::CfaLayout:
18681868
case MetadataQuerySemanticKind::SensorGeometry:
18691869
case MetadataQuerySemanticKind::RawStorage:
1870-
case MetadataQuerySemanticKind::SourceProcessing: break;
1870+
case MetadataQuerySemanticKind::SourceProcessing:
1871+
case MetadataQuerySemanticKind::Title:
1872+
case MetadataQuerySemanticKind::Description:
1873+
case MetadataQuerySemanticKind::Creator:
1874+
case MetadataQuerySemanticKind::Keywords: break;
18711875
}
18721876
return MetadataConceptRole::Primary;
18731877
}
@@ -1986,7 +1990,11 @@ namespace {
19861990
case MetadataQuerySemanticKind::CfaLayout:
19871991
case MetadataQuerySemanticKind::SensorGeometry:
19881992
case MetadataQuerySemanticKind::RawStorage:
1989-
case MetadataQuerySemanticKind::SourceProcessing: break;
1993+
case MetadataQuerySemanticKind::SourceProcessing:
1994+
case MetadataQuerySemanticKind::Title:
1995+
case MetadataQuerySemanticKind::Description:
1996+
case MetadataQuerySemanticKind::Creator:
1997+
case MetadataQuerySemanticKind::Keywords: break;
19901998
}
19911999
return MetadataConceptRole::Primary;
19922000
}
@@ -2029,7 +2037,11 @@ namespace {
20292037
case MetadataQuerySemanticKind::ColorMatrix:
20302038
case MetadataQuerySemanticKind::LensCorrection:
20312039
case MetadataQuerySemanticKind::Orientation:
2032-
case MetadataQuerySemanticKind::ExposureGain: break;
2040+
case MetadataQuerySemanticKind::ExposureGain:
2041+
case MetadataQuerySemanticKind::Title:
2042+
case MetadataQuerySemanticKind::Description:
2043+
case MetadataQuerySemanticKind::Creator:
2044+
case MetadataQuerySemanticKind::Keywords: break;
20332045
}
20342046
return MetadataConceptRole::Primary;
20352047
}
@@ -2255,7 +2267,11 @@ namespace {
22552267
case MetadataQuerySemanticKind::Linearization:
22562268
case MetadataQuerySemanticKind::CfaLayout:
22572269
case MetadataQuerySemanticKind::RawStorage:
2258-
case MetadataQuerySemanticKind::SourceProcessing: break;
2270+
case MetadataQuerySemanticKind::SourceProcessing:
2271+
case MetadataQuerySemanticKind::Title:
2272+
case MetadataQuerySemanticKind::Description:
2273+
case MetadataQuerySemanticKind::Creator:
2274+
case MetadataQuerySemanticKind::Keywords: break;
22592275
}
22602276
return MetadataConceptRole::Primary;
22612277
}

src/openmeta/metadata_interpretation.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ interpret_metadata(const MetaStore& store)
111111
append_kind_records(store, MetadataQueryKind::Color, &out);
112112
append_kind_records(store, MetadataQueryKind::LensCorrection, &out);
113113
append_kind_records(store, MetadataQueryKind::RawProcessing, &out);
114+
append_kind_records(store, MetadataQueryKind::Descriptive, &out);
114115
return out;
115116
}
116117

0 commit comments

Comments
 (0)