Skip to content

Commit 73782e3

Browse files
committed
Extend metadata concept resolution and RAW handling
Add lens-correction and RAW-processing concept kinds/roles and preserve grouped matrix/vector/table values in concept candidates. Introduce has_values/values in MetadataConceptCandidate, copy/merge/value-key helpers, and numeric/date tolerance-aware conflict checks (GPS coordinate/altitude tolerances). Treat candidates that share source entries as non-conflicting. Expand vendor RAW/source-processing classification (pixel-shift, multi-shot, composite, auto-lighting) and update query/interpretation plumbing to emit full normalized value vectors. Update docs and tests to reflect improved coverage and behavior.
1 parent 513294e commit 73782e3

19 files changed

Lines changed: 638 additions & 134 deletions

CHANGES.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,13 +113,25 @@ Changes compared with `0.4.8`.
113113
active area, border, and sensor-geometry records. Geometry candidates expose
114114
canonical origin, size, rect, and margin fields when the query/interpretation
115115
layer can normalize them.
116+
- Extended cross-family concept resolution with full normalized value vectors
117+
and new lens-correction and RAW-processing concept families. Color/white
118+
balance, lens-correction, black/white level, linearization, CFA layout,
119+
raw-storage, and source-processing concepts now preserve grouped
120+
interpretation values instead of only the first scalar preview values.
121+
- GPS concept conflict checks now compare numeric coordinates and altitude with
122+
explicit tolerances, while grouped candidates that share source entries are
123+
treated as alternate views of the same evidence rather than conflicts.
116124
- Semantic crop queries now expose canonical border-margin candidates for
117125
parseable border/padding XMP text, DNG masked-area candidates, and Phase
118126
One/Leaf geometry margins.
119127
- Vendor RAW/source-processing classification now distinguishes source-private
120128
preview, face-geometry, computational, thermal, and stitch/panorama buckets
121129
in addition to the existing color, white-balance, geometry, storage,
122130
lens-correction, raw-data, sensor, and private-table groups.
131+
- Vendor RAW/source-processing classification now also treats common
132+
computational MakerNote terms such as pixel-shift, multi-shot, composite, and
133+
auto-lighting optimizer fields as source-private processing metadata for
134+
audit and rendered-transfer safety decisions.
123135
- Added focused regression coverage for compatible-file versus rendered-image
124136
transfer safety: compatible mode keeps serializable source RAW/camera-specific
125137
metadata, while rendered mode drops source-specific metadata and uses

docs/api_stability.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ different status.
2626
| EXIF/TIFF/DNG numeric value names: `exif_tag_numeric_value_name(...)` and focused helpers | `openmeta/exif_value_names.h` | Stable | Small helper contract for common enum-like TIFF/EXIF/DNG numeric values such as compression, photometric interpretation, planar configuration, exposure program, metering mode, light source, flash, color space, white balance, scene capture type, gain control, CFA layout, and DNG calibration illuminants. Unknown values return an empty string and remain lossless numeric metadata. |
2727
| Semantic metadata query: `query_metadata(...)`, `query_crop_metadata(...)`, focused query helpers, and `metadata_query_fuzzy_search_available()` | `openmeta/metadata_query.h` | Experimental | Query contract for inspection matches plus normalized candidates. Current coverage includes crop/active-area/border margins, exposure/gain, white balance, color, lens correction, orientation, and RAW/source-processing metadata across standard tags, selected DNG tags, fuzzy XMP paths, and vendor RAW-processing classification. Matches report `exact_match`, `fuzzy_match`, and `fuzzy_score` so tools can label exact results separately from RapidFuzz near-miss hits. `OPENMETA_ENABLE_RAPIDFUZZ=ON` adds optional near-miss XMP/property-path scoring. Grouped candidates include `matrix_set`, `vector_set`, and `table` shapes for related non-crop metadata, including RAW black/white levels, linearization, CFA/sensor layout, source geometry, raw-storage identifiers, and source-processing buckets. Python `Document` and `TransferSourceSnapshot` mirror this as thin dictionary-returning wrappers. |
2828
| Structured metadata interpretation records: `interpret_metadata(...)`, `interpret_metadata_query(...)` | `openmeta/metadata_interpretation.h` | Experimental | Thin structured projection over semantic query candidates. Records carry query class, semantic kind, normalized shape, confidence, source entry ids, and normalized origin/size/rect/margins/value arrays where available. Current scope covers orientation, geometry/crop/border, exposure/gain, color/white-balance, lens-correction, and RAW/source-processing records. Python `Document` and `TransferSourceSnapshot` expose matching dictionary wrappers. |
29-
| Cross-family concept resolution: `resolve_metadata_concepts(...)`, `resolve_metadata_concept(...)` | `openmeta/metadata_concepts.h` | Experimental | First bounded resolver for duplicated host-facing concepts. Current scope reports candidates, candidate source entries, source families, preferred entries, normalized numeric/text keys, normalized date/time fields, date/time precision, timezone kind, normalized geometry fields, and same-role conflicts for orientation, date/time, color/profile, GPS, and geometry evidence across EXIF, XMP, IPTC, ICC, PNG text, and query-backed interpretation records where applicable. Geometry candidates cover crop, active area, border, and sensor geometry with canonical origin, size, rect, and margin fields when available. GPS date/time is combined from `GPSDateStamp` plus `GPSTimeStamp` when both entries exist, and GPS altitude candidates expose altitude-reference code plus below-sea-level state when reference metadata is present. It is intended for inspection UI and host policy decisions; it does not rewrite metadata or hide ambiguity. Python `Document` and `TransferSourceSnapshot` expose matching dictionary wrappers. |
30-
| Vendor RAW-processing summaries: `vendor_raw_processing_from_store(...)`, `classify_vendor_raw_processing_field(...)` | `openmeta/vendor_raw_processing.h` | Experimental | Conservative grouped source-RAW/source-processing field summaries for decoded Sony, Canon, Nikon, Fujifilm, Pentax, Panasonic, Olympus, Kodak, Minolta, Sigma, Samsung, Ricoh, Apple, DJI, Google, FLIR, Casio, Sanyo, KyoceraRaw, Reconyx, HP, JVC, GE, Motorola, Nintendo, and Microsoft MakerNotes, including vendor-private, computational, thermal, preview, face-geometry, stitch/panorama, Apple computational capture/HDR/motion, DJI pose/thermal, Google HDR+/shot-log, and FLIR radiometric/raw-value buckets. Intended for audit/UI and rendered-transfer safety decisions, not for writing vendor RAW/source-processing values into rendered targets. |
29+
| Cross-family concept resolution: `resolve_metadata_concepts(...)`, `resolve_metadata_concept(...)` | `openmeta/metadata_concepts.h` | Experimental | First bounded resolver for duplicated host-facing concepts. Current scope reports candidates, candidate source entries, source families, preferred entries, normalized numeric/text keys, full normalized value vectors, normalized date/time fields, date/time precision, timezone kind, normalized geometry fields, and same-role conflicts for orientation, date/time, color/profile, GPS, geometry, lens-correction, and RAW-processing evidence across EXIF, XMP, IPTC, ICC, PNG text, and query-backed interpretation records where applicable. Geometry candidates cover crop, active area, border, and sensor geometry with canonical origin, size, rect, and margin fields when available. Color/white-balance, lens-correction, and RAW-processing concepts preserve grouped matrix/vector/table values for host inspection; they do not make source-bound values safe to serialize into rendered targets. GPS date/time is combined from `GPSDateStamp` plus `GPSTimeStamp` when both entries exist, and GPS altitude candidates expose altitude-reference code plus below-sea-level state when reference metadata is present. It is intended for inspection UI and host policy decisions; it does not rewrite metadata or hide ambiguity. Python `Document` and `TransferSourceSnapshot` expose matching dictionary wrappers. |
30+
| Vendor RAW-processing summaries: `vendor_raw_processing_from_store(...)`, `classify_vendor_raw_processing_field(...)` | `openmeta/vendor_raw_processing.h` | Experimental | Conservative grouped source-RAW/source-processing field summaries for decoded Sony, Canon, Nikon, Fujifilm, Pentax, Panasonic, Olympus, Kodak, Minolta, Sigma, Samsung, Ricoh, Apple, DJI, Google, FLIR, Casio, Sanyo, KyoceraRaw, Reconyx, HP, JVC, GE, Motorola, Nintendo, and Microsoft MakerNotes, including vendor-private, computational, thermal, preview, face-geometry, stitch/panorama, Apple computational capture/HDR/motion, DJI pose/thermal, Google HDR+/shot-log, pixel-shift/multi-shot/composite/auto-lighting source processing, and FLIR radiometric/raw-value buckets. Intended for audit/UI and rendered-transfer safety decisions, not for writing vendor RAW/source-processing values into rendered targets. |
3131
| Transfer safety audit: `transfer_safety_audit_from_store(...)` | `openmeta/metadata_transfer.h` | Experimental | Preflight summary of source entries and entries filtered or invalidated by `TransferSafetyMode`, including Sony/Canon/Nikon/Fujifilm/Pentax/Panasonic/Olympus/Kodak/Minolta/Sigma/Samsung/Ricoh/Apple/DJI/Google/FLIR/Casio/Sanyo/KyoceraRaw/Reconyx/HP/JVC/GE/Motorola/Nintendo/Microsoft RAW/source-processing buckets. Intended for diagnostics and host UI before preparing rendered-image transfers. |
3232
| Raw-carrier passthrough audit: `raw_carrier_passthrough_audit_from_snapshot(...)` | `openmeta/metadata_transfer.h` | Experimental | Diagnostic preflight for opt-in raw carriers. Reports candidate carriers and primary block reasons such as missing payload, target incompatibility, safety filtering, content-bound C2PA, explicit profile policy, missing decoded-entry links, or unsupported carrier kind. Hosts can call it directly before enabling snapshot passthrough. |
3333
| Source snapshot type and read helpers: `TransferSourceSnapshot`, `read_transfer_source_snapshot_file(...)`, `read_transfer_source_snapshot_bytes(...)`, `build_transfer_source_snapshot(...)` | `openmeta/metadata_transfer.h` | Experimental | Current snapshots are decoded-store-backed by default. Opt-in raw carriers preserve bounded source payload/provenance records and snapshot-local decoded entry ids for host diagnostics and bounded passthrough decisions. Const reuse is safe when callers do not mutate the snapshot and do not share returned result objects across writers. |

docs/development.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ model should stay compact:
1717
| Area | Purpose | Readiness |
1818
| --- | --- | --- |
1919
| Decoding | Find metadata carriers and decode EXIF, XMP, IPTC, ICC, Photoshop IRB, JUMBF/C2PA, EXR, and related blocks into `MetaStore` entries. | High, about 98-100% for the current target scope. |
20-
| Interpretation | Normalize names and values, group entries by meaning, and classify source-bound data such as RAW crop, color, lens-correction, sensor, and vendor-private fields. | Medium-high, about 80%. |
21-
| Query | Find entries by name, fuzzy term, or semantic group, then expose normalized query candidates, structured interpretation records, and bounded cross-family concept resolutions for crop/border/active-area, exposure/gain, color/WB, orientation, date/time, GPS, lens-correction, and RAW/source-processing fields across standard and vendor metadata. | Medium, about 50-60%. |
20+
| Interpretation | Normalize names and values, group entries by meaning, and classify source-bound data such as RAW crop, color, lens-correction, sensor, and vendor-private fields. | Medium-high, about 82%. |
21+
| Query | Find entries by name, fuzzy term, or semantic group, then expose normalized query candidates, structured interpretation records, and bounded cross-family concept resolutions for crop/border/active-area, exposure/gain, color/WB, orientation, date/time, GPS, lens-correction, and RAW/source-processing fields across standard and vendor metadata. | Medium, about 63-70%. |
2222
| Creation | Build fresh metadata entries from host-provided values. | Medium, about 55-65%. |
2323
| Editing | Modify existing logical metadata entries while preserving valid surrounding structure. | Medium, about 60-70%. |
2424
| Transfer | Move metadata between files using explicit compatible-file or rendered-image safety policies. | Medium-high, about 80-85%. |
@@ -64,13 +64,15 @@ into records with query class, semantic kind, normalized shape, confidence,
6464
source entries, and normalized geometry/value arrays where available.
6565

6666
For cross-family duplicated concepts, use `openmeta/metadata_concepts.h`.
67-
It currently resolves orientation, date/time, color/profile, GPS, and geometry
68-
into candidate lists with candidate source entries, source families, preferred
69-
entries, normalized compare keys, parsed date/time fields, date/time precision,
70-
timezone kind, GPS altitude-reference state, canonical geometry
71-
origin/size/rect/margins, and same-role conflict flags. This is deliberately an
72-
inspection/policy surface; host code still decides whether a conflict should be
73-
shown, ignored, or corrected during editing/transfer.
67+
It currently resolves orientation, date/time, color/profile, GPS, geometry,
68+
lens-correction, and RAW-processing into candidate lists with candidate source
69+
entries, source families, preferred entries, normalized compare keys, parsed
70+
date/time fields, date/time precision, timezone kind, GPS altitude-reference
71+
state, canonical geometry origin/size/rect/margins, full normalized value
72+
vectors for grouped matrix/vector/table records, and same-role conflict flags.
73+
This is deliberately an inspection/policy surface; host code still decides
74+
whether a conflict should be shown, ignored, or corrected during
75+
editing/transfer.
7476

7577
## Build Prerequisites
7678

docs/host_integration.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,19 @@ confidence, and normalized geometry/value arrays.
5050

5151
For host code that needs to reconcile duplicated concepts across metadata
5252
families, use `metadata_concepts.h`. It reports orientation, date/time,
53-
color/profile, GPS, and geometry candidates with source families, preferred
54-
entries, and same-role conflict flags. Geometry candidates expose crop,
55-
active-area, border, and sensor-geometry roles with canonical origin, size,
56-
rect, and margin fields when available. Date/time candidates include parsed
57-
date/time fields when the source value is recognizable, plus precision and
58-
timezone-kind fields. GPS timestamps combine `GPSDateStamp` with
53+
color/profile, GPS, geometry, lens-correction, and RAW-processing candidates
54+
with source families, preferred entries, and same-role conflict flags. Geometry
55+
candidates expose crop, active-area, border, and sensor-geometry roles with
56+
canonical origin, size, rect, and margin fields when available. Color/white
57+
balance, lens-correction, and RAW-processing candidates expose full normalized
58+
value vectors for grouped matrix/vector/table records. Date/time candidates
59+
include parsed date/time fields when the source value is recognizable, plus
60+
precision and timezone-kind fields. GPS timestamps combine `GPSDateStamp` with
5961
`GPSTimeStamp` when both are present, and GPS altitude candidates report
6062
whether `GPSAltitudeRef` marked the height as below sea level. Treat this as an
61-
inspection and policy input rather than an automatic metadata rewrite decision.
63+
inspection and policy input rather than an automatic metadata rewrite decision;
64+
source-bound color, lens, and RAW-processing values still need rendered-transfer
65+
safety filtering.
6266

6367
## Adapter Classes
6468

@@ -83,7 +87,7 @@ OpenMeta splits host integration surfaces deliberately:
8387
`metadata_interpretation.h` for query-backed semantic records
8488
- concept-resolution utility:
8589
`metadata_concepts.h` for cross-family orientation, date/time, color/profile,
86-
GPS, and geometry conflict inspection
90+
GPS, geometry, lens-correction, and RAW-processing conflict inspection
8791

8892
## 1. Read Into `MetaStore`
8993

0 commit comments

Comments
 (0)