Skip to content

Add text/plain validation support with error codes and test vectors#9

Open
erik-sv wants to merge 5 commits into
contentauth:mainfrom
encypherai:upstream/feat/text-support
Open

Add text/plain validation support with error codes and test vectors#9
erik-sv wants to merge 5 commits into
contentauth:mainfrom
encypherai:upstream/feat/text-support

Conversation

@erik-sv
Copy link
Copy Markdown

@erik-sv erik-sv commented May 5, 2026

Summary

Adds text/plain format support to the conformance validation tool, including format-specific error codes and negative test vectors.

Depends on:

Changes

  • Text error codes: manifest.text.missing, manifest.text.bad_magic, manifest.text.bad_version, manifest.text.corrupted_wrapper, manifest.text.double_wrapper. Added to all rubric YAML files so text assets are evaluated alongside other formats.

  • Test vectors: Five text files covering the signed happy path and four failure modes (bad magic bytes, unsupported version, corrupted wrapper structure, duplicate wrapper). These serve as regression fixtures for text validation.

  • c2pa-rs submodule update: Points to the commit that includes the TextIO asset handler, enabling native text/plain reading and writing in the validation pipeline.

Context

C2PA Section A.7 defines text as a supported asset type. The c2pa-text reference implementation encodes JUMBF manifest bytes as invisible Unicode Variation Selectors. This PR enables the conformance tool to validate text assets the same way it validates images, audio, and video.

Test plan

  • cargo test passes with text_io available in the vendored c2pa-rs
  • Signed text file validates and produces correct crJSON
  • Negative test vectors trigger the expected error codes
  • Rubric evaluation includes text error code traits

erik-sv and others added 5 commits May 5, 2026 20:26
Add full rubric evaluation pipeline for the C2PA asset conformance
program's composable rubric framework.

json-formula-rs:
- Add normalize_expression() for bare true/false/null keyword rewriting
- Add arg_count() helper and $argN parameterized named expression support
  in register_expression() with globals injection and save/restore pattern

profile-evaluator-rs:
- Add evaluate_rubric_conformance() for whole-crJSON conformance evaluation
  with failIfMatched support and true/false trait bucketing
- Add evaluate_rubric_signals() for per-manifest signal detection with
  inception/transformation grouping, ingredient index resolution,
  assertedBy extraction, and mimeType derivation
- Support both report_text (profiles) and reportText (rubrics) field names
- 36 golden fixture tests matching upstream Python reference evaluator
  output (1 documented deviation: startsWith array projection in ii2i)

c2pa-validate CLI:
- Add -rubric, -rubric-dir, -rubric-mode (conformance/signals),
  -emit-crjson, -crjson, -rubric-strict flags
- Remove crJSON evaluation bail that blocked rubric eval on crJSON inputs
- Add rubric_results to CrJsonValidationReport for structured output

Test fixtures:
- 5 rubric YAML files from c2pa-org/conformance PR #324
- 18 golden test scenarios (54 files) from upstream test suite
When --rubric is used with binary assets, the CLI now extracts crJSON
and runs rubric evaluation even when trust verification fails. This
supports the conformance program onboarding workflow where products
use self-signed certificates before receiving program-issued certs.

The pipeline attempts trust verification first, then falls back to
reading with verify_trust disabled. The crJSON validationResults still
reflect the untrusted state, so the trusted_success trait fails as
expected while all other conformance traits can be evaluated.

Also fix structured JSON output for CrJsonValidation report items so
rubric results are properly serialized instead of null.
…andling

Move $argN injection and bare-keyword normalization from json-formula-rs
to profile-evaluator-rs. This keeps json-formula-rs upstream-compatible
while supporting parameterized named expressions in rubric evaluation.

json-formula-rs retains only register_function() and globals_mut() as
additions over upstream. All expression preprocessing now runs in the
evaluator layer at registration time.
Add 6 manifest.text.* well-formedness codes (corruptedWrapper,
multipleWrappers, invalidMagic, unsupportedVersion, lengthMismatch,
emptyManifest) to all four rubric YAMLs. These codes enable
well_formed_success to catch text wrapper validation failures.

Add text test vectors in testfiles/encypher-assets/document/text/:
- signed_test.txt: valid signed text (positive vector)
- signed_test_txt.json: crJSON extracted from signed text
- double_wrapper.txt: duplicate wrapper (negative)
- corrupted_wrapper.txt: truncated JUMBF (negative)
- bad_magic.txt: wrong magic bytes (negative)
- bad_version.txt: unsupported version byte (negative)
@erik-sv erik-sv closed this May 5, 2026
@erik-sv erik-sv reopened this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant