Skip to content

Add text/plain asset handler via c2pa-text crate#2117

Open
erik-sv wants to merge 2 commits into
contentauth:mainfrom
erik-sv:feat/text-io-handler
Open

Add text/plain asset handler via c2pa-text crate#2117
erik-sv wants to merge 2 commits into
contentauth:mainfrom
erik-sv:feat/text-io-handler

Conversation

@erik-sv
Copy link
Copy Markdown

@erik-sv erik-sv commented May 5, 2026

Summary

Adds a TextIO asset handler for text/plain files, implementing C2PA manifest read, write, and remove support for plain text assets per Section A.7 of the C2PA specification.

The handler uses the c2pa-text reference implementation crate, which encodes JUMBF manifest bytes as invisible Unicode Variation Selectors. This encoding is reversible, preserves the visible text content, and survives copy-paste in most environments.

Changes

  • sdk/src/asset_handlers/text_io.rs (new): TextIO struct implementing CAIReader, CAIWriter, and AssetIO. Hash object positions cover the full text with an exclusion range for the embedded manifest.
  • sdk/src/asset_handlers/mod.rs: Register text_io module.
  • sdk/src/jumbf_io.rs: Add TextIO to reader, writer, and test handler lists.
  • sdk/src/utils/mime.rs: Map txt extension and text/plain MIME type.
  • sdk/Cargo.toml: Add c2pa-text dependency (crates.io v1.1.0).

Notes

  • NFC normalization (Section A.7.5) is handled by the c2pa-text crate in both embed_manifest and extract_manifest.
  • ExtractionResult.offset and ExtractionResult.length are byte offsets (via Rust's char_indices()), matching the byte-level HashObjectPositions contract.
  • AssetPatch is not implemented in this PR. The c2pa-text crate's encode_wrapper_padded function supports fixed-size wrappers for in-place patching, but wiring that up is deferred to a follow-up.

Context

C2PA Section A.7 defines text as a supported asset type. The c2pa-text crate provides the encoding layer. This handler integrates it into c2pa-rs so that plain text files can be signed and verified the same way as images, audio, and video.

Encypher co-chairs the C2PA text task force and maintains the c2pa-text reference implementation.

Test plan

  • cargo test passes with text_io registered in all handler maps
  • Round-trip: sign a .txt file, read back via Reader, verify manifest integrity
  • text/plain and txt resolve correctly in MIME utility functions
  • Hash exclusion range correctly spans the embedded manifest bytes

Add a TextIO asset handler that embeds and extracts C2PA JUMBF manifests
in plain text files using the c2pa-text crate. The crate encodes binary
manifest data as invisible Unicode Variation Selectors, following the
C2PA text embedding specification (Section A.7).

The handler implements CAIReader, CAIWriter, and AssetPatch for full
read/write/patch support. Hash object positions span the entire text
content with an exclusion range covering the embedded manifest bytes.

Registers "txt" and "text/plain" as supported types in the MIME utility
and adds TextIO to all three handler maps (readers, writers, file-based).

The c2pa-text reference implementation is at:
  https://github.com/encypherai/c2pa-text
@dcondrey
Copy link
Copy Markdown
Contributor

dcondrey commented May 14, 2026

A few things I noticed:

  1. The git dependency on c2pa-text won't work for downstream consumers pulling c2pa-rs from crates.io. This needs to be a published crate or vendored.
  2. The spec requires NFC normalization before hashing (Section A.7.5). I don't see that in the handler or in c2pa-text — if the crate doesn't normalize, hashes will differ across platforms that store text in different normalization forms.
  3. The PR description mentions AssetPatch support but I don't see an implementation.
  4. get_object_locations_from_stream uses text.len() for byte length — are result.offset and result.length from c2pa-text guaranteed to be byte offsets rather than character offsets? The hash exclusion range needs to be in bytes.
  5. Only text/plain and txt are registered. The spec's unstructured text method applies to any text file without a format-specific embedding section — worth considering whether this should handle additional text/* subtypes.

Git dependencies are rejected by crates.io during publish. c2pa-text
v1.1.0 is already published on crates.io, so reference it directly.
@erik-sv
Copy link
Copy Markdown
Author

erik-sv commented May 15, 2026

@dcondrey Thanks for the close read. Addressed inline:

1. Git dependency. You're right - crates.io rejects git sources. c2pa-text is already published on crates.io at v1.1.0. Fixed in 40c165d; the dependency now points to the registry version.

2. NFC normalization. The c2pa-text crate handles this. embed_manifest normalizes input text to NFC before appending the wrapper, and extract_manifest normalizes clean text on output. See lib.rs L162 - the crate depends on unicode-normalization for this. The handler doesn't need to normalize separately.

3. AssetPatch. Good catch - the original PR description was wrong. The handler implements CAIReader, CAIWriter, and AssetIO, not AssetPatch. Updated the description. The c2pa-text crate does expose encode_wrapper_padded for fixed-size wrappers that would enable in-place patching, but wiring that up is follow-up work.

4. Byte offsets. ExtractionResult.offset and .length are byte offsets. The extraction code uses Rust's char_indices(), which yields (byte_index, char) tuples, and the wrapper end is either chars[j].0 (byte position) or text.len() (byte count). These are in the same unit as text.len() in the handler, so the hash exclusion ranges are correct.

5. Additional text/ subtypes. Agreed this is worth considering. Starting with text/plain keeps the initial surface small. Formats with structural syntax (markdown, HTML, CSV) may need their own handlers to avoid embedding in positions that break format parsers, so they're better handled incrementally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants