Skip to content

Add mermaid diagrams to Xet protocol spec docs#2299

Open
rajatarya wants to merge 5 commits intomainfrom
rajat/docs-xet-protocol-diagrams
Open

Add mermaid diagrams to Xet protocol spec docs#2299
rajatarya wants to merge 5 commits intomainfrom
rajat/docs-xet-protocol-diagrams

Conversation

@rajatarya
Copy link
Copy Markdown
Contributor

@rajatarya rajatarya commented Mar 18, 2026

Summary

  • Replace ASCII art wire layouts with mermaid packet diagrams across xorb.md and shard.md (chunk header, all shard binary structures, footer)
  • Add block diagram to index.md showing the overall Xet architecture flow
  • Add flowcharts for the CDC chunking algorithm (chunking.md) and hash computation paths (hashing.md)
  • Add sequence diagram for the file ID resolve flow (file-id.md)

Existing sequence diagrams in upload-protocol.md, download-protocol.md, and auth.md were already good and left unchanged.

Test plan

  • Verify mermaid diagrams render correctly on HF docs site
  • Check packet diagram bit ranges match the documented byte offsets

🤖 Generated with Claude Code


Note

Low Risk
Low risk documentation-only change; primary risk is incorrect mermaid syntax or mislabeled offsets/bit ranges causing reader confusion.

Overview
Adds mermaid diagrams across the Xet protocol spec docs to replace/augment text-only explanations.

This introduces: a CDC decision flowchart in chunking.md, a hashing flowchart in hashing.md, a file-id resolve sequence diagram in file-id.md, and an overall architecture block diagram in index.md. It also replaces several ASCII-art binary layout tables with mermaid packet diagrams in shard.md and xorb.md, and tweaks the deduplication flow diagram labels in deduplication.md.

Written by Cursor Bugbot for commit 0f4a765. This will update automatically on new commits. Configure here.

- index.md: block diagram showing overall Xet architecture (file → chunks → xorbs → shard → CAS)
- xorb.md: packet diagram for chunk header wire layout (replaces ASCII art)
- shard.md: packet diagrams for all binary structures — header, FileDataSequenceHeader/Entry, FileVerificationEntry, FileMetadataExt, CASChunkSequenceHeader/Entry, footer (replaces ASCII art)
- chunking.md: flowchart for the CDC boundary decision algorithm
- hashing.md: flowchart showing the four hash computation paths (chunk, xorb, file, verification)
- file-id.md: sequence diagram for the resolve URL → X-Xet-Hash flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rajatarya rajatarya requested a review from assafvayner March 18, 2026 02:33
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rajatarya and others added 2 commits March 17, 2026 19:39
All packet diagrams now use 1 unit = 1 byte instead of 1 unit = 1 bit.
This prevents 32-byte hash fields from spanning 8 rows with repeated labels,
making the diagrams much more compact and readable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 8-byte chunk header is too small for byte-level units (cells are
unreadably tiny on the 32-unit row). Bit-level gives 2 well-proportioned
rows of 32 bits each with readable labels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rajatarya
Copy link
Copy Markdown
Contributor Author

The Mermaid rendering doesn't look great, adds unnecessary

tags in the renderings. Let me know if you'd rather I abandon these changes because they don't look good.

rajatarya and others added 2 commits March 17, 2026 19:47
Some markdown-to-HTML converters split on blank lines within fenced
code blocks and inject <p> tags before mermaid processes the content.
Removing all blank lines inside mermaid blocks fixes this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unquoted square brackets in mermaid flowchart nodes (e.g. A[Text]) can
be interpreted as markdown link references by some parsers, causing
<p></p> tags to wrap each label. Using quoted strings (A["Text"]) fixes this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra markup in the rendered output look suspect, @mishig25

Copy link
Copy Markdown
Contributor

@assafvayner assafvayner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some visuals not rendering right, but it's worth fixing imo

Comment thread docs/xet/deduplication.md
Comment on lines 57 to 63
```mermaid
graph TD
A[File Input] --> B[Content-Defined Chunking]
B --> C[Hash Computation]
C --> D[Chunk Creation]
D --> E[Deduplication Query]
A["File Input"] --> B["Content-Defined Chunking"]
B --> C["Hash Computation"]
C --> D["Chunk Creation"]
D --> E["Deduplication Query"]
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this change doesn't render correctly, we get a bunch of

Comment thread docs/xet/shard.md
│(8 bytes)│ (8 bytes)│ │(8 bytes)│
└─────────┴──────────┴─────────────────────────────────────────────────────────────────────────────┴─────────┘
104 112 120 192 200
```mermaid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to make these not scroll, stretch them vertically.

Comment thread docs/xet/hashing.md
ChunkHash --> CH
CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY"| XorbHash["Xorb Hash"]
CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY\nthen blake3(root, zeros)"| FileHash["File Hash"]
CH -->|"blake3(concat hashes,\nVERIFICATION_KEY)"| VerifHash["Term Verification Hash"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunch of

in this one too, needs to probably drop the quote

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants