This document explains what happens when you run ipfs add to import files into IPFS. Understanding this flow helps when debugging, optimizing imports, or building applications on top of IPFS.
- The Big Picture
- Try It Yourself
- Step by Step
- Options
- UnixFS Format
- Code Architecture
- Further Reading
When you add a file to IPFS, three main things happen:
- Chunking - The file is split into smaller pieces
- DAG Building - Those pieces are organized into a tree structure (a Merkle DAG)
- Pinning - The root of the tree is pinned so it persists in your local node
The result is a Content Identifier (CID) - a hash that uniquely identifies your content and can be used to retrieve it from anywhere in the IPFS network.
flowchart LR
A["Your File<br/>(bytes)"] --> B["Chunker<br/>(split data)"]
B --> C["DAG Builder<br/>(tree)"]
C --> D["CID<br/>(hash)"]
# Add a simple file
echo "Hello World" > hello.txt
ipfs add hello.txt
# added QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u hello.txt
# See what's inside
ipfs cat QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
# Hello World
# View the DAG structure
ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8uBig files are split into chunks because:
- Large files need to be broken down for efficient transfer
- Identical chunks across files are stored only once (deduplication)
- You can fetch parts of a file without downloading the whole thing
Chunking strategies (set with --chunker):
| Strategy | Description | Best For |
|---|---|---|
size-N |
Fixed size chunks | General use |
rabin |
Content-defined chunks using rolling hash | Deduplication across similar files |
buzhash |
Alternative content-defined chunking | Similar to rabin |
See ipfs add --help for current defaults, or Import for making them permanent.
Content-defined chunking (rabin/buzhash) finds natural boundaries in the data. This means if you edit the middle of a file, only the changed chunks need to be re-stored - the rest can be deduplicated.
Each chunk becomes a leaf node in a tree. If a file has many chunks, intermediate nodes group them together. This creates a Merkle DAG (Directed Acyclic Graph) where:
- Each node is identified by a hash of its contents
- Parent nodes contain links (hashes) to their children
- The root node's hash becomes the file's CID
Layout strategies:
Balanced layout (default):
graph TD
Root --> Node1[Node]
Root --> Node2[Node]
Node1 --> Leaf1[Leaf]
Node1 --> Leaf2[Leaf]
Node2 --> Leaf3[Leaf]
All leaves at similar depth. Good for random access - you can jump to any part of the file efficiently.
Trickle layout (--trickle):
graph TD
Root --> Leaf1[Leaf]
Root --> Node1[Node]
Root --> Node2[Node]
Node1 --> Leaf2[Leaf]
Node2 --> Leaf3[Leaf]
Leaves added progressively. Good for streaming - you can start reading before the whole file is added.
As the DAG is built, each node is stored in the blockstore:
- Normal mode: Data is copied into IPFS's internal storage (
~/.ipfs/blocks/) - Filestore mode (
--nocopy): Only references to the original file are stored (saves disk space but the original file must remain in place)
By default, added content is pinned (ipfs add --pin=true). This tells your IPFS node to keep this data - without pinning, content may eventually be removed to free up space.
Instead of pinning, you can use the Mutable File System (MFS) to organize content using familiar paths like /photos/vacation.jpg instead of raw CIDs:
# Add directly to MFS path
ipfs add --to-files=/backups/ myfile.txt
# Or copy an existing CID into MFS
ipfs files cp /ipfs/QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u /docs/hello.txtContent in MFS is implicitly pinned and stays organized across node restarts.
Run ipfs add --help to see all available options for controlling chunking, DAG layout, CID format, pinning behavior, and more.
IPFS uses UnixFS to represent files and directories. UnixFS is an abstraction layer that:
- Gives names to raw data blobs (so you can have
/foo/bar.txtinstead of just hashes) - Represents directories as lists of named links to other nodes
- Organizes large files as trees of smaller chunks
- Makes these structures cryptographically verifiable - any tampering is detectable because it would change the hashes
With --raw-leaves, leaf nodes store raw data without the UnixFS wrapper. This is more efficient and is the default when using CIDv1.
The add flow spans several layers:
flowchart TD
subgraph CLI ["CLI Layer (kubo)"]
A["core/commands/add.go<br/>parses flags, shows progress"]
end
subgraph API ["CoreAPI Layer (kubo)"]
B["core/coreapi/unixfs.go<br/>UnixfsAPI.Add() entry point"]
end
subgraph Adder ["Adder (kubo)"]
C["core/coreunix/add.go<br/>orchestrates chunking, DAG building, MFS, pinning"]
end
subgraph Boxo ["boxo libraries"]
D["chunker/ - splits data into chunks"]
E["ipld/unixfs/ - DAG layout and UnixFS format"]
F["mfs/ - mutable filesystem abstraction"]
G["pinning/ - pin management"]
H["blockstore/ - block storage"]
end
A --> B --> C --> Boxo
| Component | Location |
|---|---|
| CLI command | core/commands/add.go |
| API implementation | core/coreapi/unixfs.go |
| Adder logic | core/coreunix/add.go |
| Chunking | boxo/chunker |
| DAG layouts | boxo/ipld/unixfs/importer |
| MFS | boxo/mfs |
| Pinning | boxo/pinning/pinner |
The Adder type in core/coreunix/add.go is the workhorse. It:
- Creates an MFS root - temporary in-memory filesystem for building the DAG
- Processes files recursively - chunks each file and builds DAG nodes
- Commits to blockstore - persists all blocks
- Pins the result - keeps content from being removed
- Returns the root CID
Key methods:
AddAllAndPin()- main entry pointaddFileNode()- handles a single file or directoryadd()- chunks data and builds the DAG using boxo's layout builders