Vite-plus implements a sophisticated caching system to avoid re-running tasks when their inputs haven't changed. This document describes the architecture, design decisions, and implementation details of the task cache system.
The task cache system enables:
- Incremental builds: Only run tasks when inputs have changed
- Shared caching: Multiple tasks with identical commands can share cache entries
- Individual task run caching: Tasks with different arguments get separate cache entries
- Content-based hashing: Cache keys based on actual content, not timestamps
- Output replay: Cached stdout/stderr are replayed exactly as originally produced
- Two-tier caching: Command-level cache shared across tasks, with task-run associations
For tasks defined as below:
the task cache system is able to hit the same cache for test task and for the first subcommand in build task:
- user runs
vite run build-> no cache hit. runecho $fooand create cache - user runs
vite run testecho $foo-> hit cache created in step 1 and replayecho $bar-> no cache hit. runecho testand create cache
- user changes env
$foo - user runs
vite run testecho $foo- the cache system should be able to locate the cache that was created in step 1 and hit in step 2.1
- compare the command fingerprint and report cache miss because
$foois changed. - re-run and replace the cache with a new one.
echo $bar-> hit cache created in step 2.2 and replay
- user runs
vite run build: hit the cache created in step 4.1.3 and replay.
┌──────────────────────────────────────────────────────────────────────────────────────┐
│ Task Execution Flow │
├──────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Task Request │
│ ──────────────── │
│ app#build │
│ │ │
│ ▼ │
│ 2. Cache Key Generation │
│ ────────────────────── │
│ • Command fingerprint (includes cwd) │
│ • Task arguments │
│ • Environment variables │
│ │ │
│ ▼ │
│ 3. Cache Lookup (SQLite) │
│ ──────────────────────── │
│ ┌─────────────────┬──────────────────────┐──────────────────────────┐ │
│ │ Cache Hit │ Cache Not Found │ Cache Found but Miss │ │
│ └────────┬────────┴─────────┬────────────┘──────────────────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ 4a. Validate Fingerprint 4b. Execute Task ◀───── 4c. Report what change |
│ ──────────────────────── ──────────────── causes the miss │
│ • Config match? • Run command │
│ • Inputs unchanged? • Monitor files (fspy) │
│ • Command same? • Capture stdout/stderr │
│ │ │ │
│ ▼ ▼ │
│ 5a. Replay Outputs 5b. Store in Cache │
│ ────────────────── ────────────────── │
│ • Write to stdout • Save fingerprint │
│ • Write to stderr • Save outputs │
│ • Update database │
│ │
└──────────────────────────────────────────────────────────────────────────────────────┘
The command cache key uniquely identifies a command execution context:
pub struct CommandCacheKey {
pub command_fingerprint: CommandFingerprint, // Execution context
pub args: Arc<[Str]>, // CLI arguments
}The command fingerprint captures the complete execution context:
pub struct CommandFingerprint {
pub cwd: RelativePathBuf, // Working directory, relative to workspace root
pub command: TaskCommand, // Shell script or parsed command
pub envs_without_pass_through: HashMap<Str, Str>, // Environment variables (excludes pass-through)
}
pub enum TaskCommand {
ShellScript(Str), // Raw shell script
Parsed(TaskParsedCommand), // Parsed command with program and args
}This ensures cache invalidation when:
- Working directory changes (package location changes)
- Command or arguments change
- Declared environment variables differ (pass-through envs don't affect cache)
The envs_without_pass_through field is crucial for cache correctness:
- Only includes envs explicitly declared in the task's
envsarray - Does NOT include pass-through envs (PATH, CI, etc.)
- These envs become part of the cache key
When a task runs:
- All envs (including pass-through) are available to the process
- Only declared envs affect the cache key
- If a declared env changes value, cache will miss
- If a pass-through env changes, cache will still hit
For built-in tasks (lint, build, test):
- The resolver provides envs which become part of the fingerprint
- If resolver provides different envs between runs, cache breaks
- Each built-in task type must have unique task name to avoid cache collision
The complete task fingerprint includes input files tracked during execution:
pub struct TaskFingerprint {
pub resolved_config: ResolvedTaskConfig, // Task configuration
pub command_fingerprint: CommandFingerprint, // Command execution context
pub inputs: HashMap<RelativePathBuf, PathFingerprint>, // Input file states
}The task ID uniquely identifies a task:
pub struct TaskId {
/// The name in `vite-task.json`, or the name of the `package.json` script containing this task.
/// See [`terminologies.md`](./terminologies.md) for details
pub task_group_name: Str,
/// The path of the package containing this task, relative to the monorepo root.
/// We don't use package names as they can be the same for different packages.
pub package_dir: RelativePathBuf,
/// The index of the subcommand in a parsed command (`echo A && echo B`).
/// None if the task is the last command.
pub subcommand_index: Option<usize>,
}The cache system maintains (CommandCacheKey, TaskId) relationship in order to locate the previous cache of the same task. This is a one-to-many relationship.
Vite-plus uses fspy to monitor file system access during task execution:
┌──────────────────────────────────────────────────────────────┐
│ File System Monitoring │
├──────────────────────────────────────────────────────────────┤
│ │
│ Task Execution: │
│ ────────────── │
│ 1. Start fspy monitoring │
│ 2. Execute task command │
│ 3. Capture accessed files │
│ 4. Stop monitoring │
│ │ │
│ ▼ │
│ Fingerprint Generation: │
│ ────────────────────── │
│ For each accessed file: │
│ • Check if file exists │
│ • If file: Hash contents with xxHash3 │
│ • If directory: Record structure │
│ • If missing: Mark as NotFound │
│ │ │
│ ▼ │
│ Path Fingerprint Types: │
│ ────────────────────── │
│ enum PathFingerprint { │
│ NotFound, // File doesn't exist │
│ FileContentHash(u64), // xxHash3 of content │
│ Folder(Option<HashMap>), // Directory listing │
| } ▲ │
│ │ |
| This value is `None` when fspy reports that the task is |
| opening a folder but not reading its entries. This can |
| happen when the opened folder is used as a dirfd for |
| `openat(2)`. In such case, the folder's entries don't need |
| to be fingerprinted. |
| Folders with empty entries fingerprinted are represented as |
| `Folder(Some(empty hashmap))`. |
│ │
└──────────────────────────────────────────────────────────────┘
When a cache entry exists, the fingerprint is validated to detect changes:
pub enum CacheMiss {
NotFound, // No cache entry exists
FingerprintMismatch { // Cache exists but invalid
reason: FingerprintMismatchReason,
},
}
pub enum FingerprintMismatchReason {
ConfigChanged, // Task configuration changed
CommandChanged, // Command fingerprint differs
InputsChanged, // Input files modified
}Vite-plus uses SQLite with WAL (Write-Ahead Logging) mode for cache storage:
// Database initialization
let conn = Connection::open(cache_path)?;
conn.pragma_update(None, "journal_mode", "WAL")?; // Better concurrency
conn.pragma_update(None, "synchronous", "NORMAL")?; // Balance speed/safety-- Simple key-value store for commands cache
CREATE TABLE commands (
key BLOB PRIMARY KEY, -- Serialized CommandsCacheKey
value BLOB -- Serialized CachedTask
);
-- One-to-many relationships between commands and tasks
CREATE TABLE commands_tasks (
command_key BLOB, -- Serialized CommandsCacheKey
task_id BLOB -- Serialized TaskId
);Cache entries are serialized using bincode for efficient storage:
pub struct CachedTask {
pub fingerprint: TaskFingerprint, // Complete task state
pub std_outputs: Arc<[StdOutput]>, // Captured outputs
}
pub struct StdOutput {
pub kind: OutputKind, // StdOut or StdErr
pub content: MaybeString, // Binary or UTF-8 content
}┌──────────────────────────────────────────────────────────────┐
│ Cache Hit Process │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Generate Cache Keys │
│ ────────────────────── │
│ TaskRunKey { │
│ task_id: TaskId { ... }, │
│ args: ["--production"] │
│ } │
│ CommandFingerprint { │
│ cwd: "packages/app", │
│ command: Parsed(...), │
│ envs_without_pass_through: {...}, │
│ pass_through_envs: {...} │
│ } │
│ │ │
│ ▼ │
│ 2. Query Command Cache │
│ ────────────────────── │
│ SELECT value FROM command_cache WHERE key = command_fp │
│ │ │
│ ▼ │
│ 3. Deserialize CommandCacheValue │
│ ───────────────────────────── │
│ CommandCacheValue { │
│ post_run_fingerprint: PostRunFingerprint { ... }, │
│ std_outputs: [StdOutput, ...] │
│ } │
│ │ │
│ ▼ │
│ 4. Validate Post-Run Fingerprint │
│ ───────────────────────────────── │
│ • Check input file hashes │
│ • Detect file content changes │
│ │ │
│ ▼ │
│ 5. Replay Outputs & Update Association │
│ ────────────────────────────────────── │
│ • Write to stdout/stderr │
│ • Preserve original order │
│ • Update taskrun_to_command mapping │
│ │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Cache Miss Process │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Execute Task with Monitoring │
│ ─────────────────────────────── │
│ • Start fspy file monitoring │
│ • Capture stdout/stderr │
│ • Execute command │
│ • Stop monitoring │
│ │ │
│ ▼ │
│ 2. Generate Post-Run Fingerprint │
│ ───────────────────────────────── │
│ • Hash all accessed files │
│ • Record file system access patterns │
│ │ │
│ ▼ │
│ 3. Create CommandCacheValue │
│ ────────────────────────── │
│ CommandCacheValue { │
│ post_run_fingerprint: generated_fingerprint, │
│ std_outputs: captured_outputs │
│ } │
│ │ │
│ ▼ │
│ 4. Store in Database Tables │
│ ─────────────────────────── │
│ INSERT OR REPLACE INTO command_cache │
│ VALUES (command_fingerprint, cache_value) │
│ INSERT OR REPLACE INTO taskrun_to_command │
│ VALUES (task_run_key, command_fingerprint) │
│ │
└──────────────────────────────────────────────────────────────┘
Cache entries are automatically invalidated when:
- Command changes: Different command, arguments, or working directory
- Package location changes: Working directory (
cwd) in command fingerprint changes - Environment changes: Modified declared environment variables (pass-through values don't affect cache)
- Pass-through config changes: Pass-through environment names added/removed from configuration
- Input files change: Content hash differs (detected via xxHash3)
- File structure changes: Files added, removed, or type changed
- Built-in task location: Built-in tasks run from different directories get separate caches
// Two-level fingerprint validation during cache lookup
pub async fn try_hit(
&self,
task: &ResolvedTask,
fs: &impl FileSystem,
base_dir: &AbsolutePath,
) -> Result<Result<CommandCacheValue, CacheMiss>, Error> {
let task_run_key = TaskRunKey { task_id: task.id(), args: task.args.clone() };
let command_fingerprint = &task.resolved_command.fingerprint;
if let Some(cache_value) = self.get_command_cache_by_command_fingerprint(command_fingerprint).await? {
// Command fingerprint matches, validate post-run fingerprint
if let Some(post_run_mismatch) = cache_value.post_run_fingerprint.validate(fs, base_dir)? {
Ok(Err(CacheMiss::FingerprintMismatch(
FingerprintMismatch::PostRunFingerprintMismatch(post_run_mismatch),
)))
} else {
// Cache hit, update association
self.upsert_taskrun_to_command(&task_run_key, command_fingerprint).await?;
Ok(Ok(cache_value))
}
} else if let Some(old_command_fp) = self.get_command_fingerprint_by_task_run_key(&task_run_key).await? {
// Task run exists but command fingerprint changed
Ok(Err(CacheMiss::FingerprintMismatch(
FingerprintMismatch::CommandFingerprintMismatch(
command_fingerprint.diff(&old_command_fp),
),
)))
} else {
// No cache found
Ok(Err(CacheMiss::NotFound))
}
}Vite-plus uses xxHash3 for file content hashing, providing excellent performance:
use xxhash_rust::xxh3::xxh3_64;
pub fn hash_file_content(content: &[u8]) -> u64 {
xxh3_64(content) // ~10GB/s on modern CPUs
}Instead of scanning all possible input files, fspy monitors actual file access:
┌──────────────────────────────────────────────────────────────┐
│ Efficient File Tracking │
├──────────────────────────────────────────────────────────────┤
│ │
│ Traditional Approach: │
│ ──────────────────── │
│ Scan all src/**/*.ts files → Hash everything │
│ Problem: Hashes files never accessed │
│ │
│ Vite-plus Approach: │
│ ────────────────── │
│ Monitor with fspy → Hash only accessed files │
│ Benefit: Minimal work, accurate dependencies │
│ │
└──────────────────────────────────────────────────────────────┘
// WAL mode for better concurrency
conn.pragma_update(None, "journal_mode", "WAL")?;
// Balanced durability for performance
conn.pragma_update(None, "synchronous", "NORMAL")?;
// Prepared statements for efficiency
let mut stmt = conn.prepare_cached(
"SELECT value FROM tasks WHERE key = ?"
)?;Using bincode for compact, fast serialization:
// Efficient binary encoding
let key_bytes = bincode::encode_to_vec(&cache_key, config)?;
let value_bytes = bincode::encode_to_vec(&cached_task, config)?;
// Direct storage without text conversion
stmt.execute(params![key_bytes, value_bytes])?;The cache location can be configured via environment variable:
# Custom cache location
VITE_CACHE_PATH=/tmp/vite-cache vite run build
# Default: node_modules/.vite/task-cache in workspace root
vite run buildTasks can be marked as cacheable in vite-task.json:
{
"tasks": {
"build": {
"command": "tsc && rollup -c",
"cacheable": true,
"dependsOn": ["^build"]
},
"deploy": {
"command": "deploy-script.sh",
"cacheable": false // Never cache deployment tasks
},
"test": {
"command": "jest",
"cacheable": true
}
}
}- Default: Tasks are cacheable unless explicitly disabled
- Compound commands: Each subcommand cached independently
- Dependencies: Cache considers task dependencies
pub struct StdOutput {
pub kind: OutputKind, // StdOut or StdErr
pub content: MaybeString, // Binary-safe content
}
pub struct MaybeString(Vec<u8>);Outputs are captured exactly as produced:
- Preserves order of stdout/stderr interleaving
- Handles binary output (e.g., from tools that output progress bars)
- Maintains ANSI color codes and formatting
When a task hits cache, outputs are replayed exactly:
┌──────────────────────────────────────────────────────────────┐
│ Output Replay │
├──────────────────────────────────────────────────────────────┤
│ │
│ Cached Outputs: │
│ ────────────── │
│ [ │
│ StdOutput { kind: StdOut, "Compiling..." }, │
│ StdOutput { kind: StdErr, "Warning: ..." }, │
│ StdOutput { kind: StdOut, "✓ Build complete" } │
│ ] │
│ │ │
│ ▼ │
│ Replay Process: │
│ ────────────── │
│ 1. Write "Compiling..." to stdout │
│ 2. Write "Warning: ..." to stderr │
│ 3. Write "✓ Build complete" to stdout │
│ │ │
│ ▼ │
│ Result: Identical output as original execution │
│ │
└──────────────────────────────────────────────────────────────┘
// Task: app#build --production
TaskRunKey {
task_id: TaskId {
task_group_id: TaskGroupId {
task_group_name: "build".into(),
is_builtin: false,
config_path: RelativePathBuf::from("packages/app"),
},
subcommand_index: None,
},
args: vec!["--production"].into(),
}
CommandFingerprint {
cwd: RelativePathBuf::from("packages/app"),
command: TaskCommand::ShellScript("tsc && rollup -c".into()),
envs_without_pass_through: btreemap! {
"NODE_ENV".into() => "production".into()
},
pass_through_envs: btreeset! { "PATH".into(), "HOME".into() },
}// Synthetic task (e.g., "vite lint" in a task script)
TaskRunKey {
task_id: TaskId {
task_group_id: TaskGroupId {
task_group_name: "lint".into(),
is_builtin: true,
config_path: RelativePathBuf::from("packages/frontend"), // Current working directory
},
subcommand_index: None,
},
args: vec![].into(),
}
CommandFingerprint {
cwd: RelativePathBuf::from("packages/frontend"),
command: TaskCommand::Parsed(TaskParsedCommand {
program: "/usr/local/bin/oxlint".into(),
args: vec![".".into()].into(),
envs: HashMap::new(),
}),
envs_without_pass_through: BTreeMap::new(),
pass_through_envs: btreeset! { "PATH".into() },
}# Enable debug logging
VITE_LOG=debug vite run build
# Show cache operations
VITE_LOG=trace vite run build[DEBUG] Cache lookup for app#build
[DEBUG] Cache key: TaskCacheKey { command_fingerprint: ..., args: ... }
[DEBUG] Cache hit! Validating fingerprint...
[DEBUG] Fingerprint mismatch: InputsChanged
[DEBUG] File src/index.ts changed (hash: 0x1234... → 0x5678...)
[DEBUG] Cache miss, executing task
- NotFound: No cache entry exists (first run or after cache clear)
- CommandFingerprintMismatch: Command, args, environment variables, or pass-through config changed
- PostRunFingerprintMismatch: Source files modified or file structure changed
From the test cases, cache miss messages include:
Cache miss: foo.txt content changed- Input file content changedCache miss: Command fingerprint changed: CommandFingerprintDiff { ... }- Command changed- Pass-through env config change:
pass_through_envs: BTreeSetDiff { added: {}, removed: {"MY_ENV2"} } - Environment value change:
envs_without_pass_through: HashMapDiff { altered: {"FOO": Some("1")}, removed: {} }
Ensure commands produce identical outputs for identical inputs:
// ❌ Bad: Non-deterministic output
{
"tasks": {
"build": {
"command": "echo Built at $(date) && tsc"
}
}
}
// ✅ Good: Deterministic output
{
"tasks": {
"build": {
"command": "tsc && echo Build complete"
}
}
}Tasks with identical commands automatically share cache entries:
{
"scripts": {
"script1": "cat foo.txt",
"script2": "cat foo.txt"
}
}Behavior:
vite run script1creates command cache forcat foo.txtvite run script2hits the same command cache (shared)- If
foo.txtchanges, both tasks will see cache miss on next run - Cache update from either task benefits the other
Tasks with different arguments get separate cache entries:
# These create separate caches
vite run echo -- a # TaskRunKey with args: ["a"]
vite run echo -- b # TaskRunKey with args: ["b"]Leverage compound commands for per-subcommand caching:
{
"scripts": {
"build": "tsc && rollup -c && terser dist/bundle.js"
}
}Benefit: Each && separated command is cached independently. If only terser config changes, TypeScript and rollup will hit cache.
{
"tasks": {
"deploy": {
"command": "deploy-to-production.sh",
"cacheable": false // Always run fresh
},
"notify": {
"command": "slack-webhook.sh",
"cacheable": false // Side effect: sends notification
}
}
}The cache system automatically tracks accessed files:
// This file access is automatically tracked
import config from './config.json';
// Dynamic imports are also tracked
const module = await import(`./locales/${lang}.json`);
// File system operations are monitored
const data = fs.readFileSync('data.txt');No need to manually specify inputs - fspy captures actual dependencies.
# Initial run creates command cache
> vite run script1
Cache not found
bar
# Different task, same command - hits shared cache
> vite run script2
Cache hit, replaying
bar
# File change invalidates shared cache
> echo baz > foo.txt
> vite run script2
Cache miss: foo.txt content changed
baz
# Original task benefits from updated cache
> vite run script1
Cache hit, replaying
baz# Different args create separate caches
> vite run echo -- a
Cache not found
a
> vite run echo -- b
Cache not found
b
# Each argument combination has its own cache
> vite run echo -- a
Cache hit, replaying
a
> vite run echo -- b
Cache hit, replaying
b# Different directories create separate caches for tasks
> cd folder1 && vite run lint
Cache not found
Found 0 warnings and 0 errors.
> cd folder2 && vite run lint
Cache not found # Different cwd = different cache
Found 0 warnings and 0 errors.
# Each directory maintains its own cache
> cd folder1 && vite run lint
Cache hit, replaying
Found 0 warnings and 0 errors.┌──────────────────────────────────────────────────────────────┐
│ Cache System Architecture │
├──────────────────────────────────────────────────────────────┤
│ │
│ crates/vite_task/src/ │
│ ├── cache.rs # Two-tier cache storage system │
│ │ ├── CommandCacheValue # Cached execution results │
│ │ ├── TaskRunKey # Task run identification │
│ │ ├── TaskCache # Main cache interface │
│ │ └── try_hit() # Two-level cache lookup │
│ │ │
│ ├── fingerprint.rs # Post-run fingerprint generation │
│ │ ├── PostRunFingerprint # Input file states │
│ │ ├── PathFingerprint # File/directory state │
│ │ └── PostRunFingerprintMismatch # Validation results │
│ │ │
│ ├── config/mod.rs # Command fingerprint generation │
│ │ └── CommandFingerprint # Command execution context │
│ │ │
│ ├── execute.rs # Task execution with caching │
│ │ ├── execute_with_cache() # Main execution flow │
│ │ ├── monitor_files() # fspy integration │
│ │ └── capture_outputs() # Output collection │
│ │ │
│ └── schedule.rs # Task scheduling and cache lookup │
│ └── schedule_tasks() # Cache-aware task execution │
│ │
└──────────────────────────────────────────────────────────────┘
// Generate task run key for cache lookup
impl TaskCache {
pub async fn try_hit(&self, task: &ResolvedTask) -> Result<...> {
let task_run_key = TaskRunKey {
task_id: task.id(),
args: task.args.clone(),
};
let command_fingerprint = &task.resolved_command.fingerprint;
// ... two-tier lookup logic
}
}// Validates cached post-run fingerprint against current file system state
impl PostRunFingerprint {
pub fn validate(
&self,
fs: &impl FileSystem,
base_dir: &AbsolutePath,
) -> Result<Option<PostRunFingerprintMismatch>, Error> {
let input_mismatch = self.inputs.par_iter().find_map_any(|(input_path, cached_fp)| {
let full_path = base_dir.join(input_path);
let current_fp = PathFingerprint::create(&full_path, fs);
if cached_fp != ¤t_fp {
Some(PostRunFingerprintMismatch::InputContentChanged {
path: input_path.clone(),
})
} else {
None
}
});
Ok(input_mismatch)
}
}- Cache key generation: ~1μs per task
- File hashing: ~10GB/s with xxHash3
- Database operations: <1ms for typical queries
- Fingerprint validation: ~10μs per task
- Output replay: Near-zero overhead
The cache system adds minimal overhead while providing significant speedups for unchanged tasks, making incremental builds in large monorepos extremely efficient.