Skip to content

Latest commit

 

History

History
491 lines (373 loc) · 14.9 KB

File metadata and controls

491 lines (373 loc) · 14.9 KB

RFC: Vite+ Cache Fingerprint Ignore Patterns

Summary

Add support for glob-based ignore patterns to the cache fingerprint calculation, allowing tasks to exclude specific files/directories from triggering cache invalidation while still including important files within ignored directories.

Motivation

Current cache fingerprint behavior tracks all files accessed during task execution. This causes unnecessary cache invalidation in scenarios like:

  1. Package installation tasks: The node_modules directory changes frequently, but only package.json files within it are relevant for cache validation
  2. Build output directories: Generated files in dist/ or .next/ that should not invalidate the cache
  3. Large dependency directories: When only specific files within large directories matter for reproducibility

Example Use Case

For an install task that runs pnpm install:

  • Changes to node_modules/**/*/index.js should NOT invalidate the cache
  • Changes to node_modules/**/*/package.json SHOULD invalidate the cache
  • This allows cache hits when dependencies remain the same, even if their internal implementation files have different timestamps or minor variations

Proposed Solution

Configuration Schema

Extend TaskConfig in vite-task.json to support a new optional field fingerprintIgnores:

{
  "tasks": {
    "my-task": {
      "command": "echo bar",
      "cacheable": true,
      "fingerprintIgnores": [
        "node_modules/**/*",
        "!node_modules/**/*/package.json"
      ]
    }
  }
}

Ignore Pattern Syntax

The ignore patterns follow standard glob syntax with gitignore-style semantics:

  1. Basic patterns:

    • node_modules/**/* - ignore all files under node_modules
    • dist/ - ignore the dist directory
    • *.log - ignore all log files
  2. Negation patterns (prefixed with !):

    • !node_modules/**/*/package.json - include package.json files even though node_modules is ignored
    • !important.log - include important.log even though *.log is ignored
  3. Pattern evaluation order:

    • Patterns are evaluated in order
    • Later patterns override earlier ones
    • Negation patterns can "un-ignore" files matched by earlier patterns
    • Last match wins semantics

Implementation Details

1. Configuration Schema Changes

File: crates/vite_task/src/config/mod.rs

pub struct TaskConfig {
    // ...

    // New field
    #[serde(default)]
    pub(crate) fingerprint_ignores: Option<Vec<Str>>,
}

2. CommandFingerprint Schema Changes

File: crates/vite_task/src/config/mod.rs

Add fingerprint_ignores to CommandFingerprint to ensure cache invalidation when ignore patterns change:

pub struct CommandFingerprint {
    pub cwd: RelativePathBuf,
    pub command: TaskCommand,
    pub envs_without_pass_through: BTreeMap<Str, Str>,
    pub pass_through_envs: BTreeSet<Str>,

    // New field
    pub fingerprint_ignores: Option<Vec<Str>>,
}

Why this is needed: Including fingerprint_ignores in CommandFingerprint ensures that when ignore patterns change, the cache is invalidated. This prevents incorrect cache hits when the set of tracked files changes.

Example scenario:

  • First run with fingerprintIgnores: ["node_modules/**/*"] → tracks only non-node_modules files
  • Change config to fingerprintIgnores: [] → should track ALL files
  • Without this field in CommandFingerprint → cache would incorrectly HIT
  • With this field → cache correctly MISSES, re-creates fingerprint with all files

3. Fingerprint Creation Changes

File: crates/vite_task/src/fingerprint.rs

Modify PostRunFingerprint::create() to filter paths based on ignore patterns:

impl PostRunFingerprint {
    pub fn create(
        executed_task: &ExecutedTask,
        fs: &impl FileSystem,
        base_dir: &AbsolutePath,
        fingerprint_ignores: Option<&[Str]>,  // New parameter
    ) -> Result<Self, Error> {
        let ignore_matcher = fingerprint_ignores
            .filter(|patterns| !patterns.is_empty())
            .map(GlobPatternSet::new)
            .transpose()?;

        let inputs = executed_task
            .path_reads
            .par_iter()
            .filter(|(path, _)| {
                if let Some(ref matcher) = ignore_matcher {
                    !matcher.is_match(path)
                } else {
                    true
                }
            })
            .flat_map(|(path, path_read)| {
                Some((|| {
                    let path_fingerprint =
                        fs.fingerprint_path(&base_dir.join(path).into(), *path_read)?;
                    Ok((path.clone(), path_fingerprint))
                })())
            })
            .collect::<Result<HashMap<RelativePathBuf, PathFingerprint>, Error>>()?;
        Ok(Self { inputs })
    }
}

4. Task Resolution Integration

File: crates/vite_task/src/config/task_command.rs

Update resolve_command() to include fingerprint_ignores in the fingerprint:

impl ResolvedTaskConfig {
    pub(crate) fn resolve_command(...) -> Result<ResolvedTaskCommand, Error> {
        // ...
        Ok(ResolvedTaskCommand {
            fingerprint: CommandFingerprint {
                cwd,
                command,
                envs_without_pass_through: task_envs.envs_without_pass_through.into_iter().collect(),
                pass_through_envs: self.config.pass_through_envs.iter().cloned().collect(),
                fingerprint_ignores: self.config.fingerprint_ignores.clone(),  // Pass through
            },
            all_envs: task_envs.all_envs,
        })
    }
}

5. Cache Update Integration

File: crates/vite_task/src/cache.rs

Update CommandCacheValue::create() to pass ignore patterns:

impl CommandCacheValue {
    pub fn create(
        executed_task: ExecutedTask,
        fs: &impl FileSystem,
        base_dir: &AbsolutePath,
        fingerprint_ignores: Option<&[Str]>,  // New parameter
    ) -> Result<Self, Error> {
        let post_run_fingerprint = PostRunFingerprint::create(
            &executed_task,
            fs,
            base_dir,
            fingerprint_ignores,
        )?;
        Ok(Self {
            post_run_fingerprint,
            std_outputs: executed_task.std_outputs,
            duration: executed_task.duration,
        })
    }
}

6. Execution Flow Integration

File: crates/vite_task/src/schedule.rs

Update cache creation to pass fingerprint_ignores from the task config:

if !skip_cache && exit_status.success() {
    let cached_task = CommandCacheValue::create(
        executed_task,
        fs,
        base_dir,
        task.resolved_config.config.fingerprint_ignores.as_deref(),
    )?;
    cache.update(&task, cached_task).await?;
}

Performance Considerations

  1. Pattern compilation: Glob patterns compiled once per fingerprint creation (lazy)
  2. Filtering overhead: Path filtering happens during fingerprint creation (only when caching)
  3. Memory impact:
    • fingerprint_ignores stored in CommandFingerprint (Vec)
    • Compiled GlobPatternSet created only when needed, not cached
  4. Parallel processing: Existing parallel iteration over paths is preserved
  5. Cache key size: Minimal increase (~100 bytes for typical ignore patterns)

Edge Cases

  1. Empty ignore list: No filtering applied (backward compatible)

    • None → no filtering
    • Some([]) → no filtering (empty array treated same as None)
  2. Conflicting patterns: Later patterns take precedence (last-match-wins)

  3. Invalid glob syntax: Return error during fingerprint creation

    • Detected early when PostRunFingerprint is created
    • Task execution completes, but cache save fails with clear error
  4. Absolute paths in patterns: Treated as relative to package directory

  5. Directory vs file patterns: Both supported via glob syntax

  6. Config changes: Changing fingerprint_ignores invalidates cache

    • Patterns are part of CommandFingerprint
    • Different patterns → different cache key
    • Ensures correct file tracking

Alternative Designs Considered

Alternative 1: inputs field extension

Extend the existing inputs field to support ignore patterns:

{
  "inputs": {
    "include": ["src/**/*"],
    "exclude": ["src/**/*.test.js"]
  }
}

Rejected because:

  • The inputs field currently uses a different mechanism (pre-execution declaration)
  • This feature is about post-execution fingerprint filtering
  • Mixing the two concepts would be confusing

Alternative 2: Separate fingerprintExcludes field

Only support exclude patterns (no negation):

{
  "fingerprintExcludes": ["node_modules/**/*"]
}

Rejected because:

  • Cannot express "ignore everything except X" patterns
  • Less flexible for complex scenarios
  • Gitignore-style syntax is more familiar to developers

Alternative 3: Include/Exclude separate fields

{
  "fingerprintExcludes": ["node_modules/**/*"],
  "fingerprintIncludes": ["node_modules/**/*/package.json"]
}

Rejected because:

  • More verbose
  • Less clear precedence rules
  • Gitignore-style is a proven pattern

Migration Path

Backward Compatibility

This feature is fully backward compatible:

  • Existing task configurations work unchanged
  • Default value for fingerprintIgnores is None (when omitted)
  • No behavior changes when field is absent or null
  • Empty array [] is treated the same as None (no filtering)

Testing Strategy

Unit Tests

File: crates/vite_task/src/fingerprint.rs (10 tests added)

  1. PostRunFingerprint::create() tests (8 tests):

    • test_postrun_fingerprint_no_ignores - Verify None case includes all paths
    • test_postrun_fingerprint_empty_ignores - Verify empty array includes all paths
    • test_postrun_fingerprint_ignore_node_modules - Basic ignore pattern
    • test_postrun_fingerprint_negation_pattern - Negation support for package.json
    • test_postrun_fingerprint_multiple_ignore_patterns - Multiple patterns
    • test_postrun_fingerprint_wildcard_patterns - File extension wildcards
    • test_postrun_fingerprint_complex_negation - Nested negation patterns
    • test_postrun_fingerprint_invalid_pattern - Error handling for bad syntax
  2. CommandFingerprint tests (2 tests):

    • test_command_fingerprint_with_fingerprint_ignores - Verify cache invalidation when ignores change
    • test_command_fingerprint_ignores_order_matters - Verify pattern order affects cache key
  3. vite_glob tests (existing):

    • Pattern matching already tested in vite_glob crate
    • Negation pattern precedence
    • Last-match-wins semantics

Integration Tests

Snap-test: packages/cli/snap-tests/fingerprint-ignore-test/

Test fixture structure:

fingerprint-ignore-test/
  package.json
  vite-task.json  # with fingerprintIgnores config
  steps.json      # test commands
  snap.txt        # expected output snapshot

Test scenario validates:

  1. First run → Cache miss (initial execution)
  2. Second run → Cache hit (no changes)
  3. Modify node_modules/pkg-a/index.js → Cache hit (ignored by pattern)
  4. Modify dist/bundle.js → Cache hit (ignored by pattern)
  5. Modify node_modules/pkg-a/package.json → Cache miss (NOT ignored due to negation)

This validates the complete feature including:

  • Ignore patterns filter correctly
  • Negation patterns work
  • Cache invalidation happens at the right times
  • Config changes invalidate cache

Documentation Requirements

User Documentation

Add to task configuration docs:

### fingerprintIgnores

Type: `string[]`
Default: `[]`

Glob patterns to exclude files from cache fingerprint calculation.
Patterns starting with `!` are negation patterns that override earlier excludes.

Example:

```json
{
  "tasks": {
    "install": {
      "command": "pnpm install",
      "cacheable": true,
      "fingerprintIgnores": [
        "node_modules/**/*",
        "!node_modules/**/*/package.json"
      ]
    }
  }
}
```

This configuration ignores all files in node_modules except package.json files, which are still tracked for cache validation.

### Examples Documentation

Add common patterns:

1. **Package installation**:
   ```json
   "fingerprintIgnores": [
     "node_modules/**/*",
     "!node_modules/**/*/package.json",
     "!node_modules/.pnpm/lock.yaml"
   ]
  1. Build outputs:

    "fingerprintIgnores": [
      "dist/**/*",
      ".next/**/*",
      "build/**/*"
    ]
  2. Temporary files:

    "fingerprintIgnores": [
      "**/*.log",
      "**/.DS_Store",
      "**/tmp/**"
    ]

Implementation Status

IMPLEMENTED - All functionality complete and tested

Summary of Changes

  1. Schema Changes - Added fingerprint_ignores: Option<Vec<Str>> to:

    • TaskConfig (config/mod.rs:51) - User-facing configuration
    • CommandFingerprint (config/mod.rs:272) - Cache key component
  2. Logic Updates - Fingerprint creation and validation:

    • PostRunFingerprint::create() filters paths (fingerprint.rs:85-118)
    • CommandCacheValue::create() passes patterns (cache.rs:29-42)
    • ResolvedTaskConfig::resolve_command() includes in fingerprint (task_command.rs:99-113)
    • schedule.rs execution flow integration (schedule.rs:236-242)
  3. Testing - Comprehensive coverage:

    • 8 unit tests for PostRunFingerprint::create() with filtering
    • 2 unit tests for CommandFingerprint with ignore patterns
    • 1 snap-test for end-to-end validation
    • All 71 tests pass
  4. Documentation:

    • Complete RFC with implementation details
    • Test fixtures with examples
    • Inline code documentation explaining rationale

Key Design Decisions

  1. Option type: Option<Vec<Str>> provides true optional semantics
  2. Include in CommandFingerprint: Ensures cache invalidation on config changes
  3. Leverage vite_glob: Reuses existing, battle-tested pattern matcher
  4. Filter at creation time: Paths filtered when creating PostRunFingerprint
  5. Order preservation: Vec maintains pattern order (last-match-wins semantics)

Files Modified

  • crates/vite_task/src/config/mod.rs (+13 lines)
  • crates/vite_task/src/config/task_command.rs (+2 lines)
  • crates/vite_task/src/fingerprint.rs (+397 lines including tests)
  • crates/vite_task/src/cache.rs (+2 lines)
  • crates/vite_task/src/execute.rs (+4 lines)
  • crates/vite_task/src/schedule.rs (+4 lines)
  • packages/cli/snap-tests/fingerprint-ignore-test/ (new fixture)

Conclusion

This feature successfully adds glob-based ignore patterns to cache fingerprint calculation:

  • ✅ Solves real caching problems (especially for install tasks)
  • ✅ Uses familiar gitignore-style syntax
  • ✅ Fully backward compatible
  • ✅ Minimal performance impact
  • ✅ Complete test coverage
  • ✅ Production-ready implementation

The implementation leverages the proven vite_glob crate and integrates cleanly with existing fingerprint and cache systems.