Skip to content

Add dataset augmentation and reduction utilities for SINDY workflow#578

Closed
ChrisRackauckas-Claude wants to merge 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:investigate-issue-332
Closed

Add dataset augmentation and reduction utilities for SINDY workflow#578
ChrisRackauckas-Claude wants to merge 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:investigate-issue-332

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown

Summary

This PR implements the feature requested in #332, adding dataset augmentation and reduction utilities to automate the SINDY workflow:

New Functions

  1. delay_embedding(X, num_delays; τ=1) - Create time-delay coordinates from data

    • Supports both vector and matrix inputs
    • Configurable delay interval τ
    • Useful for Takens' embedding theorem and HAVOK analysis
  2. hankel_matrix(x, num_rows) - Build Hankel matrices from 1D time series

    • Enables Hankel DMD and related methods
    • Standard format for time-delay analysis
  3. truncated_svd(X, rank) - Compute truncated singular value decomposition

    • Returns named tuple with U, S, V truncated to specified rank
    • Building block for dimension reduction
  4. reduce_dimension(X, rank) - Project data onto top singular vectors

    • Manual rank specification
    • reduce_dimension(X) - Automatic rank selection using optimal threshold

Use Case: HAVOK Analysis

These utilities enable HAVOK-style workflows:

using DataDrivenDiffEq

# Embed scalar time series using delay coordinates
x = lorenz_solution[1, :]  # Scalar measurement
H = hankel_matrix(x, 100)  # Build Hankel matrix

# Reduce dimension via SVD
X_reduced = reduce_dimension(H, 15)  # Keep top 15 modes

# Create problem with embedded coordinates
prob = DiscreteDataDrivenProblem(X_reduced)

# Discover dynamics with SINDY
# ... (use DataDrivenSparse)

References

  • Brunton et al. (2017) "Chaos as an intermittently forced linear system" https://doi.org/10.1038/s41467-017-00030-8
  • Arbabi & Mezic (2017) "Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator"

Testing

All existing tests pass plus new tests for:

  • Hankel matrix construction and structure
  • Delay embedding dimensions and correctness
  • Truncated SVD accuracy
  • Dimension reduction behavior

Fixes #332

cc @ChrisRackauckas

🤖 Generated with Claude Code

claude added 2 commits January 2, 2026 05:31
This PR implements the feature requested in SciML#332, adding support for:

1. **Delay embedding** (`delay_embedding`): Create time-delay coordinates from data,
   useful for Takens' embedding theorem and HAVOK analysis

2. **Hankel matrix construction** (`hankel_matrix`): Build Hankel matrices from
   1D time series, enabling Hankel DMD and related methods

3. **Dimensionality reduction** (`reduce_dimension`, `truncated_svd`): Project data
   onto principal components with automatic or manual rank selection

These utilities enable HAVOK-style workflows where:
- Time series data is embedded via delay coordinates or Hankel matrices
- SVD is used to identify dominant modes
- Sparse regression discovers dynamics in the reduced space

References:
- Brunton et al. (2017) "Chaos as an intermittently forced linear system"
- Arbabi & Mezic (2017) "Ergodic theory, dynamic mode decomposition,
  and computation of spectral properties of the Koopman operator"

Fixes SciML#332

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automating the SINDY workflow with Dataset Augmentation and Reduction

3 participants