Skip to content

Commit 63e31a2

Browse files
pavelkomarovclaude
andcommitted
JOSS paper third pass: address reviewer feedback
- Single-line paragraphs throughout - "implements" not "organizes" - Remove "time" as dimension name; use "sample spacing", "sample interval", etc. - Recapitulate optimization framework in Statement of Need - Fix derivative pkg description (drop ambiguous "missing data handling") - "Updated package design" replaces "The revision moved away from..." - Expand missing data and variable step into separate paragraphs - Emphasize both as new additions for practicality - Huber loss leads the robustness paragraph; piecewise smoothness is secondary - Remove `pip install pynumdiff[advanced]` sentence - "scales linearly" replaces "runs in time linear" - Deprecated interface sentence qualified "pending removal" - Comma/semicolon review throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b86064d commit 63e31a2

1 file changed

Lines changed: 26 additions & 118 deletions

File tree

paper/paper.md

Lines changed: 26 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -28,157 +28,65 @@ bibliography: paper.bib
2828

2929
# Summary
3030

31-
Computing derivatives from measured data is a foundational requirement across science and
32-
engineering. Whether identifying governing equations from experimental observations, designing
33-
control laws, or processing sensor streams, researchers routinely need derivative estimates
34-
from discrete, noisy measurements. The textbook approach — finite differencing — amplifies
35-
noise proportionally to $1/\Delta t$, making it unreliable for real data. Smoothing before
36-
differencing helps, but the choice of algorithm and its tuning parameters substantially
37-
affect the result, and no single choice is universally best.
38-
39-
PyNumDiff is an open-source Python package that consolidates a broad suite of numerical
40-
differentiation methods for noisy data under a unified API. Seven families of algorithms are
41-
organized: (1) prefiltering followed by finite difference calculation; (2) iterated finite
42-
differencing; (3) polynomial fitting; (4) basis function fitting; (5) total variation
43-
regularization; (6) Kalman smoothing; and (7) local approximation with linear models. All
44-
methods return a smoothed signal estimate and a derivative estimate as a matched pair
45-
`(x_hat, dxdt_hat)`. A companion paper [@komarov2025] provides a comprehensive theoretical
46-
taxonomy of these methods, benchmarks their relative performance, and guides method selection
47-
for different application scenarios.
31+
Computing derivatives from measured data is a foundational requirement across science and engineering. Whether identifying governing equations from experimental observations, designing control laws, or processing sensor streams, researchers routinely need derivative estimates from discrete, noisy measurements. The textbook approach — finite differencing — amplifies noise proportionally to $1/\Delta t$, making it unreliable for real data. Smoothing before differencing helps, but the choice of algorithm and its tuning parameters substantially affect the result, and no single choice is universally best.
32+
33+
PyNumDiff is an open-source Python package that consolidates a broad suite of numerical differentiation methods for noisy data under a unified API. Seven families of algorithms are implemented: (1) prefiltering followed by finite difference calculation; (2) iterated finite differencing; (3) polynomial fitting; (4) basis function fitting; (5) total variation regularization; (6) Kalman smoothing; and (7) local approximation with linear models. All methods return a smoothed signal estimate and a derivative estimate as a matched pair `(x_hat, dxdt_hat)`. A companion paper [@komarov2025] provides a comprehensive theoretical taxonomy of these methods, benchmarks their relative performance, and guides method selection for different application scenarios.
4834

4935

5036
# Statement of Need
5137

52-
Estimating derivatives from noisy measurements arises throughout experimental science,
53-
data-driven modeling, system identification, and control. The field has produced a diverse
54-
ecosystem of specialized algorithms, each with different strengths regarding outlier
55-
robustness, computational cost, handling of irregular time steps, or treatment of missing
56-
data. Without a consolidated library, practitioners either default to the nearest available
57-
tool or implement bespoke solutions that are difficult to compare or reproduce.
38+
Estimating derivatives from noisy measurements arises throughout experimental science, data-driven modeling, system identification, and control. The field has produced a diverse ecosystem of specialized algorithms, each with different strengths regarding outlier robustness, computational cost, handling of irregular sample spacing, or treatment of missing observations. Without a consolidated library, practitioners either default to the nearest available tool or implement bespoke solutions that are difficult to compare or reproduce.
5839

59-
PyNumDiff addresses this gap. Its unified interface lets users compare methods on the same
60-
data, exploit specialized capabilities, and select hyperparameters without requiring
61-
ground-truth derivatives. The package is particularly valuable in workflows that use
62-
derivatives as regression targets: Sparse Identification of Nonlinear Dynamics (SINDy)
63-
[@brunton2016discovering], for example, learns governing equations by regressing measured
64-
derivatives, making reliable derivative estimates a prerequisite for accurate model
65-
identification. More broadly, any system identification or control design pipeline that
66-
begins by estimating rates of change from sensor data is a natural user of PyNumDiff.
40+
PyNumDiff addresses this gap. Its unified interface lets users compare methods on the same data, exploit specialized capabilities, and select hyperparameters without requiring ground-truth derivatives. Derivative estimation inherently trades off fidelity to the data against smoothness of the result. PyNumDiff frames this as a multi-objective optimization problem: find hyperparameter settings that minimize a weighted combination of data fidelity (how closely the smoothed signal matches the noisy measurements) and derivative roughness (the total variation of the estimated derivative). A single weight `tvgamma` interpolates between these objectives; when ground-truth derivatives are available, they can be used as a direct optimization target instead. This framework is universal across methods, enabling principled comparison and selection. The package is particularly valuable in workflows that use derivatives as regression targets: Sparse Identification of Nonlinear Dynamics (SINDy) [@brunton2016discovering], for example, learns governing equations by regressing measured derivatives, making reliable derivative estimates a prerequisite for accurate model identification.
6741

6842

6943
# State of the Field
7044

71-
Relevant Python tools exist but none covers the full scope of PyNumDiff. `numpy.gradient`
72-
and `scipy.signal.savgol_filter` [@virtanen2020scipy] are widely available but address only
73-
a narrow slice of the method space. The `findiff` package provides high-order finite
74-
difference stencils for clean simulation data rather than noisy measurements. The standalone
75-
`TVRegDiff` code [@chartrand2011numerical] implements a single total variation regularization
76-
method; PyNumDiff includes and extends this. The `derivative` package [@derivative_pkg]
77-
implements several of the same methods, but without multidimensional support, missing data
78-
handling, or a hyperparameter optimization framework. No existing Python package combines
79-
the breadth of PyNumDiff's seven method families with a consistent API and the full set of
80-
capabilities described below.
45+
Relevant Python tools exist but none covers the full scope of PyNumDiff. `numpy.gradient` and `scipy.signal.savgol_filter` [@virtanen2020scipy] are widely available but address only a narrow slice of the method space. The `findiff` package provides high-order finite difference stencils suited to clean simulation data rather than noisy measurements. The standalone `TVRegDiff` code [@chartrand2011numerical] implements a single total variation regularization method; PyNumDiff includes and extends this. The `derivative` package [@derivative_pkg] implements several of the same methods but without multidimensional support, principled hyperparameter selection, or the broader set of capabilities described below. No existing Python package combines the breadth of PyNumDiff's seven method families with a consistent API and the practical features this version introduces.
8146

82-
The original PyNumDiff publication [@vanBreugel2022] introduced the core concept and a
83-
multi-objective optimization framework for hyperparameter selection. The present version
84-
represents a substantial revision of the codebase: methods were reorganized into a cleaner
85-
taxonomy, corrected, and extended; the interface was unified; test coverage was dramatically
86-
improved; and several new capabilities were added throughout.
47+
The original PyNumDiff publication [@vanBreugel2022] introduced the core set of methods and the multi-objective optimization framework described above. The present version represents a substantial revision: methods were reorganized into a cleaner taxonomy, corrected, and extended; the interface was unified and made more explicit; test coverage was expanded; and several capabilities — multidimensional support, irregular sample spacing, missing data, and circular domains — were added throughout to make the package more practical and widely applicable.
8748

8849

8950
# Software Design
9051

91-
**Unified API.** All differentiation methods share the call signature
52+
**Updated package design.** All differentiation methods share the call signature
9253

9354
```python
9455
x_hat, dxdt_hat = method(x, dt_or_t, **params)
9556
```
9657

97-
where `x` is a NumPy array [@harris2020array] of measurements, `dt_or_t` is either a scalar
98-
step size or an array of sample locations (for variable step size), and keyword arguments
99-
configure the method. The two return values always match the shape of `x`. The revision
100-
moved away from positional parameter lists toward explicit keyword arguments throughout,
101-
making calls self-documenting and eliminating a common source of user error. Deprecated
102-
positional interfaces are retained with warnings.
103-
104-
**Multidimensional support and interface consistency.** An `axis` parameter controls which
105-
dimension is differentiated, allowing all non-deprecated methods to operate on blocks of
106-
time series simultaneously without reshaping or looping in user code. The implementation
107-
iterates over all non-time axes using `np.ndindex`, applying the algorithm to each vector
108-
independently.
109-
110-
**Missing data and variable step size.** Several methods handle NaN-valued entries as
111-
missing observations and non-uniform sample spacing: `splinediff`, `polydiff`, `rbfdiff`,
112-
`rtsdiff`, and `robustdiff`. This supports realistic experimental scenarios where sensors
113-
drop samples or operate at irregular rates. For the Kalman-based methods, variable step size
114-
is handled correctly by computing the discrete-time transition matrix via matrix exponential
115-
at each actual time increment, rather than using a fixed $\Delta t$ extracted from the first
116-
two samples — a subtle but important distinction for data with irregular spacing.
117-
118-
**Outlier robustness.** `robustdiff` solves a convex optimization problem that replaces the
119-
least-squares Kalman cost with Huber loss terms on both measurement residuals and process
120-
residuals, following the robust smoothing framework of @aravkin2013 and using `cvxpy`
121-
[@diamond2016cvxpy] as the optimization backend. The problem is formulated as a sparse
122-
system so that it runs in time linear in the number of samples, making it practical for
123-
long time series. `tvrdiff` similarly promotes piecewise-smooth solutions by penalizing
124-
the total variation of the derivative, and the data fidelity term can be replaced with a
125-
Huber loss for additional robustness to outliers. Both methods are available via
126-
`pip install pynumdiff[advanced]`.
127-
128-
**Circular and wrapped domains.** `rtsdiff` accepts a `circular=True` flag for quantities
129-
like angles that live on a periodic domain. Innovation residuals are wrapped to
130-
$[-\pi, \pi]$ before each Kalman update step via an `innovation_fn` hook on the underlying
131-
`kalman_filter` primitive, and `x_hat` is returned wrapped to the same range. This avoids
132-
the erroneous large-magnitude spikes that naive smoothers produce when a signal crosses the
133-
$\pm\pi$ boundary.
134-
135-
**Hyperparameter optimization.** Every method has tuning parameters, so PyNumDiff provides
136-
a multi-objective optimization framework in `pynumdiff.optimize` that minimizes a weighted
137-
combination of a smoothness penalty and a data fidelity term [@vanBreugel2020numerical].
138-
When ground-truth derivatives are available, derivative error is minimized directly.
139-
The smoothness weight `tvgamma` can be initialized from the signal's estimated cutoff
140-
frequency $f_c$ via the empirical formula
58+
where `x` is a NumPy array [@harris2020array] of measurements; `dt_or_t` is either a scalar step size or an array of sample locations; and keyword arguments configure the method. The two return values always match the shape of `x`. The updated design favors explicit keyword arguments throughout, making calls self-documenting and eliminating a common source of user error. Prior positional call signatures are preserved with deprecation warnings pending removal in a future release.
59+
60+
**Multidimensional support.** An `axis` parameter controls which dimension is differentiated, allowing all non-deprecated methods to operate on blocks of data simultaneously without reshaping or looping in user code. The implementation iterates over the remaining axes using `np.ndindex`, applying the algorithm to each vector independently.
61+
62+
**Variable sample spacing.** `splinediff`, `polydiff`, `rbfdiff`, `rtsdiff`, and `robustdiff` accept an array of sample locations in place of a scalar step size, enabling differentiation of irregularly sampled data. For the Kalman-based methods, this is handled correctly by computing the discrete-time transition matrix via matrix exponential at each actual sample interval rather than using a fixed $\Delta t$ — a subtle but important distinction whose absence can silently corrupt derivative estimates.
63+
64+
**Missing data.** `splinediff`, `polydiff`, `rtsdiff`, and `robustdiff` treat NaN-valued entries as missing observations, excluding them from fitting and imputing estimates via the model. This supports sensors that occasionally drop samples without requiring the user to preprocess or interpolate the input.
65+
66+
**Outlier robustness.** `robustdiff` replaces the quadratic Kalman cost with Huber loss terms on both measurement and process residuals, following the robust smoothing framework of @aravkin2013 and using `cvxpy` [@diamond2016cvxpy] as the optimization backend. The problem is formulated as a sparse system, so it scales linearly with the number of samples. `tvrdiff` similarly replaces the quadratic data fidelity term with a Huber loss when outliers are present; its total variation penalty on the derivative additionally promotes piecewise-smooth solutions appropriate for signals with abrupt transitions.
67+
68+
**Circular and wrapped domains.** `rtsdiff` accepts a `circular=True` flag for quantities like angles that live on a periodic domain. Innovation residuals are wrapped to $[-\pi, \pi]$ before each Kalman update step via an `innovation_fn` hook on the underlying `kalman_filter` primitive, and `x_hat` is returned wrapped to the same range. This avoids the erroneous large-magnitude spikes that naive smoothers produce when a signal crosses the $\pm\pi$ boundary.
69+
70+
**Hyperparameter optimization.** Every method has tuning parameters, so PyNumDiff provides a multi-objective optimization framework in `pynumdiff.optimize` that minimizes the weighted combination described in the Statement of Need [@vanBreugel2020numerical]. The smoothness weight `tvgamma` can be initialized from the signal's estimated cutoff frequency $f_c$ via the empirical formula
14171
$$\texttt{tvgamma} = \exp(-1.6\ln f_c - 0.71\ln \Delta t - 5.1).$$
142-
The optimization was substantially improved in the current version: intermediate evaluations
143-
are cached to avoid redundant function calls, the loss function was made robust to outliers
144-
via Huber penalty, and the Kalman parameter space was reduced from two independent noise
145-
variances to their log-ratio, which is the quantity the result actually depends on
146-
[@komarov2025].
72+
The optimization was substantially improved in the current version: intermediate evaluations are cached to avoid redundant function calls; the loss function is robustified via Huber penalty so that outliers do not bias hyperparameter selection; and the Kalman parameter space was reduced from two independent noise variances to their log-ratio, which is the quantity the result actually depends on [@komarov2025].
14773

148-
**Testing and continuous integration.** The test suite validates all methods against a set
149-
of analytic test functions with known derivatives, checking both noiseless and noisy cases
150-
across the full dynamic range of expected accuracy. Tests are run automatically on every
151-
push and pull request via GitHub Actions, with coverage tracked via Coveralls. Care was
152-
taken to avoid tautological tests in which the implementation directly determines the
153-
expected result.
74+
**Testing and continuous integration.** The test suite validates all methods against analytic test functions with known derivatives, checking both noiseless and noisy cases across the full expected accuracy range. Care was taken to avoid tautological tests in which the implementation directly determines the expected result. Tests run automatically on every push and pull request via GitHub Actions, with line coverage tracked via Coveralls.
15475

15576

15677
# Research Impact
15778

158-
PyNumDiff has been applied in experimental biology to estimate flight kinematics from motion
159-
capture, in control engineering for observer design, and in data-driven dynamics
160-
identification [@vanBreugel2022]. The package is available on PyPI
161-
(`pip install pynumdiff`) and documented at
162-
[pynumdiff.readthedocs.io](https://pynumdiff.readthedocs.io/master/). The companion Taxonomy
163-
paper [@komarov2025], submitted to the Journal of Computational Physics, provides the
164-
theoretical analysis motivating the method collection and benchmarks all included methods.
79+
PyNumDiff has been applied in experimental biology to estimate flight kinematics from motion capture, in control engineering for observer design, and in data-driven dynamics identification [@vanBreugel2022]. The package is available on PyPI (`pip install pynumdiff`) and documented at [pynumdiff.readthedocs.io](https://pynumdiff.readthedocs.io/master/). The companion Taxonomy paper [@komarov2025], submitted to the Journal of Computational Physics, provides the theoretical analysis motivating the method collection and benchmarks all included methods.
16580

16681

16782
# AI Usage Disclosure
16883

169-
The draft of this paper was prepared with assistance from Claude Sonnet 4.6 (Anthropic),
170-
integrating material from the repository, documentation, release history, and related
171-
publications. The authors reviewed and edited all content and take full responsibility for
172-
its accuracy.
84+
The draft of this paper was prepared with assistance from Claude Sonnet 4.6 (Anthropic), integrating material from the repository, documentation, release history, and related publications. The authors reviewed and edited all content and take full responsibility for its accuracy.
17385

17486

17587
# Acknowledgements
17688

177-
The authors thank Yuying Liu and Bingni W. Brunton for their contributions to the original
178-
PyNumDiff package [@vanBreugel2022], and Sasha Aravkin for discussions on convex
179-
optimization techniques that informed the robust differentiation methods. This work was
180-
supported by the National Science Foundation AI Institute in Dynamic Systems (grant
181-
number 2112085).
89+
The authors thank Yuying Liu and Bingni W. Brunton for their contributions to the original PyNumDiff package [@vanBreugel2022], and Sasha Aravkin for discussions on convex optimization techniques that informed the robust differentiation methods. This work was supported by the National Science Foundation AI Institute in Dynamic Systems (grant number 2112085).
18290

18391

18492
# References

0 commit comments

Comments
 (0)