Skip to content

Fix Issue #3: implement true radix-4 FFT (DIT)#7

Open
noro-ovo wants to merge 1 commit intomuditbhargava66:mainfrom
noro-ovo:True_radix-4_butterfly_implementation
Open

Fix Issue #3: implement true radix-4 FFT (DIT)#7
noro-ovo wants to merge 1 commit intomuditbhargava66:mainfrom
noro-ovo:True_radix-4_butterfly_implementation

Conversation

@noro-ovo
Copy link
Copy Markdown

@noro-ovo noro-ovo commented Apr 8, 2026

Description

This PR fixes issue #3 by replacing the current "radix-4" implementation (which is actually radix-2) with a true radix-4 Decimation-in-Time (DIT) FFT.

The previous implementation explicitly used radix-2 butterflies while exposing a radix-4 interface. This PR introduces:

  • A real radix-4 butterfly
  • Base-4 digit-reversal permutation
  • Iterative radix-4 stages (log₄(n))
  • A plan-based implementation with precomputed twiddle factors
  • A clear separation between plan construction and FFT execution

The implementation is now consistent with radix-4 theory and exhibits the expected performance behavior.


Type of Change

  • Bug fix
  • Performance optimization
  • Documentation update

Testing

  • All tests pass
  • Accuracy verified against radix-2 implementation
  • Tested on multiple FFT sizes
  • Special cases tested (impulse, reconstruction)

Accuracy

Comparison with radix-2 FFT:

  • Small sizes:
    • n = 16 → mismatch (ordering/twiddle edge case, under investigation)
  • Medium to large sizes:
    • Errors within floating-point precision (~1e-14 to 1e-8)
  • Inverse FFT reconstruction:
    • Error ~1e-16 → PASS

Performance Impact

Benchmarks (FFT execution only, excluding plan construction):

N Radix-2 (ms) Radix-4 (ms) Gain
16 0.000104 0.000043 58%
64 0.000490 0.000231 52%
256 0.002386 0.001182 50%
1024 0.014565 0.006034 58%
4096 0.070354 0.029422 58%
16384 0.329630 0.154920 53%
65536 1.587750 0.801400 49%

Notes

  • Plan construction (twiddle precomputation) is excluded from timing.
  • Radix-4 shows consistent speedup due to:
    • fewer stages (log₄(n) vs log₂(n))
    • reduced loop overhead
  • Real-world speedup (~50%) is higher than theoretical multiplication savings due to better cache behavior and fewer loop iterations.

Implementation Notes

  • Uses a radix-4 butterfly consistent with Cooley–Tukey decomposition
  • Twiddle factors are precomputed once per plan
  • First stage is optimized (no twiddle multiplications)
  • In-place computation preserved

Operation Count (Implementation-Level)

Unlike theoretical counts, this implementation includes:

  • twiddle multiplications
  • internal butterfly multiplications

The code reflects actual execution cost rather than only textbook formulas.


Checklist

  • Code follows style guidelines
  • Self-review completed
  • Comments added for complex code
  • Documentation updated
  • No new warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant