Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/fft.rst
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,8 @@ Radix 2 transforms
space, but their value is irrelevant. The value of ``trunc`` must be
divisible by 2.

This is a truncated FFT variant in the sense of [vdH2004]_.

.. function:: void fft_truncate1(mp_limb_t ** ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t ** t1, mp_limb_t ** t2, mp_size_t trunc)

As for ``fft_radix2`` except that only the first ``trunc``
Expand Down Expand Up @@ -267,6 +269,8 @@ Radix 2 transforms
up to ``trunc`` being careful to note that this involves doubling the
coefficients from ``trunc - n`` up to ``n``.

This inverse truncated transform follows the approach of [vdH2004]_.

.. function:: void ifft_truncate1(mp_limb_t ** ii, mp_size_t n, flint_bitcnt_t w, mp_limb_t ** t1, mp_limb_t ** t2, mp_size_t trunc)

Computes the first ``trunc`` coefficients of the radix 2 inverse
Expand Down
4 changes: 4 additions & 0 deletions doc/source/fft_small.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
This module currently requires building FLINT with support for
AVX2 or NEON instructions.

The truncated FFT routines used internally by this module follow the
approach described by van der Hoeven in [vdH2004]_.

Integer multiplication
--------------------------------------------------------------------------------

Expand All @@ -19,6 +22,7 @@ Integer multiplication
.. function:: void mpn_ctx_init(mpn_ctx_t R, ulong p)

Initialize multiplication context object with initial prime ``p``.
Usually ``p`` is a constant provided by :func:`get_default_mpn_ctx`.

.. function:: void mpn_ctx_clear(mpn_ctx_t R)

Expand Down
4 changes: 2 additions & 2 deletions doc/source/machine_vectors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,13 +235,13 @@ Other assumptions are not yet documented.
vec4d vec4d_reduce_pm1n_to_pmhn(vec4d a, vec4d n)
vec8d vec8d_reduce_pm1n_to_pmhn(vec8d a, vec8d n)

Return `a \bmod n` reduced to `[-n/2, n/2]` given given `a \in [-n,n]`.
Return `a \bmod n` reduced to `[-n/2, n/2]` given `a \in [-n,n]`.

.. function:: vec1d vec1d_reduce_2n_to_n(vec1d a, vec1d n)
vec4d vec4d_reduce_2n_to_n(vec4d a, vec4d n)
vec8d vec8d_reduce_2n_to_n(vec8d a, vec8d n)

Return `a \bmod n` reduced to `[0,n)` given given `a \in [0,2n)`.
Return `a \bmod n` reduced to `[0,n)` given `a \in [0,2n)`.

.. function:: vec1d vec1d_reduce_to_0n(vec1d a, vec1d n, vec1d ninv)
vec4d vec4d_reduce_to_0n(vec4d a, vec4d n, vec4d ninv)
Expand Down
2 changes: 2 additions & 0 deletions doc/source/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,8 @@ References

.. [vdH1995] \J. van der Hoeven, "Automatic numerical expansions". Proc. of the conference Real numbers and computers (1995), 261-274. https://www.texmacs.org/joris/ane/ane-abs.html

.. [vdH2004] \J. van der Hoeven, "Truncated FFTs", ISSAC 2004 notes, https://www.texmacs.org/joris/issac04/issac04.html

.. [vdH2006] \J. van der Hoeven, "Computations with effective real numbers". Theoretical Computer Science, Volume 351, Issue 1, 14 February 2006, Pages 52-60. https://doi.org/10.1016/j.tcs.2005.09.060

All referenced works: [AbbottBronsteinMulders1999]_, [Apostol1997]_, [Ari2011]_, [Ari2012]_, [Arn2010]_, [Arn2012]_, [ArnoldMonagan2011]_,
Expand Down
55 changes: 34 additions & 21 deletions src/fft_small.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,23 +84,8 @@ FLINT_FORCE_INLINE ulong n_trailing_zeros(ulong x) {
return x == 0 ? FLINT_BITS : flint_ctz(x);
}

/*
nbits is a mess

without assuming x != 0:
on x86 we want 64 - LZCNT
on arm we want 64 - CLZ

the problem is gcc decided that __builtin_clz is undefined on zero
input even though both instructions LZCNT and CLZ are defined

assuming x != 0:
on x86 we want BSR + 1
*/
FLINT_FORCE_INLINE ulong n_nbits(ulong x) {
if (x == 0)
return 0;
return FLINT_BITS - flint_clz(x);
return FLINT_BIT_COUNT(x);
}

FLINT_FORCE_INLINE ulong n_nbits_nz(ulong x) {
Expand All @@ -123,9 +108,30 @@ FLINT_FORCE_INLINE slong z_max(slong a, slong b) {return FLINT_MAX(a, b);}
int fft_small_mulmod_satisfies_bounds(ulong n);

/*
Let `e(theta) = exp(2*pi*i*theta)`, let `++` denotes concatenation, and `*`
denotes pointwise multiplication.
Consider an infinite sequence `w` of complex numbers defined as follows.

w_0 = {1}
w_i = w_{i-1} ++ e(2^-i) * w_{i-1}

For example

w_1 = {1, -1}
w_2 = {1, -1, i, -i}
w_3 = {1, -1, i, -i, e(1/8), e(5/8), e(3/8), e(7/8)}

Note that the `2i+1`-th element of `w` is the negation of the `2i`-th element,
so we only need to store every other element of `w`. Specific elements of `w`
can be directly accessed with `sd_fft_ctx_w`.

In actual implementation, we work modulo a prime `p` such that `p-1` has
high, but finite, 2-valuation. Most of the calculation works there as well,
except the sequence `w` cannot be infinite.

The twiddle factors are split across SD_FFT_CTX_W2TAB_SIZE tables:

[0] = {e(1)} original index 0
[0] = {e(0)} original index 0
[1] = {e(1/4)} original index 1
[2] = {e(1/8), e(3/8)} original index 2,3
[3] = {e(1/16), e(5/16), e(3/16), e(7/16)} original index 4,5,6,7
Expand Down Expand Up @@ -177,9 +183,13 @@ do { \
#define SD_FFT_CTX_W2TAB_INIT 12
#define SD_FFT_CTX_W2TAB_SIZE 40

/* This context is the one expected to sit in a global position */
/*
This context is the one expected to sit in a global position.
One container of this context is the thread_local storage for mpn_ctx_t.
See sd_fft_ctx_init_prime for the meaning of each field.
*/
typedef struct {
double p;
double p; /* the FFT prime */
double pinv;
nmod_t mod;
ulong primitive_root;
Expand All @@ -189,6 +199,8 @@ typedef struct {
unsigned int w2tab_depth;
#endif
double* w2tab[SD_FFT_CTX_W2TAB_SIZE];
/* see above for the layout of w2tab. Each entry is an integer
in the range [-p/2, p/2]. */
#if FLINT_USES_PTHREAD
pthread_mutex_t mutex;
#endif
Expand All @@ -214,7 +226,7 @@ FLINT_FORCE_INLINE double* sd_fft_ctx_blk_index(double* d, ulong I)
/*
location of the bit-reversed eval:
with out_data = fft(in_data) of length 2^L, then
eval_poly(in_data, sd_fft_ctx_w(, i)) = out_data[sd_fft_ctx_trunc_index(L, i)]
eval_poly(in_data, sd_fft_ctx_w(Q, i)) = out_data[sd_fft_ctx_trunc_index(L, i)]
*/
FLINT_FORCE_INLINE ulong sd_fft_ctx_trunc_index(ulong L, ulong i)
{
Expand Down Expand Up @@ -257,6 +269,7 @@ void sd_fft_ctx_point_sqr(const sd_fft_ctx_t Q,
w = {e(0), e(1/2), e(1/4), e(3/4), e(1/8), e(5/8), e(3/8), e(7/8), ...}
Only the terms of even index are explicitly stored, and they are split
among several tables.
The recursive definition of the infinite sequence `w` can be found above.
*/

/* look up w[2*j] */
Expand All @@ -275,7 +288,7 @@ FLINT_FORCE_INLINE double sd_fft_ctx_w2inv(const sd_fft_ctx_t Q, ulong j)
return (j == 0) ? -1.0 : Q->w2tab[j_bits][j_mr];
}

/* look up w[j] */
/* look up w[j]. Definition of w above */
FLINT_FORCE_INLINE double sd_fft_ctx_w(const sd_fft_ctx_t Q, ulong j)
{
double r = sd_fft_ctx_w2(Q, j/2);
Expand Down
9 changes: 5 additions & 4 deletions src/fft_small/mpn_mul.c
Original file line number Diff line number Diff line change
Expand Up @@ -896,6 +896,11 @@ DEFINE_IT(7, 6, 5)
DEFINE_IT(8, 7, 6)
#undef DEFINE_IT

/*
Specialized helper function, currently only called from mpn_ctx_init.
Assume p is odd and p - 1 has high 2-valuation, return some number q
(not necessarily prime) less than p such that q - 1 has high 2-valuation.
*/
static ulong next_fft_number(ulong p)
{
ulong bits, l, q;
Expand Down Expand Up @@ -969,10 +974,6 @@ static void fill_vec_two_pow_tab(
flint_aligned_free(ps);
}





void mpn_ctx_init(mpn_ctx_t R, ulong p)
{
ulong i;
Expand Down
9 changes: 9 additions & 0 deletions src/fft_small/sd_fft.c
Original file line number Diff line number Diff line change
Expand Up @@ -1010,6 +1010,15 @@ static void sd_fft_trunc_internal(

/********************* interface functions ***********************/

/*
compute a truncated FFT inline in the array `d`, assuming terms beyond `itrunc`
first terms are zero. The output satisfies
eval_poly(in_data, sd_fft_ctx_w(Q, i)) = out_data[sd_fft_ctx_trunc_index(L, i)]
Usually it only make sense to have `otrunc >= itrunc`.
The array `d` need to have size at least the smallest power of 2 larger than or equal
to `n_max(itrunc, otrunc)`. See [vdH2004]_.
This function is tested in `test/t-sd_fft.c`.
*/
void sd_fft_trunc(
sd_fft_ctx_t Q,
double* d,
Expand Down
13 changes: 12 additions & 1 deletion src/fft_small/sd_fft_ctx.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ void sd_fft_ctx_clear(sd_fft_ctx_t Q)
#endif
}

/*
Initialize FFT context.
pp is a prime with at most ~ 50 bits (exactly representable with a `double`)
such that pp - 1 has sufficiently high 2-valuation.
Used in sd_fft_trunc, sd_ifft_trunc, sd_fft_ctx_point_mul, etc.
*/
void sd_fft_ctx_init_prime(sd_fft_ctx_t Q, ulong pp)
{
ulong N, i, k, l;
Expand All @@ -47,7 +53,12 @@ void sd_fft_ctx_init_prime(sd_fft_ctx_t Q, ulong pp)
2^(SD_FFT_CTX_W2TAB_INIT-1) entries: 1, e(1/4), e(1/8), e(3/8), ...

Q->w2tab[j] is itself a table of length 2^(j-1) containing 2^(j+1) st
roots of unity.
roots of unity. More documentation on the layout of w2tab can be found
before the definition of SD_FFT_CTX_W2TAB_SIZE.

All entries in w2tab are exactly-representable integers modulo pp, but
they're stored as `double` to make use of the vectorized functions in
machine_vectors.h.
*/
N = n_pow2(SD_FFT_CTX_W2TAB_INIT - 1);
t = (double*) flint_aligned_alloc(4096, n_round_up(N*sizeof(double), 4096));
Expand Down
5 changes: 5 additions & 0 deletions src/fft_small/sd_ifft.c
Original file line number Diff line number Diff line change
Expand Up @@ -1497,6 +1497,11 @@ static void sd_ifft_trunc_internal(

/********************* interface functions ***********************/

/*
Truncated inverse Fourier transform, inverse of sd_fft_trunc, assume the entries
past `trunc` first entries of the _output_ are zero. See [vdH2004]_.
The array `d` need to have size at least `n_pow2(L)`.
*/
void sd_ifft_trunc(
sd_fft_ctx_t Q,
double* d,
Expand Down
Loading