Skip to content

Commit 65198c1

Browse files
Merge pull request #2658 from fredrik-johansson/compose
Implement Kinoshita-Li power series composition
2 parents 4a0ffb0 + 689eba5 commit 65198c1

18 files changed

Lines changed: 1125 additions & 118 deletions

doc/source/fmpq_poly.rst

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1516,8 +1516,6 @@ Power series composition
15161516
of the inputs and the output.
15171517

15181518
This implementation uses Brent-Kung algorithm 2.1 [BrentKung1978]_.
1519-
The default ``fmpz_poly`` composition algorithm is automatically
1520-
used when the composition can be performed over the integers.
15211519

15221520
.. function:: void fmpq_poly_compose_series_brent_kung(fmpq_poly_t res, const fmpq_poly_t poly1, const fmpq_poly_t poly2, slong n)
15231521

@@ -1526,8 +1524,25 @@ Power series composition
15261524
to be zero.
15271525

15281526
This implementation uses Brent-Kung algorithm 2.1 [BrentKung1978]_.
1529-
The default ``fmpz_poly`` composition algorithm is automatically
1530-
used when the composition can be performed over the integers.
1527+
1528+
.. function:: void _fmpq_poly_compose_series_kinoshita_li(fmpz * res, fmpz_t den, const fmpz * poly1, const fmpz_t den1, slong len1, const fmpz * poly2, const fmpz_t den2, slong len2, slong n)
1529+
1530+
Sets ``(res, den, n)`` to the composition of
1531+
``(poly1, den1, len1)`` and ``(poly2, den2, len2)`` modulo `x^n`,
1532+
where the constant term of ``poly2`` is required to be zero.
1533+
1534+
Assumes that ``len1, len2, n > 0``, that ``len1, len2 <= n``,
1535+
and that ``res`` has space for ``n`` coefficients.
1536+
1537+
This implementation uses the Kinoshita-Li algorithm [KL2024]_.
1538+
1539+
.. function:: void fmpq_poly_compose_series_kinoshita_li(fmpq_poly_t res, const fmpq_poly_t poly1, const fmpq_poly_t poly2, slong n)
1540+
1541+
Sets ``res`` to the composition of ``poly1`` and ``poly2``
1542+
modulo `x^n`, where the constant term of ``poly2`` is required
1543+
to be zero.
1544+
1545+
This implementation uses the Kinoshita-Li algorithm [KL2024]_.
15311546

15321547
.. function:: void _fmpq_poly_compose_series(fmpz * res, fmpz_t den, const fmpz * poly1, const fmpz_t den1, slong len1, const fmpz * poly2, const fmpz_t den2, slong len2, slong n)
15331548

@@ -1540,10 +1555,10 @@ Power series composition
15401555
space for ``n`` coefficients. Does not support aliasing between any
15411556
of the inputs and the output.
15421557

1543-
This implementation automatically switches between the Horner scheme
1544-
and Brent-Kung algorithm 2.1 depending on the size of the inputs.
1545-
The default ``fmpz_poly`` composition algorithm is automatically
1546-
used when the composition can be performed over the integers.
1558+
This implementation automatically switches between the Horner scheme,
1559+
Brent-Kung algorithm 2.1 and the Kinoshita-Li algorithm depending on the
1560+
size of the inputs. The default ``fmpz_poly`` composition algorithm is
1561+
automaticallyused when the composition can be performed over the integers.
15471562

15481563
.. function:: void fmpq_poly_compose_series(fmpq_poly_t res, const fmpq_poly_t poly1, const fmpq_poly_t poly2, slong n)
15491564

@@ -1641,7 +1656,8 @@ Power series reversion
16411656
the linear term is required to be nonzero. Assumes that `n > 0`.
16421657
Does not support aliasing between any of the inputs and the output.
16431658

1644-
This implementation defaults to using Newton iteration.
1659+
This implementation chooses between fast Lagrange inversion and
1660+
Newton iteration depending on the inputs.
16451661
The default ``fmpz_poly`` reversion algorithm is automatically
16461662
used when the reversion can be performed over the integers.
16471663

@@ -1651,7 +1667,8 @@ Power series reversion
16511667
The constant term of ``poly2`` is required to be zero and
16521668
the linear term is required to be nonzero.
16531669

1654-
This implementation defaults to using Newton iteration.
1670+
This implementation chooses between fast Lagrange inversion and
1671+
Newton iteration depending on the inputs.
16551672
The default ``fmpz_poly`` reversion algorithm is automatically
16561673
used when the reversion can be performed over the integers.
16571674

doc/source/gr_poly.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -798,13 +798,16 @@ Power series composition and reversion
798798
int gr_poly_compose_series_brent_kung(gr_poly_t res, const gr_poly_t poly1, const gr_poly_t poly2, slong n, gr_ctx_t ctx)
799799
int _gr_poly_compose_series_divconquer(gr_ptr res, gr_srcptr poly1, slong len1, gr_srcptr poly2, slong len2, slong n, gr_ctx_t ctx)
800800
int gr_poly_compose_series_divconquer(gr_poly_t res, const gr_poly_t poly1, const gr_poly_t poly2, slong n, gr_ctx_t ctx)
801+
int _gr_poly_compose_series_kinoshita_li(gr_ptr res, gr_srcptr poly1, slong len1, gr_srcptr poly2, slong len2, slong n, gr_ctx_t ctx)
802+
int gr_poly_compose_series_kinoshita_li(gr_poly_t res, const gr_poly_t poly1, const gr_poly_t poly2, slong n, gr_ctx_t ctx)
801803
int _gr_poly_compose_series(gr_ptr res, gr_srcptr poly1, slong len1, gr_srcptr poly2, slong len2, slong n, gr_ctx_t ctx)
802804
int gr_poly_compose_series(gr_poly_t res, const gr_poly_t poly1, const gr_poly_t poly2, slong n, gr_ctx_t ctx)
803805

804806
Sets *res* to the power series composition `h(x) = f(g(x))` truncated
805807
to order `O(x^n)` where `f` is given by *poly1* and `g` is given by *poly2*,
806808
respectively using Horner's rule, the Brent-Kung baby step-giant step
807-
algorithm [BrentKung1978]_, divide-and-conquer, and an automatic choice between the algorithms.
809+
algorithm [BrentKung1978]_, divide-and-conquer, the quasilinear-complexity
810+
Kinoshita-Li algorithm [KL2024]_, and an automatic choice between the algorithms.
808811

809812
The default algorithm also handles short input and
810813
special-form input `g = ax^n` efficiently.

doc/source/references.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,8 @@ References
217217
218218
.. [Kar1998] \E. A. Karatsuba, "Fast evaluation of the Hurwitz zeta function and Dirichlet L-series", Problems of Information Transmission 34:4 (1998), 342-353, http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=ppi&paperid=425&option_lang=eng
219219
220+
.. [KL2024] \Y. Kinoshita and B. Li, "Power series composition in near-linear time", 2024, https://arxiv.org/abs/2404.05177
221+
220222
.. [Knu1997] \Knuth, D. E. The Art of Computer Programming, volume 2: Seminumerical algorithms, 1997
221223
222224
.. [Kob2010] \A. Kobel, "Certified Complex Numerical Root Finding", Seminar on Computational Geometry and Geometric Computing (2010), http://www.mpi-inf.mpg.de/departments/d1/teaching/ss10/Seminar_CGGC/Slides/02_Kobel_NRS.pdf

src/fmpq_poly.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,13 @@ void _fmpq_poly_compose_series_brent_kung(fmpz * res, fmpz_t den,
683683
void fmpq_poly_compose_series_brent_kung(fmpq_poly_t res,
684684
const fmpq_poly_t poly1, const fmpq_poly_t poly2, slong n);
685685

686+
void _fmpq_poly_compose_series_kinoshita_li(fmpz * res, fmpz_t den,
687+
const fmpz * poly1, const fmpz_t den1, slong len1,
688+
const fmpz * poly2, const fmpz_t den2, slong len2, slong n);
689+
690+
void fmpq_poly_compose_series_kinoshita_li(fmpq_poly_t res,
691+
const fmpq_poly_t poly1, const fmpq_poly_t poly2, slong n);
692+
686693
void _fmpq_poly_compose_series(fmpz * res, fmpz_t den,
687694
const fmpz * poly1, const fmpz_t den1, slong len1,
688695
const fmpz * poly2, const fmpz_t den2, slong len2, slong n);

src/fmpq_poly/compose_series.c

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,23 @@ _fmpq_poly_compose_series(fmpz * res, fmpz_t den, const fmpz * poly1,
1919
const fmpz_t den1, slong len1, const fmpz * poly2,
2020
const fmpz_t den2, slong len2, slong n)
2121
{
22-
if (len1 <= 20)
22+
if (fmpz_is_one(den2))
23+
{
24+
_fmpz_poly_compose_series(res, poly1, len1, poly2, len2, n);
25+
fmpz_set(den, den1);
26+
_fmpq_poly_canonicalise(res, den, n);
27+
return;
28+
}
29+
30+
if (FLINT_MIN(len1, n) < 8)
2331
_fmpq_poly_compose_series_horner(res, den, poly1, den1, len1,
2432
poly2, den2, len2, n);
25-
else
33+
else if (n < 250)
2634
_fmpq_poly_compose_series_brent_kung(res, den, poly1, den1, len1,
2735
poly2, den2, len2, n);
36+
else
37+
_fmpq_poly_compose_series_kinoshita_li(res, den, poly1, den1, len1,
38+
poly2, den2, len2, n);
2839
}
2940

3041
void

src/fmpq_poly/compose_series_brent_kung.c

Lines changed: 69 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/*
22
Copyright (C) 2010 Sebastian Pancratz
3-
Copyright (C) 2011 Fredrik Johansson
3+
Copyright (C) 2011, 2026 Fredrik Johansson
44
55
This file is part of FLINT.
66
@@ -14,48 +14,19 @@
1414
#include "fmpz.h"
1515
#include "fmpz_vec.h"
1616
#include "fmpz_poly.h"
17-
#include "fmpq.h"
18-
#include "fmpq_mat.h"
17+
#include "fmpz_mat.h"
1918
#include "fmpq_poly.h"
2019

21-
static void
22-
_fmpq_mat_get_row(fmpz * rnum, fmpz_t den, fmpq_mat_t A, slong i)
23-
{
24-
slong j;
25-
fmpz_t t;
26-
fmpz_init(t);
27-
fmpz_one(den);
28-
29-
for (j = 0; j < fmpq_mat_ncols(A); j++)
30-
fmpz_lcm(den, den, fmpq_mat_entry_den(A, i, j));
31-
32-
for (j = 0; j < fmpq_mat_ncols(A); j++)
33-
{
34-
fmpz_divexact(t, den, fmpq_mat_entry_den(A, i, j));
35-
fmpz_mul(rnum + j, fmpq_mat_entry_num(A, i, j), t);
36-
}
37-
38-
fmpz_clear(t);
39-
}
40-
41-
4220
void
4321
_fmpq_poly_compose_series_brent_kung(fmpz * res, fmpz_t den, const fmpz * poly1,
4422
const fmpz_t den1, slong len1, const fmpz * poly2,
4523
const fmpz_t den2, slong len2, slong n)
4624
{
47-
fmpq_mat_t A, B, C;
48-
fmpz_t tden, uden, hden;
49-
fmpz *t, *u, *h, *swap;
50-
slong i, j, m;
51-
52-
if (fmpz_is_one(den2))
53-
{
54-
_fmpz_poly_compose_series(res, poly1, len1, poly2, len2, n);
55-
fmpz_set(den, den1);
56-
_fmpq_poly_canonicalise(res, den, n);
57-
return;
58-
}
25+
fmpz_mat_t A, B, C;
26+
fmpz *A_den, *t, *h;
27+
fmpz_t lcd, scale;
28+
fmpz_t C_den, tden, hden;
29+
slong i, m;
5930

6031
if (n == 1)
6132
{
@@ -67,85 +38,98 @@ _fmpq_poly_compose_series_brent_kung(fmpz * res, fmpz_t den, const fmpz * poly1,
6738

6839
m = n_sqrt(n) + 1;
6940

70-
fmpq_mat_init(A, m, n);
71-
fmpq_mat_init(B, m, m);
72-
fmpq_mat_init(C, m, n);
41+
fmpz_mat_init(A, m, n);
42+
fmpz_mat_init(B, m, m);
43+
fmpz_mat_init(C, m, n);
44+
A_den = _fmpz_vec_init(m);
7345

46+
fmpz_init(lcd);
47+
fmpz_init(scale);
48+
fmpz_init(C_den);
7449
fmpz_init(tden);
75-
fmpz_init(uden);
7650
fmpz_init(hden);
7751
h = _fmpz_vec_init(n);
7852
t = _fmpz_vec_init(n);
79-
u = _fmpz_vec_init(n);
8053

81-
/* Set rows of B to the segments of poly1 */
54+
/* Set rows of B to the segments of poly1 with common denominator den1. */
8255
for (i = 0; i < len1; i++)
56+
fmpz_set(fmpz_mat_entry(B, i / m, i % m), poly1 + i);
57+
/* Remark: if the poly is non-canonical e.g. due to being a truncation
58+
of a longer power series, it could be helpful to remove its content
59+
here. We could also consider removing content of B row by row. */
60+
61+
/* Set rows of A to the numerators of powers of poly2 with corresponding
62+
denominators in A_den[i]. */
63+
fmpz_one(fmpz_mat_entry(A, 0, 0));
64+
fmpz_one(A_den + 0);
65+
_fmpz_vec_set(fmpz_mat_row(A, 1), poly2, len2);
66+
fmpz_set(A_den + 1, den2);
67+
/* Optional: may improve performance if poly2 is non-canonical e.g. due
68+
to being a truncation of a longer power series. */
69+
_fmpq_poly_canonicalise(fmpz_mat_row(A, 1), A_den + 1, n);
70+
71+
for (i = 2; i < m; i++)
8372
{
84-
fmpz_set(fmpq_mat_entry_num(B, i / m, i % m), poly1 + i);
85-
fmpz_set(fmpq_mat_entry_den(B, i / m, i % m), den1);
86-
fmpq_canonicalise(fmpq_mat_entry(B, i / m, i % m));
73+
fmpz * Ai = fmpz_mat_row(A, i);
74+
fmpz * Ai1 = fmpz_mat_row(A, i - 1);
75+
fmpz * Ai_den = A_den + i;
76+
fmpz * Ai1_den = A_den + i - 1;
77+
78+
_fmpq_poly_mullow(Ai, Ai_den, Ai1, Ai1_den, n, poly2, den2, len2, n);
79+
_fmpq_poly_canonicalise(Ai, Ai_den, n);
8780
}
8881

89-
/* Set rows of A to powers of poly2 */
90-
fmpq_set_si(fmpq_mat_entry(A, 0, 0), WORD(1), WORD(1));
82+
/* Compute h = poly2 ^ m */
83+
_fmpq_poly_mullow(h, hden, fmpz_mat_row(A, m - 1), A_den + m - 1, n, poly2, den2, len2, n);
84+
_fmpq_poly_canonicalise(h, hden, n);
9185

92-
for (i = 0; i < len2; i++)
93-
{
94-
fmpz_set(fmpq_mat_entry_num(A, 1, i), poly2 + i);
95-
fmpz_set(fmpq_mat_entry_den(A, 1, i), den2);
96-
fmpq_canonicalise(fmpq_mat_entry(A, 1, i));
97-
}
86+
/* Matrix multiply C = B * A over the integers.
87+
B (m x m) has common denominator den1.
88+
A (m x n) has row denominator A_den[i].
9889
99-
_fmpz_vec_set(h, poly2, len2);
100-
fmpz_set(hden, den2);
90+
We could remove gcd of A columnwise before multiplying, but
91+
this is slower for small to moderate n where we want to use Brent-Kung
92+
and faster only for large n where we want to use Kinoshita-Li instead. */
93+
fmpz_one(lcd);
94+
for (i = 0; i < m; i++)
95+
fmpz_lcm(lcd, lcd, A_den + i);
10196

102-
for (i = 2; i < m; i++)
97+
for (i = 0; i < m; i++)
10398
{
104-
_fmpq_poly_mullow(t, tden, h, hden, n, poly2, den2, len2, n);
105-
_fmpq_poly_canonicalise(t, tden, n);
106-
107-
for (j = 0; j < n; j++)
108-
{
109-
fmpz_set(fmpq_mat_entry_num(A, i, j), t + j);
110-
fmpz_set(fmpq_mat_entry_den(A, i, j), tden);
111-
fmpq_canonicalise(fmpq_mat_entry(A, i, j));
112-
}
113-
swap = t; t = h; h = swap;
114-
fmpz_swap(hden, tden);
99+
fmpz_divexact(scale, lcd, A_den + i);
100+
if (!fmpz_is_one(scale))
101+
_fmpz_vec_scalar_mul_fmpz(fmpz_mat_row(A, i), fmpz_mat_row(A, i), n, scale);
115102
}
116103

117-
/* Compute h = poly2 ^ m */
118-
_fmpq_poly_mullow(t, tden, h, hden, n, poly2, den2, len2, n);
119-
_fmpq_poly_canonicalise(t, tden, n);
120-
swap = t; t = h; h = swap;
121-
fmpz_swap(hden, tden);
104+
fmpz_mat_mul(C, B, A);
105+
fmpz_mat_clear(A);
106+
_fmpz_vec_clear(A_den, m);
107+
fmpz_mat_clear(B);
122108

123-
/* Matrix multiply */
124-
fmpq_mat_mul(C, B, A);
125-
fmpq_mat_clear(A);
126-
fmpq_mat_clear(B);
109+
fmpz_mul(C_den, den1, lcd);
127110

111+
_fmpz_vec_set(res, fmpz_mat_row(C, m - 1), n);
112+
fmpz_set(den, C_den);
113+
_fmpq_poly_canonicalise(res, den, n);
128114
/* Evaluate block composition using the Horner scheme */
129-
_fmpq_mat_get_row(res, den, C, m - 1);
130-
131115
for (i = m - 2; i >= 0; i--)
132116
{
133117
_fmpq_poly_mullow(t, tden, res, den, n, h, hden, n, n);
134-
/* we could canonicalise t here, but it does not seem to make
135-
much of a difference */
136-
_fmpq_mat_get_row(u, uden, C, i);
137-
_fmpq_poly_add(res, den, t, tden, n, u, uden, n);
118+
/* Remark: we could canonicalise t and/or fmpz_mat_row(C, i)
119+
here; in practice this seems to be slower at least for moderate n. */
120+
_fmpq_poly_add(res, den, t, tden, n, fmpz_mat_row(C, i), C_den, n);
138121
}
139122

140123
_fmpq_poly_canonicalise(res, den, n);
141124

142-
fmpq_mat_clear(C);
125+
fmpz_mat_clear(C);
143126

144127
_fmpz_vec_clear(t, n);
145-
_fmpz_vec_clear(u, n);
146128
_fmpz_vec_clear(h, n);
129+
fmpz_clear(lcd);
130+
fmpz_clear(scale);
131+
fmpz_clear(C_den);
147132
fmpz_clear(tden);
148-
fmpz_clear(uden);
149133
fmpz_clear(hden);
150134
}
151135

src/fmpq_poly/compose_series_horner.c

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,7 @@ _fmpq_poly_compose_series_horner(fmpz * res, fmpz_t den, const fmpz * poly1,
2020
const fmpz_t den1, slong len1, const fmpz * poly2,
2121
const fmpz_t den2, slong len2, slong n)
2222
{
23-
if (fmpz_is_one(den2))
24-
{
25-
_fmpz_poly_compose_series(res, poly1, len1, poly2, len2, n);
26-
fmpz_set(den, den1);
27-
_fmpq_poly_canonicalise(res, den, n);
28-
}
29-
else if (n == 1)
23+
if (n == 1)
3024
{
3125
fmpz_set(res, poly1);
3226
fmpz_set(den, den1);

0 commit comments

Comments
 (0)