Skip to content

Profile for nmod_poly_interpolate and improvements of interpolation at points in geometric progression#2659

Merged
vneiger merged 13 commits into
flintlib:mainfrom
vneiger:geometric_interpolate
May 4, 2026
Merged

Profile for nmod_poly_interpolate and improvements of interpolation at points in geometric progression#2659
vneiger merged 13 commits into
flintlib:mainfrom
vneiger:geometric_interpolate

Conversation

@vneiger
Copy link
Copy Markdown
Collaborator

@vneiger vneiger commented May 3, 2026

This PR follows on from #2657 and focuses on improving interpolation. It

  • adds a profile file for comparing various functions nmod_poly_interpolate for general points and for points in geometric progression,
  • simplify a part of the evaluation at a geometric progression thanks to recent improvements of mulmid

And, for interpolation at a geometric progression:

  • add comments to (understand and) explain what is going on
  • modify the main function to allow the number of values to interpolate to be smaller than the number of points used for building the precomputed data : meaning that, in some algorithm where there are multiple evaluations/interpolations at various number of points, with these changes we can build the precomputation once for all for the maximum needed number of points, and then all interpolations with fewer points will be doable with this precomputation (and as fast as if we had done them with the smaller, ad hoc precomputation)
  • incidentally, the changes allow us to remove some data from the precomputation, that had become redundant or useless with the current version (this speeds up precomputation by about 20% and reduces its memory footprint by 40%)
  • augment test file so that it covers these cases with precomputation larger than the number of points
  • [TODO : update documentation.] edit: done

Conclusions from timings (below, for 63 bits, and AMD zen4):

  • The efficiency is unchanged when the number of points matches that in the precomputed data, except for the small acceleration of the precomputations. The theoretical cost of interpolation at len points in geometric progression is the cost of two polynomial multiplications in degree len, and we can see that the observed timings are very close to this even when counting the precomputations in, as soon as the number of points exceeds 12 or so (it is sometimes even better than two multiplications, probably due to some thresholds).
  • When the number of points in precomputation increase, there is no penalty at all for the new variant. For the former variant, it scales linearly with the increase. Said otherwise, in the second table below, we see an improvement by a factor 2 or a bit more (resp. 8 or a bit more) when the precomputation is twice (resp. 8 times) as large as the actual number of values to interpolate.
len	points |            GENERAL POINTS              | GEOMETRIC PROGRESSION  | POLY_MUL
len	points |newton	baryc	fast	w/ tree	tree	| fast	w/ prec	precomp
1	      1|2.9e-03 3.3e-03 5.3e-02 1.7e-02 3.1e-02 |4.4e-02 3.0e-03 7.1e-02 |8.6e-03
2	      2|6.9e-02 1.7e-01 2.7e-01 3.9e-02 2.1e-01 |1.1e-01 5.0e-02 8.4e-02 |1.6e-02
3	      3|1.8e-01 2.9e-01 3.8e-01 6.6e-02 3.0e-01 |1.5e-01 6.8e-02 1.0e-01 |2.7e-02
4	      4|3.5e-01 4.2e-01 4.7e-01 9.4e-02 3.8e-01 |2.0e-01 1.0e-01 1.4e-01 |3.8e-02
6	      6|8.6e-01 7.5e-01 8.0e-01 1.7e-01 6.3e-01 |2.9e-01 1.5e-01 1.9e-01 |6.9e-02
8	      8|1.6e+00 1.1e+00 1.1e+00 2.5e-01 8.7e-01 |3.9e-01 2.0e-01 2.5e-01 |1.1e-01
10	     10|2.6e+00 1.7e+00 1.6e+00 3.6e-01 1.2e+00 |4.7e-01 2.5e-01 3.0e-01 |1.5e-01
12	     12|3.8e+00 2.2e+00 2.0e+00 4.6e-01 1.5e+00 |5.6e-01 3.2e-01 3.5e-01 |2.5e-01
16	     16|7.0e+00 3.7e+00 2.9e+00 7.3e-01 2.1e+00 |7.8e-01 4.9e-01 4.4e-01 |4.5e-01
20	     20|1.2e+01 5.5e+00 4.0e+00 1.1e+00 2.9e+00 |1.0e+00 6.7e-01 5.0e-01 |6.0e-01
30	     30|2.9e+01 1.2e+01 7.2e+00 2.2e+00 4.9e+00 |1.6e+00 1.2e+00 7.1e-01 |1.1e+00
45	     45|1.3e+02 2.4e+01 1.3e+01 4.3e+00 8.8e+00 |2.8e+00 2.1e+00 1.0e+00 |1.7e+00
70	     70|3.7e+02 5.6e+01 2.6e+01 8.8e+00 1.7e+01 |5.3e+00 4.3e+00 1.5e+00 |3.1e+00
100	    100|7.8e+02 1.1e+02 4.2e+01 1.5e+01 2.7e+01 |9.0e+00 7.6e+00 2.1e+00 |4.3e+00
200	    200|3.2e+03 4.7e+02 1.1e+02 3.9e+01 7.5e+01 |2.1e+01 1.9e+01 4.1e+00 |8.6e+00
400	    400|1.3e+04 2.0e+03 3.2e+02 9.7e+01 2.2e+02 |4.0e+01 3.5e+01 8.0e+00 |1.8e+01
800	    800|0.0e+00 7.8e+03 8.8e+02 2.3e+02 6.3e+02 |7.8e+01 7.0e+01 1.6e+01 |3.5e+01
1600   1600|0.0e+00 0.0e+00 2.1e+03 5.3e+02 1.5e+03 |1.5e+02 1.3e+02 3.1e+01 |6.8e+01
3200   3200|0.0e+00 0.0e+00 5.0e+03 1.3e+03 3.7e+03 |3.1e+02 2.7e+02 6.2e+01 |1.4e+02
6400   6400|0.0e+00 0.0e+00 1.1e+04 2.9e+03 8.5e+03 |6.5e+02 5.7e+02 1.2e+02 |3.1e+02
12800 12800|0.0e+00 0.0e+00 2.6e+04 6.5e+03 2.0e+04 |1.4e+03 1.3e+03 2.5e+02 |6.8e+02
precomp ||   = 1 * length    ||   = 2 * length    ||   = 8 * length   
----------------------------------------------------------------------
length  ||  before |  after  ||  before |  after  ||  before |  after 
      1 || 3.1e-03 | 2.9e-03 || 3.3e-03 | 3.1e-03 || 3.2e-03 | 3.0e-03
      2 || 5.2e-02 | 5.0e-02 || 9.8e-02 | 5.3e-02 || 6.4e-01 | 5.0e-02
      3 || 6.6e-02 | 6.7e-02 || 1.5e-01 | 7.1e-02 || 8.3e-01 | 6.6e-02
      4 || 9.9e-02 | 1.0e-01 || 1.9e-01 | 1.0e-01 || 1.2e+00 | 1.0e-01
      6 || 1.5e-01 | 1.5e-01 || 3.1e-01 | 1.5e-01 || 2.3e+00 | 1.5e-01
      8 || 2.0e-01 | 2.0e-01 || 4.8e-01 | 2.0e-01 || 4.0e+00 | 2.0e-01
     10 || 2.5e-01 | 2.5e-01 || 6.6e-01 | 2.5e-01 || 5.2e+00 | 2.5e-01
     12 || 3.2e-01 | 3.3e-01 || 8.4e-01 | 3.2e-01 || 7.0e+00 | 3.2e-01
     16 || 5.0e-01 | 4.9e-01 || 1.3e+00 | 4.9e-01 || 1.2e+01 | 5.0e-01
     20 || 6.7e-01 | 6.7e-01 || 1.8e+00 | 6.7e-01 || 1.7e+01 | 6.7e-01
     30 || 1.2e+00 | 1.2e+00 || 3.3e+00 | 1.2e+00 || 2.1e+01 | 1.2e+00
     45 || 2.1e+00 | 2.2e+00 || 6.3e+00 | 2.1e+00 || 3.0e+01 | 2.1e+00
     70 || 4.3e+00 | 4.2e+00 || 1.3e+01 | 4.3e+00 || 4.8e+01 | 4.3e+00
    100 || 7.6e+00 | 7.6e+00 || 1.9e+01 | 7.6e+00 || 6.5e+01 | 7.6e+00
    200 || 1.9e+01 | 1.9e+01 || 3.5e+01 | 1.8e+01 || 1.3e+02 | 1.9e+01
    400 || 3.5e+01 | 3.5e+01 || 6.6e+01 | 3.5e+01 || 2.7e+02 | 3.5e+01
    800 || 6.8e+01 | 6.7e+01 || 1.4e+02 | 6.6e+01 || 6.1e+02 | 6.6e+01
   1600 || 1.3e+02 | 1.3e+02 || 2.9e+02 | 1.3e+02 || 1.2e+03 | 1.3e+02
   3200 || 2.6e+02 | 2.7e+02 || 6.0e+02 | 2.7e+02 || 2.6e+03 | 2.7e+02
   6400 || 5.6e+02 | 5.5e+02 || 1.2e+03 | 5.7e+02 || 5.4e+03 | 5.6e+02
  12800 || 1.2e+03 | 1.2e+03 || 2.6e+03 | 1.2e+03 || 1.2e+04 | 1.2e+03
  25600 || 2.5e+03 | 2.5e+03 || 5.4e+03 | 2.6e+03 || 2.5e+04 | 2.5e+03
  51200 || 5.4e+03 | 5.3e+03 || 1.1e+04 | 5.4e+03 || 5.6e+04 | 5.4e+03
 102400 || 1.2e+04 | 1.1e+04 || 2.5e+04 | 1.2e+04 || 1.2e+05 | 1.2e+04
 204800 || 2.5e+04 | 2.5e+04 || 5.6e+04 | 2.5e+04 || 2.7e+05 | 2.5e+04

vneiger added 12 commits May 1, 2026 16:17
…ar multiplication with precomputation when the modulus has <= 63 bits
… points in precomputation differs from the number of values to interpolate
second step of geometric interpolate: explain computation
…lify precomputations by removing data that is not used anymore
  interpolation
- documentation contained sometimes `len`, sometimes `n` for the number
  of points, with `n` also used for the modulus value in close-by
  functions: unify to `len` for the number of points; or `ilen` (input
  length) and `olen` (output length) when necessary
- some cleaning in impl.h, making these functions static
- minor fix in the profile and interpolation function to avoid some
  memory leak
@vneiger vneiger marked this pull request as ready for review May 3, 2026 12:35
@fredrik-johansson
Copy link
Copy Markdown
Collaborator

LGTM. Feel free to merge both PRs if you're happy with them.

@vneiger
Copy link
Copy Markdown
Collaborator Author

vneiger commented May 4, 2026

Thanks for the feedback!

@vneiger vneiger merged commit 4a0ffb0 into flintlib:main May 4, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants