Skip to content

Ildg 1.2#15

Merged
edbennett merged 68 commits into
telos-collaboration:developfrom
gray95:ildg-1.2
Apr 2, 2026
Merged

Ildg 1.2#15
edbennett merged 68 commits into
telos-collaboration:developfrom
gray95:ildg-1.2

Conversation

@gray95
Copy link
Copy Markdown
Collaborator

@gray95 gray95 commented Dec 3, 2025

  • added ability to write ILDG 1.2 spec compliant lattice gauge fields (SU(N) and Sp(2N))
  • added ability to read ILDG 1.2 spec fields while retaining backwards compatibility for older fields
  • added ability to write reduced format SU and Sp fields
  • added ability to read reduced format SU and Sp fields (SU N=2,3 only currently)
  • added Test_ildg_reducedfmt_io.cc to tests/IO which checks the various read/write options work.

Much of the logic for writing fields is worked out in IldgWriter::writeConfiguration at compile time via template arguments and if constexprs. group_name, matrix_fmt, and fp_fmt are the relevant template parameters.
The logic for reading fields in IldgReader::readConfiguration has to be done at runtime. Extra intermediate datatypes for Sp(N) fields are specified in parallelIO/Metadata.h

Copy link
Copy Markdown

@edbennett edbennett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gaurav

Many thanks for your work on this so far, I think we're almost there.

I've requested some changes in the attached review. Most of them are pretty trivial, relating to not wanting to pollute the Git logs. (Upstream has got annoyed at other people doing this before, and I want us to stay on their good side.)

One thing I haven't highlighted specific instances of: where you add new functionality (either new functions, or significant edits to existing ones), please can you add detailed comments describing what you've done, linking to reference material where relevant? I realise this isn't standard practice in Grid, but our lives would have been a lot easier if it were, and so I'm aiming for our new contributions to be better about this to make life easier for the next person.

For example, reconstructSp should have a 5–10 line comment above it describing what the expected block structure of a symplectic matrix is, and hence what A and B in the comments mean, and have a link to the relevant section of the ILDG binary spec 1.2.

Comment thread Grid/parallelIO/IldgIO.h Outdated
Comment thread Grid/parallelIO/IldgIO.h Outdated
Comment thread Grid/parallelIO/IldgIO.h Outdated
Comment thread Grid/parallelIO/IldgIO.h Outdated
Comment thread Grid/parallelIO/IldgIO.h Outdated
Comment thread Grid/parallelIO/MetaData.h Outdated
Comment thread Grid/parallelIO/MetaData.h Outdated
Comment on lines 99 to 100
@@ -107,2 +99,2 @@
public:
constexpr static const char* const Name = "Binary";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change and the next few aren't related to this PR; in principle you could rebase your branch onto the current telos-collaboration:develop to get GitHub to remove them.

Comment thread tests/IO/Test_ildg_reducedfmt_io.cc Outdated
Comment thread tests/IO/Test_ildg_reducedfmt_io.cc Outdated
@edbennett
Copy link
Copy Markdown

P.S. one thing I meant to add in the message yesterday: please make sure to update the documentation in documentation/manual.rst, where the ILDG checkpointers are explicitly mentioned.

…teConfiguration now has template args specifying the gauge group and whether to write to reduced storage format. Started working on a reduced storage test.
…ow the default is to write to double (with no group reduction). added Test_ildg_reducedfmt_io for testing writing/reading to reduced group format and single/double precision.
gray95 and others added 22 commits March 11, 2026 17:15
…y explain the <rows/> element logic when reading ildg lattices.
…ckwards compatibility. this necessitates explicitly specifying an additional template param in the ildg reduced format test.
…ckwards compatibility. this necessitates explicitly specifying an additional template param in the ildg reduced format test.
…emove little endian from initialisation check
merge feature - hmc checkpointing with ildg reduced format read/write functionality.
@edbennett
Copy link
Copy Markdown

Many thanks. I tidied up a couple of loose ends.

I've just tested this on Tursa, and it doesn't build, giving errors like:

../../Grid/libGrid.a(Init.o): In function `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here
../../Grid/libGrid.a(FlightRecorder.o): In function `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here
../../Grid/libGrid.a(StaggeredKernelsInstantiation.o): In function `cudaLaunchKernel<char>':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here

configure line was:

../configure --disable-fermion-instantiations --prefix /home/dp208/dp208/shared/prefixes/prefix_grid_20260224 --with-gmp=/home/dp208/dp208/dc-benn2/prefix-gmp --with-mpfr=/home/dp208/dp208/dc-benn2/prefix-gmp --with-lime=/home/dp208/dp208/dc-benn2/prefix --enable-comms=mpi --enable-simd=GPU --enable-shm=nvlink --enable-gen-simd-width=64 --enable-accelerator=cuda --enable-Sp --enable-Nc=4 --disable-unified CXX=nvcc LDFLAGS="-cudart shared -lcublas" CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -std=c++17 -cudart shared"

Have you been able to successfully test on Tursa?

@gray95
Copy link
Copy Markdown
Collaborator Author

gray95 commented Apr 1, 2026

Many thanks. I tidied up a couple of loose ends.

I've just tested this on Tursa, and it doesn't build, giving errors like:

../../Grid/libGrid.a(Init.o): In function `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here
../../Grid/libGrid.a(FlightRecorder.o): In function `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here
../../Grid/libGrid.a(StaggeredKernelsInstantiation.o): In function `cudaLaunchKernel<char>':
/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: multiple definition of `Grid::is_perm_even(std::vector<int, std::allocator<int> >&)'
Test_hmc_Sp_ildg_pureGaugeWilson.o:/mnt/lustre/tursafs1/home/dp208/dp208/shared/src/Grid_20260401_gsr/Grid/parallelIO/MetaData.h:256: first defined here

configure line was:

../configure --disable-fermion-instantiations --prefix /home/dp208/dp208/shared/prefixes/prefix_grid_20260224 --with-gmp=/home/dp208/dp208/dc-benn2/prefix-gmp --with-mpfr=/home/dp208/dp208/dc-benn2/prefix-gmp --with-lime=/home/dp208/dp208/dc-benn2/prefix --enable-comms=mpi --enable-simd=GPU --enable-shm=nvlink --enable-gen-simd-width=64 --enable-accelerator=cuda --enable-Sp --enable-Nc=4 --disable-unified CXX=nvcc LDFLAGS="-cudart shared -lcublas" CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -std=c++17 -cudart shared"

Have you been able to successfully test on Tursa?

huh, i will build on tursa and take a look at this. I have tested on sunbird only so far.

@gray95
Copy link
Copy Markdown
Collaborator Author

gray95 commented Apr 1, 2026

that should fix the tursa compilation issue. I have compiled and successfully run Test_io_mungers, Test_ildg_reducedfmt_io and Test_hmc_Sp_ildg_pureGaugeWilson. you can see some of the resulting slurm output in /home/dp208/dp208/dc-ray2/grid-ildg.

@edbennett edbennett marked this pull request as ready for review April 2, 2026 23:24
Copy link
Copy Markdown

@edbennett edbennett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@edbennett edbennett merged commit b07a948 into telos-collaboration:develop Apr 2, 2026
4 checks passed
@gray95 gray95 deleted the ildg-1.2 branch April 8, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants