Hey folks, I am jotting down some thoughts about a C API for OpenPMD.
The reason I am doing this is because over the winter, I engaged a brief holiday project to partially implement the OpenPMD standard with BeamPhysics extension in C with the HDF5 backend. I wanted to simply load particles from tools such as Bmad in other projects I work on. Scope creep led me to implementing a good portion of the OpenPMD standard and after showing it to @ax3l, he recommended coming here and adding some notes on the experience.
You can check out my C implementation here: github.com/electronsandstuff/Parcel
Now for the notes.
- The Series and Iteration objects became C resource handles in Parcel.
pmd_open_series and pmd_open_iteration create the handles and abstract over file-based vs group-based series. The rules of HDF5 access modes are used to open the file, control writing, handle new file creation.
- One thing I liked was that I don't have a separate flag for creating a file-based vs group-based series. Instead,
pmd_open_series checks if the iteration pattern character %T is in the filename and treats it as file-based if so or group-based if not.
pmd_list_iterations is used to get all iterations within the series. This had challenges for file-based series as it involves scanning the disk and uses caching for performance. It detects all unique iterations even if non-consecutive.
- Each of the root attributes and iteration attributes has a getter / setter function.
- Challenges in keeping all files in file-based series synced. I am sure this is dealt with by the C++ code sitting behind the C API here, however.
- Also difficulties if file-based series is created with no iterations yet. Have to cache root attributes until iterations are written for consistency in how setters work.
- Particle Groups
- For loading, used a struct of arrays and non-allocating function to read particle data. The intent is that users may integrate with existing code and their existing data structure. They would set the pointers in the Parcel struct to the start of their correctly sized arrays (use function to get array size from file first). Then Parcel drops data into arrays. If pointer is NULL, that field is ignored.
- Writing works similarly, pass the Parcel struct with user-supplied arrays or NULL if not used.
- Note, this implementation lacks support for MPIO which would be nice in some parallel codes.
- One thing that isn't perfect is that on series or iteration open, default values are used and written to the handle or disk. It is then up to the user to change the default.
- I also think some simplification could happen specifically for the use case of a series with single iteration. This is common for just saving particles from a simulation and opening it requires opening the series, listing the iterations, then opening the iteration, then reading the particles. It might be worthwhile to have a way to just load from a single iteration.
- I liked this API's support for user-provided particle structs. It is also possible of-course to add functions that let the user read field-by field.
Overall, I would recommend implementing something similar with handles and open/close functions and setters/getters and reviewing the pros/cons I listed.
Hey folks, I am jotting down some thoughts about a C API for OpenPMD.
The reason I am doing this is because over the winter, I engaged a brief holiday project to partially implement the OpenPMD standard with BeamPhysics extension in C with the HDF5 backend. I wanted to simply load particles from tools such as Bmad in other projects I work on. Scope creep led me to implementing a good portion of the OpenPMD standard and after showing it to @ax3l, he recommended coming here and adding some notes on the experience.
You can check out my C implementation here: github.com/electronsandstuff/Parcel
Now for the notes.
pmd_open_seriesandpmd_open_iterationcreate the handles and abstract over file-based vs group-based series. The rules of HDF5 access modes are used to open the file, control writing, handle new file creation.pmd_open_serieschecks if the iteration pattern character%Tis in the filename and treats it as file-based if so or group-based if not.pmd_list_iterationsis used to get all iterations within the series. This had challenges for file-based series as it involves scanning the disk and uses caching for performance. It detects all unique iterations even if non-consecutive.Overall, I would recommend implementing something similar with handles and open/close functions and setters/getters and reviewing the pros/cons I listed.