New IO capabilities via FieldAPI IO by piotrows · Pull Request #121 · ecmwf-ifs/dwarf-p-cloudsc

piotrows · 2025-06-14T13:26:50Z

This PR adds HDF5 IO feature to a FieldAPI variant of CLOUDSC. The changes proposed are:

Extra routine TEST_FIELD_HDF5 placed next to the field's driver version of the Fortran feature, performing FieldAPI HDF5 calls to write and subsequently read CLOUDSC input data. The routine queries for host pointers of the FieldAPI variables, then stores all in the HDF5 file (or files, in case of MPI variant), and ultimately reads them again to verify if the write/read was performed correctly.
Fiat is added as a FieldAPI dependency, since FieldAPI IO module depends on MPI for generality. Fiat-specific files duplicated in CLOUDSC dwarf are compiled only if Fiat is FieldAPI is requested to be disabled.

reuterbal · 2025-07-25T08:07:19Z

The HDF5 functionality has now been added to field_api main. Could this also be merged into develop_ecmwf, as that's the branch we use in cloudsc? (@awnawab)

piotrows · 2025-07-25T08:29:23Z

The HDF5 functionality has now been added to field_api main. Could this also be merged into develop_ecmwf, as that's the branch we use in cloudsc? (@awnawab)

Yes please! I need it for further developments.

awnawab · 2025-07-25T08:45:23Z

The HDF5 functionality has now been added to field_api main. Could this also be merged into develop_ecmwf, as that's the branch we use in cloudsc? (@awnawab)

Sure, I'll rebase ecmwf-develop over main.

awnawab · 2025-07-25T15:05:19Z

F-API develop-ecmwf now rebased over main.

reuterbal · 2025-07-29T08:11:14Z

@piotrows Can you update the field_api branch in the bundle.yml and resolve the conflicts? Then this should be ready for a look I believe?

piotrows · 2025-08-02T12:23:25Z

@piotrows Can you update the field_api branch in the bundle.yml and resolve the conflicts? Then this should be ready for a look I believe?

Done, let's have a look together. I've made this development a bit more optional.

mlange05

First of all, apologies for the delayed review comments, and second of all many thanks for taking this on. I think this is already very useful and I've run a basic example locally and things do indeed work as I'd expect them to. So, many thanks for this and much appreciated.

Next up, I've left a few comments mostly about structure and where to call what. In short, I think it would be better if we would add the file-write capabilities as utilities of the field-based state and flux objects, and call these outside of the driver layer, triggered by env variables to allow run-time switching instead of compile-time switching.

I've also noted two things:

Nothing in the tests actually tests the correct execution of the output file, so silent failure would not be caught. Maybe a simple check for the existence of the dumped file(s) could be added to one of the ctests?
Executing the same run twice only works with a data set of the same size, however changing the working set size on a re-run fails. I'm assuming we want to replace existing output files on the re-run instead of writing into the existing file?

mlange05 · 2025-09-03T14:41:36Z

+    TYPE(CLOUDSC_STATE_TYPE)  ,INTENT(IN) :: TENDENCY_TMP
+    TYPE(CLOUDSC_STATE_TYPE)  ,INTENT(IN) :: TENDENCY_LOC
+
+    REAL(KIND=JPRB), POINTER    :: PAUX_PT(:,:,:)


I don't think the pointer declarations are needed here, or am I missing something?

They are needed for the FA IO API so the shape of the variable becomes explicit.

Are they truly needed there though? From my reading of https://github.com/ecmwf-ifs/field_api/blob/main/src/io/field_RANKSUFF_hdf5_module.fypp#L50 it uses SELF%PTR throughout, so there's no need to pass an externally declared pointer in. I also think this makes it look much more cumbersome than it needs to be, as the field object should have all the information locally to just write its content.

I'm guessing this is required to have shape information embedded into the subroutine interface that allows Fortran to pick the correct variant for the field?

If the pointer was removed, something like

CALL WRITE_HDF5_PERRANK_DATA(PAUX%F_PT, PAUX_PVERVEL, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

would then have to have either a dimension keyword

CALL WRITE_HDF5_PERRANK_DATA(PAUX%F_PT, 3, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

or specify the actual implementation in the routine name, hypothetically paraphrasing:

CALL WRITE_HDF5_PERRANK_DATA_3RB(PAUX%F_PT, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

I might be misinterpreting this, but if the above is the case, then pointers with a certain dimension are an unintuitive way of specifying this information. @awnawab might have a suggestion here?

mlange05 · 2025-09-03T14:47:56Z

      write(0,1003) NUMPROC,NUMOMP,NGPTOTG,NPROMA,NGPBLKS
    end if

+#ifdef FIELD_API_IO 


To me, the option to write the input state to file would be a run-time option, and not a compile-time option. How about we rename TEST_FIELD_HDF5 to WRITE_STATE_HDF5 and make it a property of the CLOUDSC_FIELD_STATE instead of invoking it from the kernel? This way we could toggle this via an environment flag in src/cloudsc_fortran/dwarf_cloudsc.F90 right after loading the input state from the classic input files?

Similar we could then write the output by implementing the same boilerplate on CLOUDSC_FLUX_TYPE instead of adding the I/O to the kernel driver.

How about we rename TEST_FIELD_HDF5 to WRITE_STATE_HDF5

Could be done but what we are doing at the moment is testing (write+read, not a write alone).

make it a property of the CLOUDSC_FIELD_STATE instead of invoking it from the kernel?

Well, this is an option, but then it looses the virtue of a top-level, clear example that teaches the FieldAPI IO. So it really depends if we want expand CLOUDSC capabilities or we would rather prefer a FA IO tutorial, or perhaps both.

Well, this is an option, but then it looses the virtue of a top-level, clear example that teaches the FieldAPI IO. So it really depends if we want expand CLOUDSC capabilities or we would rather prefer a FA IO tutorial, or perhaps both.

I disagree on this one - having a pure procedural test routine does not make a good "tutorial". Instead, the derived types CLOUDSC_FIELD_STATE and CLOUDSC_FLUX_STATE are intended as examples of group types used in the EC-physics context. Adding independent read/write routines to those types is the main goal here, so why not illustrate this here in exactly this form?

How about splitting the two into a write and read routine and calling them in this order from here?
That would work as a demonstration of how to use either mode individually.

For a full-on switch to HDF5 as inputs, we would need to replicate the inputs in a third format commited to the repository. I think that is a good idea but could also be tackled in a follow-on PR maybe?
The input reading is still very much set-up in a Serialbox first, otherwise use the legacy hdf5 read method. We could first remove that and full-on switch to HDF5, then add the field_api hdf5 reader.

mlange05 · 2025-09-03T14:55:34Z

+        help : Enable Field API IO (requires MPI) [ON|OFF]
+        cmake : >
+            FIELD_API_ENABLE_IO={{value}}
+            FIELD_API_ENABLE_FIAT={{value}}


Should we not always enable FIAT now that it is part of the bundle? Also, could we sensibly always enabled Field-API-IO, or do we need that toggle for installs without hdf5?

@reuterbal was strongly opposing permanent dependence on FIAT.
Since the availability of HDF5 for non-standard compilers is not obvious, perhaps we could stick to an optional HDF5...

Ok, that makes sense. In that case please leave this as is.

mlange05 · 2025-09-03T14:56:05Z

            ${COMMON_LIB_LIBS}
+            $<${HAVE_FIELD_API}:fiat>
+            $<${HAVE_FIELD_API}:field_api_${prec}>
+#            $<$<NOT:$<BOOL:${HAVE_FIELD_API}>>:parkind_${prec}>


Debug leftover?

piotrows · 2025-09-03T16:15:21Z

Nothing in the tests actually tests the correct execution of the output file, so silent failure would not be caught. Maybe a simple check for the existence of the dumped file(s) could be added to one of the ctests?

Actually, there is a check. If you look closely, there is first a store to the output file and then an immediate read. If the store is corrupt, the read shall be also corrupt and the usual norms will fail.

piotrows · 2025-09-03T16:19:04Z

Executing the same run twice only works with a data set of the same size, however changing the working set size on a re-run fails. I'm assuming we want to replace existing output files on the re-run instead of writing into the existing file?

This is a design question ... the current solution prevents the user from overwriting his previously stored data and was chosen deliberately. We can extend the API to support both ways, or we can change the default behavior.

piotrows · 2025-09-03T16:38:09Z

@mlange05 @reuterbal
Quick summary of where we are, feel free to further comment:

BR suggested (offline) that I should get rid of the quest for replacing Fiat files with Fiat lib, because it makes things complicated and the Atos CI runs fail due to linking errors. I am not fully convinced (perhaps we could link Fiat static to satisfy the GPU compiler needs), but certainly we have a decision to make.
I have a feeling that HDF5 should remain optional for CLOUDSC at compile time, but run-time is fine to me as well (HDF5 2.x (in devel) shall get rid of autotools and be easier to install, I believe)
Explicit example of FA IO vs. deeper integration with CLOUDSC aiming at expanding CLOUDSC capabilities, or both.
Do we need another evaluation of the HDF5 store correctness.

mlange05 · 2025-09-04T05:58:12Z

@mlange05 @reuterbal Quick summary of where we are, feel free to further comment:

BR suggested (offline) that I should get rid of the quest for replacing Fiat files with Fiat lib, because it makes things complicated and the Atos CI runs fail due to linking errors. I am not fully convinced (perhaps we could link Fiat static to satisfy the GPU compiler needs), but certainly we have a decision to make.

I have a feeling that HDF5 should remain optional for CLOUDSC at compile time, but run-time is fine to me as well (HDF5 2.x (in devel) shall get rid of autotools and be easier to install, I believe)

Explicit example of FA IO vs. deeper integration with CLOUDSC aiming at expanding CLOUDSC capabilities, or both.

Do we need another evaluation of the HDF5 store correctness.

Agreed. Let's keep this a PR for one feature and only add what's needed for that.
Agreed, let's keep the compile-time option. I'd still prefer the actual use of it to mimic the binary I/O and use run-time flags for that.
Deep integration was always the aim, as we don't need a second unit test (F-API unit tests this already). Ideally it would behave like the binary I/O and either optionally or always replace this field F-API enabled drivers.
I don't think the in-situ test is the right way. I'd prefer to have a ctest that checks if the file was created to test the writes, and maybe let F-API-based drivers read input/reference from HDF5 to test reads (either optionally or always)?

reuterbal

Many thanks, overall this looks very nice. I agree with the API and integration points that @mlange05 raised and have also pointed out how to fix the fiat/field_api integration.

reuterbal · 2025-09-04T11:32:36Z

+    TYPE(CLOUDSC_STATE_TYPE)  ,INTENT(IN) :: TENDENCY_TMP
+    TYPE(CLOUDSC_STATE_TYPE)  ,INTENT(IN) :: TENDENCY_LOC
+
+    REAL(KIND=JPRB), POINTER    :: PAUX_PT(:,:,:)


I'm guessing this is required to have shape information embedded into the subroutine interface that allows Fortran to pick the correct variant for the field?

If the pointer was removed, something like

CALL WRITE_HDF5_PERRANK_DATA(PAUX%F_PT, PAUX_PVERVEL, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

would then have to have either a dimension keyword

CALL WRITE_HDF5_PERRANK_DATA(PAUX%F_PT, 3, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

or specify the actual implementation in the routine name, hypothetically paraphrasing:

CALL WRITE_HDF5_PERRANK_DATA_3RB(PAUX%F_PT, "cloudsc_data", "PT", HDFEXISTS=.TRUE.)

I might be misinterpreting this, but if the above is the case, then pointers with a certain dimension are an unintuitive way of specifying this information. @awnawab might have a suggestion here?

reuterbal · 2025-09-04T11:40:37Z

+set(CLOUDSC_FIAT_SOURCES
+    module/parkind1.F90
+    module/ec_pmon_mod.F90
+    module/oml_mod.F90
+    module/abor1.F90
+)


My suggestion would be to roll back this change and restore these files in the list above.

Suggested change

set(CLOUDSC_FIAT_SOURCES

module/parkind1.F90

module/ec_pmon_mod.F90

module/oml_mod.F90

module/abor1.F90

)

reuterbal · 2025-09-04T11:40:51Z

        SOURCES
            ${CLOUDSC_COMMON_SOURCES}
            ${COMMON_LIB_SOURCES}
+            $<$<NOT:$<BOOL:${HAVE_FIAT}>>:${CLOUDSC_FIAT_SOURCES}>


This can then also be removed

Suggested change

$<$<NOT:$<BOOL:${HAVE_FIAT}>>:${CLOUDSC_FIAT_SOURCES}>

reuterbal · 2025-09-04T11:51:30Z

      write(0,1003) NUMPROC,NUMOMP,NGPTOTG,NPROMA,NGPBLKS
    end if

+#ifdef FIELD_API_IO 


How about splitting the two into a write and read routine and calling them in this order from here?
That would work as a demonstration of how to use either mode individually.

For a full-on switch to HDF5 as inputs, we would need to replicate the inputs in a third format commited to the repository. I think that is a good idea but could also be tackled in a follow-on PR maybe?
The input reading is still very much set-up in a Serialbox first, otherwise use the legacy hdf5 read method. We could first remove that and full-on switch to HDF5, then add the field_api hdf5 reader.

awnawab · 2025-09-06T09:33:53Z

F-API develop-ecmwf rebased over main and now also contains ecmwf-ifs/field_api#108.

reuterbal · 2025-09-08T20:43:55Z

+            FIELD_API_ENABLE_IO={{value}}
+            FIELD_API_ENABLE_FIAT={{value}}
+            ENABLE_FIELD_API_ENABLE_IO={{value}}
+            ENABLE_MPI=ON


Is there really a need to forcefully enable MPI?

Is there really a need to forcefully enable MPI?

The way it is coded in FA at the moment, yes. I am working on it on a FA side.

Is there really a need to forcefully enable MPI?

@reuterbal @awnawab I've made MPI/MPL fully optional, what do you think about the current state?

That's great, many thanks! Looks very nice on the CLOUDSC side now, but I haven't look at Field-API. I'll let @awnawab judge the F-API angle ;-)

I think we can simplify it a little in F-API. My comments that follow are based on what I saw in napz-IO-mpi-independent-option, please disregard if you had plans to change that.

Right now the FIELD_API IO feature requires fiat and for fiat to be built with MPI. For the moment, let's keep these two assumptions, and if in the future they become limitations we can address them then.

The only MPI related information we need in the IO functionality is the MPI communicator. The current rank and total number of ranks can be queried from the communicator. So I would propose we simply make the MPI communicator an optional argument to any IO routine that needs it. If this argument is not present, then we can assume serial IO. We can print a message either way to NULOUT so the user isn't surprised.

This allows us to remove all the IO related options: IO_SERIAL, IO_MPI, IO_MPL. FIAT links to MPI publicly, so linking to FIAT means we can import the MPI runtime library too in F-API. Since that is currently a requirement for the IO feature, we don't even need to ifdef guard it.

Apologies if I've missed something, and we can also chat about it offline if you'd like 😄

I think you misunderstood me. Having fiat as a dependency when we're building against field api is perfectly valid. I just don't like having fiat always as a dependency on non-fapi targets.

It just means we increment the rank by 1 by default,

If it is not set in stone in MPL, then I would see it as unsafe to encode such an implicit assumption into non-MPL code.

Having 4 IO related options seems like overkill

I think there are 3: SERIAL, MPI, MPL. There need to be at least two specified at the compile time (serial and parallel), the third one merely allows to be independent of FIAT.

An IO option coupled with an MPI option should be enough., and allows us to be independent of FIAT.

Again, from a domain scientist point of view, there should be an option of storing the data in an organized manner that translates in an unique way to a computational mesh, so the h5 file can be e.g. viewed in an external viewer or auto-processed in Python for postprocessing. At least until we move on to Atlas.

I think you misunderstood me. Having fiat as a dependency when we're building against field api is perfectly valid. I just don't like having fiat always as a dependency on non-fapi targets.

My main point was to provide an MPI-independent setup, actually. You had a good point that this should be relaxed and indeed I see that for the sake of testing new compilers it is very handy to don't rely on the accompanying MPI.

reuterbal · 2025-09-11T08:03:57Z

    target_compile_options( ${TARGET}-${prec}-lib
        PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:SHELL:${CLOUDSC_CUDA_OPT_FLAGS} ${CLOUDSC_CUDA_FLAGS}>
    )
+    target_link_libraries(${TARGET}-${prec}-lib PRIVATE $<${HAVE_MPI}:MPI::MPI_C>)


This should be redundant to the addition above (if that is even needed)

reuterbal · 2025-09-11T08:09:27Z

                    DESCRIPTION "Support for task-level parallelism using MPI"
                    DEFAULT OFF
-                    REQUIRED_PACKAGES "MPI COMPONENTS Fortran" )
+                    REQUIRED_PACKAGES "MPI COMPONENTS Fortran C CXX" )


Still curious what makes this necessary?

As stated earlier, dependence of hdf5.h on mpi.h. I am not sure if CXX will be needed here, but it is hard to exclude at the moment.

reuterbal · 2025-09-11T08:13:52Z

+   if( HAVE_FIELD_API_ENABLE_IO_MPL )
+     list(APPEND CLOUDSC_DEFINITIONS FIELD_API_IO_MPL)
+   elseif( HAVE_FIELD_API_ENABLE_IO_MPI )
+     find_package(MPI REQUIRED )


I think it makes more sense to check here that the MPI feature is enabled. The bundle implicitly enforces this but a CMake-build without bundle would allow a halfway-house otherwise

Suggested change

find_package(MPI REQUIRED )

if( NOT HAVE_MPI )

ecbuild_error("Cannot enable FIELD_API_ENABLE_IO_MPI when feature MPI is disabled.")

endif()

Actually, at the moment it is activated in FieldAPI itself:
elseif(HAVE_IO_MPI)
ecbuild_enable_mpi( COMPONENTS FORTRAN REQUIRED)

so this section of cloudsc can be deleted indeed.
However, if we link to parallel version of HDF5, it is probably necessary anyway to activate MPI, even without --with-mpi in the bundle.

reuterbal · 2025-09-11T08:15:14Z

+  Field API is used in newer versions of the IFS. Optionally, this version also tests 
+  Field  API IO.  To test IO write/read feature,  add --with-field-api-io at the build stage.
+  Note that this enables MPI by default. The IO feature is embedded in the 
+  cloudsc-fortran Field API test.


This probably needs a small update later.

Co-authored-by: Balthasar Reuter <6384870+reuterbal@users.noreply.github.com>

…he latest ecbuild/eckit/atlas

…PI IO combinations

piotrows added 4 commits August 2, 2025 14:00

First draft of the cloudsc store using field_api hdf5io

1522dd7

Fiat is now optional and replaces duplicated files of cloudsc

a5f8798

HDF5 output test complete

12f9f89

Make Field API IO optional

224903b

piotrows force-pushed the napz-hdf5io-pr branch from d027734 to 224903b Compare August 2, 2025 12:00

Update documentation

064722b

piotrows changed the title ~~Napz hdf5io pr~~ New IO capabilities via FieldAPI IO Aug 2, 2025

Merge branch 'develop' into napz-hdf5io-pr

ab7f0f6

piotrows marked this pull request as ready for review August 2, 2025 12:21

piotrows requested review from reuterbal, wdeconinck and wertysas August 2, 2025 12:21

piotrows added 3 commits August 6, 2025 19:52

Restore compilation of gpu variant and develop-ecmwf branch of field_api

20ef684

Correct misplaced cmake entry

9eaec81

Bump down Loki version

4f1f7e5

piotrows marked this pull request as draft September 3, 2025 08:06

mlange05 reviewed Sep 3, 2025

View reviewed changes

reuterbal requested changes Sep 4, 2025

View reviewed changes

Merge branch 'develop' into napz-hdf5io-pr

088d7f2

piotrows added 14 commits September 6, 2025 15:11

Disable caching for CI

dc95167

Fix spacing

dd9ac5b

Disable Fiat build by default

3fb48c0

Build fiat when building field_api and remove spurious path

c007421

Remove spurious comment

5cbf8f9

Remove separation of Fiat sources

6a3dfc3

Do not link fiat to common

b7e11c7

Bump software versions

214e3b6

Do not link field_api to common

0a85172

enforce oml linking

570a2e0

Rearrange bundle so fckit is before field_api

eff8730

split test routine for fieldapi IO into read/write

4ccb643

Remove dependence on oml, wrong suggestion from CI?

d1ea252

Disable testing of fiat and field_api inside cloudsc

cb4441a

reuterbal reviewed Sep 8, 2025

View reviewed changes

piotrows added 2 commits September 10, 2025 23:51

Make MPI and MPL fully optional - cloudsc side

9a47bac

Switch to special FA branch to accomodate optional MPI in FA again

2c1cfaf

reuterbal requested changes Sep 11, 2025

View reviewed changes

piotrows and others added 4 commits September 11, 2025 11:00

Remove spurious comment

a113138

Co-authored-by: Balthasar Reuter <6384870+reuterbal@users.noreply.github.com>

Cleanup

a7fd474

FieldAPI IO update

d099df9

Update defines to execute the IO test

33bf481

piotrows mentioned this pull request Sep 30, 2025

HDF5 IO with Fiat and MPI independent option (over main) ecmwf-ifs/field_api#119

Merged

Zbigniew Piotrowski and others added 6 commits September 30, 2025 16:49

update atos CI with FA io option

d6f7445

update atos CI with FA io option

0a80a13

Update versions in bundle.yml to direct to the develop-ecmwf FA and t…

99329a2

…he latest ecbuild/eckit/atlas

Remove unused definitions and singal error on improper mpi and FieldA…

5b51681

…PI IO combinations

Bump software versions for Github CI

16eb274

Bump CI software versions

0034e29

-     find_package(MPI REQUIRED )
+     if( NOT HAVE_MPI )
+         ecbuild_error("Cannot enable FIELD_API_ENABLE_IO_MPI when feature MPI is disabled.")
+     endif()

Conversation

piotrows commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reuterbal commented Jul 25, 2025

Uh oh!

piotrows commented Jul 25, 2025

Uh oh!

awnawab commented Jul 25, 2025

Uh oh!

awnawab commented Jul 25, 2025

Uh oh!

reuterbal commented Jul 29, 2025

Uh oh!

piotrows commented Aug 2, 2025

Uh oh!

mlange05 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrows Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrows commented Sep 3, 2025

Uh oh!

piotrows commented Sep 3, 2025

Uh oh!

piotrows commented Sep 3, 2025

Uh oh!

mlange05 commented Sep 4, 2025

Uh oh!

reuterbal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awnawab commented Sep 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrows commented Jun 14, 2025 •

edited

Loading

piotrows Sep 3, 2025 •

edited

Loading

piotrows Sep 12, 2025 •

edited

Loading