diff --git a/README.md b/README.md index b28c771..9abc972 100644 --- a/README.md +++ b/README.md @@ -165,9 +165,33 @@ The precise run configurations should be taken from the data spreadsheets that l ### Correctness testing -Correctness can be verified using the [validate.py](./validate.py) script. +To check that the benchmark software is working correctly, a matrix +comparison job must be run on 1 GPU and 8 GPUs with no collocation (`qmode=1`), 10000 +total dofs (`ndofs_global`), and in both cases should produce the same +output `ynorm` and `znorm` (within numerical roundoff precision). +For a problem with 10000 dofs, the numerical value of the `ynorm` and +`znorm` should be 1.141577508 to 9 decimal places. The console output +and the JSON file should be reported. + +The same correctness test should be performed with the CG operator on +1 and 8 GPUS: + +- Correctness comparison with matrix result: `bench_dolfinx --mat_comp --cg + --ndofs_global=10000 --degree=3 --json mat_comp_cg.json` + +In this case, `ynorm` and `znorm` should be 167.5924472. Console output +and JSON should be reported. -The validation script should be run as follows and produce output similar to the following: +For both these tests, the [validate.py](./validate.py) script can be used to check +the values of `ynorm` and `znorm` meet the accuracy requirements (see description +below of how to use the script). + +### Benchmark run validation + +Benchmark runs can be verified using the [validate.py](./validate.py) script. + +The validation script takes as input the JSON and console output from the +benchmark code. For example: ``` ./validate output.json output.out @@ -178,49 +202,20 @@ The validation script should be run as follows and produce output similar to the nreps : 1000 scalar size : 64 - MAT COMP performance: 0.2957402083152624 Gdofs/s + Stencil performance: 0.2957402083152624 Gdofs/s Validation: PASSED ``` -Sanity check: The matrix comparison must be run on 1 GPU and 8 GPUs with no collocation (`qmode=1`), 10000 -total dofs (`ndofs_global`), and in both cases should produce the same -output `ynorm` and `znorm` (within numerical roundoff precision). -For a problem with 10000 dofs, the numerical value of the `ynorm` and -`znorm` should be 1.141577508 to 9 decimal places. The console output -and the JSON file should be reported. - -For the acceptance tests, with `--qmode=0`, all GPU-based computations must -yield the same answer as a CPU-based variant, subject to numerical -roundoffs. - -The same correctness test should be performed with the CG operator on -1 and 8 GPUS: - -- Correctness comparison with matrix result: `bench_dolfinx --mat_comp --cg - --ndofs_global=10000 --degree=3 --json mat_comp_cg.json` - -In this case, `ynorm` and `znorm` should be 167.5924472. Console output -and JSON should be reported. - - ### Performance results -In addition to testing for correctness, `validate.py` will also print the Computation Rate, which is the sole FoM for the benchmark. +In addition to validating the benchmark run, `validate.py` will also +print the Computation Rate, which is the sole FoM for the benchmark. The Computation Rate printed by `validate.py` corresponds to the total throughput in billion degrees of freedom per second (Gdofs/s). - - ### Reference data #### LUMI-G (MI250x): Throughput in GDoFs/s for 2-64 nodes (8-256 GPUs) @@ -295,6 +290,7 @@ The following changes to this document have been made since initial release: |