Skip to content

Feature metadata serialization#305

Open
biqar wants to merge 19 commits into
hpc-io:developfrom
biqar:feature_metadata-serialization
Open

Feature metadata serialization#305
biqar wants to merge 19 commits into
hpc-io:developfrom
biqar:feature_metadata-serialization

Conversation

@biqar
Copy link
Copy Markdown

@biqar biqar commented Mar 17, 2026

Related Issues / Pull Requests

Related Issue: 282

Description

Integrate BULKI serializer for pdc server checkpointing.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Added code to add kv-tags at scale. Also added code to get and verify tags at scale. The later test code will be used to verify the metadata checkpointing after a server restart.
…nd from file instead of using a buffer

Also added version control for the bulki-based checkpointing.
@biqar biqar requested a review from a team as a code owner March 17, 2026 19:01
@biqar
Copy link
Copy Markdown
Author

biqar commented Mar 26, 2026

Benchmark Summary (10000 objects, 100 tags)

Serial

Metric Previous New (BUKLI)
Close Time (s) 1.78 5.34
Restart Time (s) 2.09 0.21

Parallel (-N 1 -n 5 -c 2)

Metric Previous New (BUKLI)
Close Time (s) 0.85 0.45
Restart Time (s) 0.77 0.05

@biqar
Copy link
Copy Markdown
Author

biqar commented Mar 26, 2026

@houjun and @jeanbez Please let me know the further steps to merge the PR.

@biqar biqar mentioned this pull request Mar 26, 2026
@houjun
Copy link
Copy Markdown
Member

houjun commented Mar 30, 2026

@biqar a couple of things:

  • Add your new checkpoint/restart test to run in cmake tests, the change in src/tests/CMakeLists.txt only compiles them
  • for benchmarking results, can you run with more variations in the number of obj/kvtag, and for each variation, run it at least 5 times and report the min, avg, and max?

@biqar
Copy link
Copy Markdown
Author

biqar commented Apr 2, 2026

Todos:

  1. more rigorous testing
a. Strong scaling (fix the total amount of work and then change the number of workers)
Fix work to 100K/100 and then changing the number of parallel servers:
-n 2 -c 2
-n 4 -c 2
-n 8 -c 2
-n 16 -c 2

b. Weak scaling (fix the amount of workload per-worker)
10K/100 -n 2 -c 2 will implies in respect to checkpointing (10k + 10k*100) = 1010000
20K/100 -n 4 -c 2 will implies in respect to checkpointing (20k + 20k*100) = 2020000
40K/100 -n 8 -c 2 will implies in respect to checkpointing (40k + 40k*100) = 4040000
80K/100 -n 16 -c 2 will implies in respect to checkpointing (80k + 80k*100) = 8080000
  1. Add new checkpoint/restart test to run in cmake tests. Create a new version of run_checkpoint_restart_test.sh to accept the program parameters. Then add another version for mpi testing.

@biqar
Copy link
Copy Markdown
Author

biqar commented Apr 30, 2026

Strong Scaling Performance Comparison

Fixed work to 100K objects and 100 tags creation and changing the number of parallel servers. Reporting min/max/avg of checkpoint time, server close time, and server restart time.

Checkpoint Time (seconds)

Servers Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 1.146857 1.208034 1.239199 359.340963 360.498139 361.789536
-n 4 -c 2 0.689133 0.706156 0.735507 62.639350 62.806907 63.095135
-n 8 -c 2 0.405211 0.406731 0.409733 11.268601 11.297985 11.330935
-n 16 -c 2 0.257778 0.266691 0.278840 3.188340 3.225553 3.251896

Server Close Time (seconds)

Servers Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 1.259604 1.315604 1.348103 359.350626 360.510299 361.805794
-n 4 -c 2 0.801891 0.823133 0.859166 62.651894 62.817672 63.105154
-n 8 -c 2 0.550356 0.558271 0.564412 11.284968 11.311051 11.347749
-n 16 -c 2 0.478174 0.488428 0.501940 3.202700 3.235686 3.258656

Server Restart Time (seconds)

Servers Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 0.265927 0.266252 0.266456 0.566464 0.594174 0.615873
-n 4 -c 2 0.137422 0.138248 0.139483 0.269829 0.283413 0.290344
-n 8 -c 2 0.070470 0.071441 0.072440 0.142268 0.144746 0.148689
-n 16 -c 2 0.040891 0.040909 0.040929 0.078482 0.079175 0.079950

Observations:

  1. Checkpoint creation time are dramatically higher with the BULKI serializer across all server counts, becoming roughly 300× slower at -n 2 and still ~12× slower at -n 16. This is a significant degradation.
  2. Restart times show a more modest degradation, roughly 2× slower across all configurations in the new BULKI deserializer.
  3. Both versions show good strong scaling behavior (times decrease as server count increases), but the BULKI's absolute times are far higher.

Weak Scaling Performance Comparison

Fixed the amount of workload per-worker. Reporting min/max/avg of checkpoint time, server close time, and server restart time.

Checkpoint Time (seconds)

Servers (Workload) Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 (10K/100) 0.155281 0.159597 0.162538 1.807282 1.835119 1.886592
-n 4 -c 2 (20K/100) 0.164595 0.166748 0.170526 1.813283 1.828611 1.848121
-n 8 -c 2 (40K/100) 0.183058 0.185744 0.188479 1.818360 1.830279 1.846697
-n 16 -c 2 (80K/100) 0.209380 0.217758 0.228390 2.080548 2.100604 2.138375

Close Time (seconds)

Servers (Workload) Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 (10K/100) 0.259814 0.263879 0.267591 1.812648 1.844281 1.902948
-n 4 -c 2 (20K/100) 0.283108 0.285033 0.286081 1.820875 1.835485 1.853913
-n 8 -c 2 (40K/100) 0.329559 0.337133 0.342381 1.825027 1.840674 1.857495
-n 16 -c 2 (80K/100) 0.434089 0.443785 0.450590 2.092237 2.114594 2.151783

Restart Time (seconds)

Servers (Workload) Prev Min Prev Avg Prev Max New Min New Avg New Max
-n 2 -c 2 (10K/100) 0.029562 0.030135 0.030464 0.059689 0.059926 0.060158
-n 4 -c 2 (20K/100) 0.029578 0.030246 0.030927 0.056596 0.057787 0.059115
-n 8 -c 2 (40K/100) 0.029415 0.030096 0.030845 0.056701 0.057546 0.058305
-n 16 -c 2 (80K/100) 0.032854 0.033069 0.033227 0.064084 0.065558 0.066550

Observations

  1. BULKI's checkpoint creation times are about ~11× slower.
  2. Both versions shows excellent weak scaling (fairly flat across n2–n8, with a slight uptick at n16).
  3. Restart times degrades by roughly 2×.

Issue: Unexpected Checkpoint Overhead After Restart with BULKI Serializer

Description

The BULKI Serializer integration introduces unexpected performance overhead in the second checkpoint cycle. Since no new data is added after the restart, both checkpoint times should be roughly equivalent. However, we observe a consistent 29–52% increase in the second checkpoint time compared to the first.

Workflow

  1. Start server and create 100K objects with 100 tags each
  2. Stop the server → record 1st checkpoint time
  3. Restart the server
  4. Read and verify all objects/tags
  5. Stop the server → record 2nd checkpoint time

Expected Behavior

Since no new data is added after the restart, the 1st and 2nd checkpoint times should be approximately equal (as showing in the existing binary serializer based checkpoint time comparison bellow).

Observed Behavior

The 2nd checkpoint time is consistently 29–52% more expensive than the 1st under the BULKI Serializer.

BULKI Serializer: Checkpoint Time Comparison (seconds)

Servers 1st Checkpoint (Avg) 2nd Checkpoint (Avg) Overhead (%)
-n 2 -c 2 360.498139 549.173542 +52.34%
-n 4 -c 2 62.806907 81.879827 +30.37%
-n 8 -c 2 11.297985 15.036968 +33.09%
-n 16 -c 2 3.225553 4.169183 +29.26%

Binary Serializer: Checkpoint Time Comparison (seconds)

Servers 1st Checkpoint (Avg) 2nd Checkpoint (Avg) Overhead (%)
-n 2 -c 2 1.208034 1.423412 +17.83%
-n 4 -c 2 0.706156 0.695221 -1.55%
-n 8 -c 2 0.406731 0.410047 +0.82%
-n 16 -c 2 0.266691 0.274068 +2.77%

@TheAssembler1
Copy link
Copy Markdown
Collaborator

@biqar could you time each BULKI operation so we can see what's taking so long on the strong scaling? For instance if the BULKI_ENTITY_append_BULKI operation is taking a long time we can look at preallocating the array with a known capacity rather than starting from empty_BULKI_Array_Entity and expanding on every append.

@jeanbez jeanbez added the type: enhancement New feature or request label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants