Feature metadata serialization#305
Conversation
Added code to add kv-tags at scale. Also added code to get and verify tags at scale. The later test code will be used to verify the metadata checkpointing after a server restart.
…nd from file instead of using a buffer Also added version control for the bulki-based checkpointing.
Added comment, fixed formatting suggested by GitHub clang-format style checker, brought back PDC_TIMING log from the stable version.
…into feature_metadata-serialization
…into feature_metadata-serialization
Benchmark Summary (10000 objects, 100 tags)Serial
Parallel (
|
| Metric | Previous | New (BUKLI) |
|---|---|---|
| Close Time (s) | 0.85 | 0.45 |
| Restart Time (s) | 0.77 | 0.05 |
|
@biqar a couple of things:
|
|
Todos:
|
Strong Scaling Performance ComparisonFixed work to 100K objects and 100 tags creation and changing the number of parallel servers. Reporting min/max/avg of Checkpoint Time (seconds)
Server Close Time (seconds)
Server Restart Time (seconds)
Observations:
Weak Scaling Performance ComparisonFixed the amount of workload per-worker. Reporting min/max/avg of Checkpoint Time (seconds)
Close Time (seconds)
Restart Time (seconds)
Observations
Issue: Unexpected Checkpoint Overhead After Restart with BULKI SerializerDescriptionThe BULKI Serializer integration introduces unexpected performance overhead in the second checkpoint cycle. Since no new data is added after the restart, both checkpoint times should be roughly equivalent. However, we observe a consistent 29–52% increase in the second checkpoint time compared to the first. Workflow
Expected BehaviorSince no new data is added after the restart, the 1st and 2nd checkpoint times should be approximately equal (as showing in the existing binary serializer based checkpoint time comparison bellow). Observed BehaviorThe 2nd checkpoint time is consistently 29–52% more expensive than the 1st under the BULKI Serializer. BULKI Serializer: Checkpoint Time Comparison (seconds)
Binary Serializer: Checkpoint Time Comparison (seconds)
|
|
@biqar could you time each BULKI operation so we can see what's taking so long on the strong scaling? For instance if the |
Related Issues / Pull Requests
Related Issue: 282
Description
Integrate BULKI serializer for pdc server checkpointing.
What changes are proposed in this pull request?
Checklist: