libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507
libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507Bojun-Seo wants to merge 2 commits into
Conversation
Add datastructure_helpers.h and datastructure_helpers.c implementing two general-purpose data structures for shared use across libbpf-tools: - struct vec: a realloc-based dynamic array with amortized O(1) push_back - struct hashmap: a separate-chaining hash map. Each bucket holds a singly-linked list of nodes with key and value stored inline. Uses FNV-1a hashing. The bucket array doubles when the average chain length exceeds 2, keeping expected lookup O(1). FNV-1a is released into the public domain under CC0 1.0. Unit tests are added under libbpf-tools/tests/ along with a standalone Makefile so the tests can be built and run without BPF or kernel support: make -C libbpf-tools/tests test Why: Several libbpf-tools have duplicated their own ad-hoc dynamic-array or lookup implementations. A shared helper avoids this duplication and gives future tools a standard building block without pulling in an external dependence
Replace the ad-hoc struct vector / grow_vector / free_vector implementation with ds_vec and ds_hashmap from datastructure_helpers. - struct disk entries are stored in a ds_vec (parse_disk_stat) - a ds_hashmap keyed by (major, minor) is populated at the same time, turning search_disk_name from an O(n) linear scan into an O(1) lookup - datastructure_helpers.o is added to COMMON_OBJ in the Makefile so all tools can link against it No functional change to the tool's output or behaviour. Why: The ad-hoc vector in biotop.c is functionally equivalent to ds_vec but exists only in that one file, making it a maintenance liability. Removing it in favour of the shared library reduces duplication. The O(n) scan in search_disk_name is also an unnecessary cost on systems with many block devices; a keyed hashmap lookup is a straightforward improvement.
|
Please refer the following ADR I and my AI wrote. Keep Shared Data Structures in C (In-Tree) for libbpf-tools
ContextA new shared helper module was introduced for reusable data structures in
This ADR records why I intentionally chose in-tree C data structures and how we will keep quality at an acceptable maintenance level. DecisionKeep and evolve shared data structures as in-tree C code ( Why This Decision1. Fits the existing project model
2. Dependency and packaging disciplineIn-tree C helpers introduce no new external runtime/build dependency. This helps distro packaging, minimal build environments, and reproducible CI behavior. 3. License and provenance clarityThe helpers are under project-compatible licensing and avoid introducing third-party code provenance and update policy overhead in a fast-moving systems repository. 4. Operational simplicity for low-level toolsThese tools run in constrained and diverse environments. A small C helper with explicit memory behavior is easier to reason about in this context than adding a C++ runtime surface or third-party dependency lifecycle. 5. API control for this codebaseThe helper API is intentionally small and explicit ( Addressing the Quality Concern DirectlyThe concern is valid: custom containers can regress in correctness if not constrained. To reduce that risk, let's apply the following controls:
Alternatives ConsideredA) External C container library (for example, uthash/c-vector style)Pros:
Cons:
Decision: not selected. B) C++ standard library containers (
|
Description
Add a new
datastructure_helperslibrary (datastructure_helpers.h/datastructure_helpers.c) tolibbpf-tools/that provides two generic,reusable data structures for user-space libbpf tool code:
ds_vec— arealloc-based dynamic array storing elements inline(amortised O(1) push, O(1) indexed access, built-in
qsortwrapper).ds_hashmap— a separate-chaining hash map storing keys and valuesinline after each node header, using FNV-1a 64-bit hashing and a 2×
bucket-array growth policy (expected O(1) insert / lookup / delete).
The second commit migrates
biotop.cto use these helpers in place of thead-hoc
struct vector/grow_vector/free_vectorcode that existedlocally in that file. As a side effect,
search_disk_name()is upgraded froman O(n) linear scan to an O(1) hashmap lookup.
A unit-test binary (
libbpf-tools/tests/test_datastructure_helpers) with itsown
Makefileis included and covers the full public API of both structures.Why this approach
Several libbpf-tools already duplicate small dynamic-array or
map-lookup patterns (biotop's disk list being one example). A shared
helper library avoids this duplication without pulling in a heavy
external dependency.
Why a new file rather than extending
map_helpers?map_helpersis specifically for BPF map I/O. Mixing general-purposeuser-space data structures there would blur its purpose.
Why FNV-1a rather than the existing libbpf hashmap?
libbpf's internalhashmapAPI is not part of its stable public surfaceand is not designed for direct use by tools.
ds_hashmapis a thin,self-contained alternative with an API tailored to the libbpf-tools coding
style (pass-by-pointer, error returns as negative errno).
Why separate-chaining rather than open addressing?
Separate chaining keeps insertion O(1) amortised without the tombstone
complexity of open addressing, which matters for the delete + re-insert
patterns some tools may need.
Checklist
libbpf-tools:,libbpf-tools/biotop:)