Skip to content

libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507

Open
Bojun-Seo wants to merge 2 commits into
iovisor:masterfrom
Bojun-Seo:datastructure
Open

libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507
Bojun-Seo wants to merge 2 commits into
iovisor:masterfrom
Bojun-Seo:datastructure

Conversation

@Bojun-Seo
Copy link
Copy Markdown
Contributor

Description

Add a new datastructure_helpers library (datastructure_helpers.h /
datastructure_helpers.c) to libbpf-tools/ that provides two generic,
reusable data structures for user-space libbpf tool code:

  • ds_vec — a realloc-based dynamic array storing elements inline
    (amortised O(1) push, O(1) indexed access, built-in qsort wrapper).
  • ds_hashmap — a separate-chaining hash map storing keys and values
    inline after each node header, using FNV-1a 64-bit hashing and a 2×
    bucket-array growth policy (expected O(1) insert / lookup / delete).

The second commit migrates biotop.c to use these helpers in place of the
ad-hoc struct vector / grow_vector / free_vector code that existed
locally in that file. As a side effect, search_disk_name() is upgraded from
an O(n) linear scan to an O(1) hashmap lookup.

A unit-test binary (libbpf-tools/tests/test_datastructure_helpers) with its
own Makefile is included and covers the full public API of both structures.

Why this approach

Several libbpf-tools already duplicate small dynamic-array or
map-lookup patterns (biotop's disk list being one example). A shared
helper library avoids this duplication without pulling in a heavy
external dependency.

Why a new file rather than extending map_helpers?
map_helpers is specifically for BPF map I/O. Mixing general-purpose
user-space data structures there would blur its purpose.

Why FNV-1a rather than the existing libbpf hashmap?
libbpf's internal hashmap API is not part of its stable public surface
and is not designed for direct use by tools. ds_hashmap is a thin,
self-contained alternative with an API tailored to the libbpf-tools coding
style (pass-by-pointer, error returns as negative errno).

Why separate-chaining rather than open addressing?
Separate chaining keeps insertion O(1) amortised without the tombstone
complexity of open addressing, which matters for the delete + re-insert
patterns some tools may need.


Checklist

  • Commit prefix matches changed area (libbpf-tools:, libbpf-tools/biotop:)
  • Commit body explains why this change is needed

Bojun-Seo added 2 commits May 4, 2026 15:12
Add datastructure_helpers.h and datastructure_helpers.c implementing
two general-purpose data structures for shared use across libbpf-tools:

- struct vec: a realloc-based dynamic array with amortized O(1) push_back

- struct hashmap: a separate-chaining hash map.  Each bucket holds a
  singly-linked list of nodes with key and value stored inline.  Uses
  FNV-1a hashing.  The bucket array doubles when the average chain
  length exceeds 2, keeping expected lookup O(1).

FNV-1a is released into the public domain under CC0 1.0.

Unit tests are added under libbpf-tools/tests/ along with a standalone
Makefile so the tests can be built and run without BPF or kernel support:

  make -C libbpf-tools/tests test

Why:
Several libbpf-tools have duplicated their own ad-hoc dynamic-array or
lookup implementations. A shared helper avoids this duplication and
gives future tools a standard building block without pulling in an
external dependence
Replace the ad-hoc struct vector / grow_vector / free_vector
implementation with ds_vec and ds_hashmap from datastructure_helpers.

- struct disk entries are stored in a ds_vec (parse_disk_stat)
- a ds_hashmap keyed by (major, minor) is populated at the same time,
  turning search_disk_name from an O(n) linear scan into an O(1) lookup
- datastructure_helpers.o is added to COMMON_OBJ in the Makefile so
  all tools can link against it

No functional change to the tool's output or behaviour.

Why:
The ad-hoc vector in biotop.c is functionally equivalent to ds_vec but
exists only in that one file, making it a maintenance liability. Removing
it in favour of the shared library reduces duplication. The O(n) scan in
search_disk_name is also an unnecessary cost on systems with many block
devices; a keyed hashmap lookup is a straightforward improvement.
@Bojun-Seo
Copy link
Copy Markdown
Contributor Author

Please refer the following ADR I and my AI wrote.

Keep Shared Data Structures in C (In-Tree) for libbpf-tools

  • Scope: libbpf-tools/datastructure_helpers.{h,c} and users (for example, biotop)

Context

A new shared helper module was introduced for reusable data structures in libbpf-tools:

  • ds_vec: generic dynamic array
  • ds_hashmap: generic hash map (separate chaining)

This ADR records why I intentionally chose in-tree C data structures and how we will keep quality at an acceptable maintenance level.

Decision

Keep and evolve shared data structures as in-tree C code (ds_vec, ds_hashmap) instead of adding an external C library dependency or rewriting userspace tools in C++.

Why This Decision

1. Fits the existing project model

libbpf-tools userspace code is C-based. Keeping shared utilities in C avoids a mixed-language toolchain and keeps build, debugging, packaging, and review flows consistent with the rest of the directory.

2. Dependency and packaging discipline

In-tree C helpers introduce no new external runtime/build dependency. This helps distro packaging, minimal build environments, and reproducible CI behavior.

3. License and provenance clarity

The helpers are under project-compatible licensing and avoid introducing third-party code provenance and update policy overhead in a fast-moving systems repository.

4. Operational simplicity for low-level tools

These tools run in constrained and diverse environments. A small C helper with explicit memory behavior is easier to reason about in this context than adding a C++ runtime surface or third-party dependency lifecycle.

5. API control for this codebase

The helper API is intentionally small and explicit (ds_-prefixed symbols). We can tailor behavior and conventions to libbpf-tools needs without inheriting unrelated features or semantics from broader-purpose libraries.

Addressing the Quality Concern Directly

The concern is valid: custom containers can regress in correctness if not constrained.

To reduce that risk, let's apply the following controls:

  1. Narrow API surface
  • Keep only operations currently needed by libbpf-tools.
  • Avoid feature creep (iterators with hidden ownership, allocator plugins, etc.).
  1. Defensive implementation choices already in place
  • ds_ symbol prefixing to avoid collisions with libbpf symbols.
  • Hash map node layout with alignment-safe value placement (key_stride separate from key_size).
  • Clear key-usage contract in docs: struct keys must be zero-initialized when padding is possible.
  1. Unit-test coverage in-tree
  • Behavioral coverage for insert/find/update/delete/iteration/clear/resize.
  • Coverage for less obvious correctness cases:
    • char-key + 64-bit value alignment expectations,
    • struct-key padding behavior and correct usage pattern.
  1. Keep implementation understandable
  • Prefer straightforward algorithms over clever abstractions.
  • Maintain comments where misuse is likely (key padding, ownership, expected complexity).

Alternatives Considered

A) External C container library (for example, uthash/c-vector style)

Pros:

  • Mature and widely used.
  • Less custom code to maintain.

Cons:

  • Adds third-party dependency/provenance and update policy burden.
  • API/behavior may not align exactly with libbpf-tools requirements.
  • Extra maintenance cost for vendoring, sync, and compatibility handling.

Decision: not selected.

B) C++ standard library containers (std::vector, std::unordered_map)

Pros:

  • Mature implementations and familiar abstractions.

Cons:

  • Introduces mixed-language build/runtime concerns in a C-focused directory.
  • Changes toolchain assumptions and potential ABI/runtime expectations.
  • Larger conceptual and operational delta than required for this scope.

Decision: not selected.

Consequences

Positive:

  • No new external dependence.
  • Consistent C-only workflow in libbpf-tools.
  • Reusable shared helper across tools.

Negative:

  • We own correctness and maintenance of the codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant