Skip to content

Commit 621dfd5

Browse files
committed
libibverbs: Introduce Completion Counters verbs
Extend verbs interface to support Completion Counters that can be seen as a light-weight alternative to polling CQ. A completion counter object separately counts successful and error completions, can be attached to multiple QPs and be configured to count completions of a subset of operation types. This is especially useful for batch or credit based workloads running on accelerators but can serve many other types of applications as well. Expose supported number of completion counters through query device extended verb. Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
1 parent 8b9cdb7 commit 621dfd5

6 files changed

Lines changed: 492 additions & 0 deletions

File tree

libibverbs/examples/devinfo.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -585,6 +585,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
585585
printf("\tmax_srq_sge:\t\t\t%d\n", device_attr.orig_attr.max_srq_sge);
586586
}
587587
printf("\tmax_pkeys:\t\t\t%d\n", device_attr.orig_attr.max_pkeys);
588+
printf("\tmax_comp_cntr:\t\t\t\t%d\n", device_attr.max_comp_cntr);
588589
printf("\tlocal_ca_ack_delay:\t\t%d\n", device_attr.orig_attr.local_ca_ack_delay);
589590

590591
print_odp_caps(&device_attr);

libibverbs/man/CMakeLists.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ rdma_man_pages(
1414
ibv_create_ah.3
1515
ibv_create_ah_from_wc.3
1616
ibv_create_comp_channel.3
17+
ibv_create_comp_cntr.3.md
1718
ibv_create_counters.3.md
19+
ibv_qp_attach_comp_cntr.3.md
1820
ibv_create_cq.3
1921
ibv_create_cq_ex.3
2022
ibv_modify_cq.3
@@ -98,6 +100,13 @@ rdma_alias_man_pages(
98100
ibv_create_ah.3 ibv_destroy_ah.3
99101
ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3
100102
ibv_create_comp_channel.3 ibv_destroy_comp_channel.3
103+
ibv_create_comp_cntr.3 ibv_destroy_comp_cntr.3
104+
ibv_create_comp_cntr.3 ibv_set_comp_cntr.3
105+
ibv_create_comp_cntr.3 ibv_set_err_comp_cntr.3
106+
ibv_create_comp_cntr.3 ibv_inc_comp_cntr.3
107+
ibv_create_comp_cntr.3 ibv_inc_err_comp_cntr.3
108+
ibv_create_comp_cntr.3 ibv_read_comp_cntr.3
109+
ibv_create_comp_cntr.3 ibv_read_err_comp_cntr.3
101110
ibv_create_counters.3 ibv_destroy_counters.3
102111
ibv_create_cq.3 ibv_destroy_cq.3
103112
ibv_create_flow.3 ibv_destroy_flow.3
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
date: 2026-02-09
3+
footer: libibverbs
4+
header: "Libibverbs Programmer's Manual"
5+
layout: page
6+
license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md'
7+
section: 3
8+
title: ibv_create_comp_cntr
9+
tagline: Verbs
10+
---
11+
12+
# NAME
13+
14+
**ibv_create_comp_cntr**, **ibv_destroy_comp_cntr** - Create or destroy a
15+
completion counter
16+
17+
**ibv_set_comp_cntr**, **ibv_set_err_comp_cntr** - Set the value of a
18+
completion or error counter
19+
20+
**ibv_inc_comp_cntr**, **ibv_inc_err_comp_cntr** - Increment a completion or
21+
error counter
22+
23+
**ibv_read_comp_cntr**, **ibv_read_err_comp_cntr** - Read the value of a
24+
completion or error counter
25+
26+
# SYNOPSIS
27+
28+
```c
29+
#include <infiniband/verbs.h>
30+
31+
struct ibv_comp_cntr *ibv_create_comp_cntr(struct ibv_context *context,
32+
struct ibv_comp_cntr_init_attr *cc_attr);
33+
34+
int ibv_destroy_comp_cntr(struct ibv_comp_cntr *comp_cntr);
35+
36+
int ibv_set_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t value);
37+
int ibv_set_err_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t value);
38+
int ibv_inc_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t amount);
39+
int ibv_inc_err_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t amount);
40+
int ibv_read_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t *value);
41+
int ibv_read_err_comp_cntr(struct ibv_comp_cntr *comp_cntr, uint64_t *value);
42+
```
43+
44+
# DESCRIPTION
45+
46+
Completion counters provide a lightweight completion mechanism as an
47+
alternative or extension to completion queues (CQs). Rather than generating
48+
individual completion queue entries, a completion counter tracks the aggregate
49+
number of completed operations. This makes them well suited for applications
50+
that need to know how many requests have completed without requiring
51+
per-request details, such as credit based flow control or tracking responses
52+
from remote peers.
53+
54+
Each completion counter maintains two distinct 64-bit values: a completion
55+
count that is incremented on successful completions, and an error count that
56+
is incremented when operations complete in error.
57+
58+
**ibv_create_comp_cntr**() allocates a new completion counter for the RDMA
59+
device context *context*. The properties of the counter are defined by
60+
*cc_attr*. The maximum number of completion counters a device supports is
61+
reported by the *max_comp_cntr* field of **ibv_device_attr_ex**.
62+
63+
**ibv_destroy_comp_cntr**() releases all resources associated with the
64+
completion counter *comp_cntr*. The counter must not be attached to any QP
65+
when destroyed.
66+
67+
**ibv_set_comp_cntr**() sets the completion count of *comp_cntr* to *value*.
68+
69+
**ibv_set_err_comp_cntr**() sets the error count of *comp_cntr* to *value*.
70+
71+
**ibv_inc_comp_cntr**() increments the completion count of *comp_cntr* by
72+
*amount*.
73+
74+
**ibv_inc_err_comp_cntr**() increments the error count of *comp_cntr* by
75+
*amount*.
76+
77+
**ibv_read_comp_cntr**() reads the current completion count of *comp_cntr*
78+
into *value*.
79+
80+
**ibv_read_err_comp_cntr**() reads the current error count of *comp_cntr*
81+
into *value*.
82+
83+
## External memory
84+
85+
By default, the memory backing the counter values is allocated internally.
86+
When the **IBV_COMP_CNTR_INIT_WITH_EXTERNAL_MEM** flag is set in
87+
*ibv_comp_cntr_init_attr.flags*, the application provides its own memory for
88+
the completion and error counts via the *comp_cntr_ext_mem* and
89+
*err_cntr_ext_mem* fields. The external memory is described by an
90+
**ibv_memory_location** structure which supports two modes: a virtual address
91+
(**IBV_MEMORY_LOCATION_VA**), where the application supplies a direct pointer, or
92+
a DMA-BUF reference (**IBV_MEMORY_LOCATION_DMABUF**), where the application
93+
supplies a file descriptor and offset into an exported DMA-BUF. When using
94+
DMA-BUF, the *ptr* field may also be set to provide a process-accessible
95+
mapping of the memory, which may enable more efficient counter reads. Using
96+
external memory allows the counter values to
97+
reside in application-managed buffers or in memory exported through DMA-BUF,
98+
enabling zero-copy observation of completion progress by co-located processes
99+
or devices.
100+
101+
# ARGUMENTS
102+
103+
## ibv_comp_cntr
104+
105+
```c
106+
struct ibv_comp_cntr {
107+
struct ibv_context *context;
108+
uint32_t handle;
109+
uint64_t comp_count_max_value;
110+
uint64_t err_count_max_value;
111+
};
112+
```
113+
114+
*context*
115+
: Device context associated with the completion counter.
116+
117+
*handle*
118+
: Kernel object handle for the completion counter.
119+
120+
*comp_count_max_value*
121+
: The maximum value the completion count can hold. A subsequent
122+
increment that would exceed this value wraps the counter to zero.
123+
124+
*err_count_max_value*
125+
: The maximum value the error count can hold. A subsequent increment
126+
that would exceed this value wraps the counter to zero.
127+
128+
## ibv_comp_cntr_init_attr
129+
130+
```c
131+
struct ibv_comp_cntr_init_attr {
132+
uint32_t comp_mask;
133+
uint32_t flags;
134+
struct ibv_memory_location comp_cntr_ext_mem;
135+
struct ibv_memory_location err_cntr_ext_mem;
136+
};
137+
```
138+
139+
*comp_mask*
140+
: Bitmask specifying what fields in the structure are valid.
141+
142+
*flags*
143+
: Creation flags. The following flags are supported:
144+
145+
**IBV_COMP_CNTR_INIT_WITH_EXTERNAL_MEM** - Use application-provided
146+
memory for the counter values, as specified by *comp_cntr_ext_mem*
147+
and *err_cntr_ext_mem*.
148+
149+
*comp_cntr_ext_mem*
150+
: Memory location for the completion count when using external memory.
151+
152+
*err_cntr_ext_mem*
153+
: Memory location for the error count when using external memory.
154+
155+
## ibv_memory_location
156+
157+
```c
158+
enum ibv_memory_location_type {
159+
IBV_MEMORY_LOCATION_VA,
160+
IBV_MEMORY_LOCATION_DMABUF,
161+
};
162+
163+
struct ibv_memory_location {
164+
uint8_t *ptr;
165+
struct {
166+
uint64_t offset;
167+
int32_t fd;
168+
uint32_t reserved;
169+
} dmabuf;
170+
uint8_t type;
171+
uint8_t reserved[7];
172+
};
173+
```
174+
175+
*type*
176+
: The type of memory location. **IBV_MEMORY_LOCATION_VA** for a virtual
177+
address, or **IBV_MEMORY_LOCATION_DMABUF** for a DMA-BUF reference.
178+
179+
*ptr*
180+
: Virtual address pointer. Required when type is
181+
**IBV_MEMORY_LOCATION_VA**. When type is
182+
**IBV_MEMORY_LOCATION_DMABUF**, may optionally be set to provide a
183+
process-accessible mapping of the DMA-BUF memory. Otherwise should be
184+
NULL.
185+
186+
*dmabuf.fd*
187+
: DMA-BUF file descriptor (used when type is
188+
**IBV_MEMORY_LOCATION_DMABUF**).
189+
190+
*dmabuf.offset*
191+
: Offset within the DMA-BUF.
192+
193+
# RETURN VALUE
194+
195+
**ibv_create_comp_cntr**() returns a pointer to the allocated ibv_comp_cntr
196+
object, or NULL if the request fails (and sets errno to indicate the failure
197+
reason).
198+
199+
**ibv_destroy_comp_cntr**(), **ibv_set_comp_cntr**(),
200+
**ibv_set_err_comp_cntr**(), **ibv_inc_comp_cntr**(),
201+
**ibv_inc_err_comp_cntr**(), **ibv_read_comp_cntr**(), and
202+
**ibv_read_err_comp_cntr**() return 0 on success, or the value of errno on
203+
failure (which indicates the failure reason).
204+
205+
# ERRORS
206+
207+
ENOTSUP
208+
: Completion counters are not supported on this device, or the
209+
requested operation is not supported for the given counter
210+
configuration.
211+
212+
ENOMEM
213+
: Not enough resources to create the completion counter.
214+
215+
EINVAL
216+
: Invalid argument(s) passed.
217+
218+
EBUSY
219+
: The completion counter is still attached to a QP
220+
(**ibv_destroy_comp_cntr**() only).
221+
222+
# NOTES
223+
224+
Counter values must only be updated using **ibv_set_comp_cntr**(),
225+
**ibv_set_err_comp_cntr**(), **ibv_inc_comp_cntr**(), or
226+
**ibv_inc_err_comp_cntr**(). Counter memory supplied by the application
227+
must not be modified directly.
228+
229+
Updates made to counter values (e.g. via **ibv_set_comp_cntr**() or
230+
**ibv_inc_comp_cntr**()) may not be immediately visible when reading the
231+
counter via **ibv_read_comp_cntr**() or **ibv_read_err_comp_cntr**(). A small
232+
delay may occur between the update and the observed value. However, the final
233+
updated value will eventually be reflected.
234+
235+
Applications should ensure that the counter value is stable before calling
236+
**ibv_set_comp_cntr**() or **ibv_set_err_comp_cntr**(). Otherwise, concurrent
237+
updates may be lost.
238+
239+
# SEE ALSO
240+
241+
**ibv_qp_attach_comp_cntr**(3), **ibv_create_cq**(3),
242+
**ibv_create_cq_ex**(3), **ibv_create_qp**(3)
243+
244+
# AUTHORS
245+
246+
Michael Margolin <mrgolin@amazon.com>
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
date: 2026-02-09
3+
footer: libibverbs
4+
header: "Libibverbs Programmer's Manual"
5+
layout: page
6+
license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md'
7+
section: 3
8+
title: ibv_qp_attach_comp_cntr
9+
tagline: Verbs
10+
---
11+
12+
# NAME
13+
14+
**ibv_qp_attach_comp_cntr** - Attach a completion counter to a QP
15+
16+
# SYNOPSIS
17+
18+
```c
19+
#include <infiniband/verbs.h>
20+
21+
int ibv_qp_attach_comp_cntr(struct ibv_qp *qp,
22+
struct ibv_comp_cntr *comp_cntr,
23+
struct ibv_comp_cntr_attach_attr *attr);
24+
```
25+
26+
# DESCRIPTION
27+
28+
**ibv_qp_attach_comp_cntr**() attaches the completion counter *comp_cntr* to
29+
the queue pair *qp*. The *attr* argument specifies which operation types
30+
should update the counter.
31+
32+
The QP must be in **IBV_QPS_RESET** or **IBV_QPS_INIT** state when attaching
33+
a completion counter. Attempting to attach a counter to a QP in any other
34+
state will fail with EINVAL.
35+
36+
The completion counter starts collecting values for the specified QP once
37+
attached. Attaching the same completion counter to multiple QPs will
38+
accumulate values from all attached QPs into the same counter.
39+
40+
The *op_mask* field controls which operation completions are counted. Local
41+
operations (**IBV_COMP_CNTR_ATTACH_OP_SEND**, **IBV_COMP_CNTR_ATTACH_OP_RECV**,
42+
**IBV_COMP_CNTR_ATTACH_OP_RDMA_READ**, **IBV_COMP_CNTR_ATTACH_OP_RDMA_WRITE**)
43+
count completions initiated by the local QP. Remote operations
44+
(**IBV_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ**,
45+
**IBV_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE**) count completions of incoming
46+
RDMA operations initiated by the remote side. Supported *op_mask* values may
47+
vary by device; unsupported values will result in an ENOTSUP error.
48+
49+
Multiple completion counters can be attached to the same QP, provided their
50+
*op_mask* values do not overlap. Each QP and operation type pair can be
51+
associated with at most one completion counter. Attempting to attach a
52+
counter with an *op_mask* that conflicts with an already attached counter
53+
will fail.
54+
55+
There is no explicit detach operation. A completion counter is implicitly
56+
detached when the QP it is attached to is destroyed. A completion counter
57+
cannot be destroyed while it is still attached to any QP; the QP must be
58+
destroyed first.
59+
60+
# ARGUMENTS
61+
62+
*qp*
63+
: The queue pair to attach the completion counter to.
64+
65+
*comp_cntr*
66+
: The completion counter to attach, previously created with
67+
**ibv_create_comp_cntr**().
68+
69+
*attr*
70+
: Attach attributes specifying which operation types update the counter.
71+
72+
## ibv_comp_cntr_attach_attr
73+
74+
```c
75+
enum ibv_comp_cntr_attach_op {
76+
IBV_COMP_CNTR_ATTACH_OP_SEND = 1 << 0,
77+
IBV_COMP_CNTR_ATTACH_OP_RECV = 1 << 1,
78+
IBV_COMP_CNTR_ATTACH_OP_RDMA_READ = 1 << 2,
79+
IBV_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ = 1 << 3,
80+
IBV_COMP_CNTR_ATTACH_OP_RDMA_WRITE = 1 << 4,
81+
IBV_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE = 1 << 5,
82+
};
83+
84+
struct ibv_comp_cntr_attach_attr {
85+
uint32_t comp_mask;
86+
uint32_t op_mask;
87+
};
88+
```
89+
90+
*comp_mask*
91+
: Bitmask specifying what fields in the structure are valid.
92+
93+
*op_mask*
94+
: Bitmask of **ibv_comp_cntr_attach_op** values specifying which
95+
operation types should update the counter.
96+
97+
# RETURN VALUE
98+
99+
**ibv_qp_attach_comp_cntr**() returns 0 on success, or the value of errno on
100+
failure (which indicates the failure reason).
101+
102+
# ERRORS
103+
104+
EINVAL
105+
: Invalid argument(s) passed.
106+
107+
ENOTSUP
108+
: Requested operation is not supported on this device.
109+
110+
EBUSY
111+
: The *op_mask* overlaps with a completion counter already attached
112+
to this QP.
113+
114+
# SEE ALSO
115+
116+
**ibv_create_comp_cntr**(3), **ibv_create_qp**(3)
117+
118+
# AUTHORS
119+
120+
Michael Margolin <mrgolin@amazon.com>

0 commit comments

Comments
 (0)