Hi,
while working with ib_write_bw, I noticed that it fails to reach high throughputs if the message size (given by -s) is not divisible by a certain small power of 2.
Towards the end I tested with these parameters: --rate_limit 60 --rate_limit_type SW -D 20 --burst_size 1 -s 65600, same for server and client. With -s 65600 (multiple of 64), it is able to reach 60Gbit/s, but with -s 65568 (multiple of 32, not 64) the throughput is consistently significantly lower (around 53Gbit/s). And with -s 65600 on the server and -s 65568 on the client, it is also able to reach 60Gbit/s, while message sizes on the wire are still 65568.
After looking through the code, the issue seems to be because the RDMA receive memory address is not cache-aligned. In perftest_communication.c:914, the sge vaddr given to the client is the second half of ctx->buf, but without the cache line size alignment present in other parts of the code. Changing
my_dest[i].vaddr = (uintptr_t)ctx->buf[0] + (user_param->num_of_qps + i)*BUFF_SIZE(ctx->size,ctx->cycle_buffer);
to
my_dest[i].vaddr = (uintptr_t)ctx->buf[0] + (user_param->num_of_qps + i)*INC(BUFF_SIZE(ctx->size,ctx->cycle_buffer), ctx->cache_line_size);
resolves the issue. I'm not very familiar with the code, so there are probably other places where this alignment is missing that I'm not aware of.
Have a nice day!
Hi,
while working with ib_write_bw, I noticed that it fails to reach high throughputs if the message size (given by -s) is not divisible by a certain small power of 2.
Towards the end I tested with these parameters:
--rate_limit 60 --rate_limit_type SW -D 20 --burst_size 1 -s 65600, same for server and client. With-s 65600(multiple of 64), it is able to reach 60Gbit/s, but with-s 65568(multiple of 32, not 64) the throughput is consistently significantly lower (around 53Gbit/s). And with-s 65600on the server and-s 65568on the client, it is also able to reach 60Gbit/s, while message sizes on the wire are still 65568.After looking through the code, the issue seems to be because the RDMA receive memory address is not cache-aligned. In perftest_communication.c:914, the sge vaddr given to the client is the second half of
ctx->buf, but without the cache line size alignment present in other parts of the code. Changingto
resolves the issue. I'm not very familiar with the code, so there are probably other places where this alignment is missing that I'm not aware of.
Have a nice day!