Skip to content

Ub ring transport dev#8

Open
zchuango wants to merge 84 commits intomasterfrom
ub_ring_transport_dev
Open

Ub ring transport dev#8
zchuango wants to merge 84 commits intomasterfrom
ub_ring_transport_dev

Conversation

@zchuango
Copy link
Copy Markdown
Owner

@zchuango zchuango commented May 7, 2026

What problem does this PR solve?

Issue Number: resolve

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

giriraj-singh-couchbase and others added 30 commits May 9, 2026 09:57
…pache#3138)

* Implemented Couchbase binary protocol support

* added support for single connection type for couchbase

* removed unnecessary cout statements

* added protocol code for helo packet

* fixed vbucketID code for identification, fixed add and get functions

* Added test cases for threaded get and add functions

* Added Error Handling code and made upsert and delete examples

* added makefile for example/couchbase_c++

* fixed bugs in couchbase header files

* Added License and formatted to google c++ norms

* fixed bugs, added support for collections and added couchbase_client.md

* fixed license issue

* added custom logic for caching collectionIDs

* added caching of collection manifests

* Added example code for multithreaded demonstration

* updated CMake

* Abstracted CRUD operations

* Added pipeline/batching support

* commented unused variables

* Updated support for C++17

* fixed some issue.

* Using Mutex instead of shared lock to support c++11

* Formatted code to google c++ format

* Introduced local cache per-instance of CouchbaseOperations and added functionality to handle server side manifest updates.

* Delete MODULE.bazel.lock

Unnecessary file

* Fixed bugs in local collection cache and collection refresh logic

* remove recurring statements

* Fixed bugs/repetitive calls to refreshing manifest on server

* Formatted function/variable naming scheme and formatted code in c++ google format

* removed unnecessary code

* updated comments

* updated comments

* updated documentation

* updated documentation

* updated documentation

* updated documentation

* Updated documentation

* Updated documentation

* Update documentation

* Added features and fixed bugs in multithreaded environment

Using connection_groups to differentiate between connections across CouchbaseOperations instances to different buckets.

Renamed CollectionManifestTracker class to CollectionManifestManager and all the related functionality inside it as before refreshing method was outside this class

Added two different authenticate method authenticate(not secure) and authenticateSSL(secure)

* Updated multithreaded and single threaded code.

Added an example where a single instance is being shared across the threads when operating on single bucket.

* updated documentation

updated the documentation on thread safe operations and fixed small small discrepancies.

* removed commented code and updated readme to have links for cluster download certificate

* removed unused code.

* Added traditional bRPC coding approach

Traditional bRPC coding approach doesn't uses high level functions but provides more control to the user

fixed formatting issues.

fixed the bug in couchbase.cpp where logic to check the cache is empty was inverted

* updated couchbase_example.md

* added unit test cases

* removed using namespace std from couchbase.h

* restored original CMakeLists.txt
The bthread name is shown when checking bthread status by curl
ip:port/bthreads/xxx, which helps to debug when bthread trace is not
enabled.
There are two kinds of problems:
1. signed number overflow is undefined behavior;
2. vsnprintfT may return E2BIG instead of EOVERFLOW.
* feat: support more ssl verify mode

* 1
This commit adds full support for RISC-V 64-bit architecture to brpc.

Changes include:
- Add RISC-V atomic operations implementation
- Add RISC-V architecture detection in build system
- Add RISC-V context switching (bthread support)
- Add RISC-V clock cycle counter support (rdcycle)
- Update CMake and Makefile for RISC-V compilation

All core functionalities have been tested and verified in QEMU RISC-V
environment, including:
- Atomic operations (32-bit and 64-bit)
- Memory barriers
- Context switching
- Clock cycle counting

Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
Co-authored-by: Haigang Xi <xhgang@blackwingasset.com>
* Bugfix: The failure of ibv_post_send is caused by polling send CQE before recv CQE

* Split send and recv comp channel

* Use wr_id to update _sq_window_size

* Send CQ and recv CQ share comp channel

* Add IMM window

* Deallocate polling cq

* Update RDMA documents
…he#3184)

Signed-off-by: jiasheng.yu <jiashengyu@deepglint.com>
Co-authored-by: jiasheng.yu <jiashengyu@deepglint.com>
The root cause is unique_ptr has constexpr destructor since C++23

libcxx/include/__memory/unique_ptr.h:75:19: error: invalid application of 'sizeof' to an incomplete type 'brpc::RedisCommandHandler'
   75 |     static_assert(sizeof(_Tp) >= 0, "cannot delete an incomplete type");
      |                   ^~~~~~~~~~~
libcxx/include/__memory/unique_ptr.h:290:7: note: in instantiation of member function 'std::default_delete<brpc::RedisCommandHandler>::operator()' requested here
  290 |       __deleter_(__tmp);
      |       ^
libcxx/include/__memory/unique_ptr.h:259:71: note: in instantiation of member function 'std::unique_ptr<brpc::RedisCommandHandler>::reset' requested here
  259 |   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
      |                                                                       ^
src/brpc/redis.h:220:14: note: in instantiation of member function 'std::unique_ptr<brpc::RedisCommandHandler>::~unique_ptr' requested here
  220 |     explicit RedisConnContext(const RedisService* rs)
      |              ^
src/brpc/redis.h:190:7: note: forward declaration of 'brpc::RedisCommandHandler'
  190 | class RedisCommandHandler;

Co-authored-by: yin.li <yin.li@okg.com>
…e#3187)

* Wrap absl::string_view as std::string to support protobuf v30+

Closes apache#3181

* remove unnecessary specialization for abs:string_view

* keep path for consistency
* feat: enable TLS key logging via SSLKEYLOGFILE env

* fix
* Fix port parsing validation in str2endpoint

Signed-off-by: Anant Shukla <anantshukla836@gmail.com>

* Add unit tests for rejecting trailing characters after port parsing

Signed-off-by: Anant Shukla <anantshukla836@gmail.com>

---------

Signed-off-by: Anant Shukla <anantshukla836@gmail.com>
* bind_client_ip

* fix UT & review

* add  client_host UT

* updated to support SO_BINDTODEVICE.

* updated to support SO_BINDTODEVICE and bind client_host.

* review
…he#3199)

* Add The transport layer to support communication protocols of different device vendors.

* Refine the SocketMode name style and clean some unused code

* Refine Transport Debug method param and RdmaTransport WaitEpollOut code

* format the code, remove indentation for top class and variables in new file

* review code

---------

Co-authored-by: wenjiecn <3252896864@qq.com>
Co-authored-by: Haigang Xi <xhgang@blackwingasset.com>
1. The return value of CreateTransport should be std::unique_ptr.
2. Delete BAIDU_REGISTER_ERRNO in transport_factory.h.
3. Optimize some code formatting.
…pache#3219)

* feat(auto_cl): add error rate threshold for punishment attenuation

Add new GFlag `auto_cl_error_rate_punish_threshold` to enable
error-rate-based punishment attenuation in AutoConcurrencyLimiter.

Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause
disproportionate avg_latency inflation (+31%), leading the limiter
to mistakenly shrink max_concurrency and trigger ELIMIT rejections.

Solution: Inspired by Alibaba Sentinel's threshold-based approach:
- threshold=0 (default): Original behavior preserved (backward compat)
- threshold>0 (e.g., 0.1): Error rates below threshold produce zero
  punishment; above it, punishment scales linearly from 0 to full

Example: With threshold=0.1, a 5% error rate produces no punishment,
while a 50% error rate produces 44% of the original punishment.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…thread_freebsd.cc (apache#3223)

The tracked_objects.h header and ThreadData::InitializeThreadContext()
were part of Chromium's base library profiling subsystem, which was
never ported to brpc. The Linux (platform_thread_linux.cc) and macOS
(platform_thread_mac.mm) equivalents already had these references
removed. This causes a compile error on FreeBSD:

  fatal error: 'butil/tracked_objects.h' file not found
* Copy http headers from main controller to sub controller

* Support custom modification of sub controllers
zchuango and others added 29 commits May 9, 2026 09:57
…ache#3272)

Doc comments in src/butil/process_util.h read 'on sucess' (lines 31, 36). Fixed to 'on success'. Comment-only change.

Signed-off-by: SAY-5 <SAY-5@users.noreply.github.com>
Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>
Add missing .previous directive after each .note.GNU-stack section in ARM inline
assembly blocks. This ensures proper section switching and prevents potential
assembler errors when building with asan.

See apache#1186
)

Commit 12fb539 ("Use monotonic time instead of wall time", apache#3268)
switched the three time-source calls in SamplerCollector::run() from
gettimeofday_us() to cpuwide_time_ns(), but the surrounding code still
treats the timestamps as microseconds:

- abstime += 1000000L now represents 1 ms (not 1 s), causing the
  sampler to spin at ~1 kHz instead of 1 Hz;
- usleep(abstime - now) receives a nanosecond delta, which usleep()
  interprets as microseconds.

Use cpuwide_time_us() instead, which preserves the monotonic behavior
from apache#3268 while keeping the existing microsecond-based arithmetic
correct.

Fixes apache#3277.

Co-authored-by: huangjun <huangjun@xsky.com>
…pache#3282)

read_proc_status can be sampled while default bvars are initialized before main().
If reading /proc/self/stat fails at that time, logging through glog may access
uninitialized glog state and crash.

Print the warning to stderr instead, matching the read_proc_io fallback.

Signed-off-by: zhoulei <zhoulei@xsky.com>
Co-authored-by: zhoulei <zhoulei@xsky.com>
…rs (apache#3283)

PR apache#3268 ("Use monotonic time instead of wall time") switched
LocalityAwareLoadBalancer::Weight::Update's end_time_us and
LocalityAwareLoadBalancer::Describe's now to butil::cpuwide_time_us(),
but every caller that supplies CallInfo::begin_time_us still uses
butil::gettimeofday_us():

  - Channel::CallMethod (channel.cpp:451) -> Controller::IssueRPC ->
    Controller::Call::begin_time_us -> SelectIn::begin_time_us ->
    CallInfo::begin_time_us
  - Controller::OnVersionedRPCReturned retry sites
    (controller.cpp:672, 715) call IssueRPC(gettimeofday_us()) on
    backup-request and regular retries

The mismatched time domains make

    latency = end_time_us - ci.begin_time_us
            = cpuwide_now - wallclock_begin
            ~= -1.7e15 us

trigger the `if (latency <= 0) { /* time skews, ignore */ return 0; }`
short-circuit on every call. _time_q never accumulates samples,
_avg_latency stays at 0, and locality-aware weight feedback is silently
disabled.

Visible downstream symptom: cold-start `list://` channels with `lb=la`
and 2 backends occasionally fail RPCs with EHOSTDOWN ("Fail to select
server from list://...") on retry even when one backend is healthy.
Bisected reproduction in xsky/brpc fork:

  - 51 commit range c41e838..604dad0c (1.16.1 .. 1.17.0-rc2)
  - master code + LA-driven multipath probe at 2 backends, max_retry=1,
    repeat 500x:
      * commit 771de31 (one before apache#3268): 0/500 fail
      * commit 12fb539 (apache#3268):           25/500 fail
      * commit 12fb539 + revert only Weight::Update::end_time_us to
        gettimeofday_us:                    0/500 fail

This commit reverts the LA-side of apache#3268's clock change so the LB lines
up with its existing callers again. Channel::CallMethod and the retry
paths in Controller stay on butil::gettimeofday_us(), which preserves
the wall-clock semantics of Controller::_begin_time_us /
Controller::latency_us() that public users rely on.

Adds test/brpc_load_balancer_unittest.cpp::la_records_latency_with_consistent_time_source
which drives a series of SelectServer + Feedback cycles against
LocalityAwareLoadBalancer (no Server / Channel needed) and asserts
that _avg_latency reflects the elapsed time, rather than being stuck
at 0 because of a time-source mismatch.

Co-authored-by: huangjun <huangjun@xsky.com>
@zchuango zchuango force-pushed the ub_ring_transport_dev branch from 4bd8328 to a6a852e Compare May 9, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.