Commit 9a6dda3
huangjun
Fix time source mismatch in client RPC path after #3268 migration
PR #3268 ("Use monotonic time instead of wall time") migrated several
hot paths from butil::gettimeofday_us() to butil::cpuwide_time_us(),
including LocalityAwareLoadBalancer::Weight::Update which now reads
end_time_us from the cpuwide clock. The migration was incomplete on
the caller side: Channel::CallMethod and the retry sites in
Controller::OnVersionedRPCReturned still passed gettimeofday_us() as
the begin time, which is then plumbed through Controller::IssueRPC ->
Controller::Call::begin_time_us -> SelectIn::begin_time_us ->
CallInfo::begin_time_us into Weight::Update.
In Weight::Update the latency is computed as
latency = end_time_us - ci.begin_time_us
= cpuwide_now - wallclock_begin
~= -1.7e15 us (huge negative)
triggering the
if (latency <= 0) { /* time skews, ignore the sample */ return 0; }
short-circuit. _time_q never accumulates samples, _avg_latency stays
at 0, and locality-aware feedback is silently disabled. The dual time
domain also leaks into Controller::latency_us() (controller.h:215)
where _begin_time_us was wallclock and the fallback used cpuwide.
Visible downstream effect: cold-start `list://` channels with `lb=la`
and 2 backends occasionally fail RPCs with EHOSTDOWN (`Fail to select
server from list://...`) on retry even when one backend is healthy.
Bisected reproduction in xsky/brpc fork:
- 51 commit range c41e838..604dad0c (1.16.1 .. 1.17.0-rc2)
- master code + LA-driven multipath probe at 2 backends, max_retry=1,
repeat 500x:
* commit 771de31 (one before #3268): 0/500 fail
* commit 12fb539 (#3268): 25/500 fail
* commit 12fb539 + revert only Weight::Update::end_time_us back
to gettimeofday: 0/500 fail
- Reverting the single end_time_us assignment confirms the time
source mismatch is the regression vector.
This commit aligns the client-side callers with #3268's intent, so
all RPC time accounting uses cpuwide_time_us consistently:
src/brpc/channel.cpp:451 CallMethod entry begin time
src/brpc/channel.cpp:628 sync RPC end time
src/brpc/controller.cpp:672 backup-request retry IssueRPC
src/brpc/controller.cpp:705 retry-backoff deadline check
src/brpc/controller.cpp:715 regular retry IssueRPC
src/brpc/controller.cpp:803 circuit-breaker feedback latency
src/brpc/controller.cpp:980 async OnRPCEnd
src/brpc/controller.cpp:1019 backup-thread OnRPCEnd
Adds test/brpc_load_balancer_unittest.cpp::la_records_latency_with_consistent_time_source
which documents the LB-side invariant: when CallInfo::begin_time_us is
in the cpuwide_time domain (matching Weight::Update::end_time_us) the
LA feedback path produces a positive _avg_latency; when it is in the
wallclock domain the time-skew short-circuit fires and _avg_latency
stays 0.1 parent 5fdb0d8 commit 9a6dda3
3 files changed
Lines changed: 86 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
448 | 448 | | |
449 | 449 | | |
450 | 450 | | |
451 | | - | |
| 451 | + | |
452 | 452 | | |
453 | 453 | | |
454 | 454 | | |
| |||
625 | 625 | | |
626 | 626 | | |
627 | 627 | | |
628 | | - | |
| 628 | + | |
629 | 629 | | |
630 | 630 | | |
631 | 631 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
669 | 669 | | |
670 | 670 | | |
671 | 671 | | |
672 | | - | |
| 672 | + | |
673 | 673 | | |
674 | 674 | | |
675 | 675 | | |
| |||
702 | 702 | | |
703 | 703 | | |
704 | 704 | | |
705 | | - | |
| 705 | + | |
706 | 706 | | |
707 | 707 | | |
708 | 708 | | |
| |||
712 | 712 | | |
713 | 713 | | |
714 | 714 | | |
715 | | - | |
| 715 | + | |
716 | 716 | | |
717 | 717 | | |
718 | 718 | | |
| |||
800 | 800 | | |
801 | 801 | | |
802 | 802 | | |
803 | | - | |
| 803 | + | |
804 | 804 | | |
805 | 805 | | |
806 | 806 | | |
| |||
977 | 977 | | |
978 | 978 | | |
979 | 979 | | |
980 | | - | |
| 980 | + | |
981 | 981 | | |
982 | 982 | | |
983 | 983 | | |
| |||
1016 | 1016 | | |
1017 | 1017 | | |
1018 | 1018 | | |
1019 | | - | |
| 1019 | + | |
1020 | 1020 | | |
1021 | 1021 | | |
1022 | 1022 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1303 | 1303 | | |
1304 | 1304 | | |
1305 | 1305 | | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
| 1310 | + | |
| 1311 | + | |
| 1312 | + | |
| 1313 | + | |
| 1314 | + | |
| 1315 | + | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
| 1319 | + | |
| 1320 | + | |
| 1321 | + | |
| 1322 | + | |
| 1323 | + | |
| 1324 | + | |
| 1325 | + | |
| 1326 | + | |
| 1327 | + | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
1306 | 1384 | | |
1307 | 1385 | | |
1308 | 1386 | | |
| |||
0 commit comments