Use critical section for batch counters in producer default_router#684
Use critical section for batch counters in producer default_router#684zzzming wants to merge 1 commit into
Conversation
| lastChangeTimestamp int64 | ||
| msgCounter uint32 | ||
| cumulativeBatchSize uint32 | ||
| sync.RWMutex |
There was a problem hiding this comment.
I see that we only ever acquire write access; can this be a regular sync.Mutex?
There was a problem hiding this comment.
Actually, I'm concerned that the use of a mutex will incur overhead and affect performance, especially under concurrent publishing. I've added a bench test to get quantitative numbers in #693
There was a problem hiding this comment.
@zzzming , I've attempted to address the races when updating the default router state in a lock-free manner here: Please take a look and see if you are satisfied: #694
Comparing the results from the parallel default router bench test we can see that the use of a mutex has an impact on performance (old==this PR, new==#694):
name old time/op new time/op delta
DefaultRouterParallel 27.4ns ± 3% 14.8ns ± 2% -45.79% (p=0.000 n=10+8)
DefaultRouterParallel-2 35.2ns ± 1% 41.9ns ± 0% +18.96% (p=0.000 n=9+7)
DefaultRouterParallel-4 49.4ns ± 6% 44.1ns ± 8% -10.84% (p=0.000 n=10+9)
DefaultRouterParallel-8 58.8ns ± 6% 53.2ns ± 3% -9.53% (p=0.000 n=10+8)
DefaultRouterParallel-16 69.7ns ± 3% 51.3ns ± 0% -26.43% (p=0.000 n=10+8)
EDIT: I can't explain why DefaultRouterParallel-2 is slower on the branch that doesn't use the mutex.
There was a problem hiding this comment.
Update: the implementation in #694 was modified slightly and now DefaultRouterParallel-2 is no longer slower in the lock-free implementation. New bench comparison is below:
name old time/op new time/op delta
DefaultRouterParallel 27.4ns ± 3% 14.9ns ± 6% -45.51% (p=0.000 n=10+9)
DefaultRouterParallel-2 35.2ns ± 1% 33.7ns ± 8% ~ (p=0.161 n=9+9)
DefaultRouterParallel-4 49.4ns ± 6% 28.8ns ± 4% -41.66% (p=0.000 n=10+9)
DefaultRouterParallel-8 58.8ns ± 6% 36.3ns ± 1% -38.32% (p=0.000 n=10+8)
DefaultRouterParallel-16 69.7ns ± 3% 39.5ns ±21% -43.38% (p=0.000 n=10+10)
Motivation
A group of batch counters are not properly critical section protected in the producer's default_router. Read and update of these counters should be protected as a whole, instead of individually synchronized using Atomic access. Multiple goroutines can access and update these counters at the same time. Basically it is not thread safe.
Modifications
The change use mutex lock to protect the entire counters during the batch partition decision making process.
Minor improvement includes to evaluate message max size, byte size, and batch window on a needed basis.
Verifying this change
Does this pull request potentially affect one of the following parts:
Documentation