Skip to content

Commit a655b65

Browse files
committed
Udpate README.md
1 parent 836cc18 commit a655b65

File tree

2 files changed

+107
-47
lines changed

2 files changed

+107
-47
lines changed

README.md

Lines changed: 53 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -31,60 +31,90 @@ flowchart TD
3131
classDef QUEUE fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
3232
3333
subgraph PRODUCERS [MP]
34-
P1[P1: Enqueue]
35-
P2[P2: Enqueue]
34+
P1[/P1: Enqueue/]
35+
ControlBlock_P1 -.-> P1
36+
P2[/P2: Enqueue/]
37+
ControlBlock_P2 -.-> P2
38+
P3[/P3: Enqueue/]
39+
ControlBlock_P3 -.-> P3
3640
end
3741
class PRODUCERS THREAD
3842
3943
subgraph CONSUMER [SC]
40-
C[SC: Dequeue]
44+
C[\C: Dequeue\]
45+
C -.-> ControlBlock_C
4146
end
4247
class CONSUMER THREAD
4348
4449
subgraph MPSC_QUEUE [Queue Instance]
45-
direction LR
46-
Head
47-
Tail
48-
Tail --> NodeA[Node]
49-
NodeA --> NodeB[Node]
50-
NodeB --> Head
50+
subgraph CONSUMER_VISIBLE [Consumer Visible]
51+
Tail(tail) -.-> NodeA(node)
52+
subgraph PRODUCER_VISIBLE [Producer Visible]
53+
NodeA --> NodeB(Node)
54+
NodeB --> NodeC(Node)
55+
NodeC --> NodeD(Node)
56+
NodeD --> NodeE(Node)
57+
NodeE --> NodeF(Node)
58+
NodeF --> NodeG(Node)
59+
NodeG --> NodeH(Node)
60+
NodeH --> Head(head)
61+
end
62+
end
5163
end
5264
class MPSC_QUEUE QUEUE
5365
66+
subgraph GLOBAL_MANAGER [Global Manager]
67+
Note1[really malloc/free here]:::note
68+
ControlBlockMap
69+
Page1[Page] --> Page2[Page]
70+
end
71+
classDef note fill:none, stroke:none;
72+
class GLOBAL_MANAGER manager
73+
5474
subgraph THREAD_LOCAL_NODE_POOL [Static Node Pool]
55-
subgraph THREAD_LOCAL_CACHE [theard_local Dmitry Vyukov]
75+
subgraph THREAD_LOCAL_CACHE [theard_local]
5676
direction TB
5777
LocalChunk_P1
5878
LocalChunk_P2
79+
LocalChunk_P3
5980
LocalChunk_C
6081
end
6182
6283
subgraph GLOBAL_NODE_POOL [global chunk stack]
6384
GlobalStackTop[Tagged Pointer]
64-
GlobalMutex[Page]
85+
GlobalStackBottom(NULL)
6586
NextChunk1
6687
NextChunk2
6788
end
6889
end
6990
class NODE_ALLOCATOR GLOBAL
7091
71-
P1 --> Head
72-
P2 --> Head
73-
C --> Tail
74-
75-
P1 -.-> LocalChunk_P1
76-
LocalChunk_P1 -- "Miss: Pop Chunk O(1)" -->GlobalStackTop
77-
P2 -.-> LocalChunk_P2
78-
LocalChunk_P2 -- "Miss: Pop Chunk O(1)" --> GlobalStackTop
79-
C -.-> LocalChunk_C
80-
LocalChunk_C -- "Recycle: Push Chunk O(1)" --> GlobalStackTop
92+
P1 -- "Preempt" --> Head
93+
P2 -- "Preempt" --> Head
94+
P3 -- "Preempt" --> Head
95+
NodeA -- "Consume" --> C
96+
C -.-> Tail
97+
98+
ControlBlockMap --> ControlBlock_C
99+
ControlBlockMap --> ControlBlock_P1
100+
ControlBlockMap --> ControlBlock_P2
101+
ControlBlockMap --> ControlBlock_P3
102+
103+
LocalChunk_P1 -.-> ControlBlock_P1
104+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P1
105+
LocalChunk_P2 -.-> ControlBlock_P2
106+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P2
107+
LocalChunk_P3 -.-> ControlBlock_P3
108+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P3
109+
ControlBlock_C -.-> LocalChunk_C
110+
LocalChunk_C -- "Push Chunk O(1)" --> GlobalStackTop
81111
82112
GlobalStackTop --> NextChunk1
83113
NextChunk1 --> NextChunk2
84-
NextChunk2 -- "Empty: Request Page" --> GlobalMutex
114+
NextChunk2 -- "Empty: Request Page" --> GlobalStackBottom
85115
```
86116

87-
Due to the characteristic of utilizing a global chunk stack for O(1) allocation of thread-local queues in units of 'chunks', the thread-local queues for producers and the consumer have a high chance of achieving efficient reuse via the stack in **SPSClike** scenarios.
117+
Due to the characteristic of utilizing a global chunk stack for allocation of thread-local queues in units of 'chunks', the thread-local queues for producers and the consumer have a high chance of achieving efficient reuse via the stack in **SPSClike** scenarios.
88118

89119
In scenarios with uniform competition among multiple producers, constrained by the limitations of the linked list structure, continuous `enqueue` operations lead to frequent CAS contention for the head of the list, which touches the performance floor of this queue.
90120

README.zh.md

Lines changed: 54 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -31,66 +31,96 @@ flowchart TD
3131
classDef QUEUE fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
3232
3333
subgraph PRODUCERS [MP]
34-
P1[P1: Enqueue]
35-
P2[P2: Enqueue]
34+
P1[/P1: Enqueue/]
35+
ControlBlock_P1 -.-> P1
36+
P2[/P2: Enqueue/]
37+
ControlBlock_P2 -.-> P2
38+
P3[/P3: Enqueue/]
39+
ControlBlock_P3 -.-> P3
3640
end
3741
class PRODUCERS THREAD
3842
3943
subgraph CONSUMER [SC]
40-
C[SC: Dequeue]
44+
C[\C: Dequeue\]
45+
C -.-> ControlBlock_C
4146
end
4247
class CONSUMER THREAD
4348
4449
subgraph MPSC_QUEUE [Queue Instance]
45-
direction LR
46-
Head
47-
Tail
48-
Tail --> NodeA[Node]
49-
NodeA --> NodeB[Node]
50-
NodeB --> Head
50+
subgraph CONSUMER_VISIBLE [Consumer Visible]
51+
Tail(tail) -.-> NodeA(node)
52+
subgraph PRODUCER_VISIBLE [Producer Visible]
53+
NodeA --> NodeB(Node)
54+
NodeB --> NodeC(Node)
55+
NodeC --> NodeD(Node)
56+
NodeD --> NodeE(Node)
57+
NodeE --> NodeF(Node)
58+
NodeF --> NodeG(Node)
59+
NodeG --> NodeH(Node)
60+
NodeH --> Head(head)
61+
end
62+
end
5163
end
5264
class MPSC_QUEUE QUEUE
5365
66+
subgraph GLOBAL_MANAGER [Global Manager]
67+
Note1[really malloc/free here]:::note
68+
ControlBlockMap
69+
Page1[Page] --> Page2[Page]
70+
end
71+
classDef note fill:none, stroke:none;
72+
class GLOBAL_MANAGER manager
73+
5474
subgraph THREAD_LOCAL_NODE_POOL [Static Node Pool]
55-
subgraph THREAD_LOCAL_CACHE [thead_local Dmitry Vyukov]
75+
subgraph THREAD_LOCAL_CACHE [theard_local]
5676
direction TB
5777
LocalChunk_P1
5878
LocalChunk_P2
79+
LocalChunk_P3
5980
LocalChunk_C
6081
end
6182
6283
subgraph GLOBAL_NODE_POOL [global chunk stack]
6384
GlobalStackTop[Tagged Pointer]
64-
GlobalMutex[Page]
85+
GlobalStackBottom(NULL)
6586
NextChunk1
6687
NextChunk2
6788
end
6889
end
6990
class NODE_ALLOCATOR GLOBAL
7091
71-
P1 --> Head
72-
P2 --> Head
73-
C --> Tail
74-
75-
P1 -.-> LocalChunk_P1
76-
LocalChunk_P1 -- "Miss: Pop Chunk O(1)" -->GlobalStackTop
77-
P2 -.-> LocalChunk_P2
78-
LocalChunk_P2 -- "Miss: Pop Chunk O(1)" --> GlobalStackTop
79-
C -.-> LocalChunk_C
80-
LocalChunk_C -- "Recycle: Push Chunk O(1)" --> GlobalStackTop
92+
P1 -- "Preempt" --> Head
93+
P2 -- "Preempt" --> Head
94+
P3 -- "Preempt" --> Head
95+
NodeA -- "Consume" --> C
96+
C -.-> Tail
97+
98+
ControlBlockMap --> ControlBlock_C
99+
ControlBlockMap --> ControlBlock_P1
100+
ControlBlockMap --> ControlBlock_P2
101+
ControlBlockMap --> ControlBlock_P3
102+
103+
LocalChunk_P1 -.-> ControlBlock_P1
104+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P1
105+
LocalChunk_P2 -.-> ControlBlock_P2
106+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P2
107+
LocalChunk_P3 -.-> ControlBlock_P3
108+
GlobalStackTop -- "Pop Chunk O(1)" --> LocalChunk_P3
109+
ControlBlock_C -.-> LocalChunk_C
110+
LocalChunk_C -- "Push Chunk O(1)" --> GlobalStackTop
81111
82112
GlobalStackTop --> NextChunk1
83113
NextChunk1 --> NextChunk2
84-
NextChunk2 -- "Empty: Request Page" --> GlobalMutex
114+
NextChunk2 -- "Empty: Request Page" --> GlobalStackBottom
85115
```
86116

87117

88-
由于利用全局块栈以`chunk`为单位进行O(1)分配线程本地队列的特点,在**SPSClike**场景下生产者和消费者的线程本地队列有很大机会借由栈实现高效复用。
118+
由于全局块栈以`chunk`为单位分配线程本地队列的特点,在**SPSClike**场景下生产者和消费者的线程本地队列有很大机会借由栈实现高效复用。
89119
在多生产者均匀竞争的场景下,受限于链表结构的限制,持续的`enqueue`操作会导致对于链表头频繁的CAS竞争,这会触及本队列的性能底线。
90120
因此,`daking::MPSC_queue`适用于:
91121
1. **非均匀生产和消息突发的场景**,也就是适合于“生产者非均匀地爆发洪峰”的场景。
92122
这将极大的降低MPSC的CAS竞争,将吞吐量快速拉回类似SPSC场景的表现。
93-
2. **生产者有批量入队行为的场景**,也就是生产者存在生产的聚合操作或拥有一个写入缓冲区并将批量入队的场景。
123+
1. **生产者有批量入队行为的场景**,也就是生产者存在生产的聚合操作或拥有一个写入缓冲区并将批量入队的场景。
94124
这是因为`daking::MPSC_queue:enqueue_bulk`会先使用高效的thread_local操作地把数据连接成一个链表段,然后只发生一次CAS将这段节点合并进队列。
95125
下面的性能测试证明了这两点。
96126

0 commit comments

Comments
 (0)