@@ -43,29 +43,6 @@ defmodule GenStage do
4343 producers to send data using different "strategies". See
4444 `GenStage.Dispatcher` for more information.
4545
46- Many developers tend to create layers of stages, such as A, B and
47- C, for achieving concurrency. If all you want is concurrency, starting
48- multiple instances of the same stage is enough. Layers in GenStage must
49- be created when there is a need for back-pressure or to route the data
50- in different ways.
51-
52- For example, if you need the data to go over multiple steps but
53- without a need for back-pressure or without a need to break the
54- data apart, do not design it as such:
55-
56- [Producer] -> [Step 1] -> [Step 2] -> [Step 3]
57-
58- Instead it is better to design it as:
59-
60- [Consumer]
61- /
62- [Producer]-<-[Consumer]
63- \
64- [Consumer]
65-
66- where "Consumer" are multiple processes running the same code that
67- subscribe to the same "Producer".
68-
6946 ## Example
7047
7148 Let's define the simple pipeline below:
@@ -167,6 +144,47 @@ defmodule GenStage do
167144 you subscribe all of them, demand will start flowing upstream and
168145 events downstream.
169146
147+ ## Usage guidelines
148+
149+ As you get familiar with GenStage, developers may tend to create layers
150+ of stages, such as A, B and C, for achieving concurrency. For example,
151+ stage A does step 1 in your company workflow, stage B does step 2 and
152+ so forth. That's an anti-pattern.
153+
154+ The same guideline that applies to processes also applies to GenStage:
155+ use processes/stages to model runtime properties, such as concurrency and
156+ data-transfer, and not for code organization or domain design purposes.
157+ For the latter, you should use modules and functions.
158+
159+ If your domain has to process the data in multiple steps, you should write
160+ that logic in separate modules and not directly in a `GenStage`. You only add
161+ stages according to runtime needs, typically when you need to provide back-
162+ pressure or leverage concurrency. This way you are free to experiment with
163+ different `GenStage` pipelines without touching your business rules.
164+
165+ In particular, if your logic has three distinct steps, instead of starting
166+ three different stages for each step, it may be best to start multiple
167+ instances of a single stage that executes all steps. Instead of this:
168+
169+ [Producer Stage] -> [Stage Step 1] -> [Stage Step 2] -> [Stage Step 3]
170+
171+ You should rather have this:
172+
173+ [Consumer Step 1 + Step 2 + Step 3]
174+ /
175+ [Producer]->-[Consumer Step 1 + Step 2 + Step 3]
176+ \
177+ [Consumer Step 1 + Step 2 + Step 3]
178+
179+ The benefit of this approach is that you can scale the code based on the machine
180+ resources and runtime needs rather than the number of steps during development.
181+
182+ Finally, if you don't need back-pressure at all and you just need to process
183+ data that is already in-memory in parallel, a simpler solution is available
184+ directly in Elixir via `Task.async_stream/2`. This function consumes a stream
185+ of data, with each entry running in a separate task. The maximum number of tasks
186+ is configurable via the `:max_concurrency` option.
187+
170188 ## Demand
171189
172190 When implementing consumers, we often set the `:max_demand` and
@@ -202,7 +220,7 @@ defmodule GenStage do
202220 50 seconds to be consumed by C, which will then request another
203221 batch of 50 items.
204222
205- ## `init` and `:subscribe_to`
223+ ### `init` and `:subscribe_to`
206224
207225 In the example above, we have started the processes A, B, and C
208226 independently and subscribed them later on. But most often it is
@@ -305,30 +323,6 @@ defmodule GenStage do
305323 concurrently running in a `ConsumerSupervisor` is at most `max_demand` and
306324 the average amount of children is `(max_demand + min_demand) / 2`.
307325
308- ## Usage guidelines
309-
310- As you get familiar with GenStage, you may want to organize your stages
311- according to your business domain. For example, stage A does step 1 in
312- your company workflow, stage B does step 2 and so forth. That's an anti-
313- pattern.
314-
315- The same guideline that applies to processes also applies to GenStage:
316- use processes/stages to model runtime properties, such as concurrency and
317- data-transfer, and not for code organization or domain design purposes.
318- For the latter, you should use modules and functions.
319-
320- If your domain has to process the data in multiple steps, you should write
321- that logic in separate modules and not directly in a `GenStage`. You only add
322- stages according to the runtime needs, typically when you need to provide back-
323- pressure or leverage concurrency. This way you are free to experiment with
324- different `GenStage` pipelines without touching your business rules.
325-
326- Finally, if you don't need back-pressure at all and you just need to process
327- data that is already in-memory in parallel, a simpler solution is available
328- directly in Elixir via `Task.async_stream/2`. This function consumes a stream
329- of data, with each entry running in a separate task. The maximum number of tasks
330- is configurable via the `:max_concurrency` option.
331-
332326 ## Buffering
333327
334328 In many situations, mismatches might happen between how many events can be produced
@@ -946,6 +940,8 @@ defmodule GenStage do
946940
947941 ## Examples
948942
943+ ### Demand with buffering
944+
949945 In the following example, the producer emits enough events to at least
950946 satisfy the demand, letting GenStage buffer any excess events:
951947
0 commit comments