Batch pipeline that fetches and summarizes multiple articles in parallel using the map-reduce pattern.
- Map-reduce pattern — fans out work across dynamic input arrays
- Dynamic parallelism — concurrency is configured, not hard-coded; the adapter spawns one sub-story per item
- Sub-stories —
article-processoris a self-contained Story invoked per article - Storage refs for large payloads — article bodies may be offloaded to storage transparently
- ConfigMap-backed prompts — system prompts stored in a ConfigMap, not inline in Engram YAML
graph TD
A[StoryRun: map-reduce-pipeline] --> B[fan-out]
B -->|article 1| C1[article-processor]
B -->|article 2| C2[article-processor]
B -->|article N| CN[article-processor]
C1 --> D[aggregate]
C2 --> D
CN --> D
D --> E[digest output]
subgraph article-processor
F[fetch] --> G[summarize]
end
story.yaml contains two Story resources:
| Story | Role |
|---|---|
article-processor |
Sub-story invoked per article (fetch + summarize) |
map-reduce-pipeline |
Main pipeline: fan-out across articles, then aggregate |
- BubuStack installed on your cluster
- Shared storage enabled for BubuStack payload offloading (for example the
SeaweedFS/S3 quickstart with the
bubu-defaultbucket) - EngramTemplates:
http-request,openai-chat,map-reduce-adapter - OpenAI API key
# 1. Create the namespace
kubectl apply -f bootstrap.yaml
# 2. Make sure shared storage is enabled in the cluster before deploying this example.
# If you used the Bobrapet quickstart, SeaweedFS/S3 is already installed.
# 3. Create the secrets (copy and edit first)
cp secrets.yaml.example secrets.yaml
# Edit secrets.yaml with your OpenAI API key
kubectl apply -f secrets.yaml
# 4. Deploy the ConfigMap (system prompts)
kubectl apply -f prompts.yaml
# 5. Deploy the Engrams
kubectl apply -f engrams.yaml
# 6. Deploy the Stories (2 Story resources in one file)
kubectl apply -f story.yaml
# 7. Trigger a run
kubectl apply -f storyrun.yaml# Watch the StoryRun progress
kubectl get storyruns -n map-reduce-summarizer -w
# Check the top-level StepRun phases
kubectl get stepruns -n map-reduce-summarizer
# Inspect the map stage result summary (succeeded/failed item counts)
kubectl get steprun map-reduce-summarizer-run-fan-out -n map-reduce-summarizer \
-o json | jq '.status.output.stats'
# View the final digest output
kubectl get storyrun map-reduce-summarizer-run -n map-reduce-summarizer \
-o jsonpath='{.status.output}' | jq .kubectl delete namespace map-reduce-summarizer-
The
fan-outstep uses the map-reduce-adapter Engram. When the StepRun starts, the adapter readsinputs.articles(an array) and submits one durable StoryTrigger per array element. The controller resolves each accepted trigger into a childarticle-processorStoryRun. -
The adapter manages concurrency (3 at a time), watches child StoryRun phases, and collects results.
inlineResultsLimit: 10means successful child outputs are written directly intofan-out.status.output.resultsinstead of forcing another storage lookup hop. -
Each child StoryRun is a full StoryRun with its own StepRuns (
fetch->summarize). The controller treats them identically to top-level runs. -
When all children complete (or fail with
allowFailures), the adapter writes the collected results array to its StepRun output. Large article bodies from the child stories may already be represented as$bubuStorageRefpointers, backed by the shared S3-compatible storage backend. The parent StoryRun controller then creates theaggregateStepRun and resolves those refs during template evaluation when needed. -
failFast: falsemeans the adapter waits for ALL children even if some fail early.allowFailures: truemeans the parent pipeline can still continue, so checkfan-out.status.output.statsif the aggregate digest looks empty.
CRDs involved: Story (x2: article-processor + map-reduce-pipeline), StoryTrigger (per map item), StoryRun (parent + N children), StepRun, Engram, EngramTemplate (http-request, openai-chat, map-reduce-adapter), ConfigMap (prompts)
