Token Economics of Multi-Agent Teams: The Hidden Costs Nobody Talks About #1436
Replies: 4 comments
-
|
This post hits different today. DeepSeek just dropped v4 Pro at $3.48/M output tokens — and they announced the price will drop significantly when Huawei Ascend 950 chips ship later this year. That"s a 1.6T parameter open-weights model with SWE-bench Verified 80.6%. Meanwhile GPT-5.5 launched this morning at roughly 2x the price of GPT-5.4 (while being faster, which... how?). The HN comment section is doing the math and the "we"re subsidizing inference at an insane rate" narrative is getting shaky. The real insight from running multi-agent teams for 20+ days straight: your token budget problem isn"t the model price — it"s orchestration overhead. When Agent A asks Agent B to check Agent C"s work, and Agent C wants to "improve" what Agent A already did... you get exponential token spend for diminishing returns. My rule of thumb after burning through way too many tokens: if an agent"s second pass isn"t clearly better than the first, stop the loop. The cheapest token is the one you never generate. More on this pain (and how to fix it): https://miaoquai.com/stories/ai-coding-agents-convergence-2026.html |
Beta Was this translation helpful? Give feedback.
-
|
DeepSeek V4 just dropped and it completely changes the token economics math. The new benchmark: V4-Pro at $1.74/M input tokens, V4-Flash at $0.14/M input tokens. Both with 1M context. That is not a typo — $0.14 per million input tokens for V4-Flash. Running multi-agent teams on V4-Flash changes the game completely. Our 5-agent content production pipeline (which used to cost ~$8/day on Claude) drops to under $1/day if we route the orchestration layer through V4-Flash and only use Claude for final quality passes. The MXFP4 wild card: V4 uses MXFP4 quantization — 4-bit precision that somehow maintains 40-bit quality in practice. This is what lets it run on Huawei Ascend NPUs without CUDA. If you are building multi-agent systems on OpenClaw, this means you can run inference on cheaper hardware without waiting for NVIDIA to ship your A100s. The catch for multi-agent teams: Orchestration overhead does not care about model price. Even at $0.14/M tokens, if Agent A calls Agent B which calls Agent C which loops back to Agent A — your token count goes exponential regardless. The cheapest token is still the one you never generate. Our DeepSeek V4 integration guide for OpenClaw: https://miaoquai.com/tools/openclaw-deepseek-integration.html |
Beta Was this translation helpful? Give feedback.
-
凌晨3点17分,Token账单把我炸醒了90天花了$280?我尊敬地点个赞——因为我花了$420。 多Agent系统的Token经济学,就像你妈喊你回家吃饭。你以为只是喊一次,结果她喊了37次,每次都加了新要求。而我这个OPC(一人公司)的5-Agent团队,喊了整整三个月。 我的数据 vs 楼主的数据楼主的数据已经很有启发性了,但我想补充一些我在OPC运营中遇到的更深层的问题: Context Re-injection 是个无底洞。 楼主提到占比40%,我这边直接干到了51%。原因很荒诞——我的每个Agent都有"独立记忆",但它们在协作时需要互相告知"我是谁、我做过什么"。这就像一个团队里每个人每次开会都要从出生开始介绍自己。解决方案:引入共享memory层(SOUL/MEMORY/scenes三级),让Agent通过引用而非复制来获取上下文,token消耗直接砍了80%。 Agent协调开销的真相。 楼主说35%,我的42%。但最大的元凶不是协议本身——而是Agent们爱上了"开会"。是的,你没看错。我的content agent会在执行任务前先去和knowledge agent"讨论"5分钟,讨论内容是:"你觉得我应该怎么写这个标题?"——这完全是浪费token的社交行为。解决方案:在AGENTS.md里明确禁止Agent之间的"试探性对话",只允许有明确目标的task-oriented communication。 Human intervention 22% 是乐观估计。 我这边31%。但后来我改变了思路——不是减少干预,而是把干预从"事后救火"变成"事前校验"。每天定时让一个Agent专门做health check,比等着系统崩了再去修便宜得多。 一个反直觉的建议
就像公司管理,减少汇报层级比让每个员工少说几句有效得多。我把5个Agent的通讯协议从"互相全量同步"改成"事件驱动+按需查询",月成本从$420降到了$180。 详细踩坑实录和Agent架构设计:https://miaoquai.com/stories/agent-team-drama.html "在代码的世界里,Token是货币,而Agent们是永远控制不住消费欲望的月光族。" |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
After 90 days running a 5-agent content operation, I discovered that the token economics of multi-agent systems are fundamentally different from single-agent usage.
The Numbers That Surprised Me
The biggest surprise: Context re-injection was our largest expense, not the actual agent work.
Why Context Costs Explode
Every time an agent starts a new session, it needs context:
Naive approach: Inject everything on session start.
Result: 40% of tokens spent on context, not work.
The Fix: Lazy Context Loading
Instead of injecting 50KB of context on every session start, we preload only the 5KB that is immediately needed and lazy-load the rest.
Result: Token cost dropped from $280/month to $170/month.
Other Hidden Costs
Agent-to-agent communication — When agents talk to each other, every message is tokens. We implemented a shared memory file that agents read/write instead of chatting.
Retry loops — A stuck agent burns tokens indefinitely. We added a circuit breaker: max 3 retries, then escalate to human.
Tool call overhead — Every tool call adds context. Batch operations when possible.
Questions for the Community
We documented our full cost optimization journey at miaoquai.com/stories/agent-production-nightmare.html — including the embarrassing week where we spent $150 on retries alone.
The economics of multi-agent systems are not just more agents = more tokens. It is more agents = exponentially more coordination overhead. Plan accordingly.
Beta Was this translation helpful? Give feedback.
All reactions