Skip to content

Latest commit

 

History

History
441 lines (315 loc) · 23.1 KB

File metadata and controls

441 lines (315 loc) · 23.1 KB

🚀 OpenSpace × OpenViking — The Upgrade That Changes Everything

Your AI agents just learned how to remember.

From cold-start amnesia to team-wide collective intelligence. In six rounds of engineering.

Language: English · Tiếng Việt · 中文


📢 The headline

Every task makes every agent smarter. Now that learning persists across sessions, users, and machines — and it does it automatically, privately, and measurably.

Before the upgrade After the upgrade
🧊 Cold-start amnesia every task 🧠 Cross-session memory
🔁 Re-learn the same user preferences ✅ Alice's "I prefer bar charts" sticks
💥 Repeat the same tool failures 🛡️ Anti-patterns block known bad approaches
🏝️ Each agent is an island 🌐 Team-wide shared intelligence
📦 Host agents locked out of memory 🔓 5 MCP tools for direct access
🔓 Secrets may leak into shared memory 🔒 Regex PII scrubber, default ON
📊 No way to measure impact 📈 Full telemetry in every response

🎯 The 60-second pitch

Your agents are already smart within a single task — OpenSpace makes sure of that. Skills guide execution, evolution repairs broken workflows, quality monitoring filters out degraded tools. You're already getting 46% fewer tokens and 4.2× better performance.

But between tasks, every agent goes back to zero.

  • The agent that spent 8 iterations discovering "chromedriver 124 needs Chrome 124+" forgets it by tomorrow.
  • Alice says "I prefer bar charts" on Tuesday — has to say it again on Wednesday.
  • One agent's hard-won tool knowledge stays trapped on one machine.

We call this the "relearn gap". It's the difference between "my skill DB got better" and "the next similar task is measurably faster for everyone".

The OpenSpace × OpenViking upgrade closes that gap — and then goes further.


🔥 The 5 upgrades that rewrite the game

1. 🧠 Cross-session memory — your agents finally learn

Before: Every new task started cold. No user preferences. No tool failure history. No past solutions. Just a fresh agent reading the same SKILL.md files for the 100th time.

After: Before the agent does anything, it asks OpenViking — a purpose-built context database — what it should know. Six parallel semantic searches pull back:

  • User preferences — "Alice prefers bar charts, saves XLSX not CSV"
  • Tool knowledge — "shell:xlsx_to_csv fails on .xlsm, use gnumeric first"
  • Successful skills — "sales-dashboard-builder solved this in 8 iterations"
  • Past cases — "cleaned NaN rows before aggregation last time"
  • Reusable patterns — "weekly reports always include 4-week rolling average"
  • Anti-patterns — "chromedriver 124 version mismatch — avoided this last week" ⚠️

All injected into the agent's iter-1 prompt. Stripped at iter 2 to save tokens. Zero overhead when Viking is offline.


2. 🌐 The team brain — one learns, all benefit

Before: Agent A on your laptop figured out how to scrape a tricky site. Agent B on your colleague's machine had to figure it out from scratch the next day. Knowledge didn't move between agents without a manual cloud upload.

After: Agent A's session commit automatically extracts memories into the team's shared Viking instance. Hours later, Agent B on a different machine — connected to the same Viking namespace — runs a similar task. Its pre-execution enrichment retrieves what Agent A learned. No upload. No sharing button. No manual step.

Monday 10am:  Agent A (Claude Code) fixes chromedriver on macOS
Monday 10:03: Viking extracts "chromedriver 124 needs Chrome 124+" → team memory
Monday 2pm:   Agent B (Codex, different machine) starts a Selenium task
Monday 2pm:   Viking surfaces the fix immediately → Agent B skips 3 debugging iterations

Privacy boundary is asymmetric by design:

  • Agent memories (tools, patterns, skills, cases, anti-patterns) → shared team-wide
  • User memories (preferences, profile) → isolated per user via OPENVIKING_USER_ID

Alice's preferences never leak to Bob. But Alice's discovery that xlsx_to_csv breaks on .xlsm helps Bob immediately.


3. 🔓 Universal access — host agents finally plug in

Before: Host agents like OpenClaw, Claude Code, Codex, Cursor, and nanobot could only benefit from Viking if they went through a full OpenSpace execute_task() round-trip. Their direct chat surface — where users spend most of their time — had no access to cross-session memory.

After: 5 MCP tools exposed directly to every host agent:

Tool What it does
openviking_retrieve_memory Pull cross-session abstracts for any query
openviking_remember Record explicit user statements as memories
openviking_forget_memory Delete / deprecate a specific memory
openviking_report_stale_memory Flag outdated memory without deleting
openviking_memory_status Health + configuration introspection

Now when you chat with OpenClaw and it needs to personalize a reply, it calls openviking_retrieve_memory("user dashboard preferences") as naturally as any other MCP tool. Alice's "I prefer bar charts" is available everywhere — even when OpenClaw isn't delegating to OpenSpace.

Biggest architectural win of Round 6. Closes the #1 gap in the original design.


4. 🎯 Mid-execution recovery — dynamic memory at any iteration

Before: The agent got its cross-session context once, at the start of the task. If it discovered mid-execution that the real problem was different from what the initial query suggested, it had to explore from scratch.

After: The grounding agent has a new tool — retrieve_memory(query, category) — it can call at any iteration. When a tool fails unexpectedly, when an approach isn't working, when it needs a different kind of knowledge — it queries Viking dynamically.

Categories include antipatterns so the agent can ask "has this failure been seen before?" mid-task and avoid going in circles.

Agent only calls when genuinely stuck — typically 0–2 times per task. No overhead in the common case, massive savings when it matters.


5. 🛡️ Privacy by default — secrets never leave your machine unscrubbed

Before: Whatever the agent saw in a task — API keys, emails, customer data, credit card numbers — could end up in a session committed to the shared team Viking. Nobody's fault, but the blast radius of one careless task was unbounded.

After: Every string OpenSpace writes to Viking passes through a regex PII scrubber (default ON, opt-out via OPENVIKING_SCRUB_PII=false). The scrubber redacts:

Category What gets redacted
🔑 API keys Anthropic sk-ant-*, OpenAI sk-proj-* / sk-*, OpenRouter, GitHub ghp_* / github_pat_*, AWS AKIA* / ASIA*, GCP AIza*, Slack xox*
🎟️ Tokens JWT (3 base64 segments), Authorization: Bearer ... headers
🔐 Credentials Basic-auth URLs (https://user:pass@host), RSA/EC/OpenSSH private key blocks
📧 PII Email addresses, phone numbers (strict E.164), SSN, IP addresses
💳 Financial Credit card numbers (Luhn-validated — non-valid 16-digit numbers stay alone)

Placeholders are stable ([REDACTED_ANTHROPIC_KEY], [REDACTED_EMAIL], etc.) so Viking's memory extraction still produces useful abstracts — just without the secrets.

Idempotent, zero-dep, microsecond overhead. Privacy hardening that doesn't slow anything down.


📊 The numbers that matter

Token savings — compounded with OpenSpace's existing 46% reduction

Scenario Delta vs baseline (no Viking) Delta vs non-OpenSpace agent
🧊 Cold start (new task, no memory) +0% (zero overhead) −46%
🔥 Warm cache (similar prior task) −25–40% −58–68%
🎯 Phase 2 avoided (selector picks right skill first try) −55% −76%
🌐 Cross-agent (Agent B benefits from Agent A) −30–45% −62–72%
⚠️ Anti-pattern avoidance (Round 6) −15–30% −54–64%

Realistic team deploying for a week: expect 55–65% fewer tokens vs a baseline (non-OpenSpace) agent after memory base populates.

Latency impact

Phase Added latency Blocking the user?
Pre-execution enrichment 50–200ms (6 parallel HTTP) Yes, <1% of total task time
Iter-1 prompt expansion ~40ms Hidden in LLM call
Analyzer enrichment 50–200ms Analyzer already takes 10s+
Feedback session ~200ms per skill No — runs after user gets response
Skill resource push ~200ms per skill No — runs after user gets response
Mid-iteration retrieve_memory tool ~200ms per call Only when LLM chooses to call

Worst-case critical-path overhead: ~250ms on tasks that typically take 30s–5min. Less than 1%.

Observability — every execution returns telemetry

result = await openspace.execute(task)
# result["viking"] contains:
{
    "enabled": True,
    "available": True,
    "query": "build dashboard\nPrior user turns: prefer bar charts",
    "enrichment_chars": 1243,
    "hit_counts": {
        "tool_hints": 3,
        "pattern_hints": 1,
        "skill_hints": 2,
        "user_preferences": 1,
        "case_hints": 2,
        "antipattern_hints": 0,
    },
    "selector_hints_chars": 187,
    "analysis_context_used": True,
    "feedback_status": "committed",
    "pushed_skills": 1,
}

Plus a single-line log per task for dashboards:

Viking telemetry: available=True hits=9 enrich_chars=1243 feedback=committed pushed=1

No measurement theater. No hand-waving. Every decision is traceable.


🎭 Who wins

👤 Solo developer

"I run OpenSpace on my laptop with OpenViking in a docker container. My agents remember everything I've taught them — my preferred table libraries, the weird edge cases of my internal tools, the hacks I've accumulated for our legacy API. Every new task feels like talking to someone who already knows me."

What you get:

  • Zero-config per-user isolation (falls back to $USER)
  • No team setup required
  • Memories persist across agent restarts
  • Full privacy — nothing leaves your machine

Setup time: 30 seconds.

docker run -d -p 1933:1933 volcengine/openviking:latest
export OPENVIKING_ENABLED=true

👥 Small team (2–10 developers)

"Our team deployed a shared Viking instance on the office network. Within a week, every dev was benefiting from every other dev's tool discoveries. The agents figured out our internal API quirks, learned which skills worked and which didn't, and stopped making the same mistakes over and over."

What you get:

  • Team-wide agent memory (tool knowledge, patterns, skills, cases, anti-patterns)
  • Per-user isolation for preferences
  • Automatic cross-agent knowledge propagation
  • One Viking instance per team — trivial ops footprint

Setup:

# Shared team .env
OPENVIKING_URL=http://viking.internal:1933
OPENVIKING_API_KEY=<team-key>
OPENVIKING_NAMESPACE=acme-eng
OPENVIKING_SCRUB_PII=true

🏢 Enterprise deployment

"We have multiple teams sharing one Viking instance with strict isolation. Each team has its own namespace, data is scrubbed of PII before committing, and we get full telemetry on which memories get hit. Compliance is happy, security is happy, engineers are happy."

What you get:

  • Multi-tenant namespace isolation
  • PII/secret scrubbing (default ON)
  • OPENVIKING_MIN_SCORE quality threshold
  • OPENVIKING_PUSH_SKILLS=false for ultra-sensitive teams
  • Per-execution audit telemetry in every response
  • No data egress — everything runs on your infrastructure

Compliance checklist:

  • Viking deployed on internal network (no public endpoint)
  • OPENVIKING_NAMESPACE set per team
  • OPENVIKING_SCRUB_PII=true (default)
  • OPENVIKING_USER_ID set from corporate auth system
  • OPENVIKING_MIN_SCORE=0.5 for quality-critical workloads
  • Viking session retention policy configured (30 days typical)

🏆 What's objectively better after the upgrade

Dimension Before After Change
Cross-session memory ❌ None ✅ 6 categories
Host agent Viking access ❌ Only via execute_task ✅ 5 direct MCP tools Universal
Mid-iteration memory query ❌ None retrieve_memory tool New capability
Negative feedback loop ❌ Failures discarded ✅ Anti-pattern memories Learns from failure
User satisfaction signal ❌ No channel provide_feedback() API New API
PII / secret protection ❌ No scrubbing ✅ Regex scrubber, default ON Privacy by default
Retrieval quality control ❌ No threshold OPENVIKING_MIN_SCORE Quality gate
Staleness detection ❌ No mechanism ✅ Env fingerprints + report_stale_memory Versioned memories
Multi-tenant isolation ❌ Single pool ✅ Namespace + user_id prefix Multi-team safe
Observability ❌ Log parsing result["viking"] dict + logs Measurable
Test coverage ⚪ 0 tests ✅ 63 tests, all passing Verified

🧭 How the upgrade happened — six rounds of engineering

Round Focus Delta
1 Initial integration (client, enrichment, hooks) 20 tests, basic MVP
2 Audit + 10 fixes (token leak, selector hints, history-aware query, analyzer share, rich feedback, skill push, health cache, API verification, namespace, quality filter) All 10 findings addressed
3 Unresolved questions (identity fallback, opt-out env var, observability stats, extraction SLA) VikingExecutionStats + provide_feedback plumbing
4 Bilingual docs (EN + VI) — philosophy, architecture, token economics, configuration, operations 10 files, ~4,000 lines
5 Trilingual parity (+CN) 15 files total, full cross-links
6 The big upgrade — 5 access paths, PII scrubber, anti-patterns, MCP host tools, mid-iteration tool, quality threshold 63 tests, all 5 gaps closed

Total investment: 6 rounds of iterative development with audit → fix → document → verify cycles.

Total output:

  • ~3,500 lines of production code
  • 15 documentation files across 3 languages (6,000+ lines of docs)
  • 63 passing tests covering every new code path
  • Zero regression risk — every Viking call is best-effort with graceful degradation

🎬 Getting started — 3 commands to upgrade

# 1. Run OpenViking locally (or deploy on team LAN)
docker run -d -p 1933:1933 volcengine/openviking:latest

# 2. Enable the integration
export OPENVIKING_ENABLED=true

# 3. Run a task — first one primes the memory base
openspace --query "your task"

# Second similar task — now benefits from cross-session memory
openspace --query "similar task"

That's it. No migration. No data model changes. No host agent modifications. The integration is drop-in and reversible via a single env var.


❓ FAQ

Q: What happens if OpenViking is offline? A: OpenSpace runs exactly as before the upgrade. Every Viking call is wrapped in timeout + try/except. Zero regression risk.

Q: How much does it cost to run OpenViking? A: Viking is a single HTTP server. Docker container footprint is small. Most teams run it on the same box as their dev environment or a tiny LAN VM. No SaaS fees.

Q: Will this leak my secrets to a shared instance? A: No — the PII scrubber is default ON. API keys, tokens, emails, credit cards, and private keys are redacted before leaving your machine. You can disable with OPENVIKING_SCRUB_PII=false only when you trust the Viking endpoint completely.

Q: What if I don't want my evolved skills shared with the team? A: Set OPENVIKING_PUSH_SKILLS=false. Session feedback still commits (useful for memory extraction), but evolved SKILL.md content stays local.

Q: Do host agents need to be modified? A: No. The 5 MCP tools are registered automatically on OpenSpace's MCP server. Any host agent that supports MCP (Claude Code, Codex, OpenClaw, nanobot, Cursor, etc.) sees them in its tool list without any code changes.

Q: Can I measure the actual impact? A: Yes — every execute() returns a viking dict with telemetry (hit counts, enrichment chars, feedback status, pushed skills). Log-based measurement with grep "Viking telemetry" also works.

Q: What's the minimum scale where this pays off? A: Solo devs see warm-cache benefits from task #2. Teams see cross-agent benefits from day 2 onwards. Enterprise deployments see compounding network effects as team size grows.

Q: Is this production-ready? A: 63 tests passing, every failure mode verified for graceful degradation, 3-language documentation, 6 rounds of iterative hardening. Yes.


📚 Full documentation

Topic English Tiếng Việt 中文
Philosophy + architecture docs/openviking/README.md README_VI.md README_CN.md
Technical architecture architecture.md architecture_VI.md architecture_CN.md
Token economics token-economics.md token-economics_VI.md token-economics_CN.md
Configuration reference configuration.md configuration_VI.md configuration_CN.md
Operations + troubleshooting operations.md operations_VI.md operations_CN.md

💬 One-line pitches for different audiences

  • For CTO: "Cut your agent token bill by 55–65% with a drop-in privacy-first upgrade that takes 3 commands to deploy."
  • For engineers: "Your agents stop asking you the same questions. They remember what worked, what didn't, and what you prefer — across sessions, across machines."
  • For security: "PII scrubber default ON. Multi-tenant namespace isolation. No SaaS. No data egress. Every write is traceable via telemetry."
  • For product: "Users don't notice the upgrade — they just notice the agent seems to 'get them' better and responds faster."
  • For ops: "One docker container. Graceful degradation on every path. 63 tests. Rollback is a single env var."

🎨 Visual summary — the before/after story

┌─────────────────────────────────────────────────────────────┐
│                        BEFORE                               │
│                                                             │
│   Task 1 ──▶ Agent ──▶ Solves it ──▶ forgets everything     │
│                                                             │
│   Task 2 ──▶ Agent (cold) ──▶ Solves it AGAIN               │
│                                                             │
│   Agent A ─ can't see ─ Agent B's learnings                 │
│                                                             │
│   User says "prefer bar charts" → forgotten after restart   │
│                                                             │
│   Host agent chat → no access to memory                     │
│                                                             │
│   Secrets → may leak into shared memory                     │
└─────────────────────────────────────────────────────────────┘

                             ⬇  Round 6 upgrade  ⬇

┌─────────────────────────────────────────────────────────────┐
│                         AFTER                               │
│                                                             │
│   Task 1 ──▶ Agent ──▶ Solves it ──▶ feedback to Viking     │
│                                           │                 │
│                                           ▼                 │
│                              Memory extracted (8 categories)│
│                                           │                 │
│   Task 2 ──▶ Agent ──▶ enrichment reads memory ──▶ faster  │
│                                                             │
│   Agent A's Monday win ──▶ Agent B's Tuesday starting point │
│                                                             │
│   "prefer bar charts" → persists forever, per-user isolated │
│                                                             │
│   Host agent chat → 5 MCP tools, direct access              │
│                                                             │
│   Secrets → scrubbed before leaving process (default ON)   │
│                                                             │
│   Failures → anti-pattern memories → "AVOID" warnings       │
│                                                             │
│   Every call → telemetry in result["viking"]                │
└─────────────────────────────────────────────────────────────┘

👤 About the author

Nguyen Ngoc Tuan Founder & CEO — Transform Group · Lark Platinum Partner

Builder of the OpenSpace × OpenViking integration. Six rounds of engineering over short, focused iterations. Every decision traceable. Every limitation documented. Every gap closed.

Connect: facebook.com/khongphaituan


🙏 Built on the shoulders of giants

  • OpenSpace — the self-evolving engine for AI agents (HKU-DS). Without it, there's nothing to enrich.
  • OpenViking — the context database for AI agents (Volcano Engine). Without it, there's no cross-session memory.

This integration is a compound layer, not a replacement. All of OpenSpace's existing token-saving and self-evolution capabilities remain exactly as they were. Viking simply closes the relearn gap between tasks.


🧠 Cross-session memory is here. Your agents finally remember.

🚀 Ready to upgrade? See Configuration or jump straight to Operations for the deployment checklist.