Skip to content

Latest commit

 

History

History
174 lines (113 loc) · 11.6 KB

File metadata and controls

174 lines (113 loc) · 11.6 KB

OpenSpace × OpenViking Integration

Cross-session intelligence for self-evolving AI agents. Every task makes every agent smarter — and now that learning persists across sessions, users, and machines.

Language: English · Tiếng Việt · 中文


Table of Contents

  1. Philosophy — Why we built this (this page)
  2. Architecture — User flows, data flows, component breakdown
  3. Token Economics — How much does it actually save?
  4. Configuration — Env vars, identity resolution, multi-tenancy
  5. Operations — Business rules, deployment, testing, troubleshooting

Philosophy

The problem: smart per-task, dumb across tasks

OpenSpace already makes agents smart within a single task. Skills guide execution, evolution repairs broken workflows, quality monitoring filters out degraded tools. On published benchmarks this delivers 4.2× better performance with 46% fewer tokens.

But between tasks, OpenSpace goes back to cold start. The agent that just spent 8 iterations discovering "chromedriver 124 needs Chrome 124+" has no memory of that lesson when the next task starts. The user who explicitly said "I prefer bar charts" last Tuesday has to say it again on Wednesday. One agent's hard-won tool knowledge stays trapped inside that agent's local database unless someone manually uploads it to the cloud.

We call this the relearn gap — the gap between "my skill DB got better" and "the next similar task is measurably faster."

The insight: OpenViking already solved the hard part

OpenViking is a Context Database for AI Agents built by the Volcano Engine team. It treats agent context — memories, resources, skills — as a filesystem with L0/L1/L2 tiered loading. Memory extraction runs automatically when a session is committed. Retrieval is semantic with directory-prefix filters. Everything queryable over a small HTTP API at localhost:1933.

What OpenViking gives us out of the box:

  • Automatic extraction — drop session messages, get structured memories (tool knowledge, user preferences, patterns, past cases) on the other end. No manual curation.
  • L0 abstracts — ~100 tokens per retrieval result. Cheap to inject, cheap to deduplicate.
  • Directory semanticsviking://tenants/acme/agent/memories/tools/ is a real path. Multi-tenant isolation is trivial prefix matching.
  • Local-first — runs as a single HTTP server on your box or team LAN. No SaaS lock-in, no data egress.

OpenSpace needed cross-session memory. OpenViking is exactly that. The integration writes itself.

The design principle: compound, don't replace

We did not rebuild any of OpenSpace's existing token-saving machinery. The two-phase skill-first execution, progressive disclosure skill selection, BM25 + embedding pre-filter, iter-2 skill context stripping, tool auto-search, message truncation — all of it stays exactly as-is.

OpenViking is a complementary layer:

OpenSpace (unchanged) OpenViking (added)
Within-task intelligence Cross-task intelligence
Skills as executable procedures Memories as experiential knowledge
Evolution creates new skills Extraction creates new memories
Quality metrics filter bad skills Past cases surface good solutions
Workspace-local execution Namespace-scoped across team

Every enrichment call is best-effort. If OpenViking is down, OpenSpace runs exactly as it did before the integration. Zero regression risk, strict cost ceiling.

The five access paths

Viking knowledge flows into OpenSpace through five distinct paths, each chosen because it's where a cross-session hint creates the most leverage at that stage:

  1. Skill selector prompt (before skill selection) Leverage: User preferences and past similar cases help the selector pick skills already known to succeed for this user's style — avoiding Phase 2 fallback entirely in many cases.

  2. Grounding agent system prompt (iter 1 only, then stripped) Leverage: Known tool pitfalls, pattern hints, and anti-patterns (known failure modes) short-circuit 2–4 exploration iterations. Stripped at iter 2 to save prompt tokens just like skill context.

  3. Analyzer prompt (post-execution analysis) Leverage: Cross-session tool issue history converges evolution decisions faster, turning speculative CAPTURED suggestions into targeted FIX suggestions that edit existing skills instead of authoring new ones.

  4. Mid-iteration retrieve_memory tool (any iteration, on-demand) Leverage: When the grounding agent discovers mid-execution that it needs different knowledge — a tool failed unexpectedly, an approach isn't working — it calls retrieve_memory(query, category) like any other tool. Categories include antipatterns for "has this failure been seen before?"

  5. Direct MCP tools for host agents (any time, outside OpenSpace execution) Leverage: Host agents (OpenClaw, Claude Code, Codex, nanobot, …) get five MCP tools — openviking_retrieve_memory, openviking_remember, openviking_forget_memory, openviking_report_stale_memory, openviking_memory_status — so the host's chat surface can access cross-session memories without ever delegating a task to OpenSpace's execution engine.

Five access points, five measurable wins — and paths 4 and 5 close the two biggest gaps of the original three-point design.

The bidirectional feedback loop

When OpenSpace's evolver produces an evolved skill, we write it back to OpenViking via two channels:

  1. Rich session — task description, final response, tool sequence, per-skill evolution metadata, environment fingerprint (OS/Python/tool versions for future staleness detection). OpenViking's extraction pipeline turns this into durable memories automatically.

  2. Structured skill resource — the evolved SKILL.md content pushed to /api/v1/skills so Viking indexes it as a first-class queryable resource. Another OpenSpace instance somewhere else on the team LAN can now retrieve this skill via find_skill_knowledge().

No manual cloud upload. No explicit "share this skill" button. The act of solving a task automatically contributes back to the collective intelligence.

Negative feedback matters too. Failures are not discarded — they become anti-pattern memories routed to viking://agent/memories/antipatterns/. When execution status is error or incomplete, OpenSpace automatically records the failure reason, the tool sequence that failed, and the environment context. Future tasks querying Viking see these as "AVOID" warnings in their enrichment block. Host agents and direct users can also provide explicit feedback via OpenSpace.provide_feedback(task_id, "negative", comment) or the openviking_remember MCP tool.

The result is a learning loop that captures both what worked and what didn't — same architecture as OpenSpace's own skill evolution, now extended across sessions via Viking.

The privacy stance

Shared intelligence is powerful — and a privacy hazard if done carelessly. We drew the isolation boundary deliberately and enforce it with multiple defense layers:

  • Agent memories (tools, patterns, skills, cases, antipatterns) are shared across the team by convention. Tool knowledge is the whole point of collective intelligence; duplicating it per user defeats the purpose.
  • User memories (preferences, profile) are isolated per user via the user_id URI segment. "Alice prefers bar charts" must never leak into Bob's sessions.
  • PII / secret scrubbing happens before any content leaves the process. A regex-based scrubber (openspace/viking/scrubber.py) redacts API keys (Anthropic, OpenAI, GitHub, AWS, GCP, Slack, OpenRouter), JWTs, private keys, basic-auth URLs, emails, phone numbers, credit cards (Luhn-validated), SSNs, and IP addresses. Default ON. Opt out via OPENVIKING_SCRUB_PII=false only when you fully trust the Viking endpoint.
  • Skill content push is opt-out via env var. Teams working on privacy-sensitive client data set OPENVIKING_PUSH_SKILLS=false and the session feedback still commits but no skill bodies leave the box.
  • Namespace is never auto-derived. Accidental cross-team memory mixing is too dangerous to solve with heuristics. Teams set OPENVIKING_NAMESPACE=<team> explicitly or they get single-tenant behavior.
  • Quality threshold (OPENVIKING_MIN_SCORE) lets teams reject low-confidence retrievals so wrong memories don't reach the prompt.

See Configuration and Operations for the full privacy model.

The observability promise

If we can't measure it, we can't trust it. Every OpenSpace.execute() call returns a viking key in its result dict:

{
    "viking": {
        "enabled": True,
        "available": True,
        "query": "build dashboard\nPrior user turns: prefer bar charts",
        "enrichment_chars": 1243,
        "hit_counts": {"tool_hints": 3, "user_preferences": 1, ...},
        "feedback_status": "committed",
        "pushed_skills": 1,
    }
}

Plus a single-line log summary per task:

Viking telemetry: available=True hits=9 enrich_chars=1243 feedback=committed pushed=1

Benchmark pipelines, MCP host agents, and team dashboards can measure real-world impact without parsing execution logs.

See Token Economics for quantified savings analysis.


What you'll find in the other pages

Architecture — The end-to-end picture. Four user flows (cold start, warm cache, cross-agent propagation, analysis enrichment). Concrete data flow sequences with latency budgets. Component breakdown with file-level entry points.

Token Economics — Baseline assumptions, per-mechanism savings breakdown, realistic scenarios (cold / warm / cross-agent), how this compounds with OpenSpace's existing 46% reduction. Failure mode cost analysis.

Configuration — Every env var documented with precedence rules. OpenSpaceConfig fields. Identity resolution fallback chain. Multi-tenant deployment patterns. Privacy toggles.

Operations — Business rule decisions (deployment model, retention, ownership). Test matrix (30 tests, all passing). Troubleshooting playbook. Rollout checklist. What to do when Viking goes down.


Status

  • Audit findings addressed: 10/10
  • Unresolved questions resolved: 4/4
  • Business rules decided: 5/5
  • Post-audit architectural gaps closed: 5/5 (host MCP access, mid-iter retrieval, negative feedback, staleness signals, quality threshold)
  • Privacy layer: regex-based PII/secret scrubber, default ON
  • Access paths: 5 (selector, grounding, analyzer, mid-iter tool, host MCP tools)
  • MCP tools for host agents: 5 (retrieve_memory, remember, forget_memory, report_stale_memory, memory_status)
  • Tests: 63 passing (client, scrubber, MCP tools, negative feedback, mid-iter tool, identity, stats)
  • Production readiness: graceful degradation verified for every failure mode

Author & Credits

Nguyen Ngoc Tuan Founder & CEO — Transform Group Lark Platinum Partner facebook.com/khongphaituan

Built on top of:

  • OpenSpace — Self-evolving engine for AI agents by HKU-DS
  • OpenViking — Context Database for AI Agents by Volcano Engine

The integration is MIT-licensed alongside OpenSpace. Contributions and feedback welcome.