Language: English · Tiếng Việt · 中文
The end-to-end picture of how OpenSpace and OpenViking interact at runtime.
┌──────────────────────────────────────────────────────────────────────┐
│ Host Agent (OpenClaw, Claude Code, Codex, ...) │
│ │
│ Path A: execute_task(instruction) ─── delegated task │
│ Path B: openviking_* MCP tools ───── direct Viking access (NEW) │
└──────┬─────────────────────────────────────────┬──────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌──────────────────────────┐
│ OpenSpace (unchanged) │ │ mcp_tools.py (NEW) │
│ │ │ │
│ ┌──────────┐ ┌──────────────┐ │ │ 5 MCP tools exposed │
│ │ Skill │ │ Grounding │ ◀─┐ │ │ to the host agent: │
│ │ Registry │▶│ Agent │ │ │ │ retrieve_memory │
│ └──────────┘ └──────────────┘ │ │ │ remember │
│ ▲ ▲ │ │ │ forget_memory │
│ │ hints │ iter-1 ctx │ │ │ report_stale_memory │
│ │ │ + mid-iter │ │ │ memory_status │
│ │ │ retrieve │ │ └────────────┬─────────────┘
│ │ │ _memory │ │ │
│ ┌────┴────────────┴────────┐ │ │ │
│ │ Execution Analyzer │ │ │ │
│ └────────────┬─────────────┘ │ │ │
│ │ evolved skills │ │ │
│ ┌────────────▼─────────────┐ │ │ │
│ │ Skill Evolver │ │ │ │
│ └────────────┬─────────────┘ │ │ │
│ │ │ │ │
└───────────────┼──────────────────┼──┼────────────────┼─────────────┘
│ │ │ │
┌───────────────┴──────────────────┴──┴────────────────┴─────────────┐
│ openspace/viking/ │
│ │
│ ┌────────────────┐ ┌──────────────────┐ ┌────────────────────┐ │
│ │ OpenVikingClient│ │ VikingEnrichment │ │ RetrieveMemoryTool │ │
│ │ • HTTP + URIs │ │ • compose_query │ │ • LocalTool for │ │
│ │ • health cache │◀─│ • enrich_pre │ │ mid-iteration │ │
│ │ • score_thresh │ │ • feedback_evo │ │ dynamic lookup │ │
│ │ • delete res │ │ • feedback_neg │ │ │ │
│ │ • find_antipat │ │ • report_stale │ └────────────────────┘ │
│ └────────────────┘ │ • push_skills │ │
│ ┌────────────────┐ │ • env fingerprint│ ┌────────────────────┐ │
│ │ scrubber.py │─▶│ (applies scrub) │ │ config.py │ │
│ │ • regex PII │ └──────────────────┘ │ • identity resolve │ │
│ │ • Luhn CC check│ │ • push/scrub/score │ │
│ │ • private keys │ │ • stats │ │
│ └────────────────┘ └────────────────────┘ │
└──────────────────────┬───────────────────────────────────────────────┘
│ HTTP (5s timeout, graceful fail)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ OpenViking server (localhost:1933) │
│ │
│ /api/v1/search/find — L0 abstracts (with score_threshold) │
│ /api/v1/sessions — rich feedback sessions (positive + negative)│
│ /api/v1/skills — structured skill resource push │
│ /api/v1/resources — DELETE for forget_memory │
│ /health — liveness probe │
│ │
│ Background: memory extraction (8 categories incl. antipatterns) │
└──────────────────────────────────────────────────────────────────────┘
| File | Role |
|---|---|
openspace/viking/client.py |
OpenVikingClient — async HTTP, URI builders, two-tier health cache, skill push, find_antipatterns, score threshold, delete_resource |
openspace/viking/enrichment.py |
VikingEnrichment — pre-execution (6 parallel finds), analysis, rich feedback, negative feedback, skill push, stale reporting, env fingerprinting |
openspace/viking/config.py |
resolve_viking_identity, resolve_viking_push_enabled, resolve_viking_min_score, resolve_viking_scrub_pii, VikingExecutionStats (zero-dep) |
openspace/viking/scrubber.py |
[NEW] Regex-based PII / secret scrubber — API keys, JWTs, emails, phones, credit cards (Luhn), private keys |
openspace/viking/mcp_tools.py |
[NEW] 5 MCP tools exposed to host agents: retrieve_memory, remember, forget_memory, report_stale_memory, memory_status |
openspace/viking/retrieve_memory_tool.py |
[NEW] RetrieveMemoryTool — LocalTool exposing Viking query as a grounding-agent tool for mid-iteration use |
openspace/viking/__init__.py |
Public API exports |
openspace/tool_layer.py |
Config fields, client init, pre-execution enrichment, feedback wiring (positive + negative), telemetry, provide_feedback() public API |
openspace/agents/grounding_agent.py |
_viking_context attribute, iter-2 strip, _viking_client attribute, mid-iteration RetrieveMemoryTool registration |
openspace/skill_engine/registry.py |
select_skills_with_llm accepts cross_session_hints |
openspace/skill_engine/analyzer.py |
Injected viking_client, analysis prompt enrichment, telemetry attr |
openspace/mcp_server.py |
_maybe_register_viking_tools() auto-registers the 5 host MCP tools on startup |
User (new) → OpenSpace.execute("analyze sales.xlsx and make dashboard")
│
├─ Viking /health: 200 (or timeout → empty enrichment, flow continues)
├─ Viking /search/find × 5 in parallel (tool/pattern/skill/pref/case)
│ → all return [] (nothing learned yet)
├─ viking_context = "" → skip injection (no overhead)
├─ Skill-first Phase 1 (same as today)
├─ Execution... 12 iterations, 45 tool calls
├─ Analyzer runs → evolution_suggestions: [DERIVED(sales-dashboard-builder)]
├─ Evolver creates new skill locally
├─ Viking feedback session (rich: task + response + tool sequence + skill metadata)
│ → commit → Viking async extraction
│ • Extracts into viking://.../agent/memories/skills/
│ • Extracts user pref from task text into user/memories/preferences/
│ • Extracts tool knowledge (shell:xlsx_to_csv success)
├─ Viking skill resource push (if OPENVIKING_PUSH_SKILLS=true)
└─ Response → user, telemetry attached
Outcome: zero Viking benefit for this task. Baseline cost = same as today. Overhead: ~5 failed HTTP calls = few ms (timeout guarded). The memories created here become available to future tasks.
User → OpenSpace.execute("make weekly dashboard for this week's sales")
│
├─ compose_query = task + last 3 user turns (from conversation_history)
├─ Viking /search/find × 5 in parallel:
│ • find_user_preferences → "user prefers bar charts, saves XLSX not CSV"
│ • find_tool_knowledge → "shell:xlsx_to_csv ok for .xlsx, fails .xlsm"
│ • find_skill_knowledge → "sales-dashboard-builder solved similar in 8 iter"
│ • find_cases → "past task cleaned NaN rows before aggregation"
│
├─ Build two enrichment blocks:
│ 1. context_injection (~800 tokens, full L0 abstracts)
│ 2. selector_hints (~150 tokens, only user prefs + top skills/cases)
│
├─ _select_and_inject_skills(task, cross_session_hints=selector_hints)
│ → Selector LLM sees hints block → picks sales-dashboard-builder confidently
│
├─ grounding_agent._viking_context = context_injection
├─ Phase 1 executes → LLM already knows:
│ ✓ user wants bar chart → skips clarification
│ ✓ file likely .xlsx → skips .xlsm probing
│ ✓ NaN handling → skips 2 debugging iterations
│ → Completes in 6 iterations instead of 12
│
├─ iter 2+: viking_context AND skill_context stripped (grounding_agent.py:232-251)
├─ Analyzer runs → confident, no evolution suggested
└─ Response → user
Outcome: ~50% fewer iterations, ~25–40% fewer tokens on this task. See Token Economics for the math.
Agent A (Claude Code user, Monday 10am)
└─ OpenSpace → task fixes "chromedriver version mismatch"
└─ Viking extracts tool knowledge:
"chromedriver 124 needs Chrome 124+"
→ viking://tenants/acme/agent/memories/tools/chromedriver.md
Hours later...
Agent B (Codex user, different machine, Monday 2pm)
└─ OpenSpace → "scrape example.com"
└─ Viking find_tool_knowledge("chromedriver setup")
→ returns [{abstract: "chromedriver 124 needs Chrome 124+", score: 0.88}]
└─ Agent B skips the entire 3-iteration debugging loop
Key insight: Both agents connect to the same team's Viking server (same OPENVIKING_NAMESPACE). Agent A's session commit triggers automatic memory extraction; Agent B's subsequent find_tool_knowledge call retrieves it naturally. No manual upload step, no explicit "share" button.
This is OpenSpace's published "one agent learns, all agents benefit" slogan turned from an aspirational cloud-upload workflow into an automatic cross-agent feedback loop.
Task fails at iter 15 — analyzer kicks in
│
├─ analyzer._build_analysis_prompt(context) [async]
│ ├─ Build normal prompt (traj + skill content + conversations)
│ ├─ Check self._viking_client (shared from OpenSpace init — no new HTTP conn)
│ ├─ Extract tool_issues from traj_records with status=error
│ ├─ enricher.enrich_analysis_context(task, tool_issues):
│ │ • find_tool_knowledge(task) — general tool patterns
│ │ • find_cases(task) — past solutions
│ │ • find_tool_knowledge(tool_issues[0]) — specific known fixes
│ ├─ Append formatted context to resource_info
│ └─ Set self._last_viking_context_chars (telemetry)
│
├─ analysis LLM now knows "this error pattern → solution X was applied last week"
│ → emits evolution_type=FIX (targeted edit)
│ → FIX is cheaper than CAPTURED (new skill from scratch)
│
└─ Evolver processes suggestion
Why this matters: Analyzer is typically the most expensive LLM call in an OpenSpace task (30k+ input tokens from traj + conversations + skill content). A small enrichment (~500 additional tokens) can convert a speculative new-skill suggestion into a targeted 2-line fix, saving an entire subsequent evolution cycle.
OpenSpace.execute(task, context={conversation_history, ...})
│
├── [if viking_client && is_available()]
│ min_score = resolve_viking_min_score(config.openviking_min_score)
│ VikingEnrichment.enrich_pre_execution(
│ task, conversation_history, score_threshold=min_score
│ )
│ │
│ ├── compose_query(task, history)
│ │ • Extract last 3 USER turns (text parts only for multimodal)
│ │ • Join with "Prior user turns: ... || ..."
│ │ • Truncate to 500 chars
│ │
│ ├── asyncio.gather( ── 6 PARALLEL FINDS (was 5) ──
│ │ find_tool_knowledge(query, limit=5, score_threshold=min),
│ │ find_pattern_knowledge(query, limit=5, score_threshold=min),
│ │ find_skill_knowledge(query, limit=5, score_threshold=min),
│ │ find_user_preferences(query, limit=3, score_threshold=min),
│ │ find_cases(query, limit=3, score_threshold=min),
│ │ find_antipatterns(query, limit=3, score_threshold=min), ← NEW
│ │ )
│ │ → 6 parallel HTTP POSTs, each 5s timeout
│ │ → each hits /api/v1/search/find with scoped target_uri
│ │ AND score_threshold in payload
│ │ → client-side safety net drops any hit below score_threshold
│ │
│ ├── Dedupe abstracts, build two output blocks:
│ │ 1. context_injection (markdown with 5 sections including
│ │ "## Known Failure Modes (AVOID)" for anti-patterns)
│ │ 2. selector_hints (shorter — prefs + top skills + top cases
│ │ + "AVOID (prior failures)" bullets)
│ │
│ └── return {context_injection, selector_hints, hit_counts,
│ antipattern_hints, query, ...}
│
├── _viking_stats.query / hit_counts / enrichment_chars populated
├── grounding_agent._viking_context = context_injection
│
└── _select_and_inject_skills(task, cross_session_hints=selector_hints)
└── registry.select_skills_with_llm(task, ..., cross_session_hints=...)
└── prompt includes "# Cross-Session Hints (non-authoritative)"
Latency budget: 6 parallel HTTP calls, 5s timeout each. Typical cached response: 50–200ms (parallelism means 6 finds cost ~same as 1). Worst case: timeout → [] → empty context → zero prompt overhead.
Token cost injected: 6 categories × ~3 abstracts × ~100 tokens ≈ 1,800 tokens, only for iter 1. Stripped from iter 2 onward by grounding_agent.py.
Quality filtering: when OPENVIKING_MIN_SCORE=0.5 is set (or openviking_min_score config field), low-confidence hits are dropped at both the server-side score_threshold parameter and a client-side safety net in find_memories(). This prevents wrong memories from reaching the prompt.
OpenSpace.execute() → finally: _maybe_analyze_execution(task_id, ...)
│
├── analyzer.analyze_execution() → analysis with evolution_suggestions
│ └── _build_analysis_prompt() [ASYNC]
│ └── Uses self._viking_client (shared, NOT new connection)
│ └── enrich_analysis_context → resource_info block
│ └── self._last_viking_context_chars = N (telemetry)
│
├── _viking_stats.analysis_context_used = (chars > 0)
│
├── evolver.process_analysis() → evolved_records
│
├── scrub_enabled = resolve_viking_scrub_pii(config.openviking_scrub_pii)
│
├── [POSITIVE PATH — if evolved && viking_client]
│ │
│ ├── Read SKILL.md content from disk for each evolved record
│ ├── Extract tool_sequence from execution_result.tool_executions (first 30)
│ ├── Extract task_description from grounding_agent._current_instruction
│ ├── Extract final_response from execution_result.response
│ │
│ ├── VikingEnrichment.feedback_evolution(
│ │ task_id,
│ │ feedback_data, # [{name, origin, content, description, ...}, ...]
│ │ task_description=...,
│ │ final_response=...,
│ │ tool_sequence=...,
│ │ scrub_pii=scrub_enabled, ← NEW: applies scrubber.py
│ │ )
│ │ │
│ │ ├── clean = _make_scrub(scrub_enabled) — regex PII redactor
│ │ ├── create_session("openspace-evo-<task_id>")
│ │ ├── add_session_message("user", clean(task_description))
│ │ ├── add_session_message("assistant", clean(f"Final response:..."))
│ │ ├── add_session_message("assistant", clean("Tool sequence: ..."))
│ │ ├── add_session_message("assistant",
│ │ │ f"Environment: {_env_fingerprint()}") ← NEW
│ │ ├── add_session_message("assistant", clean(f"Skill 'X' evolved...")) # × N
│ │ └── commit_session(...)
│ │ → Viking schedules background memory extraction
│ │
│ ├── _viking_stats.feedback_status = "committed"
│ │
│ └── [if resolve_viking_push_enabled()]
│ push_evolved_skills(feedback_data, scrub_pii=scrub_enabled)
│ → each skill content passed through scrubber before POST /api/v1/skills
│ → metadata includes env fingerprint
│ _viking_stats.pushed_skills = N
│
└── [NEGATIVE PATH — NEW — if status in (error, incomplete) && viking_client]
│
├── failure_reason = execution_result.error or response
├── tool_sequence = execution_result.tool_executions[:30]
│
└── VikingEnrichment.feedback_negative(
task_id,
task_description=...,
failure_reason=...,
tool_sequence=...,
scrub_pii=scrub_enabled,
)
│
├── create_session("openspace-neg-<task_id>")
├── add_session_message("assistant",
│ "POLARITY: negative — this is a FAILED execution record...")
├── add_session_message("user", clean(task_description))
├── add_session_message("assistant", clean(f"Failure reason: ..."))
├── add_session_message("assistant", clean(f"Tool sequence: ..."))
├── add_session_message("assistant",
│ f"Environment: {_env_fingerprint()}")
└── commit_session(...)
→ Viking extracts into viking://.../agent/memories/antipatterns/
→ Future tasks see this as an "AVOID" warning
Privacy defense: When OPENVIKING_SCRUB_PII=true (default), every user-authored string passes through the regex scrubber before reaching Viking. API keys, JWTs, emails, phones, Luhn-valid credit cards, and private key blocks are replaced with [REDACTED_*] placeholders. Idempotent — scrubbing already-scrubbed text is a no-op.
OpenSpace-side cost: 4–6 HTTP calls per evolved skill (session create + 3 messages + commit + optional resource push). Zero impact on user-perceived latency — happens in the finally block after execute() already has a return value.
OpenViking stores each context in three tiers:
| Tier | Size | When fetched |
|---|---|---|
| L0 abstract | ~100 tokens | Returned by /api/v1/search/find (default) |
| L1 overview | ~1–2k tokens | On-demand via /api/v1/resources/overview |
| L2 detail | Unlimited | On-demand via /api/v1/resources/read |
Our current integration only fetches L0. 28 abstracts (6 categories × ~5 limit) × 100 tokens = ~2.8k tokens worst-case; after dedup and truncation usually 800–1,800 tokens. This is intentionally conservative. A future enhancement could fetch L1 for the top-1 highest-scoring case when confidence exceeds a threshold, but this is not wired in today.
grounding_agent.process() → iteration loop
│
│ iter 1: gets pre-execution enrichment (D1 path)
│ iter 2: skill_context AND viking_context stripped
│ iter 3+: LLM may discover it needs different knowledge
│
▼
LLM emits tool call → retrieve_memory(query="error X", category="antipatterns")
│
▼
RetrieveMemoryTool._arun(query, category, limit=5)
│
├── is_available() check (cached)
├── Normalize category → target_uri via client.agent_memory_uri(...)
│ or client.user_memory_uri(...)
├── find_memories(query, target_uri, limit, score_threshold)
│ → 1 HTTP POST to /api/v1/search/find
│
└── Format result as plain-text block:
"# Retrieved from OpenViking — query: 'error X'
# Category: antipatterns, 3 result(s)
## WARNING: these are FAILED approaches...
1. [score=0.82] chromedriver 124 fails on macOS <14
uri: viking://.../antipatterns/...
2. ..."
│
▼
LLM receives the tool result → decides next action
→ may avoid the failed approach
→ may call retrieve_memory again with different category
→ may proceed with alternative strategy
Why this matters: pre-execution enrichment only sees the initial query. If the agent discovers mid-execution that the real problem is different from what the query suggested, D4 lets it pull fresh knowledge without waiting for the next task. The tool is registered alongside retrieve_skill in grounding_agent.py when openviking_mid_iter_tool=True and _viking_client is set.
Cost: 1 HTTP call per invocation. Agent only calls when genuinely stuck — typically 0–2 times per task. Overhead is negligible compared to the alternative (exploration loop that might take 3+ iterations).
Host agent's chat LLM (e.g. OpenClaw Claude or Codex)
│
│ User types: "What were my preferences for dashboard colors again?"
│
▼
Host LLM sees MCP tool list → picks openviking_retrieve_memory
│
▼
MCP server routes call to tool_retrieve_memory(get_client, query, category, limit)
│
├── get_client() → _get_viking_client()
│ → lazy-init OpenSpace → return os._viking_client
│
├── Validate: non-empty query, is_available() True
├── _resolve_target_uri(client, category="preferences")
│ → viking://tenants/<ns>/user/<uid>/memories/preferences/
│
├── client.find_memories(query, target_uri, limit)
│
└── Return JSON to MCP client:
{
"status": "ok",
"data": {
"query": "...",
"category": "preferences",
"target_uri": "viking://...",
"count": 2,
"results": [
{"uri": "...", "abstract": "prefers teal accent, white background",
"score": 0.91, "category": "preferences"},
...
]
}
}
│
▼
Host LLM receives the result → uses it to personalize the chat reply
→ NO OpenSpace execute_task() call required
→ User gets a personalized response immediately
This is the biggest architectural improvement of Round 6. The previous design required a full execute_task() round-trip through OpenSpace for the host to benefit from Viking memories. Now the host can retrieve (and write, and forget, and report stale) memories directly — making cross-session personalization universally available to every host agent that supports MCP.
The 5 tools registered by register_viking_mcp_tools(mcp, _get_viking_client):
| Tool | Purpose | Typical caller |
|---|---|---|
openviking_retrieve_memory(query, category?, limit?) |
Fetch L0 abstracts | Host LLM personalizing replies |
openviking_remember(content, category, polarity?) |
Write new memory | Host LLM capturing explicit user statements |
openviking_forget_memory(uri, reason?) |
Delete / deprecate | Host LLM responding to "forget this" |
openviking_report_stale_memory(uri, reason) |
Flag stale (softer than forget) | Host LLM noticing outdated memory |
openviking_memory_status() |
Health + config introspection | Host LLM surfacing integration state in UI |
All tools return structured JSON ({"status": "ok"|"error", ...}) so the host LLM can handle Viking unavailability gracefully.
The grounding agent iteration loop enforces strict context hygiene to keep per-iteration token cost constant:
# openspace/agents/grounding_agent.py:226-251
while current_iteration < max_iterations:
current_iteration += 1
# Strip skill context AND viking context at iteration 2.
# Both are planning hints — useful for the first LLM call that
# builds the execution plan, then pure noise once tool results
# start accumulating in the message history.
if current_iteration == 2:
drop_contents = set()
if self._skill_context:
drop_contents.add(self._skill_context)
viking_ctx = getattr(self, "_viking_context", "")
if viking_ctx and viking_ctx.strip():
drop_contents.add(viking_ctx)
if drop_contents:
messages = [
m for m in messages
if not (m.get("role") == "system" and m.get("content") in drop_contents)
]
# Per-iteration caps and truncation also apply
if current_iteration >= 2:
messages = cap_message_content(messages)
if current_iteration >= 5:
messages = truncate_messages(messages, keep_recent=8, max_tokens_estimate=120000)Without the iter-2 strip, every iteration would re-send the 1,500-token Viking block to the LLM. A 12-iteration task would waste ~18k tokens on stale hints. The strip is the most important token-saving mechanism in the Viking integration.
When OPENVIKING_NAMESPACE=acme and OPENVIKING_USER_ID=alice, the client builds:
Agent-shared (team-wide):
viking://tenants/acme/agent/memories/tools/
viking://tenants/acme/agent/memories/patterns/
viking://tenants/acme/agent/memories/skills/
viking://tenants/acme/agent/memories/cases/
User-isolated (per-user within team):
viking://tenants/acme/user/alice/memories/preferences/
Single-tenant dev machines (no namespace, no user_id set, fallback $USER=jimmy):
Agent-shared (global):
viking://agent/memories/tools/
User-isolated (global user folder):
viking://user/memories/preferences/
See Configuration for the complete resolution chain and deployment patterns.
OpenSpace.execute(task)
│
├─[viking_enabled=False]────────────────────▶ No Viking code runs, ever.
│
├─[httpx not installed]─────────────────────▶ _get_client() returns None,
│ all methods return {}/[]
│
├─[server unreachable]──────────────────────▶ 5s timeout → exception →
│ caught → returns {}/[]
│
├─[server returns 4xx/5xx]──────────────────▶ Status check → None →
│ returns {}/[]
│
├─[server returns malformed JSON]───────────▶ json.loads exception →
│ returns {}
│
├─[find returns unexpected shape]───────────▶ Parser tries multiple
│ envelopes → returns []
│
├─[commit fails after messages added]───────▶ Exception caught,
│ logger.warning, no raise
│
└─[skill push fails for one record]─────────▶ Per-record try/except,
pushed counter reflects
only successes
Every path leads to graceful degradation. The user-visible contract is: if Viking is broken, OpenSpace runs as if Viking did not exist.
Next: Token Economics quantifies the actual savings with realistic baselines.