diff --git a/CLI.md b/CLI.md index 6ac629f84..3d02c5bee 100644 --- a/CLI.md +++ b/CLI.md @@ -164,7 +164,7 @@ Use `prompt_toolkit` for: | `/compression-model ` | Switch compression model | | `/exit` | Exit the REPL | | `/help` | Show available commands and brief descriptions | -| `/load-instructions` | Load agent instruction files into next prompt | +| `/instructions-load` | Load agent instruction files into next prompt | | `/plan [on\|off\|N]` | Toggle or set planning interval (default: 22) | | `/pwd` | Show current working directory | | `/repeat ` | Run the same prompt N times, each on a fresh agent with current context | diff --git a/README.md b/README.md index dba6fc890..7f0f7a30b 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ limitations under the License. * 💻 **Interactive CLI ([`bpsa`](#cli-bpsa)):** Multi-turn REPL with slash commands, command history, tab completion, session stats, and auto-approve mode. * 🔄 **Infinite runtime CLI ([`ad-infinitum`](#cli-ad-infinitum)):** Allows agents to **run ad infinitum** via autonomous looping. -* 🗜️ **Context compression**: Automatic LLM-based summarization of older memory steps to manage context window size during long-running tasks. +* 🗜️ **Context compression**: Biologically inspired [automatic LLM-based summarization](docs/compression.md) of older memory steps to manage context window size during long-running tasks. * 🌐 **Browser integration:** Control a headed Chromium browser from agent code blocks via Playwright (`--browser` flag). * 🖥️ **GUI interaction:** Launch, screenshot, click, type, and send keys to native GUI applications on X11 via xdotool/ImageMagick (`--gui-x11` flag). * 👁️ **Image loading:** Agents can load and visually inspect image files (plots, screenshots, diagrams) via the built-in `load_image` tool — always available, no flags needed. diff --git a/docs/compression-plan.md b/docs/compression-plan.md deleted file mode 100644 index 32eba32da..000000000 --- a/docs/compression-plan.md +++ /dev/null @@ -1,97 +0,0 @@ -# Context Compression Implementation Plan - -## Overview -Add a hybrid rolling summarization system to smolagents that compresses older memory steps via LLM summarization while keeping recent steps in full detail. - -## Files to Create/Modify - -### 1. CREATE: `src/smolagents/bp_compression.py` -New module containing all compression logic: - -```python -@dataclass -class CompressionConfig: - enabled: bool = True - keep_recent_steps: int = 5 # Recent steps to keep in full - max_uncompressed_steps: int = 10 # Compress when exceeds this - estimated_token_threshold: int = 0 # Token-based trigger (0=disabled) - compression_model: Model | None = None # Cheaper model for compression - preserve_error_steps: bool = True - preserve_final_answer_steps: bool = True - max_compressed_steps: int = 0 # Merge compressed summaries (0=disabled) - -@dataclass -class CompressedHistoryStep(MemoryStep): - summary: str - compressed_step_numbers: list[int] - original_step_count: int - # Implements to_messages() returning summary as USER message - -class ContextCompressor: - def should_compress(steps) -> bool - def compress(steps) -> list[MemoryStep] # Returns new list with compressed history -``` - -Key functions: -- `estimate_tokens(text)` - Character-based heuristic (~4 chars/token) -- `should_preserve_step(step, config)` - Check if step must be kept -- `create_compression_prompt(steps)` - Build LLM prompt for summarization -- `create_compression_callback(compressor)` - Callback for automatic triggering - -### 2. MODIFY: `src/smolagents/agents.py` -Add to `MultiStepAgent.__init__` (around line 360): -- New parameter: `compression_config: CompressionConfig | None = None` -- New method: `_setup_compression()` that registers compression callback - -Integration via existing callback system (lines 425-443) - no changes to core agent loop. - -### 3. MODIFY: `src/smolagents/__init__.py` -Add exports: -```python -from .bp_compression import CompressionConfig, CompressedHistoryStep, ContextCompressor -``` - -### 4. CREATE: `tests/test_compression.py` -Tests for: -- `CompressedHistoryStep.to_messages()` and `dict()` serialization -- Token estimation functions -- `should_preserve_step()` logic -- `ContextCompressor.should_compress()` threshold behavior -- Integration test with mock model - -## Implementation Sequence - -1. Create `bp_compression.py` with all classes and functions -2. Modify `MultiStepAgent.__init__` to accept `compression_config` -3. Add `_setup_compression()` method to register callback -4. Update `__init__.py` exports -5. Create test file -6. Run tests to verify - -## Usage Example -```python -from smolagents import CodeAgent, CompressionConfig, LiteLLMModel - -config = CompressionConfig( - keep_recent_steps=5, - max_uncompressed_steps=8, - compression_model=LiteLLMModel(model_id="gpt-4o-mini"), # Cheap model -) - -agent = CodeAgent( - tools=[...], - model=main_model, - compression_config=config, -) -``` - -## Design Decisions -- **New file vs existing**: New `bp_compression.py` keeps related logic together, follows pattern of `monitoring.py` -- **Callback-based**: Uses existing callback system for clean integration without modifying agent loop -- **Token estimation**: Character heuristic (4 chars/token) since no proactive token counting exists -- **Graceful fallback**: If compression LLM call fails, keep original steps and log warning - -## Verification -1. Run existing tests: `pytest tests/test_memory.py tests/test_agents.py` -2. Run new tests: `pytest tests/test_compression.py` -3. Manual test: Create agent with compression enabled, run multi-step task, verify memory gets compressed diff --git a/docs/compression.md b/docs/compression.md new file mode 100644 index 000000000..bd87c5c4b --- /dev/null +++ b/docs/compression.md @@ -0,0 +1,301 @@ + + +# Context Compression & Knowledge Extraction + +## Overview +A hybrid rolling summarization system for BPSA that compresses older memory steps via LLM summarization while keeping recent steps in full detail. Knowledge is extracted incrementally during compression and further refined when compressed summaries accumulate. + +## Inspirations from Biology + +The two-phase compression pipeline was designed from first principles, yet it converges +strikingly closely on the **Standard Model of Memory Consolidation** — the dominant +neuroscientific theory of how biological brains move experiences from short-term storage +into long-term knowledge. The parallels are not superficial; they appear to reflect deep +structural constraints that any system managing finite working memory over unbounded +experience must eventually solve. This convergence is a hypothesis, not a proven fact — +but the hypothesis is a strong one: finite capacity + unbounded experience + the need for +generalisation are universal pressures, and similar pressures tend to produce similar +solutions regardless of substrate. + +### The Deepest Parallel + +The entire two-phase design mirrors the **Standard Model of Memory Consolidation**: + +``` +Experience → Hippocampus (short-lived, detailed) + ↓ (sleep / Phase 1) + Compressed replay → early neocortex + ↓ (later consolidation / Phase 2) + Abstract semantic knowledge → late neocortex + ↓ + Hippocampus no longer needed for retrieval +``` + +Replace hippocampus with "action steps", early neocortex with "CompressedHistoryStep", +late neocortex with "knowledge store" — and you have BPSA's compression pipeline almost +exactly. + +> *Note: the Standard Model's claim that the hippocampus becomes unnecessary for retrieval +> is contested by Multiple Trace Theory (Nadel & Moscovitch, 1997), which argues the +> hippocampus remains involved in detailed episodic retrieval indefinitely. BPSA's +> architecture — which does eventually discard original steps — maps onto the Standard +> Model regardless of which biological theory proves correct.* + +--- + +### 1. Working Memory vs. Long-Term Memory +* **BPSA:** Recent steps are kept in **full detail** (`keep_recent_steps`). Older steps are compressed into summaries. +* **Human mind:** The **prefrontal cortex** holds a small working memory buffer (~7±2 items, Miller 1956) in full resolution. Older experiences are consolidated over time: the hippocampus holds the initial detailed trace and orchestrates its gradual transfer to the neocortex, where a compressed, generalised form eventually lives independently. + +> *"Keep 40 recent steps in full" is literally what your brain does right now — you remember today in detail, last Tuesday as a blur.* + +--- + +### 2. Sleep Consolidation → Phase 1 + Phase 2 +* **BPSA:** Two-phase pipeline — Phase 1 compresses live steps + extracts knowledge; Phase 2 merges accumulated compressed steps into deeper knowledge. +* **Human mind:** Sleep has **two consolidation phases** — slow-wave sleep (SWS) replays episodic memories from hippocampus to neocortex (Phase 1 analog), and REM sleep is associated with abstracting and integrating those replays into semantic knowledge (Phase 2 analog). The analogy is functional: SWS and REM differ in their underlying neural mechanisms (sharp-wave ripples vs. theta oscillations) rather than being a clean "first pass / second pass" distinction, but the broad directionality — from detailed episodic replay toward abstract semantic integration — maps well onto BPSA's two phases. + +> *Phase 2 in BPSA ("merge_compressed when they accumulate") is functionally analogous to the later sleep stages that refine, consolidate, and eventually render raw episodic traces unnecessary for retrieval.* + +--- + +### 3. Episodic vs. Semantic Memory +* **BPSA:** `CompressedHistoryStep` = what happened (events, actions taken). `knowledge` store = what is currently true (facts, beliefs, current state). +* **Human mind:** **Episodic memory** = "I did X at time T." **Semantic memory** = "X is true." Neuroscience has identified these as distinct systems with different neural substrates. Old episodic memories gradually convert to semantic ones — exactly what Phase 2 does. + +> *"Compressed history = events/changes over time; knowledge = current beliefs/facts" — this is straight from cognitive psychology textbooks.* + +--- + +### 4. Schemas / Semantic Networks → Tagged XML Knowledge +* **BPSA:** Knowledge stored as tagged XML sections (``, ``, ``). Sections (tags) can be added, updated, or deleted via diff operations. +* **Human mind:** Cognitive psychologists call these **schemas** — organised clusters of knowledge with labels and relationships, updated incrementally as new information arrives. The `merge_context()` add/update/delete operations mirror how schemas are revised. + +--- + +### 5. Deliberate Belief Revision → Agent-Driven Knowledge Updates +* **BPSA:** The `update_knowledge` tool lets the *agent itself* explicitly revise its knowledge store at any point during live execution. +* **Human mind:** **Deliberate belief revision** — the conscious, intentional process of updating one's own knowledge when new evidence or reasoning warrants it. This is distinct from *metacognition* in the strict cognitive science sense (which additionally involves monitoring uncertainty and regulating reasoning strategies); what the agent does here is closer to deliberate note-taking and self-correction — updating a belief because a step's outcome has changed what is known to be true. + + +## Architecture + +### Two-Phase Compression Pipeline + +**Phase 1 — Step Compression + Knowledge Extraction:** Older action steps are summarized by the LLM into `CompressedHistoryStep` instances. The same LLM call also extracts knowledge updates, which are applied to the persistent knowledge store immediately. The LLM receives both the full compressed history (past events) and the full knowledge store (current facts) so it can avoid all duplication and propose corrections. Recent steps are kept in full detail. + +**Phase 2 — Knowledge Refinement:** When compressed steps accumulate beyond a threshold, older ones are merged into the knowledge store via a separate LLM call. The merged compressed steps are then removed entirely. This phase refines and consolidates knowledge that may have been partially captured in Phase 1. + +``` +Steps accumulate → Phase 1: compress older steps + ↓ + LLM produces + optional + ↓ ↓ + CompressedHistoryStep merge_context() → memory.knowledge + ↓ + (when too many compressed steps accumulate) + ↓ + Phase 2: extract knowledge from old compressed steps + ↓ + merge_context() → memory.knowledge + ↓ + Old compressed steps removed + ↓ + Knowledge injected into LLM context + as ... message +``` + +## Files + +### `src/smolagents/bp_compression.py` +All compression and knowledge logic: + +```python +@dataclass +class CompressionConfig: + enabled: bool = True + keep_recent_steps: int = 5 # Recent steps to keep in full + max_uncompressed_steps: int = 10 # Compress when exceeds this + estimated_token_threshold: int = 0 # Token-based trigger (0=disabled) + compression_model: Model | None = None # Separate model for compression (None=use main) + max_summary_tokens: int = 50000 # Max tokens for generated summary + preserve_error_steps: bool = False # Keep error steps uncompressed + preserve_final_answer_steps: bool = True # Keep final_answer steps uncompressed + max_compressed_steps: int = 32 # Merge compressed steps when exceeds this + keep_compressed_steps: int = 22 # Recent compressed steps to keep during merge + min_compression_chars: int = 4096 # Skip compression if content below this + +@dataclass +class CompressedHistoryStep(MemoryStep): + summary: str + compressed_step_numbers: list[int] + original_step_count: int + timing: Timing | None + compression_token_usage: TokenUsage | None + # to_messages() renders as [COMPRESSED HISTORY - N steps summarized] + +class ContextCompressor: + def should_compress(steps) -> bool + def compress(steps, knowledge) -> tuple[list[MemoryStep], str] + def should_merge_compressed(steps) -> bool + def merge_compressed(steps, knowledge) -> tuple[list[MemoryStep], str] +``` + +Key functions: +- `estimate_tokens(text)` — Character-based heuristic (~4 chars/token) +- `estimate_step_tokens(step)` — Token estimate for a memory step +- `should_preserve_step(step, config)` — Check if step must be kept +- `create_compression_prompt(steps, knowledge, existing_summaries)` — Build LLM prompt for step summarization with full context: existing compressed history (to avoid duplicating events) and knowledge store (current facts, updatable). Requests structured `` + optional `` output +- `parse_compression_output(raw_output)` — Parse structured LLM output into `(summary, knowledge_updates)` with graceful fallback for unstructured output +- `create_knowledge_extraction_prompt(steps, tag_names)` — Build LLM prompt for Phase 2 knowledge extraction +- `create_merge_prompt(steps)` — Build prompt for merging compressed steps +- `list_xml_tag_names(text)` — Extract XML tag names from a string +- `merge_context(existing, updates)` — Apply tagged XML diff (add/update/delete) +- `create_compression_callback(compressor)` — Callback for automatic triggering + +### `src/smolagents/agents.py` +Integration in `MultiStepAgent`: +- `__init__` accepts `compression_config: CompressionConfig | None = None` +- `_setup_compression()` registers the compression callback +- `write_memory_to_messages()` injects `memory.knowledge` as a `` message just before the last message in context +- System prompt log line shows Context and Knowledge char counts + +### `src/smolagents/memory.py` +- `AgentMemory.knowledge: str = ""` — Persistent knowledge store (tagged XML) +- Reset on `memory.reset()` + +### `src/smolagents/bp_tools.py` +- `UpdateKnowledge` tool — Allows the agent to explicitly update its knowledge store via `update_knowledge(updates='content')` + +### `src/smolagents/bp_cli.py` +- `print_turn_summary()` shows Context and Knowledge char counts +- `/compress` command handles tuple return from `compress()` +- Environment variable configuration (see below) + +### `tests/test_compression.py` +Tests for: +- `CompressedHistoryStep.to_messages()` and `dict()` serialization +- Token estimation functions +- `should_preserve_step()` logic +- `ContextCompressor.should_compress()` threshold behavior +- `ContextCompressor.compress()` — tuple return, knowledge extraction, fallback for unstructured output +- `parse_compression_output()` — structured output, summary-only, fallback, empty/None input +- `merge_context()` add/update/delete operations +- `list_xml_tag_names()` extraction +- Integration test with mock model + +## Knowledge Store + +The knowledge store (`memory.knowledge`) is a plain string of tagged XML: + +```xml +1. Setup done +2. Now implementing API +The database uses PostgreSQL 14 with pgvector extension +API endpoints implemented, testing in progress +``` + +**Three sources of updates:** +1. **Phase 1 (automatic):** `compress()` extracts `` from the same LLM call that produces the summary — knowledge starts accumulating from the very first compression cycle +2. **Phase 2 (automatic):** `merge_compressed()` extracts knowledge from old compressed summaries when they accumulate beyond the threshold — refines and consolidates +3. **Manual:** The `update_knowledge` tool lets the agent explicitly add/update/delete sections at any time + +**`merge_context(existing, updates)` applies three operations:** +- `content` where tag exists → **UPDATE** (replace content) +- `content` where tag is new → **APPEND** +- `` or `` (self-closing/empty) → **DELETE** + +**Injection:** Knowledge is inserted as a `...` USER message just before the last message in the LLM context, giving it high attention weight. + +### Phase 1 Knowledge Extraction + +During Phase 1 compression, the LLM receives: +- The full current knowledge store as `` context +- Instructions to output structured format: + +``` + +Concise summary of compressed steps... + + +new or updated content + + +``` + +The `parse_compression_output()` function handles parsing with graceful fallback: +- If `` tags present → extract summary and knowledge_updates separately +- If no `` tags → entire output becomes the summary (backwards compatible) +- If no `` → no knowledge changes applied + +This design means: +- **Zero extra LLM calls** — knowledge extraction piggybacks on the existing compression call +- **Higher fidelity** — Phase 1 has access to full original steps (not lossy summaries) +- **Immediate availability** — knowledge accumulates from the first compression, not after 32+ steps + +## BPSA CLI Configuration + +Environment variables (with defaults used by the CLI): + +| Variable | Default | Description | +|---|---|---| +| `BPSA_COMPRESSION_ENABLED` | `1` | Enable compression | +| `BPSA_COMPRESSION_KEEP_RECENT_STEPS` | `40` | Recent steps to keep uncompressed | +| `BPSA_COMPRESSION_MAX_UNCOMPRESSED_STEPS` | `50` | Trigger threshold for compression | +| `BPSA_COMPRESSION_KEEP_COMPRESSED_STEPS` | `80` | Compressed steps to keep on merge | +| `BPSA_COMPRESSION_MAX_COMPRESSED_STEPS` | `120` | Trigger threshold for merge | +| `BPSA_COMPRESSION_TOKEN_THRESHOLD` | `0` | Token-based trigger (0=disabled) | +| `BPSA_COMPRESSION_MODEL` | same as main | Model ID for compression | +| `BPSA_COMPRESSION_MAX_SUMMARY_TOKENS` | `50000` | Max tokens in summary | +| `BPSA_COMPRESSION_PRESERVE_ERROR_STEPS` | `0` | Keep error steps uncompressed | +| `BPSA_COMPRESSION_PRESERVE_FINAL_ANSWER_STEPS` | `1` | Keep final_answer steps | +| `BPSA_COMPRESSION_MIN_CHARS` | `4096` | Min chars before compressing | + +Note: The CLI defaults differ from `CompressionConfig` defaults to suit interactive use (more steps kept). + +## Usage Example + +### Programmatic +```python +from smolagents import CodeAgent, CompressionConfig, LiteLLMModel + +config = CompressionConfig( + keep_recent_steps=5, + max_uncompressed_steps=10, + compression_model=LiteLLMModel(model_id="gpt-4o-mini"), # Cheaper model + max_compressed_steps=32, + keep_compressed_steps=22, +) + +agent = CodeAgent( + tools=[...], + model=main_model, + compression_config=config, +) +``` + +### BPSA CLI +```bash +export BPSA_COMPRESSION_ENABLED=1 +export BPSA_COMPRESSION_KEEP_RECENT_STEPS=40 +export BPSA_COMPRESSION_MAX_UNCOMPRESSED_STEPS=50 +bpsa +``` + +## Design Decisions +- **New file vs existing:** `bp_compression.py` keeps all compression/knowledge logic together, follows pattern of `monitoring.py` +- **Callback-based:** Uses existing callback system for clean integration without modifying the agent loop +- **Token estimation:** Character heuristic (4 chars/token) since no proactive token counting exists +- **Graceful fallback:** If compression LLM call fails, keep original steps and log warning. If LLM doesn't follow structured format, entire output becomes the summary with no knowledge changes. +- **Combined summary + knowledge in Phase 1:** Single LLM call produces both summary and knowledge updates. The LLM sees the full compressed history AND knowledge store so it can avoid all duplication. The prompt explains the distinction: compressed history = events/changes over time, knowledge = current beliefs/facts. Zero extra cost. +- **Two-phase design:** Phase 1 extracts knowledge from full original steps (high fidelity). Phase 2 refines/consolidates from compressed summaries when they accumulate. Both phases use `merge_context()` for consistent tagged XML operations. +- **Tagged XML for knowledge:** Simple, parseable format that supports incremental updates via diff operations +- **Knowledge placement:** Injected near end of context for high attention weight in transformer models +- **Min chars threshold:** Avoids wasting LLM calls on already-concise content + +## Verification +1. Run existing tests: `pytest tests/test_memory.py tests/test_agents.py` +2. Run compression tests: `pytest tests/test_compression.py` +3. Manual test: Create agent with compression enabled, run multi-step task, verify memory gets compressed and knowledge accumulates from Phase 1 + + diff --git a/pyproject.toml b/pyproject.toml index 0a166246e..8dbd31d34 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "bpsa" -version = "1.23.10" +version = "1.23.11" description = "Beyond Python SmolAgents (BPSA) — a multi-language, multi-agent framework forked from HuggingFace smolagents." authors = [ { name="Joao Paulo Schwarz Schuler" }, diff --git a/src/smolagents/__init__.py b/src/smolagents/__init__.py index f9d71ea65..4e0d07dcc 100644 --- a/src/smolagents/__init__.py +++ b/src/smolagents/__init__.py @@ -14,7 +14,7 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "1.23.10" +__version__ = "1.23.11" from .agent_types import * # noqa: I001 from .agents import * # Above noqa avoids a circular dependency due to cli.py diff --git a/src/smolagents/agents.py b/src/smolagents/agents.py index c05b5b4f7..5c4052df6 100644 --- a/src/smolagents/agents.py +++ b/src/smolagents/agents.py @@ -31,7 +31,7 @@ from pathlib import Path from typing import TYPE_CHECKING, Any, Literal, Type, TypeAlias, TypedDict, Union from .bp_executors import LocalExecExecutor -from .bp_tools import get_file_size, force_directories, remove_after_markers, PlanningTool, MoveActionStepToMemory, RetrieveActionStepFromMemory, SummarizeActionStep, GetToolDescriptionsTool +from .bp_tools import get_file_size, force_directories, remove_after_markers, PlanningTool, MoveActionStepToMemory, RetrieveActionStepFromMemory, SummarizeActionStep, UpdateKnowledge, GetToolDescriptionsTool from .bp_utils import bp_parse_code_blobs, fix_nested_tags from .bp_utils import is_valid_python_code from. utils import MAX_LENGTH_TRUNCATE_CONTENT @@ -366,7 +366,7 @@ def __init__( else: self.logger = logger - self.monitor = Monitor(self.model, self.logger) + self.monitor = Monitor(self.model, self.logger, memory=self.memory) self._setup_step_callbacks(step_callbacks) self._setup_compression(compression_config) self.stream_outputs = False @@ -446,6 +446,10 @@ def _bind_memory_tools(self): tool = SummarizeActionStep() tool.set_agent(self) self.tools["summarize_actionstep"] = tool + if "update_knowledge" not in self.tools: + tool = UpdateKnowledge() + tool.set_agent(self) + self.tools["update_knowledge"] = tool def _bind_tool_descriptions(self): """Add get_tool_descriptions tool and populate it with full docs from all other tools.""" @@ -558,11 +562,20 @@ def run( title=self.name if hasattr(self, "name") else None, ) breakdown = self.get_prompt_char_breakdown() + ctx_chars = self.get_context_char_size() + knowledge = getattr(self.memory, "knowledge", "") + knowledge_chars = len(knowledge) if knowledge else 0 + extras = "" + if ctx_chars > 0: + extras += f" | Context: {ctx_chars:,} chars" + if knowledge_chars > 0: + extras += f" | Knowledge: {knowledge_chars:,} chars" self.logger.log( Text( f"[System prompt: {breakdown['total']:,} chars" f" | Instructions: {breakdown['instructions']:,} chars" - f" | Tool descriptions: {breakdown['tools']:,} chars]", + f" | Tool descriptions: {breakdown['tools']:,} chars" + f"{extras}]", style="dim", ), level=LogLevel.INFO, @@ -855,11 +868,23 @@ def write_memory_to_messages( """ Reads past llm_outputs, actions, and observations or errors from the memory into a series of messages that can be used as input to the LLM. Adds a number of keywords (such as PLAN, error, etc) to help - the LLM. + the LLM. If the agent has accumulated knowledge, it is injected just before the last message. """ messages = self.memory.system_prompt.to_messages(summary_mode=summary_mode) for memory_step in self.memory.steps: messages.extend(memory_step.to_messages(summary_mode=summary_mode)) + + # Inject knowledge near the end of context (just before the last message) + if self.memory.knowledge and self.memory.knowledge.strip(): + knowledge_msg = ChatMessage( + role=MessageRole.USER, + content=[{"type": "text", "text": f"\n{self.memory.knowledge}\n"}], + ) + if len(messages) > 1: + messages.insert(len(messages) - 1, knowledge_msg) + else: + messages.append(knowledge_msg) + return messages def get_context_char_size(self) -> int: diff --git a/src/smolagents/bp_ad_infinitum.py b/src/smolagents/bp_ad_infinitum.py index 9c8055aca..1a82b974b 100644 --- a/src/smolagents/bp_ad_infinitum.py +++ b/src/smolagents/bp_ad_infinitum.py @@ -197,6 +197,8 @@ def print_banner(config: dict): browser_str = "[green]on[/]" if config.get("browser") else "off" gui_str = "[green]on[/]" if config.get("gui") else "off" + mcp_count = len(config.get("mcp") or []) + mcp_str = f"[green]{mcp_count} server(s)[/]" if mcp_count else "off" console.print( Panel.fit( @@ -209,7 +211,8 @@ def print_banner(config: dict): f"Inject folder: {tree_str} | " f"Cooldown: {config['cooldown']}s\n" f"Browser: {browser_str} | " - f"GUI: {gui_str}", + f"GUI: {gui_str} | " + f"MCP: {mcp_str}", border_style="blue", ) ) @@ -226,9 +229,9 @@ def print_banner(config: dict): def run_loop(model, tasks, cycles, max_steps, plan_interval, tree_folder, cooldown, - browser_enabled=False, gui_enabled=False): + browser_enabled=False, gui_enabled=False, mcp_servers=None): """Core autonomous loop: cycles x tasks, fresh agent per task.""" - from smolagents.bp_cli import _shutdown_browser, _shutdown_gui, build_agent + from smolagents.bp_cli import _shutdown_browser, _shutdown_gui, _shutdown_mcp, build_agent original_dir = os.getcwd() total_start = time.time() @@ -259,7 +262,7 @@ def run_loop(model, tasks, cycles, max_steps, plan_interval, tree_folder, cooldo if tree_folder: prompt += inject_tree(tree_folder) - agent = build_agent(model, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + agent = build_agent(model, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) if plan_interval: agent.planning_interval = plan_interval @@ -287,6 +290,7 @@ def run_loop(model, tasks, cycles, max_steps, plan_interval, tree_folder, cooldo total_tasks_run += 1 console.print(f"[red]FAIL[/] {task_label} | {elapsed:.1f}s | {e}") finally: + _shutdown_mcp(agent) _shutdown_browser(agent) _shutdown_gui(agent) @@ -355,6 +359,10 @@ def main(): "--gui-x11", action="store_true", default=None, help="Enable native GUI interaction tools (overrides BPSA_GUI)", ) + parser.add_argument( + "--mcp", action="append", metavar="URL_OR_CMD", dest="mcp", + help="MCP server to connect (URL or shell command); repeatable", + ) args = parser.parse_args() # Install Ctrl+C handler @@ -381,6 +389,8 @@ def main(): browser_enabled = args.browser if args.browser else get_env_bool("BPSA_BROWSER") gui_enabled = args.gui_x11 if args.gui_x11 else get_env_bool("BPSA_GUI") + from smolagents.bp_cli import _parse_mcp_servers + mcp_servers = _parse_mcp_servers(args.mcp or []) or None # Load tasks console.print("[dim]Loading tasks...[/]") @@ -397,6 +407,7 @@ def main(): "cooldown": cooldown, "browser": browser_enabled, "gui": gui_enabled, + "mcp": mcp_servers, } print_banner(config) @@ -405,7 +416,7 @@ def main(): # Run the loop run_loop(model, tasks, cycles, max_steps, plan_interval, tree_folder, cooldown, - browser_enabled=browser_enabled, gui_enabled=gui_enabled) + browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) if __name__ == "__main__": diff --git a/src/smolagents/bp_cli.py b/src/smolagents/bp_cli.py index f6dbfefc2..3982d2f96 100644 --- a/src/smolagents/bp_cli.py +++ b/src/smolagents/bp_cli.py @@ -325,7 +325,7 @@ def build_model(override_model_id=None): return model -def build_agent(model, approval_callback=None, browser_enabled=False, gui_enabled=False): +def build_agent(model, approval_callback=None, browser_enabled=False, gui_enabled=False, mcp_servers=None): from smolagents import CodeAgent from smolagents.bp_thinkers import ( DEFAULT_THINKER_COMPRESSION, DEFAULT_THINKER_MAX_STEPS, @@ -337,6 +337,7 @@ def build_agent(model, approval_callback=None, browser_enabled=False, gui_enable tools = list(DEFAULT_THINKER_TOOLS) browser_manager = None gui_manager = None + mcp_client = None # Image tools — always available (Pillow only; tesseract optional for OCR) from smolagents.bp_tools import LoadImageTool, load_image_callback @@ -356,6 +357,11 @@ def build_agent(model, approval_callback=None, browser_enabled=False, gui_enable gui_manager, gui_tools = create_gui_tools() tools.extend(gui_tools) + if mcp_servers: + from smolagents import MCPClient + mcp_client = MCPClient(mcp_servers, structured_output=True) + tools.extend(mcp_client.__enter__()) + step_cbs = [_compact_step_callback, load_image_callback] if gui_manager: from smolagents.bp_tools_gui import gui_screenshot_callback @@ -392,6 +398,9 @@ def build_agent(model, approval_callback=None, browser_enabled=False, gui_enable if gui_manager: agent._gui_manager = gui_manager + if mcp_client: + agent._mcp_client = mcp_client + return agent @@ -530,6 +539,9 @@ def print_turn_summary(turn_num: int, elapsed: float, input_tokens: int, output_ ctx_chars = agent.get_context_char_size() if ctx_chars > 0: line += f" | Context: {format_tokens(ctx_chars)} chars" + knowledge = getattr(agent.memory, "knowledge", "") + if knowledge: + line += f" | Knowledge: {format_tokens(len(knowledge))} chars" line += f" | Auto-approve: {'on' if _auto_approve else 'off'}" line += "[/]" console.print(line) @@ -600,12 +612,14 @@ def _save_aliases(aliases: dict): SLASH_COMMANDS = [ "/alias", "/auto-approve", "/cd", "/clear", "/compress", "/compression", - "/compression-keep-recent-steps", "/compression-max-uncompressed-steps", + "/compression-keep-recent-steps", "/compression-keep-compressed-steps", + "/compression-max-uncompressed-steps", "/compression-max-compressed-steps", + "/compression-set-high", "/compression-set-low", "/compression-set-medium", "/compression-model", "/dictation", "/exit", "/help", - "/load-instructions", "/plan", "/pwd", "/redo", "/repeat", "/repeat-prompt", "/run-prompt", "/run-py", "/save", + "/instructions-load", "/plan", "/pwd", "/redo", "/repeat", "/repeat-prompt", "/run-prompt", "/run-py", "/save", "/session-load", "/session-save", "/show-compression-stats", "/show-memory-stats", "/show-stats", - "/save-step", "/set-max-steps", "/show-step", "/show-steps", "/show-tools", "/undo-steps", "/verbose", + "/save-step", "/set-max-steps", "/show-knowledge", "/show-step", "/show-steps", "/show-tools", "/undo-steps", "/verbose", ] @@ -617,19 +631,24 @@ def print_help(): table.add_row("!!", "Run an OS command; output is appended to the next prompt sent to the agent") table.add_row("!!!", "Run an OS command and immediately send the output to the agent for analysis") table.add_row("/alias ", "Define alias (saved to ~/.bpsa_aliases). No args=list, -d =delete") - table.add_row("/auto-approve \[on|off]", "Toggle or set auto-approve for tag execution") + table.add_row(r"/auto-approve \[on|off]", "Toggle or set auto-approve for tag execution") table.add_row("/cd ", "Change working directory") table.add_row("/clear", "Clear screen, reset agent and conversation history") - table.add_row("/compress \[N]", "Force compression now, or compress a specific step N") - table.add_row("/compression \[on|off]", "Toggle compression on/off") + table.add_row(r"/compress \[N]", "Force compression now, or compress a specific step N") + table.add_row(r"/compression \[on|off]", "Toggle compression on/off") table.add_row("/compression-keep-recent-steps ", "Change keep_recent_steps") table.add_row("/compression-max-uncompressed-steps ", "Change max_uncompressed_steps") + table.add_row("/compression-keep-compressed-steps ", "Change keep_compressed_steps") + table.add_row("/compression-max-compressed-steps ", "Change max_compressed_steps") + table.add_row("/compression-set-high", "Set compression preset: HIGH (aggressive)") + table.add_row("/compression-set-medium", "Set compression preset: MEDIUM (balanced)") + table.add_row("/compression-set-low", "Set compression preset: LOW (conservative)") table.add_row("/compression-model ", "Switch compression model") table.add_row(r"/dictation \[on|off]", "Toggle dictation (requires BPSA_DICTATION_TRANSCRIBER)") table.add_row("/exit", "Exit the REPL") table.add_row("/help", "Show this help message") - table.add_row("/load-instructions", "Load agent instruction files into next prompt") - table.add_row("/plan \[on|off|N]", "Toggle or set planning interval (default: 22)") + table.add_row("/instructions-load", "Load agent instruction files into next prompt") + table.add_row(r"/plan \[on|off|N]", "Toggle or set planning interval (default: 22)") table.add_row("/pwd", "Show current working directory") table.add_row("/redo", "Re-run the last prompt (undo last steps and run again)") table.add_row("/repeat ", "Run the same prompt N times, each on a fresh agent with current context") @@ -644,10 +663,11 @@ def print_help(): table.add_row("/show-memory-stats", "Show memory breakdown: steps, tokens, compressed vs uncompressed") table.add_row("/show-step ", "Show full content of a specific step") table.add_row("/show-steps", "Show one-line summary of all memory steps") + table.add_row("/show-knowledge", "Show the full content of the knowledge store") table.add_row("/show-stats", "Show session statistics") table.add_row("/set-max-steps ", "Change max_steps for the agent") table.add_row("/show-tools", "List all loaded tools") - table.add_row("/undo-steps \[N]", "Remove last N steps from memory (default: 1)") + table.add_row(r"/undo-steps \[N]", "Remove last N steps from memory (default: 1)") table.add_row("/verbose", "Toggle verbose output") console.print(table) console.print() @@ -869,6 +889,11 @@ def print_stats(session_stats: dict, agent=None): table.add_row("System prompt", f"{breakdown['total']:,} chars") table.add_row(" Instructions", f"{breakdown['instructions']:,} chars") table.add_row(" Tool descriptions", f"{breakdown['tools']:,} chars") + if agent: + knowledge = getattr(agent.memory, "knowledge", "") + knowledge_chars = len(knowledge) if knowledge else 0 + table.add_row("", "") + table.add_row("Knowledge", f"{knowledge_chars:,} chars") console.print(table) console.print() @@ -1013,6 +1038,12 @@ def cmd_compression_stats(agent): total_chars = sum(len(s.summary) for s in steps if isinstance(s, CompressedHistoryStep)) compression_count = compressor._compression_count if compressor else 0 + # Knowledge store info + knowledge = getattr(agent.memory, "knowledge", "") + knowledge_chars = len(knowledge) + from smolagents.bp_compression import list_xml_tag_names + knowledge_tags = list_xml_tag_names(knowledge) if knowledge else [] + console.print() console.print(Rule("[bold]Compression Stats", style="blue")) stats_table = Table(show_header=False, box=None, padding=(0, 2)) @@ -1023,6 +1054,8 @@ def cmd_compression_stats(agent): stats_table.add_row("Original steps compressed", str(compressed_original)) stats_table.add_row("Compression runs", str(compression_count)) stats_table.add_row("Compressed summary chars", f"{total_chars:,}") + stats_table.add_row("Knowledge chars", f"{knowledge_chars:,}") + stats_table.add_row("Knowledge sections", f"{len(knowledge_tags)} ({', '.join(knowledge_tags)})" if knowledge_tags else "0") console.print(stats_table) console.print() @@ -1061,8 +1094,12 @@ def cmd_memory_stats(agent): table.add_row("Total memory steps", str(total_steps)) for type_name, count in sorted(type_counts.items()): table.add_row(f" {type_name}", str(count)) + # Knowledge store + knowledge = getattr(agent.memory, "knowledge", "") + knowledge_chars = len(knowledge) table.add_row("Total chars", f"{total_chars:,}") table.add_row("Estimated tokens", f"{total_tokens:,}") + table.add_row("Knowledge chars", f"{knowledge_chars:,}") console.print(table) console.print() @@ -1135,7 +1172,7 @@ def cmd_compress(agent, args: str): old_threshold = compressor.config.max_uncompressed_steps compressor.config.max_uncompressed_steps = 0 # Force trigger original_len = len(agent.memory.steps) - agent.memory.steps = compressor.compress(agent.memory.steps) + agent.memory.steps, agent.memory.knowledge = compressor.compress(agent.memory.steps, agent.memory.knowledge) compressor.config.max_uncompressed_steps = old_threshold new_len = len(agent.memory.steps) if new_len < original_len: @@ -1203,6 +1240,106 @@ def cmd_compression_max_uncompressed(agent, args: str): console.print("[red]Invalid number. Usage: /compression-max-uncompressed-steps [/]") +def cmd_compression_keep_compressed(agent, args: str): + """Change keep_compressed_steps.""" + config = _get_compression_config(agent) + if config is None: + return + args = args.strip() + if not args: + console.print(f"[cyan]Current keep_compressed_steps: {config.keep_compressed_steps}[/]") + console.print("[dim]Usage: /compression-keep-compressed-steps [/]") + return + try: + n = int(args) + if n < 0: + raise ValueError + config.keep_compressed_steps = n + console.print(f"[green]keep_compressed_steps set to {n}[/]") + except ValueError: + console.print("[red]Invalid number. Usage: /compression-keep-compressed-steps [/]") + + +def cmd_compression_max_compressed(agent, args: str): + """Change max_compressed_steps.""" + config = _get_compression_config(agent) + if config is None: + return + args = args.strip() + if not args: + console.print(f"[cyan]Current max_compressed_steps: {config.max_compressed_steps}[/]") + console.print("[dim]Usage: /compression-max-compressed-steps [/]") + return + try: + n = int(args) + if n < 0: + raise ValueError + config.max_compressed_steps = n + console.print(f"[green]max_compressed_steps set to {n}[/]") + except ValueError: + console.print("[red]Invalid number. Usage: /compression-max-compressed-steps [/]") + + +def cmd_compression_set_high(agent): + """Set compression to HIGH preset (aggressive).""" + config = _get_compression_config(agent) + if config is None: + return + config.keep_recent_steps = 20 + config.max_uncompressed_steps = 25 + config.keep_compressed_steps = 10 + config.max_compressed_steps = 15 + table = Table(show_header=False, box=None) + table.add_column(style="cyan", no_wrap=True) + table.add_column(style="green") + table.add_row("Compression preset", "HIGH") + table.add_row("keep_recent_steps", "20") + table.add_row("max_uncompressed_steps", "25") + table.add_row("keep_compressed_steps", "20") + table.add_row("max_compressed_steps", "25") + console.print(table) + + +def cmd_compression_set_normal(agent): + """Set compression to NORMAL preset (balanced).""" + config = _get_compression_config(agent) + if config is None: + return + config.keep_recent_steps = 40 + config.max_uncompressed_steps = 50 + config.keep_compressed_steps = 15 + config.max_compressed_steps = 20 + table = Table(show_header=False, box=None) + table.add_column(style="cyan", no_wrap=True) + table.add_column(style="green") + table.add_row("Compression preset", "NORMAL") + table.add_row("keep_recent_steps", "40") + table.add_row("max_uncompressed_steps", "50") + table.add_row("keep_compressed_steps", "10") + table.add_row("max_compressed_steps", "20") + console.print(table) + + +def cmd_compression_set_low(agent): + """Set compression to LOW preset (conservative).""" + config = _get_compression_config(agent) + if config is None: + return + config.keep_recent_steps = 90 + config.max_uncompressed_steps = 100 + config.keep_compressed_steps = 20 + config.max_compressed_steps = 40 + table = Table(show_header=False, box=None) + table.add_column(style="cyan", no_wrap=True) + table.add_column(style="green") + table.add_row("Compression preset", "LOW") + table.add_row("keep_recent_steps", "90") + table.add_row("max_uncompressed_steps", "100") + table.add_row("keep_compressed_steps", "20") + table.add_row("max_compressed_steps", "40") + console.print(table) + + def cmd_compression_model(agent, args: str): """Switch compression model.""" config = _get_compression_config(agent) @@ -1270,7 +1407,9 @@ def cmd_session_save(agent, session_stats: dict, args: str): filename += ".json" try: count = save_session(filename, agent, session_stats) - console.print(f"[green]Session saved to {filename} ({count} steps).[/]") + knowledge = getattr(agent.memory, "knowledge", "") + knowledge_info = f", knowledge: {len(knowledge):,} chars" if knowledge else "" + console.print(f"[green]Session saved to {filename} ({count} steps{knowledge_info}).[/]") except Exception as e: console.print(f"[red]Failed to save session: {e}[/]") @@ -1286,12 +1425,21 @@ def cmd_session_load(agent, args: str) -> dict | None: console.print("[yellow]Usage: /session-load [/]") return None if not os.path.isfile(filename): - console.print(f"[red]File not found: {filename}[/]") - return None + # Try appending .json if no extension was given + if not os.path.splitext(filename)[1] and os.path.isfile(filename + ".json"): + filename = filename + ".json" + else: + if not os.path.splitext(filename)[1]: + console.print(f"[red]File not found: {filename} (also tried {filename}.json)[/]") + else: + console.print(f"[red]File not found: {filename}[/]") + return None try: stats = load_session(filename, agent) step_count = len(agent.memory.steps) - console.print(f"[green]Session loaded from {filename} ({step_count} steps).[/]") + knowledge = getattr(agent.memory, "knowledge", "") + knowledge_info = f", knowledge: {len(knowledge):,} chars" if knowledge else "" + console.print(f"[green]Session loaded from {filename} ({step_count} steps{knowledge_info}).[/]") return stats except Exception as e: console.print(f"[red]Failed to load session: {e}[/]") @@ -1478,6 +1626,23 @@ def cmd_show_steps(agent): console.print() + +def cmd_show_knowledge(agent): + """Show the full content of the knowledge store.""" + from smolagents.bp_compression import list_xml_tag_names + + knowledge = getattr(agent.memory, "knowledge", "") + if not knowledge: + console.print("[yellow]Knowledge is empty.[/]") + return + + knowledge_tags = list_xml_tag_names(knowledge) + sections_info = f"{len(knowledge_tags)} section(s): {', '.join(knowledge_tags)}" if knowledge_tags else "no sections" + console.print(Rule(f"[bold]Knowledge[/] [dim]({len(knowledge):,} chars, {sections_info})[/]", style="blue")) + console.print(knowledge) + console.print() + + def cmd_undo(agent, args: str): """Remove the last N steps from agent memory. Default N=1.""" from smolagents.memory import SystemPromptStep @@ -1621,6 +1786,38 @@ def _shutdown_gui(agent): manager.shutdown() +def _parse_mcp_servers(mcp_list: list[str]): + """Parse a list of MCP server strings into server_parameters dicts/objects. + + Each entry is either: + - An HTTP URL: {"url": "...", "transport": "streamable-http"} + - A command string: StdioServerParameters(command, args=[...]) + """ + import shlex + from mcp import StdioServerParameters + result = [] + for spec in mcp_list: + spec = spec.strip() + if not spec: + continue + if spec.startswith("http://") or spec.startswith("https://"): + result.append({"url": spec, "transport": "streamable-http"}) + else: + parts = shlex.split(spec) + result.append(StdioServerParameters(command=parts[0], args=parts[1:])) + return result + + +def _shutdown_mcp(agent): + """Disconnect MCP client if one exists on the agent.""" + client = getattr(agent, "_mcp_client", None) + if client: + try: + client.__exit__(None, None, None) + except Exception: + pass + + def prepend_instructions(task: str, instructions: str | None) -> str: if instructions: return instructions+""" @@ -1629,13 +1826,13 @@ def prepend_instructions(task: str, instructions: str | None) -> str: return task -def run_one_shot(task: str, skip_instructions: bool = False, auto_approve: bool = True, browser_enabled: bool = False, gui_enabled: bool = False): +def run_one_shot(task: str, skip_instructions: bool = False, auto_approve: bool = True, browser_enabled: bool = False, gui_enabled: bool = False, mcp_servers=None): global _auto_approve _auto_approve = auto_approve try_load_dotenv() check_required_env() model = build_model() - agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) instructions = None if not skip_instructions: console.print("[dim]Loading agent instructions...[/]") @@ -1657,16 +1854,17 @@ def run_one_shot(task: str, skip_instructions: bool = False, auto_approve: bool if manager: manager.shutdown() _shutdown_gui(agent) + _shutdown_mcp(agent) -def run_repl(skip_instructions: bool = False, auto_approve: bool = True, browser_enabled: bool = False, gui_enabled: bool = False): +def run_repl(skip_instructions: bool = False, auto_approve: bool = True, browser_enabled: bool = False, gui_enabled: bool = False, mcp_servers=None): global _auto_approve _auto_approve = auto_approve try_load_dotenv() check_required_env() model = build_model() - agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) model_id = get_env("BPSA_MODEL_ID") server_model = get_env("BPSA_SERVER_MODEL", default="OpenAIServerModel") tool_count = count_tools(agent) @@ -1788,6 +1986,7 @@ def get_input(): _shutdown_voice() _shutdown_browser(agent) _shutdown_gui(agent) + _shutdown_mcp(agent) console.print("[dim]Goodbye![/]") break @@ -1843,6 +2042,7 @@ def get_input(): _shutdown_voice() _shutdown_browser(agent) _shutdown_gui(agent) + _shutdown_mcp(agent) console.print("[dim]Goodbye![/]") break elif cmd == "/help": @@ -1884,7 +2084,8 @@ def get_input(): elif cmd == "/clear": _shutdown_browser(agent) _shutdown_gui(agent) - agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + _shutdown_mcp(agent) + agent = build_agent(model, approval_callback=interactive_approval_callback, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) session_stats = { "turns": 0, "total_time": 0.0, @@ -1930,7 +2131,7 @@ def get_input(): elif cmd == "/pwd": console.print(f"[cyan]{os.getcwd()}[/]") continue - elif cmd == "/load-instructions": + elif cmd == "/instructions-load": console.print("[dim]Loading agent instructions...[/]") instructions = load_agent_instructions() if instructions: @@ -1979,6 +2180,21 @@ def get_input(): elif cmd == "/compression-max-uncompressed-steps": cmd_compression_max_uncompressed(agent, cmd_args) continue + elif cmd == "/compression-keep-compressed-steps": + cmd_compression_keep_compressed(agent, cmd_args) + continue + elif cmd == "/compression-max-compressed-steps": + cmd_compression_max_compressed(agent, cmd_args) + continue + elif cmd == "/compression-set-high": + cmd_compression_set_high(agent) + continue + elif cmd == "/compression-set-medium": + cmd_compression_set_normal(agent) + continue + elif cmd == "/compression-set-low": + cmd_compression_set_low(agent) + continue elif cmd == "/compression-model": cmd_compression_model(agent, cmd_args) continue @@ -1991,6 +2207,9 @@ def get_input(): elif cmd == "/show-steps": cmd_show_steps(agent) continue + elif cmd == "/show-knowledge": + cmd_show_knowledge(agent) + continue elif cmd == "/undo-steps": cmd_undo(agent, cmd_args) continue @@ -2173,6 +2392,10 @@ def main(): "--gui-x11", action="store_true", help="Enable native GUI interaction tools (screenshot, click, type, key via xdotool/ImageMagick on X11)", ) + parser.add_argument( + "--mcp", action="append", metavar="URL_OR_CMD", dest="mcp", + help="Connect an MCP server. Use a URL for HTTP servers or a shell command for stdio servers. Can be repeated for multiple servers.", + ) subparsers = parser.add_subparsers(dest="command") run_parser = subparsers.add_parser("run", help="Run a one-shot task") @@ -2184,20 +2407,21 @@ def main(): from smolagents.bp_utils import get_env_bool browser_enabled = args.browser or get_env_bool("BPSA_BROWSER") gui_enabled = args.gui_x11 or get_env_bool("BPSA_GUI") + mcp_servers = _parse_mcp_servers(args.mcp or []) or None # Piped input detection if not sys.stdin.isatty() and args.command is None: task = sys.stdin.read().strip() if task: - run_one_shot(task, skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + run_one_shot(task, skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) else: fail("No input provided via pipe.") return if args.command == "run": - run_one_shot(args.task, skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + run_one_shot(args.task, skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) else: - run_repl(skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled) + run_repl(skip_instructions=skip_instructions, auto_approve=auto_approve, browser_enabled=browser_enabled, gui_enabled=gui_enabled, mcp_servers=mcp_servers) if __name__ == "__main__": diff --git a/src/smolagents/bp_compression.py b/src/smolagents/bp_compression.py index 1f17a681a..983a2fffd 100644 --- a/src/smolagents/bp_compression.py +++ b/src/smolagents/bp_compression.py @@ -8,6 +8,7 @@ via LLM summarization while keeping recent steps in full detail. """ +import re import time from dataclasses import dataclass, field from logging import getLogger @@ -31,6 +32,10 @@ "estimate_step_tokens", "create_compression_callback", "create_merge_prompt", + "merge_context", + "list_xml_tag_names", + "create_knowledge_extraction_prompt", + "parse_compression_output", ] @@ -230,14 +235,77 @@ def should_preserve_step(step: MemoryStep, config: CompressionConfig) -> bool: return False -def create_compression_prompt(steps_to_compress: list[MemoryStep]) -> str: +def _build_post_steps_section(post_steps: list["MemoryStep"] | None) -> str: + """Build a prompt section from steps that follow the compressed batch. + + These steps are shown to the compressor as read-only context so it can avoid + writing stale knowledge that has already been superseded by later activity. + + Args: + post_steps: Steps occurring after the batch being compressed. May be None or empty. + + Returns: + A formatted prompt section string, or empty string if nothing to show. + """ + if not post_steps: + return "" + + post_step_descs = [] + for step in post_steps: + if isinstance(step, ActionStep): + desc = f"Step {step.step_number}:" + if step.model_output: + output = str(step.model_output)[:500] + desc += f"\n{output}" + if step.observations: + obs = str(step.observations)[:300] + desc += f"\n{obs}" + post_step_descs.append("" + desc + "") + elif isinstance(step, PlanningStep): + plan = (step.plan or "")[:400] + post_step_descs.append("" + plan + "") + elif isinstance(step, CompressedHistoryStep): + summary = (step.summary or "")[:400] + post_step_descs.append(f"{summary}") + + if not post_step_descs: + return "" + + post_steps_text = "\n".join(post_step_descs) + return f""" +The following steps occurred AFTER the batch you are summarizing. Use them to understand +what is still current and what has already been superseded. Do NOT summarize these steps -- +they will remain in full detail. Only use them as context to avoid writing stale knowledge. + +{post_steps_text} + +""" + + +def create_compression_prompt( + steps_to_compress: list[MemoryStep], + knowledge: str = "", + existing_summaries: list["CompressedHistoryStep"] | None = None, + post_steps: list[MemoryStep] | None = None, +) -> str: """Create the prompt for the compression LLM call. Builds a structured representation of the steps to compress and asks the LLM to generate a concise summary preserving key information. + The prompt provides two types of existing context to avoid duplication: + - **Compressed history** (existing_summaries): chronological record of past events + and changes. The new summary should complement, not repeat, this history. + - **Knowledge** (knowledge): current beliefs and facts. The LLM can propose + updates to knowledge when the execution history reveals corrections or + important new information. + Args: steps_to_compress: List of memory steps to summarize. + knowledge: Current knowledge store content (tagged XML). Empty string if none. + existing_summaries: Already-compressed history steps to avoid duplicating. + post_steps: Steps that occurred AFTER the batch being compressed. Shown to the + compressor so it can see what is still current vs already superseded. Returns: The prompt string for the compression LLM call. @@ -265,15 +333,99 @@ def create_compression_prompt(steps_to_compress: list[MemoryStep]) -> str: steps_text = "<\n>".join(step_descriptions) - return f"""Summarize the following agent execution history () into a concise summary. -{COMMON_COMPRESSION_INSTRUCTIONS} + # Build compressed history section + history_section = "" + if existing_summaries: + history_parts = [] + for s in existing_summaries: + history_parts.append(s.summary) + history_text = "\n---\n".join(history_parts) + history_section = f""" +The following is the compressed history of earlier work (events and changes over time). +Do NOT repeat any information already captured in the compressed history. +Your summary should only describe NEW events, actions, and changes from the execution history below. + + +{history_text} + +""" -This is the execution history: + # Build knowledge section + has_knowledge = knowledge and knowledge.strip() + has_history = bool(existing_summaries) + + if has_knowledge: + knowledge_section = f""" +The agent has a persistent knowledge store containing current beliefs and facts: + +{knowledge} + +""" + else: + knowledge_section = "" + + # Build subsequent steps section (steps AFTER the batch being compressed) + post_steps_section = _build_post_steps_section(post_steps) + + # Build deduplication and output instructions + dedup_parts = [] + if has_history: + dedup_parts.append("the compressed history (past events)") + if has_knowledge: + dedup_parts.append("the knowledge store (current facts)") + + if dedup_parts: + dedup_instruction = f"Do NOT repeat information already in {' or '.join(dedup_parts)}." + else: + dedup_instruction = "" + + output_instruction = f""" +{dedup_instruction} + +There are two distinct stores: +- **Compressed history** captures events, changes, and what happened over time. +- **Knowledge** captures current beliefs, facts, and the latest state of things. + +Episodic Memory vs. Semantic Memory +- **Compressed History** = Episodic Memory = what happened (events, actions taken). +- **Knowledge** = Semantic Memory = what is currently true (facts, beliefs, current state). + +In the Human mind: +- **Episodic memory** = "I did X at time T." +- **Semantic memory** = "X is true." + +Your summary will be added to the compressed history (Episodic Memory). It should describe what happened +(events, actions, outcomes, changes) without repeating prior history entries. + +If the execution history reveals important new facts or corrections your existing knowledge (Semantic Memory), +include a section. Use XML tags to add, update, or delete sections: +- To ADD or UPDATE: new content +- To DELETE an obsolete section: + +You can add/update/delete as many s as you see fit. + +If no knowledge updates are needed, omit the section entirely. +In the case that you spot any other error in the knowledge, you can fix as you see fit. + +Output format: + +Your concise summary of new events and changes... + + +...tagged updates if any... + +""" + + return f"""Hello super-intelligence! +This task is involved in your context compression. +To your own benefit, please summarize the following agent execution history into a concise summary. +{COMMON_COMPRESSION_INSTRUCTIONS} +{history_section}{knowledge_section}{post_steps_section}{output_instruction} +This is the execution history to summarize: {steps_text} - -SUMMARY:""" +""" def create_merge_prompt(compressed_steps: list[CompressedHistoryStep]) -> str: @@ -312,6 +464,182 @@ def create_merge_prompt(compressed_steps: list[CompressedHistoryStep]) -> str: CONSOLIDATED SUMMARY:""" + + + +def parse_compression_output(raw_output: str) -> tuple[str, str]: + """Parse structured compression output into summary and knowledge updates. + + Expects output in the format: + ... + ... + + Falls back gracefully: if no tags found, treats the entire + output as the summary with no knowledge updates. + + Args: + raw_output: Raw LLM output from the compression call. + + Returns: + Tuple of (summary, knowledge_updates). knowledge_updates may be empty string. + """ + if not raw_output: + return "", "" + + # Try to extract ... + summary_match = re.search(r'(.*?)', raw_output, re.DOTALL) + if summary_match: + summary = summary_match.group(1).strip() + else: + # Fallback: no tags, use everything before or entire output + knowledge_start = raw_output.find('') + if knowledge_start >= 0: + summary = raw_output[:knowledge_start].strip() + else: + summary = raw_output.strip() + + # Try to extract ... + knowledge_match = re.search(r'(.*?)', raw_output, re.DOTALL) + knowledge_updates = knowledge_match.group(1).strip() if knowledge_match else "" + + return summary, knowledge_updates +def list_xml_tag_names(text: str) -> list[str]: + """List unique top-level XML tag names found in text. + + Args: + text: String containing XML-tagged content. + + Returns: + Sorted list of unique tag names. + """ + if not text or not text.strip(): + return [] + names = set() + for m in re.finditer(r'<([\w][\w-]*)[\s>/]', text): + names.add(m.group(1)) + return sorted(names) + + +def merge_context(existing: str, updates: str) -> str: + """Merge tagged XML updates into existing context using three rules: + + 1. **UPDATE**: If a tag in ``updates`` has content and exists in ``existing``, replace it. + 2. **DELETE**: If a tag in ``updates`` is self-closing (````) or empty + (````), remove it from ``existing``. + 3. **APPEND**: If a tag in ``updates`` has content and does NOT exist in + ``existing``, append it. + + Args: + existing: The current knowledge string (tagged XML). + updates: New tagged XML with updates, deletions, or additions. + + Returns: + Updated knowledge string after applying all operations. + """ + if not updates or not updates.strip(): + return existing + + result = existing if existing else "" + processed_tags = set() + + # 1. Self-closing tags: or -> DELETE + for m in re.finditer(r'<([\w][\w-]*)\s*/>', updates): + tag = m.group(1) + if tag in processed_tags: + continue + processed_tags.add(tag) + result = re.sub(rf'<{re.escape(tag)}>.*?\s*', '', result, flags=re.DOTALL) + result = re.sub(rf'<{re.escape(tag)}\s*/>\s*', '', result, flags=re.DOTALL) + + # 2. Content tags: content + for m in re.finditer(r'<([\w][\w-]*)>(.*?)', updates, re.DOTALL): + tag = m.group(1) + content = m.group(2) + if tag in processed_tags: + continue + processed_tags.add(tag) + full_tag = f'<{tag}>{content}' + if content.strip() == '': + # Empty content -> DELETE + result = re.sub(rf'<{re.escape(tag)}>.*?\s*', '', result, flags=re.DOTALL) + elif re.search(rf'<{re.escape(tag)}>.*?', result, re.DOTALL): + # Tag exists -> UPDATE + result = re.sub(rf'<{re.escape(tag)}>.*?', full_tag, result, flags=re.DOTALL) + else: + # Tag doesn't exist -> APPEND + result = result.rstrip() + '\n' + full_tag + '\n' + + return result + + +def create_knowledge_extraction_prompt( + compressed_steps: list[CompressedHistoryStep], + existing_tag_names: list[str] | None = None, + post_steps: list[MemoryStep] | None = None, +) -> str: + """Create a prompt for extracting knowledge from compressed summaries. + + Instead of rewriting the full knowledge, this prompt asks the LLM to produce + a tagged XML diff: updates to existing sections, new sections, or deletions. + + Args: + compressed_steps: List of CompressedHistoryStep instances to extract knowledge from. + existing_tag_names: List of tag names currently in the knowledge store. + post_steps: Steps occurring after the compressed batch. Passed as read-only context + so the LLM avoids writing knowledge that has already been superseded. + + Returns: + The prompt string for the knowledge extraction LLM call. + """ + summaries = [] + for i, step in enumerate(compressed_steps, 1): + summaries.append( + f"Summary {i} (covering {step.original_step_count} steps, " + f"step numbers: {step.compressed_step_numbers}):\n{step.summary}" + ) + summaries_text = "\n\n".join(summaries) + total_steps = sum(step.original_step_count for step in compressed_steps) + + if existing_tag_names: + tag_list = ", ".join(existing_tag_names) + existing_section = f""" +Existing knowledge sections: {tag_list} + +Rules: +- To UPDATE an existing section, use the same tag name with new content +- To DELETE a section that is no longer relevant, use an empty self-closing tag: +- To ADD new information, use a new descriptive tag name +- Only output sections that are new, changed, or should be deleted +- Do NOT output sections that have not changed""" + else: + existing_section = """ +There are no existing knowledge sections yet. Create new tagged sections for +the important information found in the summaries below. +Use descriptive tag names (e.g., , , , ).""" + + post_steps_section = _build_post_steps_section(post_steps) + + return f"""Hello super-intelligence! +This task is involved in your context compression. +Please extract key knowledge from the following {len(compressed_steps)} summaries +covering {total_steps} total steps of agent execution. +These summaries are about to be removed from the context. Therefore, updating the knowledge +with any relevant information is important. In the case that you spot any other error in +the knowledge, you can fix as you see fit. + +Output the knowledge as XML-tagged sections. Each section should contain concise, +factual information that would be useful for continuing the task. +{existing_section} + +{COMMON_COMPRESSION_INSTRUCTIONS} +{post_steps_section} + +{summaries_text} + + +KNOWLEDGE UPDATE:""" + + class ContextCompressor: """Manages context compression for agent memory. @@ -370,25 +698,27 @@ def should_compress(self, steps: list[MemoryStep]) -> bool: return False - def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: + def compress(self, steps: list[MemoryStep], knowledge: str = "") -> tuple[list[MemoryStep], str]: """Compress older steps while preserving recent and critical steps. This method: 1. Identifies which steps must be preserved (TaskStep, errors, etc.) 2. Keeps the most recent N compressible steps in full detail 3. Compresses remaining old steps into a summary via LLM - 4. Returns a new list with compressed history + 4. Optionally extracts knowledge updates from the same LLM call + 5. Returns a new step list and updated knowledge Args: steps: Current list of memory steps. + knowledge: Current knowledge store content (tagged XML). Returns: - New list with older steps compressed into a CompressedHistoryStep. + Tuple of (new_steps, updated_knowledge). """ start_time = time.time() if not self.should_compress(steps): - return steps + return steps, knowledge # Separate preserved steps and compressible steps preserved_indices = set() @@ -411,7 +741,7 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: compressible_indices = [i for i in range(len(steps)) if i not in preserved_indices] if len(compressible_indices) <= self.config.keep_recent_steps: - return steps # Nothing to compress + return steps, knowledge # Nothing to compress # Steps to keep in full detail (most recent compressible ones) recent_to_keep = set(compressible_indices[-self.config.keep_recent_steps :]) @@ -420,7 +750,7 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: to_compress_indices = [i for i in compressible_indices if i not in recent_to_keep] if not to_compress_indices: - return steps + return steps, knowledge steps_to_compress = [steps[i] for i in to_compress_indices] @@ -443,10 +773,24 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: ) else: logger.info(f"Compression skipped: content ({original_chars} chars) < min_compression_chars ({self.config.min_compression_chars})") - return steps + return steps, knowledge + + # Collect existing compressed history for deduplication + existing_summaries = [s for s in steps if isinstance(s, CompressedHistoryStep)] + + # Steps occurring AFTER the batch being compressed (kept in full detail). + # Pass these to the compressor so it can see what is still current and + # avoid writing knowledge that was already superseded by later steps. + max_to_compress_index = max(to_compress_indices) + post_steps = [ + steps[i] for i in range(max_to_compress_index + 1, len(steps)) + if not isinstance(steps[i], (TaskStep, CompressedHistoryStep)) + ] - # Generate summary using LLM - compression_prompt = create_compression_prompt(steps_to_compress) + # Generate summary using LLM (history + knowledge aware) + compression_prompt = create_compression_prompt( + steps_to_compress, knowledge, existing_summaries, post_steps=post_steps + ) try: summary_message = self.compression_model.generate( @@ -457,14 +801,15 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: ) ] ) - summary = summary_message.content - if isinstance(summary, list): - summary = " ".join(item.get("text", "") for item in summary if isinstance(item, dict)) + raw_output = summary_message.content + if isinstance(raw_output, list): + raw_output = " ".join(item.get("text", "") for item in raw_output if isinstance(item, dict)) + summary, knowledge_updates = parse_compression_output(raw_output) compression_token_usage = summary_message.token_usage except Exception as e: logger.warning(f"Compression failed, keeping original steps: {e}") - return steps + return steps, knowledge # Safety check: skip compression if summary is larger than original summary_chars = len(summary) if summary else 0 @@ -477,7 +822,7 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: ) else: logger.info(f"Compression skipped: summary ({summary_chars} chars) >= original ({original_chars} chars)") - return steps + return steps, knowledge # Build compressed step compressed_step_numbers = [] @@ -533,7 +878,21 @@ def compress(self, steps: list[MemoryStep]) -> list[MemoryStep]: f"(kept {len(new_steps)} steps total, compression #{self._compression_count})" ) - return new_steps + # Apply knowledge updates if any were extracted + updated_knowledge = knowledge + if knowledge_updates: + updated_knowledge = merge_context(knowledge, knowledge_updates) + knowledge_chars = len(updated_knowledge) if updated_knowledge else 0 + if self.agent_logger: + tag_names = list_xml_tag_names(updated_knowledge) + self.agent_logger.log_markdown( + content=f"Knowledge updated during compression. " + f"Store: {knowledge_chars:,} chars, sections: {tag_names}.", + title="Knowledge Update (Phase 1)", + level=LogLevel.INFO, + ) + + return new_steps, updated_knowledge def should_merge_compressed(self, steps: list[MemoryStep]) -> bool: """Check if compressed steps should be merged. @@ -561,30 +920,30 @@ def should_merge_compressed(self, steps: list[MemoryStep]) -> bool: mergeable_count = compressed_count - self.config.keep_compressed_steps return mergeable_count >= 2 - def merge_compressed(self, steps: list[MemoryStep]) -> list[MemoryStep]: - """Merge multiple CompressedHistoryStep instances into one. + def merge_compressed(self, steps: list[MemoryStep], knowledge: str = "") -> tuple[list[MemoryStep], str]: + """Merge older compressed history steps by extracting knowledge. - Collects all compressed history steps, generates a consolidated summary - via LLM, and replaces them with a single merged CompressedHistoryStep. - If keep_compressed_steps is set, the most recent N compressed steps are - preserved and only older ones are merged. + Instead of rewriting all compressed summaries into a single prose summary, + this method extracts tagged XML knowledge from the older compressed steps + and merges it into the existing knowledge store using ``merge_context()``. + The merged compressed steps are then removed from the step list. - Note: This is a lossy operation (summary of summaries). Information - fidelity decreases with each merge cycle. + If keep_compressed_steps is set, the most recent N compressed steps are + preserved and only older ones are processed. Args: steps: Current list of memory steps. + knowledge: Current knowledge string (tagged XML). Returns: - New list with older compressed history steps merged into one, - preserving the most recent keep_compressed_steps instances. + Tuple of (new_steps, updated_knowledge). """ start_time = time.time() compressed_steps = [step for step in steps if isinstance(step, CompressedHistoryStep)] if len(compressed_steps) <= 1: - return steps + return steps, knowledge # Determine which compressed steps to keep vs merge keep_count = self.config.keep_compressed_steps @@ -598,7 +957,7 @@ def merge_compressed(self, steps: list[MemoryStep]) -> list[MemoryStep]: steps_to_merge = compressed_steps if len(steps_to_merge) <= 1: - return steps # Not enough to merge + return steps, knowledge # Not enough to merge # Skip merge if combined content is too small to be worth an LLM call pre_merge_chars = sum(len(step.summary) for step in steps_to_merge) @@ -611,10 +970,18 @@ def merge_compressed(self, steps: list[MemoryStep]) -> list[MemoryStep]: ) else: logger.info(f"Merge skipped: combined summaries ({pre_merge_chars} chars) < min_compression_chars ({self.config.min_compression_chars})") - return steps - - # Build merge prompt and call LLM - merge_prompt = create_merge_prompt(steps_to_merge) + return steps, knowledge + + # Build knowledge extraction prompt and call LLM + # post_steps: everything NOT being merged (kept compressed + live recent steps) + # so the extractor knows what is still current vs already superseded + merge_set_ids = set(id(s) for s in steps_to_merge) + post_steps = [ + s for s in steps + if id(s) not in merge_set_ids and not isinstance(s, (TaskStep, SystemPromptStep)) + ] + existing_tag_names = list_xml_tag_names(knowledge) + merge_prompt = create_knowledge_extraction_prompt(steps_to_merge, existing_tag_names, post_steps) try: merge_message = self.compression_model.generate( @@ -625,85 +992,50 @@ def merge_compressed(self, steps: list[MemoryStep]) -> list[MemoryStep]: ) ] ) - merged_summary = merge_message.content - if isinstance(merged_summary, list): - merged_summary = " ".join( - item.get("text", "") for item in merged_summary if isinstance(item, dict) + knowledge_updates = merge_message.content + if isinstance(knowledge_updates, list): + knowledge_updates = " ".join( + item.get("text", "") for item in knowledge_updates if isinstance(item, dict) ) - - merge_token_usage = merge_message.token_usage except Exception as e: - logger.warning(f"Compressed step merge failed, keeping original steps: {e}") - return steps + logger.warning(f"Knowledge extraction failed, keeping original steps: {e}") + return steps, knowledge - # Safety check: skip merge if consolidated summary is larger than combined originals + # Apply the tagged XML diff to the knowledge store original_chars = sum(len(step.summary) for step in steps_to_merge) - merged_chars = len(merged_summary) if merged_summary else 0 - if merged_chars >= original_chars: - if self.agent_logger: - self.agent_logger.log_markdown( - content=f"Merge skipped: consolidated summary ({merged_chars:,} chars) " - f"is larger than combined originals ({original_chars:,} chars)", - title="Compressed Step Merge Skipped", - level=LogLevel.INFO, - ) - else: - logger.info( - f"Merge skipped: consolidated summary ({merged_chars} chars) " - f">= combined originals ({original_chars} chars)" - ) - return steps - - # Accumulate metadata from merged compressed steps - all_step_numbers = [] - total_original_count = 0 - for step in steps_to_merge: - all_step_numbers.extend(step.compressed_step_numbers) - total_original_count += step.original_step_count - - merged_step = CompressedHistoryStep( - summary=merged_summary, - compressed_step_numbers=sorted(set(all_step_numbers)), - original_step_count=total_original_count, - timing=Timing(start_time=start_time, end_time=time.time()), - compression_token_usage=merge_token_usage, - ) + updated_knowledge = merge_context(knowledge, knowledge_updates) if knowledge_updates else knowledge + + # Accumulate metadata + total_original_count = sum(step.original_step_count for step in steps_to_merge) - # Rebuild steps: replace merged CompressedHistoryStep instances with the single - # merged one, while preserving kept compressed steps in their original positions + # Remove merged compressed steps from the step list merge_set = set(id(step) for step in steps_to_merge) - new_steps = [] - merged_inserted = False - - for step in steps: - if isinstance(step, CompressedHistoryStep) and id(step) in merge_set: - if not merged_inserted: - new_steps.append(merged_step) - merged_inserted = True - # Skip subsequent merged compressed steps (replaced by merged) - else: - new_steps.append(step) + new_steps = [step for step in steps if not (isinstance(step, CompressedHistoryStep) and id(step) in merge_set)] # Log kept_count = len(compressed_steps) - len(steps_to_merge) + knowledge_chars = len(updated_knowledge) if updated_knowledge else 0 + elapsed = time.time() - start_time if self.agent_logger: - compression_ratio = (1 - merged_chars / original_chars) * 100 if original_chars > 0 else 0 kept_msg = f" Kept {kept_count} recent compressed steps." if kept_count > 0 else "" + tag_names = list_xml_tag_names(updated_knowledge) self.agent_logger.log_markdown( - content=f"Merged {len(steps_to_merge)} compressed steps " - f"({total_original_count} original steps) from {original_chars:,} chars " - f"to {merged_chars:,} chars ({compression_ratio:.1f}% reduction).{kept_msg}", - title="Compressed Step Merge", + content=f"Extracted knowledge from {len(steps_to_merge)} compressed steps " + f"({total_original_count} original steps, {original_chars:,} chars). " + f"Knowledge store: {knowledge_chars:,} chars, sections: {tag_names}. " + f"Elapsed: {elapsed:.1f}s.{kept_msg}", + title="Knowledge Extraction", level=LogLevel.INFO, ) else: kept_msg = f", kept {kept_count} recent" if kept_count > 0 else "" logger.info( - f"Merged {len(steps_to_merge)} compressed steps " - f"({total_original_count} original steps) into single summary{kept_msg}" + f"Extracted knowledge from {len(steps_to_merge)} compressed steps " + f"({total_original_count} original steps) into knowledge store " + f"({knowledge_chars} chars){kept_msg}" ) - return new_steps + return new_steps, updated_knowledge def create_compression_callback(compressor: ContextCompressor) -> Callable: @@ -727,9 +1059,11 @@ def compression_callback(step: MemoryStep, agent: "MultiStepAgent") -> None: return if compressor.should_compress(agent.memory.steps): - agent.memory.steps = compressor.compress(agent.memory.steps) + agent.memory.steps, agent.memory.knowledge = compressor.compress(agent.memory.steps, agent.memory.knowledge) if compressor.should_merge_compressed(agent.memory.steps): - agent.memory.steps = compressor.merge_compressed(agent.memory.steps) + agent.memory.steps, agent.memory.knowledge = compressor.merge_compressed( + agent.memory.steps, agent.memory.knowledge + ) return compression_callback diff --git a/src/smolagents/bp_session.py b/src/smolagents/bp_session.py index c55d0e8d2..17c4d69a0 100644 --- a/src/smolagents/bp_session.py +++ b/src/smolagents/bp_session.py @@ -289,6 +289,7 @@ def save_session_to_dict(agent, session_stats: dict) -> dict: "system_prompt": agent.memory.system_prompt.system_prompt, "next_actionstep_id": agent._next_actionstep_id, "last_plan_step": agent._last_plan_step, + "knowledge": getattr(agent.memory, "knowledge", ""), }, "session_stats": dict(session_stats), "monitor_state": { @@ -322,6 +323,8 @@ def load_session_from_dict(payload: dict, agent) -> dict: agent_state = payload.get("agent_state", {}) agent.memory.system_prompt = SystemPromptStep(system_prompt=agent_state.get("system_prompt", "")) agent.memory.steps = [deserialize_step(s) for s in payload.get("steps", [])] + if hasattr(agent.memory, "knowledge"): + agent.memory.knowledge = agent_state.get("knowledge", "") # Restore agent counters agent._next_actionstep_id = agent_state.get("next_actionstep_id", 1) @@ -360,6 +363,7 @@ def save_session(filepath: str, agent, session_stats: dict) -> int: "system_prompt": agent.memory.system_prompt.system_prompt, "next_actionstep_id": agent._next_actionstep_id, "last_plan_step": agent._last_plan_step, + "knowledge": getattr(agent.memory, "knowledge", ""), }, "session_stats": dict(session_stats), "monitor_state": { @@ -399,6 +403,8 @@ def load_session(filepath: str, agent) -> dict: agent_state = payload.get("agent_state", {}) agent.memory.system_prompt = SystemPromptStep(system_prompt=agent_state.get("system_prompt", "")) agent.memory.steps = [deserialize_step(s) for s in payload.get("steps", [])] + if hasattr(agent.memory, "knowledge"): + agent.memory.knowledge = agent_state.get("knowledge", "") # Restore agent counters agent._next_actionstep_id = agent_state.get("next_actionstep_id", 1) diff --git a/src/smolagents/bp_tools.py b/src/smolagents/bp_tools.py index 75eb9cee3..b52d3de2d 100644 --- a/src/smolagents/bp_tools.py +++ b/src/smolagents/bp_tools.py @@ -2541,6 +2541,7 @@ class PlanningTool(Tool): """ name = "plan" + should_add_tool_description_into_system_prompt = True description = ( "Call this tool whenever you need help to create or update your plan. " "Use it when starting a complex task, when your current approach is failing, " @@ -2647,6 +2648,7 @@ class MoveActionStepToMemory(Tool): """ name = "move_actionstep_to_memory" + should_add_tool_description_into_system_prompt = True description = ( "Move content from a specific ActionStep out of the active context into memory. " "This reduces context size while preserving the original content for later retrieval. " @@ -2726,6 +2728,7 @@ class RetrieveActionStepFromMemory(Tool): """ name = "move_actionstep_from_memory" + should_add_tool_description_into_system_prompt = True description = ( "Restore content that was previously moved to memory back into the active context. " "Use this when you need to re-examine a step's response or model_output that was archived. " @@ -2802,6 +2805,7 @@ class SummarizeActionStep(Tool): """ name = "summarize_actionstep" + should_add_tool_description_into_system_prompt = True description = ( "Summarize content from a specific ActionStep using custom instructions. " "This replaces the content with an LLM-generated summary while archiving the original for later retrieval. " @@ -2993,6 +2997,66 @@ def load_image_callback(memory_step, agent=None): ) +class UpdateKnowledge(Tool): + """A tool that allows the agent to update its persistent knowledge store. + + The knowledge store is a tagged XML string that survives compression cycles + and is injected into the agent's context near the end of each turn. + Updates use a tag-based merge: existing tags are updated, empty/self-closing + tags are deleted, and new tags are appended. + + Must be bound to an agent via ``set_agent`` before use. + """ + + name = "update_knowledge" + should_add_tool_description_into_system_prompt = True + description = ( + "Update your persistent knowledge store with tagged XML sections.\n\n" + "You have a section in your context containing your long-term notes. " + "It survives context compression and is always visible to you.\n\n" + "Usage: update_knowledge(updates='content')\n\n" + "Rules:\n" + "- To ADD a new section: use a new descriptive tag name\n" + "- To UPDATE an existing section: use the same tag name with new content\n" + "- To DELETE a section no longer relevant: use a self-closing tag \n" + "- Tag names are free: use descriptive names like , , , \n" + "- Only include sections you want to change\n\n" + "Example:\n" + " update_knowledge('1. Setup done\\n2. Now implementing API')\n\n" + "Use this to note important discoveries, track your plan, or remove stale information." + ) + inputs = { + "updates": { + "type": "string", + "description": "Tagged XML with sections to add, update, or delete. " + "Use content to add/update, to delete.", + }, + } + output_type = "string" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self._agent = None + + def set_agent(self, agent): + """Bind this tool to an agent so it can access the knowledge store.""" + self._agent = agent + + def forward(self, updates: str) -> str: + if self._agent is None: + return "Error: UpdateKnowledge is not bound to an agent. Call set_agent() first." + + if not updates or not updates.strip(): + return "Error: No updates provided." + + from smolagents.bp_compression import merge_context, list_xml_tag_names + + old_knowledge = self._agent.memory.knowledge or "" + self._agent.memory.knowledge = merge_context(old_knowledge, updates) + tag_names = list_xml_tag_names(self._agent.memory.knowledge) + return f"Knowledge updated. Current sections: {tag_names}" + + class GetToolDescriptionsTool(Tool): """Tool that returns full descriptions for specified tools, enabling compact tool listings in the system prompt.""" diff --git a/src/smolagents/memory.py b/src/smolagents/memory.py index 091b318c4..dfa039ee7 100644 --- a/src/smolagents/memory.py +++ b/src/smolagents/memory.py @@ -245,10 +245,12 @@ class AgentMemory: def __init__(self, system_prompt: str): self.system_prompt: SystemPromptStep = SystemPromptStep(system_prompt=system_prompt) self.steps: list[TaskStep | ActionStep | PlanningStep] = [] + self.knowledge: str = "" def reset(self): """Reset the agent's memory, clearing all steps and keeping the system prompt.""" self.steps = [] + self.knowledge = "" def get_succinct_steps(self) -> list[dict]: """Return a succinct representation of the agent's steps, excluding model input messages.""" diff --git a/src/smolagents/monitoring.py b/src/smolagents/monitoring.py index 3e54ad0a6..8e5289103 100644 --- a/src/smolagents/monitoring.py +++ b/src/smolagents/monitoring.py @@ -79,10 +79,11 @@ def __repr__(self) -> str: class Monitor: - def __init__(self, tracked_model, logger): + def __init__(self, tracked_model, logger, memory=None): self.step_durations = [] self.tracked_model = tracked_model self.logger = logger + self.memory = memory self.total_input_token_count = 0 self.total_output_token_count = 0 @@ -121,6 +122,9 @@ def update_metrics(self, step_log): step_log.context_chars = ctx_chars console_outputs += f"| Context: {ctx_chars:,} chars" + knowledge = getattr(self.memory, "knowledge", "") if self.memory else "" + if knowledge: + console_outputs += f"| Knowledge: {len(knowledge):,} chars" console_outputs += "]" self.logger.log(Text(console_outputs, style="dim"), level=1) diff --git a/src/smolagents/prompts/code_agent.yaml b/src/smolagents/prompts/code_agent.yaml index 4d6c1e85c..b64c6f13f 100644 --- a/src/smolagents/prompts/code_agent.yaml +++ b/src/smolagents/prompts/code_agent.yaml @@ -112,6 +112,10 @@ system_prompt: |- {{ tool.to_compact_prompt() }} {%- endfor %} ``` + {%- for tool in tools.values() if tool.should_add_tool_description_into_system_prompt %} + + {{ tool.name }}: {{ tool.description }} + {%- endfor %} {%- if managed_agents and managed_agents.values() | list %} You can also give tasks to team members. @@ -173,7 +177,12 @@ system_prompt: |- 22. If starting a completely new task unrelated to the previous, using summarization/memory or similar tools is a must. 23. Before you start coding, please search in the existing code for similar functions to those that you intend to implement. Avoid creating replicated code. + 24. If you have a tool to update knowledge, you must keep the knowledge updated. + In the case that you need a note pad, you can use the knowledge tool to store your notes inside . + You can also use the knowledge tool to keep your task list status inside . + These are ideas only. You can use the knowledge tool at your discretion. If you do use, you must keep the knowledge updated. + The `final_answer` tools ends the chat. Any final output that you would like to give such as "my name is Assistant" should be done via a python code block with final_answer("my name is Assistant"). This is an example of python calling code with "this is the final answer" as final answer: @@ -214,8 +223,8 @@ system_prompt: |- {{custom_instructions}} {%- endif %} - When the user asks me to run something with , it means that - the user wants me to respond with the ... string so the commands + When the user asks to run something with , it means that + the user wants a response with the ... string so the commands will run in his device. If you try to run or at your end, you will fail. But, when you respond with the and tags (text), these tags will be run/saved in the user's device. @@ -230,7 +239,7 @@ system_prompt: |- at each step, consider if you should summarize or move the previous step to memory, keeping only the relevant information. It is common to have long execution outputs bloated with warnings and irrelevant information. Each step's response is tagged with step="N" (e.g. ) — use that number as the actionstep_id when calling these tools. - If you prefer, you can just move the previous step (or any previous step) to memory and write the relevant information + If you prefer, you can just move the previous step (or any other previous step) to memory and write the relevant information in your current section. When coding, try to make the smallest possible code to achieve the goal keeping good code quality. @@ -274,6 +283,8 @@ planning: Try to make the smallest possible plan to achieve the goal keeping good outcome quality. If summarization/memory or similar tools are available, recommend to use these tools (or even add as tasks) before starting new major steps if applicable. If starting a completely new task unrelated to the previous, using summarization/memory or similar tools is a must. + If the agent has access to a knowledge tool, consider recommending the agent to store and keep updated the task list via the knowledge tool. + update_plan_pre_messages: |- Create a simple and doable plan towards solving a task. diff --git a/src/smolagents/prompts/structured_code_agent.yaml b/src/smolagents/prompts/structured_code_agent.yaml index 6d77bd8ba..1a66b624a 100644 --- a/src/smolagents/prompts/structured_code_agent.yaml +++ b/src/smolagents/prompts/structured_code_agent.yaml @@ -81,6 +81,10 @@ system_prompt: |- {{ tool.to_compact_prompt() }} {%- endfor %} ``` + {%- for tool in tools.values() if tool.should_add_tool_description_into_system_prompt %} + + {{ tool.name }}: {{ tool.description }} + {%- endfor %} {%- if managed_agents and managed_agents.values() | list %} You can also give tasks to team members. diff --git a/src/smolagents/prompts/toolcalling_agent.yaml b/src/smolagents/prompts/toolcalling_agent.yaml index be3162571..82e24bb0d 100644 --- a/src/smolagents/prompts/toolcalling_agent.yaml +++ b/src/smolagents/prompts/toolcalling_agent.yaml @@ -93,6 +93,10 @@ system_prompt: |- {%- for tool in tools.values() %} - {{ tool.to_compact_prompt() }} {%- endfor %} + {%- for tool in tools.values() if tool.should_add_tool_description_into_system_prompt %} + + {{ tool.name }}: {{ tool.description }} + {%- endfor %} {%- if managed_agents and managed_agents.values() | list %} You can also give tasks to team members. diff --git a/src/smolagents/tools.py b/src/smolagents/tools.py index b5dad0a04..c9ad0b036 100644 --- a/src/smolagents/tools.py +++ b/src/smolagents/tools.py @@ -133,6 +133,7 @@ class Tool(BaseTool): inputs: dict[str, dict[str, str | type | bool]] output_type: str output_schema: dict[str, Any] | None = None + should_add_tool_description_into_system_prompt: bool = False def __init__(self, *args, **kwargs): self.is_initialized = False diff --git a/tests/test_bp_session.py b/tests/test_bp_session.py index f5b5e14f9..042b2b94f 100644 --- a/tests/test_bp_session.py +++ b/tests/test_bp_session.py @@ -481,3 +481,55 @@ def test_minimal_action_step(self, tmp_path): assert loaded.observations_images is None assert loaded.token_usage is None assert loaded.is_final_answer is False + + +class TestKnowledgeRoundTrip: + def test_knowledge_saved_and_restored(self, tmp_path): + """Knowledge store should survive save/load roundtrip.""" + filepath = str(tmp_path / "knowledge.json") + agent = FakeAgent() + agent.memory.knowledge = "PostgreSQL 15\nJWT tokens" + stats = {"turns": 5, "total_time": 10.0, "total_input_tokens": 100, "total_output_tokens": 50} + + save_session(filepath, agent, stats) + + agent2 = FakeAgent() + assert agent2.memory.knowledge == "" + load_session(filepath, agent2) + assert agent2.memory.knowledge == "PostgreSQL 15\nJWT tokens" + + def test_empty_knowledge_roundtrip(self, tmp_path): + """Empty knowledge should remain empty after roundtrip.""" + filepath = str(tmp_path / "no_knowledge.json") + agent = FakeAgent() + stats = {"turns": 0, "total_time": 0.0, "total_input_tokens": 0, "total_output_tokens": 0} + + save_session(filepath, agent, stats) + agent2 = FakeAgent() + load_session(filepath, agent2) + assert agent2.memory.knowledge == "" + + def test_backward_compatible_load(self, tmp_path): + """Loading a session saved without knowledge field should set knowledge to empty.""" + import json + filepath = str(tmp_path / "old_session.json") + # Simulate old session format without knowledge key + old_payload = { + "version": 1, + "saved_at": "2025-01-01T00:00:00+00:00", + "agent_state": { + "system_prompt": "You are a helpful assistant.", + "next_actionstep_id": 1, + "last_plan_step": 0, + }, + "session_stats": {"turns": 0, "total_time": 0.0, "total_input_tokens": 0, "total_output_tokens": 0}, + "monitor_state": {"total_input_token_count": 0, "total_output_token_count": 0}, + "steps": [], + } + with open(filepath, "w") as f: + json.dump(old_payload, f) + + agent = FakeAgent() + agent.memory.knowledge = "should be cleared" + load_session(filepath, agent) + assert agent.memory.knowledge == "" diff --git a/tests/test_compression.py b/tests/test_compression.py index 997b54bef..59a797a1b 100644 --- a/tests/test_compression.py +++ b/tests/test_compression.py @@ -9,6 +9,7 @@ ContextCompressor, create_compression_callback, create_compression_prompt, + parse_compression_output, create_merge_prompt, estimate_tokens, estimate_step_tokens, @@ -275,9 +276,144 @@ def test_creates_prompt_for_planning_steps(self): ), ] prompt = create_compression_prompt(steps) - assert "Planning step:" in prompt + assert "" in prompt assert "First step" in prompt + def test_creates_prompt_with_knowledge(self): + steps = [ + ActionStep( + step_number=1, + model_input_messages=[], + model_output="Found the config file", + observations="config.yaml loaded", + model_output_message=ChatMessage(role=MessageRole.ASSISTANT, content="Found config"), + timing=Timing(start_time=0, end_time=1), + ), + ] + knowledge = "Step 1 done\nREST API" + prompt = create_compression_prompt(steps, knowledge=knowledge) + assert "" in prompt + assert "plan" in prompt + assert "architecture" in prompt + assert "Do NOT repeat" in prompt + assert "" in prompt + + def test_creates_prompt_without_knowledge(self): + steps = [ + ActionStep( + step_number=1, + model_input_messages=[], + model_output="Found the config file", + observations="config.yaml loaded", + model_output_message=ChatMessage(role=MessageRole.ASSISTANT, content="Found config"), + timing=Timing(start_time=0, end_time=1), + ), + ] + prompt = create_compression_prompt(steps, knowledge="") + assert "" not in prompt + # Should still mention knowledge_updates as optional output + assert "" in prompt + + def test_creates_prompt_with_empty_knowledge(self): + steps = [ + ActionStep( + step_number=1, + model_input_messages=[], + model_output="Found the config file", + observations="config.yaml loaded", + model_output_message=ChatMessage(role=MessageRole.ASSISTANT, content="Found config"), + timing=Timing(start_time=0, end_time=1), + ), + ] + prompt = create_compression_prompt(steps, knowledge=" ") + assert "" not in prompt + + + + def test_creates_prompt_with_existing_summaries(self): + steps = [ + ActionStep( + step_number=5, + model_input_messages=[], + model_output="Implemented API endpoint", + observations="Tests pass", + model_output_message=ChatMessage(role=MessageRole.ASSISTANT, content="Done"), + timing=Timing(start_time=0, end_time=1), + ), + ] + summaries = [ + CompressedHistoryStep( + summary="Set up database and created schema.", + compressed_step_numbers=[1, 2], + original_step_count=2, + ), + ] + prompt = create_compression_prompt(steps, knowledge="", existing_summaries=summaries) + assert "" in prompt + assert "Set up database" in prompt + assert "Do NOT repeat" in prompt + + def test_creates_prompt_with_both_history_and_knowledge(self): + steps = [ + ActionStep( + step_number=5, + model_input_messages=[], + model_output="Fixed the bug", + observations="All tests pass", + model_output_message=ChatMessage(role=MessageRole.ASSISTANT, content="Done"), + timing=Timing(start_time=0, end_time=1), + ), + ] + summaries = [ + CompressedHistoryStep( + summary="Explored codebase and found the bug.", + compressed_step_numbers=[1, 2, 3], + original_step_count=3, + ), + ] + knowledge = "REST API with PostgreSQL" + prompt = create_compression_prompt(steps, knowledge=knowledge, existing_summaries=summaries) + assert "" in prompt + assert "" in prompt + assert "compressed history" in prompt.lower() + assert "knowledge" in prompt.lower() + + + +class TestParseCompressionOutput: + def test_parses_structured_output(self): + raw = "My summary here.\n\nStep 1\n" + summary, updates = parse_compression_output(raw) + assert summary == "My summary here." + assert "Step 1" in updates + + def test_parses_summary_only(self): + raw = "Just a summary." + summary, updates = parse_compression_output(raw) + assert summary == "Just a summary." + assert updates == "" + + def test_fallback_no_tags(self): + raw = "Plain text summary without any tags." + summary, updates = parse_compression_output(raw) + assert summary == "Plain text summary without any tags." + assert updates == "" + + def test_fallback_no_summary_tag_with_knowledge(self): + raw = "Some summary text\n\nDo X\n" + summary, updates = parse_compression_output(raw) + assert summary == "Some summary text" + assert "Do X" in updates + + def test_empty_input(self): + summary, updates = parse_compression_output("") + assert summary == "" + assert updates == "" + + def test_none_input(self): + summary, updates = parse_compression_output(None) + assert summary == "" + assert updates == "" class TestCreateMergePrompt: def test_creates_prompt_from_compressed_steps(self): @@ -368,15 +504,16 @@ def test_compress_returns_original_when_not_needed(self): config = CompressionConfig(max_uncompressed_steps=20, keep_recent_steps=5) compressor = ContextCompressor(config, MagicMock()) steps = [ActionStep(step_number=i, timing=Timing(start_time=0, end_time=1)) for i in range(5)] - result = compressor.compress(steps) - assert result == steps + new_steps, new_knowledge = compressor.compress(steps) + assert new_steps == steps + assert new_knowledge == "" def test_compress_creates_compressed_step(self): - config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2) + config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Summary of steps 0-5.", + content="Summary of steps 0-5.", token_usage=TokenUsage(input_tokens=100, output_tokens=50), ) compressor = ContextCompressor(config, mock_model) @@ -392,22 +529,77 @@ def test_compress_creates_compressed_step(self): for i in range(8) ] - result = compressor.compress(steps) + new_steps, new_knowledge = compressor.compress(steps) # Should have compressed step + 2 recent steps - assert len(result) < len(steps) + assert len(new_steps) < len(steps) # First step should be CompressedHistoryStep - assert isinstance(result[0], CompressedHistoryStep) - assert "Summary of steps" in result[0].summary + assert isinstance(new_steps[0], CompressedHistoryStep) + assert "Summary of steps" in new_steps[0].summary # Model should have been called mock_model.generate.assert_called_once() + # No knowledge updates in this case + assert new_knowledge == "" + + def test_compress_extracts_knowledge_updates(self): + config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2, min_compression_chars=0) + mock_model = MagicMock() + mock_model.generate.return_value = ChatMessage( + role=MessageRole.ASSISTANT, + content="Summary of work done.\n\nStep 1 complete\n", + token_usage=TokenUsage(input_tokens=100, output_tokens=50), + ) + compressor = ContextCompressor(config, mock_model) + + steps = [ + ActionStep( + step_number=i, + timing=Timing(start_time=0, end_time=1), + model_output=f"Output {i}", + observations=f"Observation {i}", + ) + for i in range(8) + ] + + new_steps, new_knowledge = compressor.compress(steps) + + assert isinstance(new_steps[0], CompressedHistoryStep) + assert "Summary of work done" in new_steps[0].summary + assert "Step 1 complete" in new_knowledge + + def test_compress_fallback_no_summary_tags(self): + """When LLM doesn't use tags, entire output becomes the summary.""" + config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2, min_compression_chars=0) + mock_model = MagicMock() + mock_model.generate.return_value = ChatMessage( + role=MessageRole.ASSISTANT, + content="Plain text summary without tags.", + token_usage=TokenUsage(input_tokens=100, output_tokens=50), + ) + compressor = ContextCompressor(config, mock_model) + + steps = [ + ActionStep( + step_number=i, + timing=Timing(start_time=0, end_time=1), + model_output=f"Output {i}", + observations=f"Observation {i}", + ) + for i in range(8) + ] + + new_steps, new_knowledge = compressor.compress(steps) + + assert isinstance(new_steps[0], CompressedHistoryStep) + assert "Plain text summary without tags" in new_steps[0].summary + assert new_knowledge == "" def test_compress_preserves_task_step(self): config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Summary", + content="Summary", token_usage=TokenUsage(input_tokens=100, output_tokens=50), ) compressor = ContextCompressor(config, mock_model) @@ -418,11 +610,11 @@ def test_compress_preserves_task_step(self): for i in range(10) ]) - result = compressor.compress(steps) + new_steps, _ = compressor.compress(steps) # TaskStep should be first - assert isinstance(result[0], TaskStep) - assert result[0].task == "Original task" + assert isinstance(new_steps[0], TaskStep) + assert new_steps[0].task == "Original task" def test_compress_handles_model_failure_gracefully(self): config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2) @@ -435,9 +627,10 @@ def test_compress_handles_model_failure_gracefully(self): for i in range(10) ] - # Should return original steps when compression fails - result = compressor.compress(steps) - assert result == steps + # Should return original steps and knowledge when compression fails + new_steps, new_knowledge = compressor.compress(steps) + assert new_steps == steps + assert new_knowledge == "" def test_should_merge_compressed_false_when_disabled(self): config = CompressionConfig(max_compressed_steps=0) @@ -482,15 +675,16 @@ def test_merge_compressed_returns_original_when_single(self): CompressedHistoryStep(summary="Only one", compressed_step_numbers=[1], original_step_count=1), ActionStep(step_number=5, timing=Timing(start_time=0, end_time=1)), ] - result = compressor.merge_compressed(steps) - assert result == steps + result_steps, result_knowledge = compressor.merge_compressed(steps) + assert result_steps == steps + assert result_knowledge == "" - def test_merge_compressed_creates_merged_step(self): - config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=0) + def test_merge_compressed_extracts_knowledge(self): + config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=0, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Consolidated summary.", + content="Useful data found and analyzed.Final output prepared.", token_usage=TokenUsage(input_tokens=200, output_tokens=30), ) compressor = ContextCompressor(config, mock_model) @@ -515,19 +709,16 @@ def test_merge_compressed_creates_merged_step(self): ActionStep(step_number=9, timing=Timing(start_time=0, end_time=1)), ] - result = compressor.merge_compressed(steps) + result_steps, result_knowledge = compressor.merge_compressed(steps) - # Should have: TaskStep + 1 merged CompressedHistoryStep + ActionStep - assert len(result) == 3 - assert isinstance(result[0], TaskStep) - assert isinstance(result[1], CompressedHistoryStep) - assert isinstance(result[2], ActionStep) + # Compressed steps should be removed, leaving TaskStep + ActionStep + assert len(result_steps) == 2 + assert isinstance(result_steps[0], TaskStep) + assert isinstance(result_steps[1], ActionStep) - merged = result[1] - assert merged.summary == "Consolidated summary." - assert merged.original_step_count == 8 # 3 + 3 + 2 - assert merged.compressed_step_numbers == [1, 2, 3, 4, 5, 6, 7, 8] - assert merged.compression_token_usage.input_tokens == 200 + # Knowledge should contain the extracted tags + assert "" in result_knowledge + assert "" in result_knowledge mock_model.generate.assert_called_once() @@ -542,27 +733,41 @@ def test_merge_compressed_handles_model_failure(self): CompressedHistoryStep(summary="Summary B", compressed_step_numbers=[2], original_step_count=1), ] - result = compressor.merge_compressed(steps) - assert result == steps + result_steps, result_knowledge = compressor.merge_compressed(steps) + assert result_steps == steps + assert result_knowledge == "" - def test_merge_compressed_skips_when_merged_is_larger(self): - config = CompressionConfig(max_compressed_steps=1, keep_compressed_steps=0) + def test_merge_compressed_updates_existing_knowledge(self): + config = CompressionConfig(max_compressed_steps=1, keep_compressed_steps=0, min_compression_chars=0) mock_model = MagicMock() - # Return a summary that's longer than the combined originals mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="This merged summary is intentionally much longer than the originals to trigger the safety check.", + content="All tasks complete", token_usage=TokenUsage(input_tokens=100, output_tokens=50), ) compressor = ContextCompressor(config, mock_model) steps = [ - CompressedHistoryStep(summary="Short A", compressed_step_numbers=[1], original_step_count=1), - CompressedHistoryStep(summary="Short B", compressed_step_numbers=[2], original_step_count=1), + CompressedHistoryStep( + summary="Agent searched for information and found useful data about the topic.", + compressed_step_numbers=[1], + original_step_count=1, + ), + CompressedHistoryStep( + summary="Agent analyzed the data and drew conclusions about the results.", + compressed_step_numbers=[2], + original_step_count=1, + ), ] - result = compressor.merge_compressed(steps) - assert result == steps # Should keep originals + existing_knowledge = "PostgreSQL\nIn progress" + result_steps, result_knowledge = compressor.merge_compressed(steps, existing_knowledge) + # Compressed steps removed + assert len(result_steps) == 0 + # Knowledge should have updated status and kept db + assert "PostgreSQL" in result_knowledge + assert "All tasks complete" in result_knowledge + assert "In progress" not in result_knowledge def test_should_merge_compressed_false_when_not_enough_mergeable(self): """With keep_compressed_steps=2 and 3 compressed steps, only 1 is mergeable (need 2).""" @@ -585,11 +790,11 @@ def test_should_merge_compressed_true_with_enough_mergeable(self): assert compressor.should_merge_compressed(steps) is True def test_merge_compressed_keeps_recent_compressed_steps(self): - config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=1) + config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=1, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Consolidated summary.", + content="Useful data about the topic.", token_usage=TokenUsage(input_tokens=200, output_tokens=30), ) compressor = ContextCompressor(config, mock_model) @@ -614,33 +819,29 @@ def test_merge_compressed_keeps_recent_compressed_steps(self): ActionStep(step_number=9, timing=Timing(start_time=0, end_time=1)), ] - result = compressor.merge_compressed(steps) - - # Should have: TaskStep + 1 merged + 1 kept compressed + ActionStep - assert len(result) == 4 - assert isinstance(result[0], TaskStep) - assert isinstance(result[1], CompressedHistoryStep) # merged - assert isinstance(result[2], CompressedHistoryStep) # kept (most recent) - assert isinstance(result[3], ActionStep) + result_steps, result_knowledge = compressor.merge_compressed(steps) - # The merged step should only cover the first 2 compressed steps - merged = result[1] - assert merged.summary == "Consolidated summary." - assert merged.original_step_count == 6 # 3 + 3 - assert merged.compressed_step_numbers == [1, 2, 3, 4, 5, 6] + # Merged compressed steps removed, kept one remains: TaskStep + 1 kept compressed + ActionStep + assert len(result_steps) == 3 + assert isinstance(result_steps[0], TaskStep) + assert isinstance(result_steps[1], CompressedHistoryStep) # kept (most recent) + assert isinstance(result_steps[2], ActionStep) # The kept step should be the most recent one (unchanged) - kept = result[2] + kept = result_steps[1] assert kept.summary == "Agent refined the analysis and prepared the final output." assert kept.compressed_step_numbers == [7, 8] + # Knowledge should have the extracted info + assert "" in result_knowledge + def test_merge_compressed_keep_zero_merges_all(self): - """With keep_compressed_steps=0 (default), all compressed steps are merged.""" - config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=0) + """With keep_compressed_steps=0 (default), all compressed steps are removed and knowledge extracted.""" + config = CompressionConfig(max_compressed_steps=2, keep_compressed_steps=0, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Consolidated summary.", + content="All data searched, analyzed and output prepared.", token_usage=TokenUsage(input_tokens=200, output_tokens=30), ) compressor = ContextCompressor(config, mock_model) @@ -663,12 +864,12 @@ def test_merge_compressed_keep_zero_merges_all(self): ), ] - result = compressor.merge_compressed(steps) + result_steps, result_knowledge = compressor.merge_compressed(steps) - # All should be merged into one - assert len(result) == 1 - assert isinstance(result[0], CompressedHistoryStep) - assert result[0].original_step_count == 8 # 3 + 3 + 2 + # All compressed steps should be removed + assert len(result_steps) == 0 + # Knowledge should contain the extracted info + assert "" in result_knowledge def test_merge_compressed_keep_too_many_returns_original(self): """If keep_compressed_steps >= len-1, only 1 left to merge, so return original.""" @@ -689,16 +890,18 @@ def test_merge_compressed_keep_too_many_returns_original(self): ), ] - result = compressor.merge_compressed(steps) - assert result == steps # Nothing merged + result_steps, result_knowledge = compressor.merge_compressed(steps) + assert result_steps == steps # Nothing merged + assert result_knowledge == "" mock_model.generate.assert_not_called() - def test_merge_compressed_accumulates_overlapping_step_numbers(self): - config = CompressionConfig(max_compressed_steps=1, keep_compressed_steps=0) + def test_merge_compressed_removes_merged_steps(self): + """Verify that merged compressed steps are removed from the step list.""" + config = CompressionConfig(max_compressed_steps=1, keep_compressed_steps=0, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Merged.", + content="Important data", token_usage=TokenUsage(input_tokens=50, output_tokens=10), ) compressor = ContextCompressor(config, mock_model) @@ -714,19 +917,20 @@ def test_merge_compressed_accumulates_overlapping_step_numbers(self): compressed_step_numbers=[3, 4, 5], original_step_count=3, ), + ActionStep(step_number=6, timing=Timing(start_time=0, end_time=1)), ] - result = compressor.merge_compressed(steps) - merged = result[0] - # Overlapping step number 3 should be deduplicated - assert merged.compressed_step_numbers == [1, 2, 3, 4, 5] - # Total original count is the sum (not deduplicated) - assert merged.original_step_count == 6 + result_steps, result_knowledge = compressor.merge_compressed(steps) + # Only ActionStep should remain + assert len(result_steps) == 1 + assert isinstance(result_steps[0], ActionStep) + # Knowledge should be populated + assert "Important data" in result_knowledge class TestCreateCompressionCallback: def test_callback_triggers_compression(self): - config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2) + config = CompressionConfig(max_uncompressed_steps=3, keep_recent_steps=2, min_compression_chars=0) mock_model = MagicMock() mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, @@ -738,6 +942,7 @@ def test_callback_triggers_compression(self): # Create mock agent with memory (include content so original_chars > summary_chars) mock_agent = MagicMock() + mock_agent.memory.knowledge = "" mock_agent.memory.steps = [ ActionStep( step_number=i, @@ -785,12 +990,13 @@ def test_callback_triggers_merge(self): keep_recent_steps=2, max_compressed_steps=1, keep_compressed_steps=0, + min_compression_chars=0, ) mock_model = MagicMock() - # First call: compress, second call: merge + # First call: compress, second call: merge (knowledge extraction) mock_model.generate.return_value = ChatMessage( role=MessageRole.ASSISTANT, - content="Short.", + content="Knowledge from merge.", token_usage=TokenUsage(input_tokens=100, output_tokens=10), ) compressor = ContextCompressor(config, mock_model) @@ -798,6 +1004,7 @@ def test_callback_triggers_merge(self): # Set up agent with multiple compressed steps + action steps that exceed threshold mock_agent = MagicMock() + mock_agent.memory.knowledge = "" mock_agent.memory.steps = [ CompressedHistoryStep( summary="This is a long enough first summary that the merge will save space versus the originals.", @@ -840,3 +1047,101 @@ def test_callback_triggers_merge(self): # The model should have been called (compression and/or merge) assert mock_model.generate.call_count >= 1 + + +class TestMergeContext: + def test_append_new_tag(self): + from smolagents.bp_compression import merge_context + existing = "PostgreSQL" + updates = "JWT tokens" + result = merge_context(existing, updates) + assert "PostgreSQL" in result + assert "JWT tokens" in result + + def test_update_existing_tag(self): + from smolagents.bp_compression import merge_context + existing = "PostgreSQL\nIn progress" + updates = "Complete" + result = merge_context(existing, updates) + assert "PostgreSQL" in result + assert "Complete" in result + assert "In progress" not in result + + def test_delete_with_self_closing(self): + from smolagents.bp_compression import merge_context + existing = "PostgreSQL\nSome notes" + updates = "" + result = merge_context(existing, updates) + assert "PostgreSQL" in result + assert "old_notes" not in result + + def test_delete_with_empty_tag(self): + from smolagents.bp_compression import merge_context + existing = "PostgreSQL\nSome notes" + updates = "" + result = merge_context(existing, updates) + assert "PostgreSQL" in result + assert "old_notes" not in result + + def test_mixed_operations(self): + from smolagents.bp_compression import merge_context + existing = "Step 1\nMySQL\nRemove me" + updates = "Step 2JWT" + result = merge_context(existing, updates) + assert "Step 2" in result + assert "JWT" in result + assert "old" not in result.lower() or "MySQL" in result + + def test_empty_existing(self): + from smolagents.bp_compression import merge_context + result = merge_context("", "New data") + assert "New data" in result + + def test_empty_updates(self): + from smolagents.bp_compression import merge_context + result = merge_context("PostgreSQL", "") + assert "PostgreSQL" in result + + +class TestListXmlTagNames: + def test_basic(self): + from smolagents.bp_compression import list_xml_tag_names + text = "PostgreSQL\nJWT\nSteps" + tags = list_xml_tag_names(text) + assert tags == ["auth", "db", "plan"] + + def test_empty(self): + from smolagents.bp_compression import list_xml_tag_names + assert list_xml_tag_names("") == [] + + def test_no_tags(self): + from smolagents.bp_compression import list_xml_tag_names + assert list_xml_tag_names("plain text no tags") == [] + + +class TestCreateKnowledgeExtractionPrompt: + def test_includes_summaries(self): + from smolagents.bp_compression import create_knowledge_extraction_prompt, CompressedHistoryStep + steps = [ + CompressedHistoryStep(summary="Found PostgreSQL", compressed_step_numbers=[1], original_step_count=1), + ] + prompt = create_knowledge_extraction_prompt(steps) + assert "Found PostgreSQL" in prompt + + def test_includes_existing_tags(self): + from smolagents.bp_compression import create_knowledge_extraction_prompt, CompressedHistoryStep + steps = [ + CompressedHistoryStep(summary="Some info", compressed_step_numbers=[1], original_step_count=1), + ] + prompt = create_knowledge_extraction_prompt(steps, existing_tag_names=["db", "plan"]) + assert "db" in prompt + assert "plan" in prompt + + def test_without_existing_tags(self): + from smolagents.bp_compression import create_knowledge_extraction_prompt, CompressedHistoryStep + steps = [ + CompressedHistoryStep(summary="Info", compressed_step_numbers=[1], original_step_count=1), + ] + prompt = create_knowledge_extraction_prompt(steps, existing_tag_names=[]) + assert "no existing" in prompt.lower() or "new" in prompt.lower() or len(prompt) > 0