Skip to content

Latest commit

 

History

History
1337 lines (1052 loc) · 43.7 KB

File metadata and controls

1337 lines (1052 loc) · 43.7 KB

MCP Context Engineering Research 2025

Research Date: 2025-01-08
Purpose: Up-to-date best practices for context engineering using Model Context Protocol
Sprint: US-RAG-001
Status: Active Research


🎯 Executive Summary

Model Context Protocol (MCP) is Anthropic's standardized framework for enabling AI models to interact with external tools, resources, and context. Modern context engineering with MCP focuses on:

  1. Structured Context Access: Standardized interfaces for context retrieval
  2. Dynamic Context Management: Real-time context assembly and routing
  3. Persistent AI Memory: Maintaining context across sessions
  4. Secure Context Delivery: Defensive configurations and access controls

🏗️ MCP Architecture for Context Engineering

Core Components

┌─────────────────────────────────────────────────────┐
│                  AI Agent/Model                      │
│           (Context-Aware Decision Making)            │
└────────────────┬────────────────────────────────────┘
                 │ MCP Protocol
                 ▼
┌─────────────────────────────────────────────────────┐
│              MCP Context Server                      │
│  ┌──────────┬──────────┬──────────┬──────────┐     │
│  │  Tools   │Resources │ Prompts  │ Sampling │     │
│  └──────────┴──────────┴──────────┴──────────┘     │
└────────────────┬────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────┐
│            Context Engine (RAG)                      │
│  ┌──────────────────────────────────────────┐      │
│  │  Vector Store (FAISS)                     │      │
│  │  Embeddings (HuggingFace)                │      │
│  │  Semantic Search                          │      │
│  │  Pattern Learning                         │      │
│  └──────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────┘

Protocol Layers

1. Protocol Layer

  • Message framing and routing
  • Request/response linking
  • Error handling and recovery
  • High-level communication patterns

2. Transport Layer

  • stdio: For local process communication
  • HTTP + SSE: For client-server communication
  • WebSocket: For real-time bidirectional communication

3. Message Types

  • Requests: Operations requiring responses
  • Results: Successful operation outcomes
  • Errors: Operation failures with details
  • Notifications: One-way informational messages

📋 Context Engineering Best Practices 2025

1. Context Assembly

Dynamic Context Generation

class ContextAssembler:
    """Dynamically assemble context based on agent needs."""
    
    async def assemble_context(self, query: str, agent_type: str) -> Context:
        """
        Assemble context dynamically based on query and agent type.
        
        Best Practices:
        - Query expansion for better retrieval
        - Agent-specific context filtering
        - Relevance scoring and ranking
        - Context deduplication
        """
        # 1. Expand query for better retrieval
        expanded_query = await self.expand_query(query, agent_type)
        
        # 2. Retrieve from multiple sources
        semantic_results = await self.semantic_search(expanded_query)
        pattern_results = await self.pattern_match(query)
        knowledge_results = await self.knowledge_lookup(query)
        
        # 3. Score and rank by relevance
        scored_results = self.score_relevance(
            semantic_results + pattern_results + knowledge_results,
            query, agent_type
        )
        
        # 4. Deduplicate and filter
        filtered_results = self.deduplicate_and_filter(
            scored_results, 
            max_results=5,
            min_relevance=0.7
        )
        
        # 5. Structure for agent consumption
        return Context(
            query=query,
            results=filtered_results,
            metadata=self.extract_metadata(filtered_results),
            timestamp=datetime.now()
        )

2. Context Routing

Intelligent Context Distribution

class ContextRouter:
    """Route context to appropriate agents and tools."""
    
    def route_context(self, context: Context, intent: str) -> Route:
        """
        Route context based on intent and agent capabilities.
        
        Best Practices:
        - Intent detection and classification
        - Agent capability matching
        - Load balancing across agents
        - Fallback routing strategies
        """
        # 1. Detect intent
        intent_classification = self.classify_intent(intent)
        
        # 2. Match to agent capabilities
        capable_agents = self.find_capable_agents(
            intent_classification,
            context.metadata
        )
        
        # 3. Select optimal agent
        selected_agent = self.select_agent(
            capable_agents,
            load_balance=True,
            performance_history=True
        )
        
        # 4. Prepare context for agent
        agent_context = self.adapt_context_for_agent(
            context,
            selected_agent.requirements
        )
        
        return Route(
            agent=selected_agent,
            context=agent_context,
            fallback_agents=capable_agents[1:]
        )

3. Persistent Context

Maintaining Context Across Sessions

class PersistentContextManager:
    """Manage persistent context across agent sessions."""
    
    async def maintain_context(
        self, 
        session_id: str, 
        new_context: Context
    ) -> PersistentContext:
        """
        Maintain context across interactions.
        
        Best Practices:
        - Session-based context storage
        - Context decay and relevance updates
        - Memory compression for efficiency
        - Privacy-aware context retention
        """
        # 1. Retrieve existing session context
        existing_context = await self.get_session_context(session_id)
        
        # 2. Merge with new context
        merged_context = self.merge_contexts(
            existing_context,
            new_context,
            decay_factor=0.9
        )
        
        # 3. Compress if needed
        if len(merged_context) > self.max_context_size:
            merged_context = self.compress_context(
                merged_context,
                preserve_recent=True,
                preserve_important=True
            )
        
        # 4. Store updated context
        await self.store_session_context(session_id, merged_context)
        
        return merged_context

4. Context Quality Control

Ensuring High-Quality Context

class ContextQualityController:
    """Ensure high-quality context for agents."""
    
    def validate_context_quality(self, context: Context) -> QualityReport:
        """
        Validate context quality before delivery.
        
        Best Practices:
        - Relevance verification
        - Accuracy validation
        - Completeness checking
        - Freshness assessment
        - Source credibility validation
        """
        quality = QualityReport()
        
        # 1. Relevance check
        quality.relevance_score = self.assess_relevance(context)
        
        # 2. Accuracy validation
        quality.accuracy_score = self.validate_accuracy(context)
        
        # 3. Completeness check
        quality.completeness_score = self.check_completeness(context)
        
        # 4. Freshness assessment
        quality.freshness_score = self.assess_freshness(context)
        
        # 5. Overall quality score
        quality.overall_score = self.calculate_quality_score(quality)
        
        # 6. Improvement recommendations
        if quality.overall_score < self.quality_threshold:
            quality.recommendations = self.generate_improvements(quality)
        
        return quality

🔒 Security Best Practices

Defensive MCP Configuration

class SecureMCPServer:
    """Secure MCP server configuration."""
    
    def __init__(self):
        # 1. Strict Permissions
        self.permissions = {
            'read': ['trusted_agents', 'admin'],
            'write': ['admin'],
            'execute': ['trusted_agents']
        }
        
        # 2. Appropriate Timeouts
        self.timeouts = {
            'context_retrieval': 5.0,  # 5 seconds
            'tool_execution': 30.0,     # 30 seconds
            'resource_access': 10.0     # 10 seconds
        }
        
        # 3. Audit Logging
        self.audit_config = {
            'log_all_requests': True,
            'log_all_responses': True,
            'log_errors': True,
            'log_security_events': True,
            'retention_days': 90
        }
        
        # 4. Access Control
        self.access_control = {
            'require_authentication': True,
            'require_authorization': True,
            'allow_anonymous': False,
            'rate_limiting': {
                'requests_per_minute': 60,
                'requests_per_hour': 1000
            }
        }

Web Access Guardrails

class WebAccessGuardrails:
    """Limit AI access to trusted sources."""
    
    TRUSTED_DOMAINS = [
        'github.com',
        'docs.python.org',
        'developer.mozilla.org',
        'stackoverflow.com'
    ]
    
    BLOCKED_PATTERNS = [
        r'.*malware.*',
        r'.*phishing.*',
        r'.*untrusted.*'
    ]
    
    def validate_url(self, url: str) -> bool:
        """Validate URL against trusted sources."""
        domain = self.extract_domain(url)
        
        # Check trusted domains
        if not any(trusted in domain for trusted in self.TRUSTED_DOMAINS):
            return False
        
        # Check blocked patterns
        if any(re.match(pattern, url) for pattern in self.BLOCKED_PATTERNS):
            return False
        
        return True

🎨 Integration Patterns

Pattern 1: Context-Aware Agent with MCP

class MCPContextAwareAgent(ContextAwareAgent):
    """Agent with MCP context server integration."""
    
    def __init__(self, config, mcp_server_url):
        super().__init__(config)
        self.mcp_client = MCPClient(mcp_server_url)
    
    async def execute_with_mcp_context(self, task):
        """Execute task with MCP-provided context."""
        
        # 1. Request context from MCP server
        context_request = {
            'query': task.get('query'),
            'agent_type': self.config.agent_type,
            'session_id': task.get('session_id')
        }
        
        mcp_context = await self.mcp_client.request_context(
            context_request
        )
        
        # 2. Merge with local ContextEngine context
        local_context = await self.context_engine.semantic_search(
            task.get('query')
        )
        
        merged_context = self.merge_contexts(mcp_context, local_context)
        
        # 3. Execute with enhanced context
        enhanced_task = {
            **task,
            'context': merged_context,
            'context_source': 'mcp_enhanced'
        }
        
        return await self.execute(enhanced_task)

Pattern 2: RAG as MCP Server

class RAGMCPServer:
    """Expose RAG system as MCP server."""
    
    def __init__(self, context_engine):
        self.server = MCPServer("rag-context-server")
        self.context_engine = context_engine
        self.register_tools()
    
    def register_tools(self):
        """Register RAG tools for MCP."""
        
        @self.server.tool("semantic_search")
        async def semantic_search(query: str, limit: int = 5):
            """Semantic search via MCP."""
            return await self.context_engine.semantic_search(query, limit)
        
        @self.server.tool("get_file_context")
        async def get_file_context(file_path: str):
            """Get file context via MCP."""
            return self.context_engine.get_context_for_file(file_path)
        
        @self.server.tool("get_import_suggestions")
        async def get_import_suggestions(file_path: str):
            """Get import suggestions via MCP."""
            return self.context_engine.get_import_suggestions(file_path)
        
        @self.server.tool("get_project_intelligence")
        async def get_project_intelligence():
            """Get project intelligence via MCP."""
            return self.context_engine.get_project_intelligence_summary()

Pattern 3: Multi-Source Context Aggregation

class MultiSourceContextAggregator:
    """Aggregate context from multiple MCP servers."""
    
    def __init__(self):
        self.mcp_servers = {
            'rag': MCPClient('localhost:8001'),
            'web': MCPClient('localhost:8002'),
            'tools': MCPClient('localhost:8003')
        }
    
    async def aggregate_context(self, query: str) -> AggregatedContext:
        """Aggregate context from multiple sources."""
        
        # 1. Query all sources in parallel
        tasks = [
            self.mcp_servers['rag'].semantic_search(query),
            self.mcp_servers['web'].web_search(query),
            self.mcp_servers['tools'].find_tools(query)
        ]
        
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        # 2. Filter successful results
        valid_results = [r for r in results if not isinstance(r, Exception)]
        
        # 3. Score and rank
        scored_results = self.score_multi_source_results(valid_results)
        
        # 4. Aggregate and deduplicate
        return self.aggregate_and_deduplicate(scored_results)

🚀 Implementation Recommendations for Our Project

Phase 1: Enhance ContextAwareAgent with MCP (IMMEDIATE)

# agents/core/mcp_context_aware_agent.py

class MCPContextAwareAgent(ContextAwareAgent):
    """
    Context-aware agent with MCP server integration.
    
    Combines local ContextEngine with remote MCP context servers
    for comprehensive context retrieval.
    """
    
    def __init__(self, config, context_engine=None, mcp_servers=None):
        super().__init__(config, context_engine)
        self.mcp_servers = mcp_servers or {}
    
    async def execute_with_hybrid_context(self, task):
        """Execute with hybrid local + MCP context."""
        
        # 1. Get local context
        local_context = await self.get_relevant_context(task.get('query'))
        
        # 2. Get MCP context if available
        mcp_context = {}
        if self.mcp_servers:
            mcp_context = await self._get_mcp_context(task)
        
        # 3. Merge contexts intelligently
        hybrid_context = self._merge_contexts(local_context, mcp_context)
        
        # 4. Execute with hybrid context
        enhanced_task = {**task, 'context': hybrid_context}
        return await self.execute(enhanced_task)

Phase 2: Expose ContextEngine via MCP (THIS WEEK)

# utils/mcp/context_server.py (already planned!)

class ContextEngineMCPServer:
    """
    MCP server exposing ContextEngine to Cursor and other clients.
    
    Follows 2025 best practices:
    - Defensive configuration
    - Audit logging
    - Access control
    - Quality validation
    """
    
    def __init__(self):
        self.server = MCPServer("ai-dev-agent-context")
        self.context_engine = ContextEngine(ContextConfig())
        self.access_control = AccessController()
        self.audit_logger = AuditLogger()
        
        # Apply security best practices
        self._configure_security()
        self.register_tools()
    
    def _configure_security(self):
        """Apply 2025 security best practices."""
        self.server.set_timeouts({
            'context_retrieval': 5.0,
            'tool_execution': 30.0
        })
        
        self.server.enable_audit_logging()
        self.server.set_rate_limits({
            'requests_per_minute': 60
        })

Phase 3: RAG-Enhanced Chat with MCP Context (NEXT)

# apps/rag_management_app.py - Chat Interface

class RAGEnhancedChatInterface:
    """
    Chat interface with MCP context visualization.
    
    Shows real-time context retrieval and usage,
    following 2025 context engineering best practices.
    """
    
    def display_context_aware_chat(self):
        """Display chat with context visualization."""
        
        # 1. Agent selection
        selected_agent = st.selectbox(
            "Select Context-Aware Agent",
            self.get_available_agents()
        )
        
        # 2. Chat input
        user_input = st.chat_input("Ask your question...")
        
        if user_input:
            # 3. Show context retrieval in real-time
            with st.expander("🔍 Context Retrieval (Live)", expanded=True):
                context_col1, context_col2 = st.columns(2)
                
                with context_col1:
                    st.write("**Local ContextEngine:**")
                    local_context = self.get_local_context(user_input)
                    st.json(local_context)
                
                with context_col2:
                    st.write("**MCP Servers:**")
                    mcp_context = self.get_mcp_context(user_input)
                    st.json(mcp_context)
            
            # 4. Execute agent with hybrid context
            response = await selected_agent.execute_with_hybrid_context({
                'query': user_input
            })
            
            # 5. Display response with context attribution
            self.display_response_with_context(response)

📊 Success Metrics

Context Quality Metrics

  • Relevance Score: > 0.85 for retrieved context
  • Accuracy Rate: > 95% for context information
  • Completeness: > 90% of required context provided
  • Freshness: < 5 minutes average context age

Performance Metrics

  • Context Retrieval Time: < 500ms (semantic search)
  • MCP Request/Response: < 200ms
  • End-to-End Latency: < 1s for context-enhanced responses
  • Throughput: > 100 requests/minute

Security Metrics

  • Authentication Rate: 100% of requests authenticated
  • Authorization Failures: < 0.1% false positives
  • Audit Log Coverage: 100% of operations logged
  • Security Incidents: 0 unauthorized access attempts successful

🎓 Key Takeaways

  1. MCP is the Future: Standardized context access is critical for scalable AI systems
  2. Context Engineering Matters: Deliberate context design improves AI performance significantly
  3. Security is Paramount: Defensive configurations and access controls are essential
  4. Hybrid Approaches Win: Combine local (ContextEngine) + remote (MCP) for best results
  5. Quality Over Quantity: Better to have 5 highly relevant results than 50 mediocre ones

🎯 Advanced Context Engineering Insights

From Prompt Engineering to Context Engineering

According to Phil Schmid, the fundamental shift is:

"Context Engineering is the discipline of designing and building dynamic systems that provides the right information and tools, in the right format, at the right time, to give a LLM everything it needs to accomplish a task."

Key Insight from Phil Schmid:

  • A System, Not a String: Context isn't a static prompt template. It's the output of a system that runs before the main LLM call.
  • Dynamic: Created on-the-fly, tailored to the immediate task
  • Right Information + Tools: Ensuring the model has both knowledge and capabilities when needed
  • Format Matters: How you present information is as important as what information you provide

The "Cheap Demo" vs "Magical Agent" Example:

According to Phil Schmid's article, most agent failures are context failures, not model failures. The difference:

# ❌ Cheap Demo Agent - Poor Context
# Only sees: "Hey, just checking if you're around for a quick sync tomorrow."
# Response: "Thank you for your message. Tomorrow works for me. May I ask what time you had in mind?"
# Problem: No context about calendar, past interactions, or tools

# ✅ Magical Agent - Rich Context
# Context includes:
# - Calendar information (you're fully booked tomorrow)
# - Past emails with this person (informal tone appropriate)
# - Contact list (identifies them as key partner)
# - Available tools (send_invite, send_email)
# Response: "Hey Jim! Tomorrow's packed on my end, back-to-back all day. 
#            Thursday AM free if that works for you? Sent an invite, lmk if it works."

Complete Context Definition

According to the Prompt Engineering Guide and Phil Schmid, context encompasses:

Core Definition:

"The process of designing and optimizing instructions and relevant context for LLMs and advanced AI models to perform their tasks effectively."

Key Components (Complete Context Definition):

  1. Instructions / System Prompt

    • Initial instructions defining model behavior
    • Examples, rules, and behavioral guidelines
    • Role definition and boundaries
  2. User Prompt

    • Immediate task or question from user
    • Current query or request
    • Dynamic input content
  3. State / History (Short-term Memory)

    • Current conversation flow
    • Previous user and model responses
    • Interaction context from this session
  4. Long-Term Memory

    • Persistent knowledge across sessions
    • Learned user preferences
    • Summaries of past projects
    • Facts to remember for future use
  5. Retrieved Information (RAG)

    • External, up-to-date knowledge
    • Relevant documents and databases
    • API responses and web searches
    • Semantic search results
  6. Available Tools

    • Function definitions the model can call
    • Tool schemas and parameters
    • Capability descriptions
    • Tool execution results
  7. Structured Output

    • Response format definitions
    • JSON schemas
    • Validation rules
    • Output constraints

Context Window Optimization

From LangChain's Context Engineering blog:

The Challenge:

  • Modern LLMs have large context windows (100K+ tokens)
  • But filling them with noise degrades performance
  • Quality > Quantity for context

Best Practices:

  1. Filter Noisy Information - Remove irrelevant context
  2. Context Compression - Summarize when needed
  3. Context Management - Track what's effective over time
  4. Context Safety - Validate sources and accuracy
  5. Context Evaluation - Measure effectiveness systematically

Context Engineering as a System

Critical Insight from Phil Schmid:

"The magic isn't in a smarter model or a more clever algorithm. It's about providing the right context for the right task."

Context Engineering System Architecture:

class ContextEngineeringSystem:
    """
    Context engineering as a SYSTEM, not a string.
    
    This system runs BEFORE the LLM call to gather and structure
    all necessary context dynamically.
    """
    
    def engineer_context(self, user_request: str, user_profile: Dict) -> Context:
        """
        Build dynamic context based on the specific request.
        
        Key Principle: Most agent failures are context failures,
        not model failures.
        """
        
        # 1. Start with base system instructions
        system_prompt = self._get_system_instructions(user_profile['role'])
        
        # 2. Gather relevant state/history (short-term memory)
        conversation_history = self._get_conversation_history(
            user_profile['session_id'],
            limit=10  # Last 10 messages
        )
        
        # 3. Retrieve long-term memory
        long_term_memory = self._get_long_term_memory(
            user_profile['user_id']
        )
        
        # 4. Perform RAG for current request
        retrieved_info = self._retrieve_relevant_knowledge(
            user_request,
            user_context=user_profile
        )
        
        # 5. Determine which tools are needed
        relevant_tools = self._select_relevant_tools(
            user_request,
            available_tools=self.tool_registry
        )
        
        # 6. Define output structure
        output_schema = self._define_output_schema(
            user_request,
            expected_format='json'
        )
        
        # 7. Assemble complete context
        return Context(
            system=system_prompt,
            user_input=user_request,
            short_term_memory=conversation_history,
            long_term_memory=long_term_memory,
            retrieved_knowledge=retrieved_info,
            available_tools=relevant_tools,
            output_schema=output_schema,
            metadata={
                'user_id': user_profile['user_id'],
                'timestamp': datetime.now().isoformat(),
                'context_version': '2.0'
            }
        )

Why This Matters (Phil Schmid's Example):

# Meeting scheduling example from Phil Schmid

# ❌ Without Context Engineering
def schedule_meeting_bad(email_text: str):
    """Cheap demo - just sends email to LLM"""
    response = llm.generate(email_text)
    return response  # Generic, unhelpful response

# ✅ With Context Engineering
def schedule_meeting_good(email_text: str, sender: str):
    """Magical agent - engineers rich context first"""
    
    # Build context dynamically
    context = {
        'email': email_text,
        'sender_info': get_contact_info(sender),  # Identify relationship
        'calendar': get_my_calendar(),             # Current schedule
        'past_emails': get_email_history(sender), # Communication style
        'tools': ['send_invite', 'send_email'],    # Available actions
        'tone_preference': infer_tone(sender)      # Informal vs formal
    }
    
    response = llm.generate_with_context(context)
    return response  # Contextual, helpful, actionable

Practical Context Engineering Patterns

Based on real-world agent architectures from the guide and Phil Schmid's insights:

class ProductionContextEngineer:
    """
    Production-ready context engineering following 2025 best practices.
    """
    
    def engineer_context(self, user_query: str, agent_type: str) -> Context:
        """
        Engineer optimal context for agent execution.
        
        Following proven patterns:
        - System prompt with clear role definition
        - Structured input/output specifications
        - Dynamic date/time injection
        - RAG for relevant knowledge
        - Historical context when needed
        """
        
        # 1. System Prompt - Set the foundation
        system_prompt = self._create_system_prompt(agent_type)
        
        # 2. Instructions - Task-specific directives
        instructions = self._generate_instructions(user_query)
        
        # 3. Dynamic Context - Date/time, user info
        dynamic_context = {
            'current_datetime': datetime.now().isoformat(),
            'user_query': user_query,
            'agent_type': agent_type
        }
        
        # 4. Structured Output Schema
        output_schema = self._define_output_schema(agent_type)
        
        # 5. RAG - Retrieve relevant knowledge
        rag_context = self._retrieve_relevant_knowledge(user_query)
        
        # 6. Historical Context - If available
        historical = self._get_relevant_history(user_query)
        
        # 7. Assemble optimized context
        return Context(
            system=system_prompt,
            instructions=instructions,
            dynamic=dynamic_context,
            schema=output_schema,
            knowledge=rag_context,
            history=historical,
            total_tokens=self._estimate_tokens(...)
        )

Context Caching for Speed

From the guide's real-world example:

class ContextCache:
    """
    Cache subtasks and queries to avoid redundant LLM calls.
    
    Key Insight: If similar query exists, reuse results.
    This reduces latency AND costs.
    """
    
    async def get_or_create_context(self, query: str):
        """Get cached context or create new."""
        
        # 1. Search vector store for similar queries
        similar_queries = await self.vector_store.similarity_search(
            query, 
            k=1,
            threshold=0.9  # High similarity threshold
        )
        
        if similar_queries and similar_queries[0].score > 0.9:
            # Return cached subtasks
            return similar_queries[0].metadata['subtasks']
        
        # 2. Generate new subtasks
        new_subtasks = await self.generate_subtasks(query)
        
        # 3. Cache for future use
        await self.vector_store.add_documents([
            Document(
                page_content=query,
                metadata={'subtasks': new_subtasks}
            )
        ])
        
        return new_subtasks

Structured Output Engineering

Critical Pattern from the guide:

# BAD: Unstructured output
response = "Here are the subtasks: 1. Search news, 2. Search blogs..."

# GOOD: Structured output with schema
response = {
    "subtasks": [
        {
            "id": "subtask_1",
            "query": "OpenAI recent announcements",
            "source_type": "news",
            "time_period": "recent",
            "priority": 1,
            "start_date": "2025-06-24T16:35:26.901Z",
            "end_date": "2025-07-01T16:35:26.901Z"
        }
    ]
}

Why this matters:

  • Parseable by downstream components
  • Validated against schema
  • No ambiguity in interpretation
  • Easy to debug and monitor

Date/Time Context Injection

From the guide's agent example:

# CRITICAL: LLMs don't know current date/time
# You MUST inject it as context

system_prompt = f"""
Current date and time: {datetime.now().isoformat()}

When user asks for "recent" or "last week":
- Calculate actual date ranges
- Use ISO format: 2025-06-24T16:35:26.901Z
- Don't guess dates!
"""

Context Engineering Workflow

Step-by-step process from the guide:

  1. Design System Prompt

    • Clear role definition
    • Task boundaries
    • Output format expectations
  2. Structure Instructions

    • Step-by-step directives
    • Examples (few-shot)
    • Edge case handling
  3. Manage Dynamic Elements

    • User inputs
    • Date/time
    • Session state
  4. Implement RAG

    • Relevant knowledge retrieval
    • Context compression
    • Source attribution
  5. Handle State & History

    • Conversation memory
    • Previous results
    • Revision tracking
  6. Evaluate Effectiveness

    • A/B testing
    • Performance metrics
    • Continuous optimization

Advanced Context Management Strategies

From Research Synthesis (Microsoft AI Agents, Hypermode, AI Automators):

1. Agent Scratchpad Pattern

class AgentScratchpad:
    """
    External memory for agents to store notes during session execution.
    
    Key Benefit: Store information outside context window to prevent
    overflow while maintaining access to relevant data.
    """
    
    def __init__(self, agent_id: str):
        self.agent_id = agent_id
        self.notes = []
        self.max_notes = 50  # Prevent unbounded growth
    
    def add_note(self, note: str, metadata: Dict = None):
        """Add note to scratchpad with automatic pruning."""
        self.notes.append({
            'content': note,
            'timestamp': datetime.now(),
            'metadata': metadata or {}
        })
        
        # Prune oldest notes if limit exceeded
        if len(self.notes) > self.max_notes:
            self.notes = self.notes[-self.max_notes:]
    
    def retrieve_relevant_notes(self, query: str, limit: int = 5):
        """Retrieve most relevant notes for current task."""
        # Use vector similarity or keyword matching
        return self._semantic_search_notes(query, limit)

2. Cross-Session Memories

class CrossSessionMemoryStore:
    """
    Persistent memory system for agents across multiple sessions.
    
    Stores:
    - User preferences
    - Previous feedback
    - Learned patterns
    - Long-term project context
    """
    
    def __init__(self, storage_backend: StorageBackend):
        self.storage = storage_backend
        self.memory_index = {}
    
    async def store_memory(
        self, 
        user_id: str, 
        memory: Dict, 
        memory_type: str
    ):
        """Store memory with metadata for retrieval."""
        memory_entry = {
            'user_id': user_id,
            'memory_type': memory_type,
            'content': memory,
            'created_at': datetime.now(),
            'access_count': 0,
            'importance_score': self._calculate_importance(memory)
        }
        
        await self.storage.save(memory_entry)
        await self._update_index(memory_entry)
    
    async def retrieve_memories(
        self, 
        user_id: str, 
        context: str,
        limit: int = 10
    ):
        """Retrieve relevant memories for current context."""
        # Combine recency, relevance, and importance
        candidates = await self.storage.query(user_id=user_id)
        
        scored_memories = [
            (m, self._score_memory(m, context))
            for m in candidates
        ]
        
        # Sort by score and return top matches
        scored_memories.sort(key=lambda x: x[1], reverse=True)
        return [m for m, score in scored_memories[:limit]]
    
    def _score_memory(self, memory: Dict, context: str) -> float:
        """Score memory based on recency, relevance, importance."""
        recency_score = self._calculate_recency_score(memory)
        relevance_score = self._calculate_relevance_score(memory, context)
        importance = memory['importance_score']
        
        # Weighted combination
        return (
            0.3 * recency_score +
            0.5 * relevance_score +
            0.2 * importance
        )

3. Context Compression Strategies

class ContextCompressor:
    """
    Intelligent context compression to manage window size.
    
    Strategies:
    - Summarization of older messages
    - Trimming of redundant information
    - Preservation of critical context
    """
    
    def compress_context(
        self, 
        context: List[Message], 
        target_size: int
    ) -> List[Message]:
        """
        Compress context to fit target size while preserving quality.
        
        Priority preservation:
        1. Most recent messages (high recency)
        2. System prompts and instructions (always keep)
        3. High-importance messages (user-marked or critical)
        4. Messages with tool calls/results
        """
        
        current_size = sum(msg.token_count for msg in context)
        
        if current_size <= target_size:
            return context  # No compression needed
        
        # Separate into buckets
        system_messages = [m for m in context if m.role == 'system']
        recent_messages = context[-10:]  # Last 10 always kept
        important_messages = [m for m in context if m.importance > 0.8]
        tool_messages = [m for m in context if m.has_tool_calls]
        
        # Build compressed context
        compressed = []
        remaining_size = target_size
        
        # 1. Always include system messages
        compressed.extend(system_messages)
        remaining_size -= sum(m.token_count for m in system_messages)
        
        # 2. Include recent messages
        for msg in reversed(recent_messages):
            if msg.token_count <= remaining_size:
                compressed.insert(len(system_messages), msg)
                remaining_size -= msg.token_count
        
        # 3. Fill remaining space with important/tool messages
        for msg in important_messages + tool_messages:
            if msg not in compressed and msg.token_count <= remaining_size:
                compressed.append(msg)
                remaining_size -= msg.token_count
        
        # 4. Summarize middle sections if still too large
        if remaining_size < 0:
            compressed = self._summarize_middle_section(compressed, target_size)
        
        return compressed

4. Multi-Agent Context Isolation

class MultiAgentContextManager:
    """
    Manage isolated contexts for multiple agents working together.
    
    Key Principle: Each agent maintains its own context window
    to prevent context pollution and maintain focus.
    """
    
    def __init__(self):
        self.agent_contexts = {}
        self.shared_context = SharedContext()
    
    def create_agent_context(self, agent_id: str, domain: str):
        """Create isolated context for specific agent."""
        self.agent_contexts[agent_id] = AgentContext(
            agent_id=agent_id,
            domain=domain,
            memory_store=MemoryStore(scope='isolated'),
            scratchpad=AgentScratchpad(agent_id)
        )
    
    async def execute_with_isolation(
        self, 
        agent_id: str, 
        task: Dict
    ) -> Dict:
        """Execute task with isolated agent context."""
        agent_context = self.agent_contexts[agent_id]
        
        # 1. Load agent-specific context only
        local_context = await agent_context.load_context()
        
        # 2. Optionally access shared context if needed
        if task.get('needs_shared_context'):
            shared = await self.shared_context.get_relevant_context(
                agent_id,
                task['query']
            )
            local_context.merge(shared)
        
        # 3. Execute in isolation
        result = await self._execute_isolated_task(
            agent_context,
            local_context,
            task
        )
        
        # 4. Update both isolated and shared context
        await agent_context.update(result)
        
        if task.get('share_learnings'):
            await self.shared_context.update_from_agent(
                agent_id,
                result['learnings']
            )
        
        return result

5. Context Pipeline Design

class ContextPipeline:
    """
    Design and execute context assembly pipelines.
    
    Maps out what information is needed and how agents access it.
    """
    
    def __init__(self):
        self.stages = []
    
    def add_stage(self, stage: ContextStage):
        """Add processing stage to pipeline."""
        self.stages.append(stage)
    
    async def execute_pipeline(self, initial_context: Dict) -> Context:
        """Execute full context assembly pipeline."""
        
        context = Context(initial_context)
        
        for stage in self.stages:
            try:
                # Each stage enriches the context
                context = await stage.process(context)
                
                # Validate context quality after each stage
                if not self._validate_stage_output(context, stage):
                    raise ContextPipelineError(
                        f"Stage {stage.name} produced invalid context"
                    )
                
            except Exception as e:
                # Handle stage failures gracefully
                self._handle_stage_failure(stage, e, context)
        
        return context
    
    def _validate_stage_output(
        self, 
        context: Context, 
        stage: ContextStage
    ) -> bool:
        """Validate that stage produced expected output."""
        required_fields = stage.output_schema.required_fields
        return all(hasattr(context, field) for field in required_fields)

6. Runtime State Objects

class RuntimeStateObject:
    """
    Container for managing information during agent execution.
    
    Stores subtask results step-by-step to maintain context relevance.
    """
    
    def __init__(self, task_id: str):
        self.task_id = task_id
        self.subtask_results = []
        self.current_step = 0
        self.state_variables = {}
    
    def record_subtask_result(self, subtask: str, result: Dict):
        """Record result of subtask execution."""
        self.subtask_results.append({
            'step': self.current_step,
            'subtask': subtask,
            'result': result,
            'timestamp': datetime.now()
        })
        self.current_step += 1
    
    def get_context_for_next_step(self) -> Dict:
        """Get relevant context for next step."""
        # Only include recent results to avoid context bloat
        recent_results = self.subtask_results[-5:]
        
        return {
            'completed_steps': self.current_step,
            'recent_results': recent_results,
            'state_variables': self.state_variables
        }
    
    def update_state(self, key: str, value: Any):
        """Update state variable for cross-step information."""
        self.state_variables[key] = value

📚 References


Next Actions:

  1. ✅ Created ContextAwareAgent base class
  2. 🔄 Create MCP server for ContextEngine (this week)
  3. 🔄 Add chat interface to RAG UI (this week)
  4. 📋 Implement hybrid context (local + MCP)
  5. 📋 Add context quality metrics and monitoring

Status: Research Complete - Ready for Implementation
Last Updated: 2025-01-08