Research Date: 2025-01-08
Purpose: Up-to-date best practices for context engineering using Model Context Protocol
Sprint: US-RAG-001
Status: Active Research
Model Context Protocol (MCP) is Anthropic's standardized framework for enabling AI models to interact with external tools, resources, and context. Modern context engineering with MCP focuses on:
- Structured Context Access: Standardized interfaces for context retrieval
- Dynamic Context Management: Real-time context assembly and routing
- Persistent AI Memory: Maintaining context across sessions
- Secure Context Delivery: Defensive configurations and access controls
┌─────────────────────────────────────────────────────┐
│ AI Agent/Model │
│ (Context-Aware Decision Making) │
└────────────────┬────────────────────────────────────┘
│ MCP Protocol
▼
┌─────────────────────────────────────────────────────┐
│ MCP Context Server │
│ ┌──────────┬──────────┬──────────┬──────────┐ │
│ │ Tools │Resources │ Prompts │ Sampling │ │
│ └──────────┴──────────┴──────────┴──────────┘ │
└────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Context Engine (RAG) │
│ ┌──────────────────────────────────────────┐ │
│ │ Vector Store (FAISS) │ │
│ │ Embeddings (HuggingFace) │ │
│ │ Semantic Search │ │
│ │ Pattern Learning │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
1. Protocol Layer
- Message framing and routing
- Request/response linking
- Error handling and recovery
- High-level communication patterns
2. Transport Layer
- stdio: For local process communication
- HTTP + SSE: For client-server communication
- WebSocket: For real-time bidirectional communication
3. Message Types
- Requests: Operations requiring responses
- Results: Successful operation outcomes
- Errors: Operation failures with details
- Notifications: One-way informational messages
Dynamic Context Generation
class ContextAssembler:
"""Dynamically assemble context based on agent needs."""
async def assemble_context(self, query: str, agent_type: str) -> Context:
"""
Assemble context dynamically based on query and agent type.
Best Practices:
- Query expansion for better retrieval
- Agent-specific context filtering
- Relevance scoring and ranking
- Context deduplication
"""
# 1. Expand query for better retrieval
expanded_query = await self.expand_query(query, agent_type)
# 2. Retrieve from multiple sources
semantic_results = await self.semantic_search(expanded_query)
pattern_results = await self.pattern_match(query)
knowledge_results = await self.knowledge_lookup(query)
# 3. Score and rank by relevance
scored_results = self.score_relevance(
semantic_results + pattern_results + knowledge_results,
query, agent_type
)
# 4. Deduplicate and filter
filtered_results = self.deduplicate_and_filter(
scored_results,
max_results=5,
min_relevance=0.7
)
# 5. Structure for agent consumption
return Context(
query=query,
results=filtered_results,
metadata=self.extract_metadata(filtered_results),
timestamp=datetime.now()
)Intelligent Context Distribution
class ContextRouter:
"""Route context to appropriate agents and tools."""
def route_context(self, context: Context, intent: str) -> Route:
"""
Route context based on intent and agent capabilities.
Best Practices:
- Intent detection and classification
- Agent capability matching
- Load balancing across agents
- Fallback routing strategies
"""
# 1. Detect intent
intent_classification = self.classify_intent(intent)
# 2. Match to agent capabilities
capable_agents = self.find_capable_agents(
intent_classification,
context.metadata
)
# 3. Select optimal agent
selected_agent = self.select_agent(
capable_agents,
load_balance=True,
performance_history=True
)
# 4. Prepare context for agent
agent_context = self.adapt_context_for_agent(
context,
selected_agent.requirements
)
return Route(
agent=selected_agent,
context=agent_context,
fallback_agents=capable_agents[1:]
)Maintaining Context Across Sessions
class PersistentContextManager:
"""Manage persistent context across agent sessions."""
async def maintain_context(
self,
session_id: str,
new_context: Context
) -> PersistentContext:
"""
Maintain context across interactions.
Best Practices:
- Session-based context storage
- Context decay and relevance updates
- Memory compression for efficiency
- Privacy-aware context retention
"""
# 1. Retrieve existing session context
existing_context = await self.get_session_context(session_id)
# 2. Merge with new context
merged_context = self.merge_contexts(
existing_context,
new_context,
decay_factor=0.9
)
# 3. Compress if needed
if len(merged_context) > self.max_context_size:
merged_context = self.compress_context(
merged_context,
preserve_recent=True,
preserve_important=True
)
# 4. Store updated context
await self.store_session_context(session_id, merged_context)
return merged_contextEnsuring High-Quality Context
class ContextQualityController:
"""Ensure high-quality context for agents."""
def validate_context_quality(self, context: Context) -> QualityReport:
"""
Validate context quality before delivery.
Best Practices:
- Relevance verification
- Accuracy validation
- Completeness checking
- Freshness assessment
- Source credibility validation
"""
quality = QualityReport()
# 1. Relevance check
quality.relevance_score = self.assess_relevance(context)
# 2. Accuracy validation
quality.accuracy_score = self.validate_accuracy(context)
# 3. Completeness check
quality.completeness_score = self.check_completeness(context)
# 4. Freshness assessment
quality.freshness_score = self.assess_freshness(context)
# 5. Overall quality score
quality.overall_score = self.calculate_quality_score(quality)
# 6. Improvement recommendations
if quality.overall_score < self.quality_threshold:
quality.recommendations = self.generate_improvements(quality)
return qualityclass SecureMCPServer:
"""Secure MCP server configuration."""
def __init__(self):
# 1. Strict Permissions
self.permissions = {
'read': ['trusted_agents', 'admin'],
'write': ['admin'],
'execute': ['trusted_agents']
}
# 2. Appropriate Timeouts
self.timeouts = {
'context_retrieval': 5.0, # 5 seconds
'tool_execution': 30.0, # 30 seconds
'resource_access': 10.0 # 10 seconds
}
# 3. Audit Logging
self.audit_config = {
'log_all_requests': True,
'log_all_responses': True,
'log_errors': True,
'log_security_events': True,
'retention_days': 90
}
# 4. Access Control
self.access_control = {
'require_authentication': True,
'require_authorization': True,
'allow_anonymous': False,
'rate_limiting': {
'requests_per_minute': 60,
'requests_per_hour': 1000
}
}class WebAccessGuardrails:
"""Limit AI access to trusted sources."""
TRUSTED_DOMAINS = [
'github.com',
'docs.python.org',
'developer.mozilla.org',
'stackoverflow.com'
]
BLOCKED_PATTERNS = [
r'.*malware.*',
r'.*phishing.*',
r'.*untrusted.*'
]
def validate_url(self, url: str) -> bool:
"""Validate URL against trusted sources."""
domain = self.extract_domain(url)
# Check trusted domains
if not any(trusted in domain for trusted in self.TRUSTED_DOMAINS):
return False
# Check blocked patterns
if any(re.match(pattern, url) for pattern in self.BLOCKED_PATTERNS):
return False
return Trueclass MCPContextAwareAgent(ContextAwareAgent):
"""Agent with MCP context server integration."""
def __init__(self, config, mcp_server_url):
super().__init__(config)
self.mcp_client = MCPClient(mcp_server_url)
async def execute_with_mcp_context(self, task):
"""Execute task with MCP-provided context."""
# 1. Request context from MCP server
context_request = {
'query': task.get('query'),
'agent_type': self.config.agent_type,
'session_id': task.get('session_id')
}
mcp_context = await self.mcp_client.request_context(
context_request
)
# 2. Merge with local ContextEngine context
local_context = await self.context_engine.semantic_search(
task.get('query')
)
merged_context = self.merge_contexts(mcp_context, local_context)
# 3. Execute with enhanced context
enhanced_task = {
**task,
'context': merged_context,
'context_source': 'mcp_enhanced'
}
return await self.execute(enhanced_task)class RAGMCPServer:
"""Expose RAG system as MCP server."""
def __init__(self, context_engine):
self.server = MCPServer("rag-context-server")
self.context_engine = context_engine
self.register_tools()
def register_tools(self):
"""Register RAG tools for MCP."""
@self.server.tool("semantic_search")
async def semantic_search(query: str, limit: int = 5):
"""Semantic search via MCP."""
return await self.context_engine.semantic_search(query, limit)
@self.server.tool("get_file_context")
async def get_file_context(file_path: str):
"""Get file context via MCP."""
return self.context_engine.get_context_for_file(file_path)
@self.server.tool("get_import_suggestions")
async def get_import_suggestions(file_path: str):
"""Get import suggestions via MCP."""
return self.context_engine.get_import_suggestions(file_path)
@self.server.tool("get_project_intelligence")
async def get_project_intelligence():
"""Get project intelligence via MCP."""
return self.context_engine.get_project_intelligence_summary()class MultiSourceContextAggregator:
"""Aggregate context from multiple MCP servers."""
def __init__(self):
self.mcp_servers = {
'rag': MCPClient('localhost:8001'),
'web': MCPClient('localhost:8002'),
'tools': MCPClient('localhost:8003')
}
async def aggregate_context(self, query: str) -> AggregatedContext:
"""Aggregate context from multiple sources."""
# 1. Query all sources in parallel
tasks = [
self.mcp_servers['rag'].semantic_search(query),
self.mcp_servers['web'].web_search(query),
self.mcp_servers['tools'].find_tools(query)
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# 2. Filter successful results
valid_results = [r for r in results if not isinstance(r, Exception)]
# 3. Score and rank
scored_results = self.score_multi_source_results(valid_results)
# 4. Aggregate and deduplicate
return self.aggregate_and_deduplicate(scored_results)# agents/core/mcp_context_aware_agent.py
class MCPContextAwareAgent(ContextAwareAgent):
"""
Context-aware agent with MCP server integration.
Combines local ContextEngine with remote MCP context servers
for comprehensive context retrieval.
"""
def __init__(self, config, context_engine=None, mcp_servers=None):
super().__init__(config, context_engine)
self.mcp_servers = mcp_servers or {}
async def execute_with_hybrid_context(self, task):
"""Execute with hybrid local + MCP context."""
# 1. Get local context
local_context = await self.get_relevant_context(task.get('query'))
# 2. Get MCP context if available
mcp_context = {}
if self.mcp_servers:
mcp_context = await self._get_mcp_context(task)
# 3. Merge contexts intelligently
hybrid_context = self._merge_contexts(local_context, mcp_context)
# 4. Execute with hybrid context
enhanced_task = {**task, 'context': hybrid_context}
return await self.execute(enhanced_task)# utils/mcp/context_server.py (already planned!)
class ContextEngineMCPServer:
"""
MCP server exposing ContextEngine to Cursor and other clients.
Follows 2025 best practices:
- Defensive configuration
- Audit logging
- Access control
- Quality validation
"""
def __init__(self):
self.server = MCPServer("ai-dev-agent-context")
self.context_engine = ContextEngine(ContextConfig())
self.access_control = AccessController()
self.audit_logger = AuditLogger()
# Apply security best practices
self._configure_security()
self.register_tools()
def _configure_security(self):
"""Apply 2025 security best practices."""
self.server.set_timeouts({
'context_retrieval': 5.0,
'tool_execution': 30.0
})
self.server.enable_audit_logging()
self.server.set_rate_limits({
'requests_per_minute': 60
})# apps/rag_management_app.py - Chat Interface
class RAGEnhancedChatInterface:
"""
Chat interface with MCP context visualization.
Shows real-time context retrieval and usage,
following 2025 context engineering best practices.
"""
def display_context_aware_chat(self):
"""Display chat with context visualization."""
# 1. Agent selection
selected_agent = st.selectbox(
"Select Context-Aware Agent",
self.get_available_agents()
)
# 2. Chat input
user_input = st.chat_input("Ask your question...")
if user_input:
# 3. Show context retrieval in real-time
with st.expander("🔍 Context Retrieval (Live)", expanded=True):
context_col1, context_col2 = st.columns(2)
with context_col1:
st.write("**Local ContextEngine:**")
local_context = self.get_local_context(user_input)
st.json(local_context)
with context_col2:
st.write("**MCP Servers:**")
mcp_context = self.get_mcp_context(user_input)
st.json(mcp_context)
# 4. Execute agent with hybrid context
response = await selected_agent.execute_with_hybrid_context({
'query': user_input
})
# 5. Display response with context attribution
self.display_response_with_context(response)- Relevance Score: > 0.85 for retrieved context
- Accuracy Rate: > 95% for context information
- Completeness: > 90% of required context provided
- Freshness: < 5 minutes average context age
- Context Retrieval Time: < 500ms (semantic search)
- MCP Request/Response: < 200ms
- End-to-End Latency: < 1s for context-enhanced responses
- Throughput: > 100 requests/minute
- Authentication Rate: 100% of requests authenticated
- Authorization Failures: < 0.1% false positives
- Audit Log Coverage: 100% of operations logged
- Security Incidents: 0 unauthorized access attempts successful
- MCP is the Future: Standardized context access is critical for scalable AI systems
- Context Engineering Matters: Deliberate context design improves AI performance significantly
- Security is Paramount: Defensive configurations and access controls are essential
- Hybrid Approaches Win: Combine local (ContextEngine) + remote (MCP) for best results
- Quality Over Quantity: Better to have 5 highly relevant results than 50 mediocre ones
According to Phil Schmid, the fundamental shift is:
"Context Engineering is the discipline of designing and building dynamic systems that provides the right information and tools, in the right format, at the right time, to give a LLM everything it needs to accomplish a task."
Key Insight from Phil Schmid:
- A System, Not a String: Context isn't a static prompt template. It's the output of a system that runs before the main LLM call.
- Dynamic: Created on-the-fly, tailored to the immediate task
- Right Information + Tools: Ensuring the model has both knowledge and capabilities when needed
- Format Matters: How you present information is as important as what information you provide
The "Cheap Demo" vs "Magical Agent" Example:
According to Phil Schmid's article, most agent failures are context failures, not model failures. The difference:
# ❌ Cheap Demo Agent - Poor Context
# Only sees: "Hey, just checking if you're around for a quick sync tomorrow."
# Response: "Thank you for your message. Tomorrow works for me. May I ask what time you had in mind?"
# Problem: No context about calendar, past interactions, or tools
# ✅ Magical Agent - Rich Context
# Context includes:
# - Calendar information (you're fully booked tomorrow)
# - Past emails with this person (informal tone appropriate)
# - Contact list (identifies them as key partner)
# - Available tools (send_invite, send_email)
# Response: "Hey Jim! Tomorrow's packed on my end, back-to-back all day.
# Thursday AM free if that works for you? Sent an invite, lmk if it works."According to the Prompt Engineering Guide and Phil Schmid, context encompasses:
Core Definition:
"The process of designing and optimizing instructions and relevant context for LLMs and advanced AI models to perform their tasks effectively."
Key Components (Complete Context Definition):
-
Instructions / System Prompt
- Initial instructions defining model behavior
- Examples, rules, and behavioral guidelines
- Role definition and boundaries
-
User Prompt
- Immediate task or question from user
- Current query or request
- Dynamic input content
-
State / History (Short-term Memory)
- Current conversation flow
- Previous user and model responses
- Interaction context from this session
-
Long-Term Memory
- Persistent knowledge across sessions
- Learned user preferences
- Summaries of past projects
- Facts to remember for future use
-
Retrieved Information (RAG)
- External, up-to-date knowledge
- Relevant documents and databases
- API responses and web searches
- Semantic search results
-
Available Tools
- Function definitions the model can call
- Tool schemas and parameters
- Capability descriptions
- Tool execution results
-
Structured Output
- Response format definitions
- JSON schemas
- Validation rules
- Output constraints
From LangChain's Context Engineering blog:
The Challenge:
- Modern LLMs have large context windows (100K+ tokens)
- But filling them with noise degrades performance
- Quality > Quantity for context
Best Practices:
- Filter Noisy Information - Remove irrelevant context
- Context Compression - Summarize when needed
- Context Management - Track what's effective over time
- Context Safety - Validate sources and accuracy
- Context Evaluation - Measure effectiveness systematically
Critical Insight from Phil Schmid:
"The magic isn't in a smarter model or a more clever algorithm. It's about providing the right context for the right task."
Context Engineering System Architecture:
class ContextEngineeringSystem:
"""
Context engineering as a SYSTEM, not a string.
This system runs BEFORE the LLM call to gather and structure
all necessary context dynamically.
"""
def engineer_context(self, user_request: str, user_profile: Dict) -> Context:
"""
Build dynamic context based on the specific request.
Key Principle: Most agent failures are context failures,
not model failures.
"""
# 1. Start with base system instructions
system_prompt = self._get_system_instructions(user_profile['role'])
# 2. Gather relevant state/history (short-term memory)
conversation_history = self._get_conversation_history(
user_profile['session_id'],
limit=10 # Last 10 messages
)
# 3. Retrieve long-term memory
long_term_memory = self._get_long_term_memory(
user_profile['user_id']
)
# 4. Perform RAG for current request
retrieved_info = self._retrieve_relevant_knowledge(
user_request,
user_context=user_profile
)
# 5. Determine which tools are needed
relevant_tools = self._select_relevant_tools(
user_request,
available_tools=self.tool_registry
)
# 6. Define output structure
output_schema = self._define_output_schema(
user_request,
expected_format='json'
)
# 7. Assemble complete context
return Context(
system=system_prompt,
user_input=user_request,
short_term_memory=conversation_history,
long_term_memory=long_term_memory,
retrieved_knowledge=retrieved_info,
available_tools=relevant_tools,
output_schema=output_schema,
metadata={
'user_id': user_profile['user_id'],
'timestamp': datetime.now().isoformat(),
'context_version': '2.0'
}
)Why This Matters (Phil Schmid's Example):
# Meeting scheduling example from Phil Schmid
# ❌ Without Context Engineering
def schedule_meeting_bad(email_text: str):
"""Cheap demo - just sends email to LLM"""
response = llm.generate(email_text)
return response # Generic, unhelpful response
# ✅ With Context Engineering
def schedule_meeting_good(email_text: str, sender: str):
"""Magical agent - engineers rich context first"""
# Build context dynamically
context = {
'email': email_text,
'sender_info': get_contact_info(sender), # Identify relationship
'calendar': get_my_calendar(), # Current schedule
'past_emails': get_email_history(sender), # Communication style
'tools': ['send_invite', 'send_email'], # Available actions
'tone_preference': infer_tone(sender) # Informal vs formal
}
response = llm.generate_with_context(context)
return response # Contextual, helpful, actionableBased on real-world agent architectures from the guide and Phil Schmid's insights:
class ProductionContextEngineer:
"""
Production-ready context engineering following 2025 best practices.
"""
def engineer_context(self, user_query: str, agent_type: str) -> Context:
"""
Engineer optimal context for agent execution.
Following proven patterns:
- System prompt with clear role definition
- Structured input/output specifications
- Dynamic date/time injection
- RAG for relevant knowledge
- Historical context when needed
"""
# 1. System Prompt - Set the foundation
system_prompt = self._create_system_prompt(agent_type)
# 2. Instructions - Task-specific directives
instructions = self._generate_instructions(user_query)
# 3. Dynamic Context - Date/time, user info
dynamic_context = {
'current_datetime': datetime.now().isoformat(),
'user_query': user_query,
'agent_type': agent_type
}
# 4. Structured Output Schema
output_schema = self._define_output_schema(agent_type)
# 5. RAG - Retrieve relevant knowledge
rag_context = self._retrieve_relevant_knowledge(user_query)
# 6. Historical Context - If available
historical = self._get_relevant_history(user_query)
# 7. Assemble optimized context
return Context(
system=system_prompt,
instructions=instructions,
dynamic=dynamic_context,
schema=output_schema,
knowledge=rag_context,
history=historical,
total_tokens=self._estimate_tokens(...)
)From the guide's real-world example:
class ContextCache:
"""
Cache subtasks and queries to avoid redundant LLM calls.
Key Insight: If similar query exists, reuse results.
This reduces latency AND costs.
"""
async def get_or_create_context(self, query: str):
"""Get cached context or create new."""
# 1. Search vector store for similar queries
similar_queries = await self.vector_store.similarity_search(
query,
k=1,
threshold=0.9 # High similarity threshold
)
if similar_queries and similar_queries[0].score > 0.9:
# Return cached subtasks
return similar_queries[0].metadata['subtasks']
# 2. Generate new subtasks
new_subtasks = await self.generate_subtasks(query)
# 3. Cache for future use
await self.vector_store.add_documents([
Document(
page_content=query,
metadata={'subtasks': new_subtasks}
)
])
return new_subtasksCritical Pattern from the guide:
# BAD: Unstructured output
response = "Here are the subtasks: 1. Search news, 2. Search blogs..."
# GOOD: Structured output with schema
response = {
"subtasks": [
{
"id": "subtask_1",
"query": "OpenAI recent announcements",
"source_type": "news",
"time_period": "recent",
"priority": 1,
"start_date": "2025-06-24T16:35:26.901Z",
"end_date": "2025-07-01T16:35:26.901Z"
}
]
}Why this matters:
- Parseable by downstream components
- Validated against schema
- No ambiguity in interpretation
- Easy to debug and monitor
From the guide's agent example:
# CRITICAL: LLMs don't know current date/time
# You MUST inject it as context
system_prompt = f"""
Current date and time: {datetime.now().isoformat()}
When user asks for "recent" or "last week":
- Calculate actual date ranges
- Use ISO format: 2025-06-24T16:35:26.901Z
- Don't guess dates!
"""Step-by-step process from the guide:
-
Design System Prompt
- Clear role definition
- Task boundaries
- Output format expectations
-
Structure Instructions
- Step-by-step directives
- Examples (few-shot)
- Edge case handling
-
Manage Dynamic Elements
- User inputs
- Date/time
- Session state
-
Implement RAG
- Relevant knowledge retrieval
- Context compression
- Source attribution
-
Handle State & History
- Conversation memory
- Previous results
- Revision tracking
-
Evaluate Effectiveness
- A/B testing
- Performance metrics
- Continuous optimization
From Research Synthesis (Microsoft AI Agents, Hypermode, AI Automators):
class AgentScratchpad:
"""
External memory for agents to store notes during session execution.
Key Benefit: Store information outside context window to prevent
overflow while maintaining access to relevant data.
"""
def __init__(self, agent_id: str):
self.agent_id = agent_id
self.notes = []
self.max_notes = 50 # Prevent unbounded growth
def add_note(self, note: str, metadata: Dict = None):
"""Add note to scratchpad with automatic pruning."""
self.notes.append({
'content': note,
'timestamp': datetime.now(),
'metadata': metadata or {}
})
# Prune oldest notes if limit exceeded
if len(self.notes) > self.max_notes:
self.notes = self.notes[-self.max_notes:]
def retrieve_relevant_notes(self, query: str, limit: int = 5):
"""Retrieve most relevant notes for current task."""
# Use vector similarity or keyword matching
return self._semantic_search_notes(query, limit)class CrossSessionMemoryStore:
"""
Persistent memory system for agents across multiple sessions.
Stores:
- User preferences
- Previous feedback
- Learned patterns
- Long-term project context
"""
def __init__(self, storage_backend: StorageBackend):
self.storage = storage_backend
self.memory_index = {}
async def store_memory(
self,
user_id: str,
memory: Dict,
memory_type: str
):
"""Store memory with metadata for retrieval."""
memory_entry = {
'user_id': user_id,
'memory_type': memory_type,
'content': memory,
'created_at': datetime.now(),
'access_count': 0,
'importance_score': self._calculate_importance(memory)
}
await self.storage.save(memory_entry)
await self._update_index(memory_entry)
async def retrieve_memories(
self,
user_id: str,
context: str,
limit: int = 10
):
"""Retrieve relevant memories for current context."""
# Combine recency, relevance, and importance
candidates = await self.storage.query(user_id=user_id)
scored_memories = [
(m, self._score_memory(m, context))
for m in candidates
]
# Sort by score and return top matches
scored_memories.sort(key=lambda x: x[1], reverse=True)
return [m for m, score in scored_memories[:limit]]
def _score_memory(self, memory: Dict, context: str) -> float:
"""Score memory based on recency, relevance, importance."""
recency_score = self._calculate_recency_score(memory)
relevance_score = self._calculate_relevance_score(memory, context)
importance = memory['importance_score']
# Weighted combination
return (
0.3 * recency_score +
0.5 * relevance_score +
0.2 * importance
)class ContextCompressor:
"""
Intelligent context compression to manage window size.
Strategies:
- Summarization of older messages
- Trimming of redundant information
- Preservation of critical context
"""
def compress_context(
self,
context: List[Message],
target_size: int
) -> List[Message]:
"""
Compress context to fit target size while preserving quality.
Priority preservation:
1. Most recent messages (high recency)
2. System prompts and instructions (always keep)
3. High-importance messages (user-marked or critical)
4. Messages with tool calls/results
"""
current_size = sum(msg.token_count for msg in context)
if current_size <= target_size:
return context # No compression needed
# Separate into buckets
system_messages = [m for m in context if m.role == 'system']
recent_messages = context[-10:] # Last 10 always kept
important_messages = [m for m in context if m.importance > 0.8]
tool_messages = [m for m in context if m.has_tool_calls]
# Build compressed context
compressed = []
remaining_size = target_size
# 1. Always include system messages
compressed.extend(system_messages)
remaining_size -= sum(m.token_count for m in system_messages)
# 2. Include recent messages
for msg in reversed(recent_messages):
if msg.token_count <= remaining_size:
compressed.insert(len(system_messages), msg)
remaining_size -= msg.token_count
# 3. Fill remaining space with important/tool messages
for msg in important_messages + tool_messages:
if msg not in compressed and msg.token_count <= remaining_size:
compressed.append(msg)
remaining_size -= msg.token_count
# 4. Summarize middle sections if still too large
if remaining_size < 0:
compressed = self._summarize_middle_section(compressed, target_size)
return compressedclass MultiAgentContextManager:
"""
Manage isolated contexts for multiple agents working together.
Key Principle: Each agent maintains its own context window
to prevent context pollution and maintain focus.
"""
def __init__(self):
self.agent_contexts = {}
self.shared_context = SharedContext()
def create_agent_context(self, agent_id: str, domain: str):
"""Create isolated context for specific agent."""
self.agent_contexts[agent_id] = AgentContext(
agent_id=agent_id,
domain=domain,
memory_store=MemoryStore(scope='isolated'),
scratchpad=AgentScratchpad(agent_id)
)
async def execute_with_isolation(
self,
agent_id: str,
task: Dict
) -> Dict:
"""Execute task with isolated agent context."""
agent_context = self.agent_contexts[agent_id]
# 1. Load agent-specific context only
local_context = await agent_context.load_context()
# 2. Optionally access shared context if needed
if task.get('needs_shared_context'):
shared = await self.shared_context.get_relevant_context(
agent_id,
task['query']
)
local_context.merge(shared)
# 3. Execute in isolation
result = await self._execute_isolated_task(
agent_context,
local_context,
task
)
# 4. Update both isolated and shared context
await agent_context.update(result)
if task.get('share_learnings'):
await self.shared_context.update_from_agent(
agent_id,
result['learnings']
)
return resultclass ContextPipeline:
"""
Design and execute context assembly pipelines.
Maps out what information is needed and how agents access it.
"""
def __init__(self):
self.stages = []
def add_stage(self, stage: ContextStage):
"""Add processing stage to pipeline."""
self.stages.append(stage)
async def execute_pipeline(self, initial_context: Dict) -> Context:
"""Execute full context assembly pipeline."""
context = Context(initial_context)
for stage in self.stages:
try:
# Each stage enriches the context
context = await stage.process(context)
# Validate context quality after each stage
if not self._validate_stage_output(context, stage):
raise ContextPipelineError(
f"Stage {stage.name} produced invalid context"
)
except Exception as e:
# Handle stage failures gracefully
self._handle_stage_failure(stage, e, context)
return context
def _validate_stage_output(
self,
context: Context,
stage: ContextStage
) -> bool:
"""Validate that stage produced expected output."""
required_fields = stage.output_schema.required_fields
return all(hasattr(context, field) for field in required_fields)class RuntimeStateObject:
"""
Container for managing information during agent execution.
Stores subtask results step-by-step to maintain context relevance.
"""
def __init__(self, task_id: str):
self.task_id = task_id
self.subtask_results = []
self.current_step = 0
self.state_variables = {}
def record_subtask_result(self, subtask: str, result: Dict):
"""Record result of subtask execution."""
self.subtask_results.append({
'step': self.current_step,
'subtask': subtask,
'result': result,
'timestamp': datetime.now()
})
self.current_step += 1
def get_context_for_next_step(self) -> Dict:
"""Get relevant context for next step."""
# Only include recent results to avoid context bloat
recent_results = self.subtask_results[-5:]
return {
'completed_steps': self.current_step,
'recent_results': recent_results,
'state_variables': self.state_variables
}
def update_state(self, key: str, value: Any):
"""Update state variable for cross-step information."""
self.state_variables[key] = value- Context Engineering Guide - Prompt Engineering Guide
- The Rise of Context Engineering - LangChain
- Context Engineering in MCP Ecosystem
- Model Context Protocol: What It Is and Why It Matters
- Context Engineering with MCP - O'Reilly
- MCP Architecture Documentation
- Context Engineering: The New Paradigm
- 12 Factor Agents - HumanLayer
- Context Engineering - Phil Schmid
- AI Agents for Beginners - Microsoft
- Context Engineering Multi-Agent - Hypermode
- Context Engineering for AI Agents - AI Automators
Next Actions:
- ✅ Created ContextAwareAgent base class
- 🔄 Create MCP server for ContextEngine (this week)
- 🔄 Add chat interface to RAG UI (this week)
- 📋 Implement hybrid context (local + MCP)
- 📋 Add context quality metrics and monitoring
Status: Research Complete - Ready for Implementation
Last Updated: 2025-01-08