This guide explains how to extend the Agentic Research Framework to new use cases beyond system design. The framework is designed to be modular, allowing researchers to easily add new domains while maintaining consistent experiment methodology and quality evaluation.
Perfect for: Researchers, developers, and AI enthusiasts who want to build on this framework - no prior agent development experience required!
- 📖 What are AI Agents? - LangChain introduction
- 🎥 AI Agents Explained - YouTube tutorials
- 📚 Agent Design Patterns - DeepLearning.AI course
- 📖 Read:
Chapter 4_ Reflection.txtin this repository - 🧠 Key Concept: Separate agents for generation (Producer) and evaluation (Critic)
- 💡 Why It Works: Prevents cognitive bias of self-review
- 📖 Official ADK Documentation
- 🚀 Getting Started Guide
- 🛠️ Agent Development Tutorial
- 📝 ADK Python Examples
- 📖 Docker for Beginners
- 🎥 Docker Tutorial - FreeCodeCamp
- 🐳 Docker Compose Guide
- Start with AI Agents concepts (understand the big picture)
- Read Chapter 4 (understand reflection pattern)
- Explore ADK documentation (understand the tools)
- Follow this guide (build your first use case)
- Experiment and iterate (learn by doing)
- ✅ Basic Python (functions, classes, async/await)
- ✅ Basic understanding of AI/LLMs (what they do, how they work)
- ✅ Command line comfort (running commands, file navigation)
- 🔄 Docker experience (we provide complete examples)
- 🔄 API development (FastAPI patterns are provided)
- 🔄 Agent frameworks (ADK patterns are documented)
- 📖 Producer-Critic reflection pattern (see
Chapter 4_ Reflection.txt) - 📊 Research methodology (see
Iterative Reflection vs.txt) - 🏗️ Framework architecture (see
COMPREHENSIVE_BUILD_SUMMARY.md) - 🐳 Docker setup (see main
README.md) - 🔧 Recent debugging fixes (see
COMPREHENSIVE_BUILD_SUMMARY.md- Final Debugging Session)
Framework Core (Generic)
├── Base Classes (Agent, Orchestrator, Evaluator)
├── Experiment Runner
├── Quality Assessment Framework
└── Performance Metrics
Use Case Modules (Pluggable)
├── System Design ✅ (Reference Implementation - FULLY WORKING)
├── Code Review 📝 (Your new use case)
├── Content Generation 📝 (Your new use case)
├── Strategic Planning 📝 (Your new use case)
└── [Any Domain] 📝 (Your new use case)
The framework has been thoroughly debugged and all critical issues resolved:
- ✅ All models working: Flash-Lite, Flash, Pro
- ✅ All modes working: chat, baseline, reflection
- ✅ Quality evaluation functional: Multi-dimensional scoring
- ✅ No runtime errors: All edge cases handled
- ✅ Research ready: Systematic experiments possible
Each use case follows the same pattern:
- Producer Agent: Generates initial output
- Critic Agent: Evaluates and suggests improvements
- Orchestrator: Manages reflection cycles and termination
- Evaluator: Assesses quality across multiple dimensions
- Tools: Domain-specific capabilities
- Configuration: Test scenarios and quality dimensions
# Create the basic directory structure
mkdir -p use_cases/your_use_case/{agents,tools,orchestrator,evaluator,docker}
mkdir -p use_cases/your_use_case/docker
# Create required files
touch use_cases/your_use_case/__init__.py
touch use_cases/your_use_case/config.py
touch use_cases/your_use_case/agents.py
touch use_cases/your_use_case/orchestrator.py
touch use_cases/your_use_case/evaluator.py
touch use_cases/your_use_case/tools/__init__.pyExpected Structure:
use_cases/your_use_case/
├── __init__.py
├── config.py # Use case configuration
├── agents.py # Producer & Critic agents
├── orchestrator.py # Workflow management
├── evaluator.py # Quality assessment
├── tools/ # Domain-specific tools
│ ├── __init__.py
│ └── your_domain_tools.py
└── docker/ # Container configuration
├── Dockerfile
├── docker-compose.yml
└── entrypoint.sh
💡 For Beginners: This file defines what your use case does, how to measure quality, and what test scenarios to use. Think of it as the "blueprint" for your domain.
📚 Helpful Resources:
- Pydantic Models - Understanding data validation
- Type Hints in Python - Python typing basics
Create use_cases/your_use_case/config.py:
from typing import Dict, Any, List
from framework.base_evaluator import QualityDimension
# Use case configuration
USE_CASE_CONFIG = {
"name": "your_use_case", # e.g., "code_review", "content_generation"
"description": "Brief description of what this use case does",
"complexity_levels": ["simple", "medium", "complex"],
"evaluation_dimensions": [
"accuracy", # Domain-specific quality aspects
"completeness",
"clarity",
"efficiency",
"creativity", # If applicable
"compliance" # If applicable
]
}
# Quality dimensions for your domain
QUALITY_DIMENSIONS = [
QualityDimension(
name="accuracy",
weight=0.30, # Adjust weights based on domain importance
description="Correctness and factual accuracy of the output",
scale_description="0.0 = Major errors, 1.0 = Completely accurate"
),
QualityDimension(
name="completeness",
weight=0.25,
description="Coverage of all requirements and aspects",
scale_description="0.0 = Missing key components, 1.0 = Comprehensive"
),
QualityDimension(
name="clarity",
weight=0.20,
description="Clarity of explanations and structure",
scale_description="0.0 = Unclear/confusing, 1.0 = Crystal clear"
),
QualityDimension(
name="efficiency",
weight=0.15,
description="Efficiency and optimization of the solution",
scale_description="0.0 = Inefficient approach, 1.0 = Highly optimized"
),
QualityDimension(
name="creativity",
weight=0.10,
description="Innovation and creative problem-solving",
scale_description="0.0 = Generic solution, 1.0 = Highly innovative"
)
]
# Test scenarios for research (complexity progression)
TEST_SCENARIOS = {
"simple": [
{
"id": "simple_example",
"input": "Simple test case for your domain...",
"expected_components": ["component1", "component2"]
}
],
"medium": [
{
"id": "medium_example",
"input": "Medium complexity test case...",
"expected_components": ["component1", "component2", "component3"]
}
],
"complex": [
{
"id": "complex_example",
"input": "Complex test case with multiple requirements...",
"expected_components": ["component1", "component2", "component3", "component4"]
}
]
}💡 For Beginners: Agents are like specialized AI assistants. The Producer creates content (like a writer), and the Critic reviews it (like an editor). This separation prevents the AI from being too lenient on its own work.
📚 Essential Resources:
- Google ADK Agent Guide - Official agent documentation
- LLM Agent Patterns - LangChain blog on reflection
- Prompt Engineering Guide - Writing effective AI instructions
- Python Async/Await Tutorial - Understanding async programming
🎯 Key Concepts:
- Producer Agent: Generates initial output (like a domain expert)
- Critic Agent: Reviews and suggests improvements (like a quality reviewer)
- Tools: Special functions agents can use (like calculators or databases)
- Instructions: The "personality" and expertise you give each agent
Create use_cases/your_use_case/agents.py:
from typing import Dict, Any, List
from framework.base_agent import BaseUseCaseAgent
from google.adk.tools import FunctionTool
from .tools.your_domain_tools import (
# Import your domain-specific tools
domain_specific_function_1,
domain_specific_function_2,
analysis_function,
validation_function
)
import structlog
logger = structlog.get_logger(__name__)
class YourUseCaseProducer(BaseUseCaseAgent):
"""
Generates high-quality outputs for your specific domain.
Design principles:
1. Domain expertise and specialized knowledge
2. Structured output format for evaluation
3. Tool integration for enhanced capabilities
4. Research-oriented logging and metrics
"""
def __init__(self, model: str):
super().__init__(model, "your_use_case", "producer")
def _initialize_tools(self) -> List[FunctionTool]:
"""Initialize domain-specific tools for the producer."""
return [
FunctionTool(func=domain_specific_function_1),
FunctionTool(func=domain_specific_function_2),
FunctionTool(func=analysis_function),
# Add tools that help generate better outputs
]
def _get_instructions(self) -> str:
"""Instructions for the domain-specific producer."""
return """
You are a senior expert in [YOUR DOMAIN] with [X]+ years of experience.
EXAMPLE DOMAINS & EXPERTISE:
📝 CONTENT WRITING:
- Content strategy and audience analysis
- SEO optimization and keyword research
- Brand voice and tone consistency
- Editorial standards and fact-checking
💻 CODE REVIEW:
- Software architecture and design patterns
- Security vulnerability assessment
- Performance optimization techniques
- Code maintainability and readability
📊 DATA ANALYSIS:
- Statistical analysis and hypothesis testing
- Data visualization and storytelling
- Machine learning model evaluation
- Business intelligence and insights
🎯 STRATEGIC PLANNING:
- Business strategy and competitive analysis
- Market research and customer insights
- Financial modeling and ROI analysis
- Risk assessment and mitigation
When generating outputs:
1. Analyze requirements thoroughly
2. Apply domain expertise and best practices
3. Create structured, comprehensive responses
4. Include specific examples and recommendations
5. Address potential challenges and solutions
6. Provide clear rationale for decisions
Your output must be structured with these sections:
- Executive Summary
- Requirements Analysis
- [Domain-Specific Section 1]
- [Domain-Specific Section 2]
- [Domain-Specific Section 3]
- Quality Assurance
- Implementation Guidance
- Best Practices
Always justify your decisions with specific reasoning.
Use your tools to enhance the quality and accuracy of your output.
"""
class YourUseCaseCritic(BaseUseCaseAgent):
"""
Reviews and critiques outputs for improvements.
Design principles:
1. Objective evaluation (prevents cognitive bias)
2. Domain expertise for accurate assessment
3. Structured feedback for actionable improvements
4. Research-oriented termination conditions
"""
def __init__(self, model: str):
super().__init__(model, "your_use_case", "critic")
def _initialize_tools(self) -> List[FunctionTool]:
"""Initialize tools for domain-specific critique."""
return [
FunctionTool(func=validation_function),
FunctionTool(func=analysis_function),
# Add tools that help evaluate quality
]
def _get_instructions(self) -> str:
"""Instructions for the domain-specific critic."""
return """
You are a principal expert and technical reviewer in [YOUR DOMAIN] with expertise in:
- [Domain review expertise 1]
- [Domain review expertise 2]
- [Quality assessment standards]
- [Industry benchmarks]
- [Best practice validation]
Your role is to critically evaluate outputs and provide structured feedback.
For each review:
1. Assess accuracy and technical correctness
2. Evaluate completeness and coverage
3. Review clarity and structure
4. Analyze efficiency and optimization
5. Check for missing components
6. Validate against best practices
Your critique must be structured with:
- Overall Assessment (EXCELLENT/GOOD/NEEDS_IMPROVEMENT/POOR)
- Specific Issues Found (categorized by severity: CRITICAL/HIGH/MEDIUM/LOW)
- Improvement Recommendations (with specific actions)
- Missing Components or Considerations
- Best Practice Violations
- [Domain-Specific Assessment Areas]
Be thorough, objective, and constructive in your feedback.
Prioritize issues by impact on quality and effectiveness.
Termination condition: Respond with "OUTPUT_APPROVED" if the output meets all requirements and follows best practices with no critical issues.
"""💡 For Beginners: Tools are special functions that agents can use to enhance their capabilities. Think of them as "superpowers" - like giving a writer access to a spell-checker, or giving a code reviewer access to security scanners.
📚 Tool Development Resources:
- ADK Tools Documentation - Official tool guide
- Function Tools Tutorial - Step-by-step tool creation
- Tool Best Practices - Design guidelines
🛠️ Common Tool Types:
- Analysis Tools: Analyze data, check quality, validate inputs
- External APIs: Weather, stock prices, news, databases
- Calculations: Math, statistics, financial modeling
- Validation: Check formats, verify facts, test compliance
💡 Tool Design Tips:
- Keep tools focused (one clear purpose)
- Make them reliable (handle errors gracefully)
- Include good documentation (clear descriptions)
- Return structured data (dictionaries with consistent format)
Create use_cases/your_use_case/tools/your_domain_tools.py:
from typing import Dict, Any, List, Optional
import structlog
logger = structlog.get_logger(__name__)
def domain_specific_function_1(input_param: str, **kwargs) -> Dict[str, Any]:
"""
Domain-specific tool function.
Args:
input_param: Description of the parameter
**kwargs: Additional context from agent execution
Returns:
Dict containing tool results and metadata
"""
try:
# Implement your domain-specific logic
result = {
"success": True,
"data": "Tool-specific output",
"metadata": {
"tool_name": "domain_specific_function_1",
"execution_time": "measurement if needed",
"parameters_used": {"input_param": input_param}
}
}
logger.info(
"Domain tool executed successfully",
tool_name="domain_specific_function_1",
input_param=input_param
)
return result
except Exception as e:
logger.error(
"Domain tool execution failed",
tool_name="domain_specific_function_1",
error=str(e)
)
return {
"success": False,
"error": str(e),
"data": None
}
def validation_function(output: str, requirements: str, **kwargs) -> Dict[str, Any]:
"""
Validate output against requirements.
Args:
output: The generated output to validate
requirements: Original requirements
**kwargs: Additional context
Returns:
Validation results with scores and feedback
"""
# Implement domain-specific validation logic
return {
"validation_score": 0.8, # 0.0 - 1.0
"issues_found": ["List of specific issues"],
"recommendations": ["List of improvements"],
"compliance_status": "PASS/FAIL",
"details": "Detailed validation analysis"
}
# Add more domain-specific tools as neededCreate use_cases/your_use_case/orchestrator.py:
from typing import Dict, Any
import structlog
from framework.base_orchestrator import BaseUseCaseOrchestrator
from .agents import YourUseCaseProducer, YourUseCaseCritic
from .config import USE_CASE_CONFIG
logger = structlog.get_logger(__name__)
class YourUseCaseOrchestrator(BaseUseCaseOrchestrator):
"""
Orchestrates your domain-specific workflow with producer-critic pattern.
"""
def __init__(self, use_case_config: Dict[str, Any] = None):
config = use_case_config or USE_CASE_CONFIG
super().__init__(config)
def _initialize_agents(self) -> Dict[str, Any]:
"""Initialize domain-specific agents."""
default_model = "gemini-2.5-flash-lite"
agents = {
"producer": YourUseCaseProducer(default_model),
"critic": YourUseCaseCritic(default_model)
}
logger.info(
"Domain agents initialized",
use_case=self.use_case,
producer_model=agents["producer"].model,
critic_model=agents["critic"].model
)
return agents
def _should_terminate(self, state: Dict[str, Any]) -> bool:
"""
Domain-specific termination conditions.
Customize based on your domain's quality requirements.
"""
# Check if critic approved the output
critic_output = state.get("critic_output", {})
if isinstance(critic_output, str) and "OUTPUT_APPROVED" in critic_output:
logger.info("Output approved by critic, terminating reflection")
return True
# Check for structured critic response
if isinstance(critic_output, dict):
overall_assessment = critic_output.get("overall_assessment", "")
if overall_assessment == "EXCELLENT":
logger.info("Output rated as excellent, terminating reflection")
return True
# Check if no critical issues remain
critical_issues = critic_output.get("critical_issues", [])
if not critical_issues:
high_issues = critic_output.get("high_issues", [])
if len(high_issues) <= 1: # Domain-specific threshold
logger.info("No critical issues, terminating reflection")
return True
# Check improvement stagnation (optional)
iteration_count = state.get("reflection:iteration", 0)
if iteration_count >= 2:
quality_history = state.get("quality_history", [])
if len(quality_history) >= 2:
recent_improvement = quality_history[-1] - quality_history[-2]
if recent_improvement < 0.05: # Less than 5% improvement
logger.info("Quality improvement stagnated, terminating reflection")
return True
return False
def prepare_producer_input(self, original_input: str,
critique: Dict[str, Any] = None) -> Dict[str, Any]:
"""
Prepare input for producer agent, incorporating critique if available.
"""
producer_input = {
"requirements": original_input,
"iteration_context": {
"is_refinement": critique is not None,
"previous_critique": critique
}
}
if critique:
producer_input["improvement_focus"] = [
"Address critical issues: " + ", ".join(critique.get("critical_issues", [])),
"Resolve high priority issues: " + ", ".join(critique.get("high_issues", [])),
"Implement recommendations: " + ", ".join(critique.get("recommendations", []))
]
producer_input["refinement_instructions"] = f"""
This is a refinement iteration for {self.use_case}. Focus on:
1. Addressing all critical and high-priority issues from the critique
2. Implementing specific recommendations provided
3. Maintaining the good aspects of the previous output
4. Improving overall quality while preserving working components
"""
return producer_input💡 For Beginners: The evaluator measures how good the output is across different aspects (like grading a paper on content, grammar, structure, etc.). This provides the research data to compare different approaches.
📚 Quality Assessment Resources:
- Multi-Criteria Decision Analysis - Theory behind multi-dimensional evaluation
- Rubric Design Guide - Creating evaluation criteria
- Inter-rater Reliability - Ensuring consistent evaluation
🎯 Quality Dimension Examples:
- Technical Domains: Accuracy, completeness, efficiency, security
- Creative Domains: Originality, engagement, clarity, relevance
- Analytical Domains: Rigor, insight, actionability, evidence
- Communication Domains: Clarity, persuasiveness, tone, structure
💡 Evaluation Tips:
- Use 0.0 to 1.0 scale for consistency
- Define clear criteria for each score level
- Weight dimensions by importance to your domain
- Include specific feedback for improvement
Create use_cases/your_use_case/evaluator.py:
from typing import Dict, Any, List, Optional
import structlog
from framework.base_evaluator import BaseUseCaseEvaluator, QualityScore
from .config import QUALITY_DIMENSIONS
logger = structlog.get_logger(__name__)
class YourUseCaseEvaluator(BaseUseCaseEvaluator):
"""
Evaluates output quality for your specific domain.
"""
def __init__(self, evaluator_model: str = None):
super().__init__("your_use_case", evaluator_model)
def _get_quality_dimensions(self) -> List[QualityDimension]:
"""Get domain-specific quality dimensions."""
return QUALITY_DIMENSIONS
async def evaluate_output(self,
output: Dict[str, Any],
original_input: str,
context: Optional[Dict[str, Any]] = None) -> List[QualityScore]:
"""
Evaluate output quality across all dimensions.
Customize this method for your domain's specific evaluation needs.
"""
try:
# Extract the actual output content
output_text = output.get("response", str(output))
# Prepare evaluation prompt for the evaluation agent
evaluation_prompt = f"""
Evaluate this {self.use_case} output across multiple quality dimensions.
Original Requirements:
{original_input}
Generated Output:
{output_text}
Evaluate across these dimensions: {[d.name for d in self.dimensions]}
For each dimension, provide:
1. Score (0.0 to 1.0)
2. Reasoning for the score
3. Specific issues found
4. Improvement suggestions
Format your response as a structured analysis for each dimension.
"""
# Execute evaluation (you may need to implement this differently)
# For now, using fallback scores based on output characteristics
scores = []
for dimension in self.dimensions:
# Implement domain-specific scoring logic
score = self._calculate_dimension_score(dimension, output_text, original_input)
scores.append(score)
logger.info(
"Quality evaluation completed",
use_case=self.use_case,
dimension_count=len(scores),
overall_score=self.calculate_overall_score(scores)
)
return scores
except Exception as e:
logger.error(
"Quality evaluation failed",
use_case=self.use_case,
error=str(e)
)
# Return fallback scores
return [
QualityScore(
dimension=dim.name,
score=0.5, # Neutral fallback
reasoning=f"Evaluation failed: {str(e)}",
specific_issues=["Evaluation system error"],
improvement_suggestions=["Fix evaluation system"]
)
for dim in self.dimensions
]
def _calculate_dimension_score(self,
dimension: QualityDimension,
output_text: str,
original_input: str) -> QualityScore:
"""
Calculate score for a specific quality dimension.
Implement domain-specific scoring logic here.
"""
# Example scoring logic - customize for your domain
if dimension.name == "completeness":
# Score based on output length and requirement coverage
score = min(1.0, len(output_text) / 1000) # Adjust threshold
reasoning = f"Output length: {len(output_text)} characters"
issues = ["Too brief"] if score < 0.5 else []
suggestions = ["Provide more detail"] if score < 0.5 else []
elif dimension.name == "clarity":
# Score based on structure and readability
has_sections = any(marker in output_text for marker in ["##", "###", "**", "1.", "2."])
score = 0.8 if has_sections else 0.4
reasoning = f"Structured format: {has_sections}"
issues = ["Lacks clear structure"] if not has_sections else []
suggestions = ["Add headings and sections"] if not has_sections else []
else:
# Default scoring - implement specific logic for each dimension
score = 0.5
reasoning = f"Default scoring for {dimension.name}"
issues = []
suggestions = ["Implement specific scoring logic"]
return QualityScore(
dimension=dimension.name,
score=score,
reasoning=reasoning,
specific_issues=issues,
improvement_suggestions=suggestions
)Create use_cases/your_use_case/docker/Dockerfile:
# Multi-stage build using pip with pyproject.toml (proven approach)
FROM python:3.11-slim as builder
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Install build dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy project files
COPY pyproject.toml README.md ./
# Install dependencies using pip (which can read pyproject.toml)
RUN pip install --no-cache-dir -e .[dev]
# Production stage
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV USE_CASE=your_use_case
ENV API_PORT=8002 # Use next available port
# Install runtime dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder stage
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
# Copy the source code
COPY . .
# Create entrypoint script
RUN echo '#!/bin/bash\n\
set -e\n\
echo "Starting Your Use Case API Server"\n\
echo "Port: ${API_PORT:-8002}"\n\
echo "Use Case: ${USE_CASE:-your_use_case}"\n\
\n\
export USE_CASE=your_use_case\n\
export API_PORT=${API_PORT:-8002}\n\
\n\
echo "Python: $(python --version)"\n\
echo "Starting API server on port ${API_PORT}..."\n\
exec uvicorn api.use_case_server:app \\\n\
--host 0.0.0.0 \\\n\
--port ${API_PORT} \\\n\
--log-level info \\\n\
--access-log' > /entrypoint.sh && chmod +x /entrypoint.sh
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:${API_PORT}/health || exit 1
EXPOSE ${API_PORT}
CMD ["/entrypoint.sh"]Create use_cases/your_use_case/docker/docker-compose.yml:
services:
your_use_case:
container_name: your_use_case_container
build:
context: ../../.. # Build from project root
dockerfile: use_cases/your_use_case/docker/Dockerfile
env_file: ../../../.env
ports:
- "127.0.0.1:8002:8002" # Use next available port
environment:
- USE_CASE=your_use_case
- API_PORT=8002
networks:
- research_framework_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8002/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
your_use_case_research:
container_name: your_use_case_research
build:
context: ../../..
dockerfile: use_cases/your_use_case/docker/Dockerfile
env_file: ../../../.env
ports:
- "127.0.0.1:8892:8888" # Jupyter port for this use case
environment:
- USE_CASE=your_use_case
- JUPYTER_ENABLE_LAB=yes
- JUPYTER_TOKEN=your_use_case_research_token
volumes:
- ../../../research/data:/app/research/data
- ../../../research/notebooks:/app/research/notebooks
networks:
- research_framework_network
command: ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
profiles: ["research"]
networks:
research_framework_network:
external: trueAdd your use case to the main docker-compose.yml:
# Add to the main docker-compose.yml services section
your_use_case:
container_name: your_use_case_container
build:
context: .
dockerfile: use_cases/your_use_case/docker/Dockerfile
env_file: .env
ports:
- "127.0.0.1:8002:8002"
environment:
- USE_CASE=your_use_case
- API_PORT=8002
networks:
- research_framework_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8002/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40sUpdate docs/PORT_ALLOCATION.md:
## Port Allocation Strategy
| Port | Service | Use Case | Purpose |
|------|---------|----------|---------|
| 8000 | Main API | Framework | Cross-use-case orchestration |
| 8001 | System Design | system_design | Cloud architecture design |
| 8002 | Your Use Case | your_use_case | [Your domain description] |
| 8003 | Available | - | Next use case |
| 8888 | Jupyter Lab | research | Data analysis |
| 8891 | System Design Research | system_design | Research notebooks |
| 8892 | Your Use Case Research | your_use_case | Research notebooks |# Build your use case container
docker-compose up -d --build your_use_case
# Check health
curl http://localhost:8002/health
# Check available endpoints
curl http://localhost:8002/# Test chat mode (producer only)
curl -X POST http://localhost:8002/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Your domain-specific test request...",
"mode": "chat",
"model": "gemini-2.5-flash-lite"
}'# Test baseline mode (with quality evaluation)
curl -X POST http://localhost:8002/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Your domain-specific test request...",
"mode": "baseline",
"model": "gemini-2.5-flash-lite"
}'# Test reflection mode (producer-critic iterations)
curl -X POST http://localhost:8002/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Your domain-specific test request...",
"mode": "reflection",
"model": "gemini-2.5-flash-lite",
"reflection_iterations": 3
}'Check that your response includes:
{
"response": "Domain-specific output...",
"mode_used": "reflection",
"model_used": "gemini-2.5-flash-lite",
"quality_score": 0.75,
"reflection_iterations_used": 2,
"processing_time_seconds": 45.2,
"session_id": "uuid",
"use_case": "your_use_case"
}# use_cases/code_review/config.py
QUALITY_DIMENSIONS = [
QualityDimension("correctness", 0.30, "Code correctness and bug detection"),
QualityDimension("security", 0.25, "Security vulnerability identification"),
QualityDimension("maintainability", 0.20, "Code maintainability and readability"),
QualityDimension("performance", 0.15, "Performance optimization suggestions"),
QualityDimension("best_practices", 0.10, "Adherence to coding standards")
]
# Producer: Senior software engineer reviewing code
# Critic: Principal engineer validating the review
# Tools: Static analysis, security scanning, performance profiling# use_cases/content_generation/config.py
QUALITY_DIMENSIONS = [
QualityDimension("relevance", 0.25, "Relevance to target audience and purpose"),
QualityDimension("engagement", 0.25, "Engagement and readability"),
QualityDimension("accuracy", 0.20, "Factual accuracy and credibility"),
QualityDimension("creativity", 0.15, "Originality and creative approach"),
QualityDimension("structure", 0.15, "Organization and flow")
]
# Producer: Content strategist and writer
# Critic: Editorial reviewer and fact-checker
# Tools: SEO analysis, readability scoring, fact-checking# use_cases/strategic_planning/config.py
QUALITY_DIMENSIONS = [
QualityDimension("strategic_alignment", 0.30, "Alignment with business objectives"),
QualityDimension("feasibility", 0.25, "Practical feasibility and resource requirements"),
QualityDimension("risk_assessment", 0.20, "Risk identification and mitigation"),
QualityDimension("innovation", 0.15, "Innovation and competitive advantage"),
QualityDimension("measurability", 0.10, "Clear metrics and success criteria")
]
# Producer: Senior strategy consultant
# Critic: Executive advisor and risk assessor
# Tools: Market analysis, competitive intelligence, financial modeling# For complex tools requiring external APIs or databases
class DomainSpecificTool(BaseTool):
def __init__(self):
super().__init__(
name="domain_specific_tool",
description="Tool for domain-specific analysis"
)
async def execute(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
# Implement complex tool logic
# Can include external API calls, database queries, etc.
pass# For domains requiring specialized evaluation
async def custom_evaluation_agent(self, output: str, requirements: str) -> Dict[str, Any]:
"""Use a separate ADK agent for quality evaluation."""
evaluation_agent = LlmAgent(
model="gemini-2.5-pro", # Use higher model for evaluation
name=f"{self.use_case}_evaluator",
instruction="Detailed evaluation instructions...",
tools=[]
)
# Execute evaluation agent
result = await self._run_agent(evaluation_agent, {
"output": output,
"requirements": requirements
})
return resultdef _should_terminate(self, state: Dict[str, Any]) -> bool:
"""Customize termination logic for your domain."""
# Example: Code review termination
if self.use_case == "code_review":
critic_output = state.get("critic_output", {})
security_issues = critic_output.get("security_issues", [])
if security_issues:
return False # Never terminate with security issues
# Example: Content generation termination
elif self.use_case == "content_generation":
quality_score = state.get("quality_score", 0)
if quality_score < 0.7:
return False # Require higher quality for content
# Call parent termination logic
return super()._should_terminate(state)Create config/experiments/your_use_case_pilot.json:
{
"name": "your_use_case_pilot_study",
"description": "Pilot study comparing reflection vs capability for [your domain]",
"use_case": "your_use_case",
"models": ["gemini-2.5-flash-lite", "gemini-2.5-flash", "gemini-2.5-pro"],
"modes": ["baseline", "reflection"],
"reflection_iterations": [1, 2, 3],
"test_scenarios": ["simple", "medium", "complex"],
"evaluation_dimensions": [
"accuracy", "completeness", "clarity", "efficiency", "creativity"
],
"sample_size": 10,
"randomization": true
}Create research/notebooks/your_use_case_analysis.ipynb:
# Jupyter notebook for analyzing your use case results
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load experiment results
results_df = pd.read_json('research/data/experiments/your_use_case_results.json')
# Analyze quality vs processing time
plt.figure(figsize=(10, 6))
sns.scatterplot(data=results_df, x='processing_time_seconds', y='quality_score',
hue='mode', style='model')
plt.title('Quality vs Processing Time: Your Use Case')
plt.show()
# Compare baseline vs reflection
baseline_scores = results_df[results_df['mode'] == 'baseline']['quality_score']
reflection_scores = results_df[results_df['mode'] == 'reflection']['quality_score']
print(f"Baseline mean quality: {baseline_scores.mean():.3f}")
print(f"Reflection mean quality: {reflection_scores.mean():.3f}")
print(f"Quality improvement: {(reflection_scores.mean() - baseline_scores.mean()):.3f}")# Deploy just your use case
cd use_cases/your_use_case
docker-compose up -d
# Or deploy with main framework
cd ../../..
docker-compose up -d your_use_caseimport requests
# Test your use case API
response = requests.post('http://localhost:8002/chat', json={
"message": "Your domain-specific request...",
"mode": "reflection",
"model": "gemini-2.5-flash-lite",
"reflection_iterations": 2
})
result = response.json()
print(f"Quality Score: {result['quality_score']}")
print(f"Processing Time: {result['processing_time_seconds']}s")
print(f"Iterations Used: {result['reflection_iterations_used']}")# Collect data for your research
import asyncio
import json
async def run_experiment(scenarios, models, modes):
results = []
for scenario in scenarios:
for model in models:
for mode in modes:
result = await test_use_case(scenario, model, mode)
results.append(result)
# Save results for analysis
with open(f'research/data/{use_case}_experiment_results.json', 'w') as f:
json.dump(results, f, indent=2)
return results- Clear personas: Define specific expertise and experience levels
- Structured outputs: Require consistent formatting for evaluation
- Domain tools: Integrate relevant tools for enhanced capabilities
- Termination keywords: Use consistent approval language
- Domain relevance: Choose dimensions that matter for your field
- Weighted importance: Reflect real-world priorities
- Measurable criteria: Define clear scoring guidelines
- Expert validation: Test with domain professionals
- Progressive complexity: Simple → Medium → Complex scenarios
- Realistic constraints: Budget, time, resource limitations
- Professional scenarios: Real-world applicability
- Comparative baselines: Multiple model and mode combinations
- Start simple: Basic functionality first, then add complexity
- Test incrementally: Validate each component before integration
- Monitor performance: Track quality, time, and resource usage
- Document thoroughly: Clear instructions for future developers
# Error: ModuleNotFoundError: No module named 'google.adk'Solution:
- Check your
pyproject.tomlincludesgoogle-adk>=1.14.1 - Rebuild your Docker container:
docker-compose up -d --build your_use_case - Verify installation:
docker exec your_container pip list | grep google-adk
📚 Learn More: Python Import System
# Error: 'LlmAgent' object has no attribute 'run'Solution:
# ❌ Wrong way:
result = await agent.run(input_data)
# ✅ Correct way (use our base_agent.py):
result = await producer.run({"input": "Your message here"})📚 Learn More: Google ADK Runner Documentation
# Error: failed to solve: process "/bin/sh -c pip install..." did not completeSolution:
- Copy the working
Dockerfilefromuse_cases/system_design/docker/Dockerfile - Update only the
USE_CASEandAPI_PORTenvironment variables - Use the proven pip-based approach (not UV or Poetry)
📚 Learn More: Docker Best Practices
# Error: curl: (7) Failed to connect to localhost:8002Solution:
- Check container status:
docker ps - Check logs:
docker logs your_container - Verify port mapping in
docker-compose.yml - Test health endpoint:
curl http://localhost:8002/health
This means your evaluation is using fallback scores.
Solution:
- Check evaluator logs for errors
- Verify quality dimensions are properly defined
- Test evaluation logic with simple inputs
- Use fallback scoring as a starting point
# ❌ Wrong: weights sum to 1.2
QualityDimension("accuracy", 0.4, ...),
QualityDimension("clarity", 0.4, ...),
QualityDimension("completeness", 0.4, ...)
# ✅ Correct: weights sum to 1.0
QualityDimension("accuracy", 0.5, ...),
QualityDimension("clarity", 0.3, ...),
QualityDimension("completeness", 0.2, ...)This is often correct behavior! The critic approves good outputs early.
To Test:
- Use vague prompts to force multiple iterations
- Check critic instructions include termination keywords
- Look for "DESIGN_APPROVED" or "EXCELLENT" in critic responses
The critic never approves the output.
Solution:
- Check critic instructions include termination condition
- Add maximum iteration limits (safety net)
- Review critic's evaluation criteria (might be too strict)
- Copy the system_design use case as a template
- Change only the domain-specific parts (instructions, tools)
- Test each component separately before integrating
- Use our working examples as reference
# Start with these (easiest to modify):
config.py # Define what your domain does
agents.py # Change the instructions and expertise
# Then move to these (more complex):
tools/ # Add domain-specific capabilities
evaluator.py # Customize quality measurement
orchestrator.py # Usually no changes needed# ❌ Too complex:
"You are an expert with deep knowledge of advanced methodologies..."
# ✅ Clear and specific:
"You are a senior software engineer. Review code for bugs, security issues, and best practices."# ❌ Vague:
QualityDimension("goodness", 0.5, "How good it is")
# ✅ Specific:
QualityDimension("technical_accuracy", 0.5, "Correctness of technical decisions and implementations")# ❌ No error handling:
def my_tool(input_data):
return expensive_api_call(input_data)
# ✅ Robust error handling:
def my_tool(input_data):
try:
result = expensive_api_call(input_data)
return {"success": True, "data": result}
except Exception as e:
logger.error(f"Tool failed: {e}")
return {"success": False, "error": str(e)}# ✅ Always test each step:
# 1. Test agent creation
curl http://localhost:8002/health
# 2. Test basic chat
curl -X POST http://localhost:8002/chat -d '{"message":"test","mode":"chat"}'
# 3. Test baseline mode
curl -X POST http://localhost:8002/chat -d '{"message":"test","mode":"baseline"}'
# 4. Test reflection mode
curl -X POST http://localhost:8002/chat -d '{"message":"test","mode":"reflection"}'- AI Agents 101 - DeepLearning.AI courses
- Python for AI - Python skills
- Our Chapter 4 - Reflection pattern explained
- Copy system_design and modify step by step
- Google ADK Quickstart
- FastAPI Tutorial
- Docker for Python
- Follow this guide step by step
- ADK Advanced Patterns
- Research Methodology
- Framework Architecture
- Customize and extend based on your needs
- Read Chapter 4_ Reflection.txt (understand the pattern)
- Watch AI agents tutorial videos (understand the concepts)
- Set up development environment (Docker, Python, IDE)
- Run the existing system_design use case (see it work)
- Copy system_design folder to your_use_case
- Change the domain in config.py (your expertise area)
- Modify agent instructions (your domain knowledge)
- Test basic functionality (chat mode)
- Add domain-specific tools (if needed)
- Customize quality dimensions (what matters in your domain)
- Test baseline and reflection modes
- Debug and refine based on results
- Run comparison experiments (baseline vs reflection)
- Analyze results and gather insights
- Deploy for expert evaluation
- Document findings and share with community
- Look at
use_cases/system_design/files - See how system design solves similar problems
- Copy working patterns and adapt them
# Test each component separately:
# 1. Can you create the agent?
# 2. Can you execute it in chat mode?
# 3. Does the quality evaluation work?
# 4. Does reflection mode work?# Begin with minimal implementation:
# - No tools initially
# - Simple quality dimensions
# - Basic agent instructions
# - Add features incrementally- Google ADK GitHub - Issues and discussions
- ADK Community - Official community
- AI Agent Discord/Slack - LangChain community
- Stack Overflow - Technical questions
"I followed the guide to create a content generation use case. The Producer writes blog posts, and the Critic reviews them for SEO and engagement. It works great!"
Key Success Factors:
- Started with system_design as template
- Focused on clear instructions first
- Added tools gradually
- Tested each step thoroughly
"I compared reflection vs baseline for code review. Reflection found 40% more bugs with the same model! Publishing at next conference."
Key Success Factors:
- Chose measurable quality dimensions
- Ran systematic experiments
- Validated with expert developers
- Documented methodology clearly
The System Design use case serves as the reference implementation. Study these files:
use_cases/system_design/agents.py- Producer-Critic agent implementationuse_cases/system_design/orchestrator.py- Workflow managementuse_cases/system_design/evaluator.py- Quality assessmentuse_cases/system_design/config.py- Configuration and test scenariosuse_cases/system_design/tools/cloud_pricing.py- Domain toolsuse_cases/system_design/docker/Dockerfile- Container configuration
Your new use case is ready when:
- ✅ Producer agent generates domain-appropriate outputs
- ✅ Critic agent provides objective evaluation and improvement suggestions
- ✅ Quality evaluation works across your defined dimensions
- ✅ Reflection mode shows iterative improvement
- ✅ Baseline comparison demonstrates research capabilities
- ✅ Docker deployment works independently
- ✅ API endpoints respond with structured research data
- ✅ Performance metrics are captured accurately
When you create new use cases:
- Document your approach in this guide
- Share quality dimensions that work well for your domain
- Contribute tools that might be useful for other domains
- Report issues and improvements to the framework
- Publish research results to advance the field
For questions about extending the framework:
- Study the reference implementation (system_design use case)
- Check the comprehensive build summary for technical details
- Review ADK documentation for agent development patterns
- Test incrementally and validate each component
- Use the debugging techniques documented in the main guide
This framework makes it easy to extend research to new domains while maintaining:
- Consistent methodology across use cases
- Comparable results for cross-domain analysis
- Professional quality outputs for real-world validation
- Research rigor with measurable outcomes
Your new use case will contribute to the broader understanding of reflection vs capability trade-offs across different AI application domains! 🎯✨
We'll create a blog post generation use case step-by-step. Perfect for beginners!
# Create the directory structure
mkdir -p use_cases/content_generation/{tools,docker}
cd use_cases/content_generation
# Create the required files
touch __init__.py config.py agents.py orchestrator.py evaluator.py
touch tools/__init__.py tools/content_tools.py
touch docker/Dockerfile docker/docker-compose.yml# use_cases/content_generation/config.py
from typing import Dict, Any, List
from framework.base_evaluator import QualityDimension
# What does this use case do?
USE_CASE_CONFIG = {
"name": "content_generation",
"description": "AI-powered blog post and article generation with editorial review",
"complexity_levels": ["simple", "medium", "complex"],
"evaluation_dimensions": ["relevance", "engagement", "accuracy", "creativity", "structure"]
}
# How do we measure quality? (Like a grading rubric)
QUALITY_DIMENSIONS = [
QualityDimension(
name="relevance",
weight=0.25, # 25% of total score
description="How well the content matches the target audience and purpose",
scale_description="0.0 = Completely off-topic, 1.0 = Perfectly relevant"
),
QualityDimension(
name="engagement",
weight=0.25, # 25% of total score
description="How engaging and interesting the content is to read",
scale_description="0.0 = Boring/hard to read, 1.0 = Highly engaging"
),
QualityDimension(
name="accuracy",
weight=0.20, # 20% of total score
description="Factual accuracy and credibility of information",
scale_description="0.0 = Major factual errors, 1.0 = Completely accurate"
),
QualityDimension(
name="creativity",
weight=0.15, # 15% of total score
description="Originality and creative approach to the topic",
scale_description="0.0 = Generic/cliché, 1.0 = Highly original"
),
QualityDimension(
name="structure",
weight=0.15, # 15% of total score
description="Organization, flow, and readability of the content",
scale_description="0.0 = Poor structure, 1.0 = Excellent organization"
)
]
# What scenarios will we test?
TEST_SCENARIOS = {
"simple": [
{
"id": "basic_blog_post",
"input": "Write a 500-word blog post about the benefits of remote work for productivity.",
"expected_components": ["introduction", "main_points", "conclusion", "engaging_tone"]
}
],
"medium": [
{
"id": "technical_article",
"input": "Write a 1000-word technical article explaining machine learning to business executives, including practical applications and ROI considerations.",
"expected_components": ["executive_summary", "technical_explanation", "business_value", "examples", "action_items"]
}
],
"complex": [
{
"id": "thought_leadership",
"input": "Write a 1500-word thought leadership article on the future of AI in healthcare, including current challenges, emerging opportunities, regulatory considerations, and a 5-year outlook.",
"expected_components": ["industry_analysis", "trend_identification", "regulatory_landscape", "predictions", "strategic_recommendations"]
}
]
}# use_cases/content_generation/agents.py
from typing import Dict, Any, List
from framework.base_agent import BaseUseCaseAgent
from google.adk.tools import FunctionTool
import structlog
logger = structlog.get_logger(__name__)
class ContentGenerationProducer(BaseUseCaseAgent):
"""
The "Writer" - Creates blog posts and articles.
Think of this as your AI content writer.
"""
def __init__(self, model: str):
super().__init__(model, "content_generation", "producer")
def _initialize_tools(self) -> List[FunctionTool]:
"""Tools the writer can use (start with none for simplicity)"""
return [] # We'll add tools later
def _get_instructions(self) -> str:
"""Instructions that define the writer's personality and expertise"""
return """
You are a senior content strategist and writer with 10+ years of experience.
Your expertise includes:
- Content strategy and audience analysis
- SEO optimization and keyword research
- Brand voice and tone development
- Editorial standards and best practices
- Engagement optimization and conversion writing
When creating content:
1. Analyze the target audience and purpose
2. Create compelling headlines and introductions
3. Structure content for maximum readability
4. Include relevant examples and actionable insights
5. Optimize for engagement and shareability
6. Ensure factual accuracy and credibility
Your output must be structured with these sections:
- Compelling Headline
- Executive Summary (for longer pieces)
- Well-organized Main Content
- Key Takeaways
- Call to Action (when appropriate)
Always write in an engaging, professional tone that matches the intended audience.
"""
class ContentGenerationCritic(BaseUseCaseAgent):
"""
The "Editor" - Reviews and improves content.
Think of this as your AI editor and fact-checker.
"""
def __init__(self, model: str):
super().__init__(model, "content_generation", "critic")
def _initialize_tools(self) -> List[FunctionTool]:
"""Tools the editor can use for review"""
return [] # We'll add tools later
def _get_instructions(self) -> str:
"""Instructions that define the editor's review criteria"""
return """
You are a principal editor and content strategist with expertise in:
- Editorial review and content optimization
- Fact-checking and accuracy verification
- Audience engagement and readability analysis
- SEO and content performance optimization
- Brand consistency and voice guidelines
Your role is to critically evaluate content and provide structured feedback.
For each review:
1. Assess relevance to target audience and purpose
2. Evaluate engagement and readability
3. Check factual accuracy and credibility
4. Analyze creativity and originality
5. Review structure and organization
6. Identify missing elements or improvements
Your critique must be structured with:
- Overall Assessment (EXCELLENT/GOOD/NEEDS_IMPROVEMENT/POOR)
- Specific Issues Found (categorized by severity: CRITICAL/HIGH/MEDIUM/LOW)
- Improvement Recommendations (with specific actions)
- Missing Elements
- Engagement Opportunities
- SEO and Optimization Suggestions
Be thorough, objective, and constructive in your feedback.
Termination condition: Respond with "CONTENT_APPROVED" if the content meets all requirements and editorial standards with no critical issues.
"""The orchestrator and evaluator are complex - let's copy and modify the working ones:
# Copy the working orchestrator (minimal changes needed)
cp use_cases/system_design/orchestrator.py use_cases/content_generation/orchestrator.py
# Copy the working evaluator (we'll customize it)
cp use_cases/system_design/evaluator.py use_cases/content_generation/evaluator.pyThen make these simple changes:
# In orchestrator.py, change these lines:
from .agents import ContentGenerationProducer, ContentGenerationCritic # Line ~4
# Change "DESIGN_APPROVED" to "CONTENT_APPROVED" in termination condition
# In evaluator.py, change:
super().__init__("content_generation", evaluator_model) # In __init__Copy the working Docker setup:
# Copy the proven Dockerfile
cp use_cases/system_design/docker/Dockerfile use_cases/content_generation/docker/Dockerfile
# Edit these lines in the Dockerfile:
ENV USE_CASE=content_generation
ENV API_PORT=8002# Build and start your use case
docker-compose up -d --build content_generation
# Test it works
curl http://localhost:8002/health
# Test content generation
curl -X POST http://localhost:8002/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Write a blog post about the benefits of remote work for productivity.",
"mode": "chat",
"model": "gemini-2.5-flash-lite"
}'
# Test reflection mode
curl -X POST http://localhost:8002/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Write a blog post about the benefits of remote work for productivity.",
"mode": "reflection",
"model": "gemini-2.5-flash-lite",
"reflection_iterations": 2
}'🎉 Congratulations! You've built your first AI agent use case!
- Anthropic's Guide to AI Agents - Prompt engineering
- OpenAI Agent Patterns - Function calling and tools
- Multi-Agent Systems - Academic paper on agent coordination
- LangGraph Documentation - Alternative agent framework
- CrewAI Framework - Another multi-agent approach
- AutoGen Framework - Microsoft's agent framework
- Evaluating LLM Applications - Anthropic's evaluation guide
- AI Safety and Alignment - Safety considerations
- Empirical Methods in AI - Academic conference for methodology
- Docker for Data Science - Docker tutorial
- FastAPI Production Guide - Deployment best practices
- Container Security - Security considerations
- Real Python - Comprehensive Python tutorials
- Python Type Hints - Type checking with mypy
- Async Programming - Official asyncio documentation
- Focus on methodology - understand the research design first
- Study the quality dimensions - how we measure improvement
- Customize evaluation criteria - what matters in your field
- Run systematic experiments - collect publishable data
Key Resources:
Iterative Reflection vs.txt- Research methodologyCOMPREHENSIVE_BUILD_SUMMARY.md- Technical implementation- Research notebooks in
research/folder
- Understand the architecture - modular, containerized design
- Study the ADK patterns - agent creation and execution
- Focus on code quality - error handling, logging, testing
- Optimize performance - Docker builds, API response times
Key Resources:
framework/folder - Core implementation patternsuse_cases/system_design/- Reference implementation- Docker configurations and best practices
- Define your expertise - what knowledge should the agent have?
- Create evaluation criteria - how do you judge quality in your field?
- Design test scenarios - what challenges should the agent handle?
- Validate results - does the output meet professional standards?
Key Resources:
- Agent instruction templates in this guide
- Quality dimension examples
- Test scenario frameworks
- 📖 Read Chapter 4_ Reflection.txt (understand the pattern)
- 🎥 Watch AI agents tutorial video (understand concepts)
- 🐳 Install Docker and test system_design use case
- 📁 Create your use case directory structure
- 📝 Define your domain in config.py
- 🤖 Create Producer agent instructions (your domain expert)
- 🔍 Create Critic agent instructions (your domain reviewer)
- 🐳 Copy and modify Docker configuration
- ✅ Test basic chat mode functionality
- 📊 Define quality dimensions for your domain
- 🔄 Test baseline mode (with quality evaluation)
- 🔁 Test reflection mode (producer-critic iterations)
- 🐛 Debug and fix any issues
- 📈 Validate quality measurements
- 🧪 Run comparison experiments (baseline vs reflection)
- 📊 Analyze results and insights
- 👥 Deploy for expert evaluation (if applicable)
- 📝 Document your findings
- 🌟 Share with the community
Total Time Investment: ~1 week for beginners, 2-3 days for experienced developers
# Is Docker running?
docker --version
# Is the main framework working?
curl http://localhost:8001/health
# Are there any containers running?
docker ps# Copy the entire system_design folder
cp -r use_cases/system_design use_cases/my_test_case
# Change just the name and port
# In config.py: change "system_design" to "my_test_case"
# In docker files: change port 8001 to 8003
# Test if it works before customizing- Discord/Slack: Join AI development communities
- Stack Overflow: Tag questions with
google-adk,ai-agents - GitHub Issues: Check existing issues and discussions
- Email: Reach out to the framework maintainers
- Take a break - Complex systems take time to understand
- Start smaller - Copy system_design and change just the instructions
- Focus on one thing - Get chat mode working before reflection
- Ask for help - The community is friendly and helpful
- Learn incrementally - You don't need to understand everything at once
Remember: Every expert was once a beginner! 🌟
- ✅ Your use case responds to chat requests
- ✅ Quality evaluation returns meaningful scores
- ✅ Reflection mode shows iterative improvement
- ✅ You understand what each file does
- ✅ You can debug issues using logs and testing
- ✅ You're excited to try new domains and experiments!
Happy researching! 🔬✨
- GitHub Discussions - ADK community
- Reddit r/MachineLearning - ML research community
- AI Research Discord - Real-time help
- LangChain Community - Agent development
When you succeed:
- 🌟 Star this repository - Help others find it
- 📝 Document your use case - Share your approach
- 🐛 Report issues - Help improve the framework
- 📊 Share results - Contribute to research knowledge
- 🎓 Mentor others - Help the next generation of researchers
Building on this framework makes you part of advancing AI agent research! 🚀