Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
The GenAIIDP solution includes an integrated Code Intelligence feature that provides intelligent codebase understanding and analysis capabilities through an AI-powered chat interface. This feature enables developers and users to interactively explore, understand, and analyze complex codebases using natural language queries, making code comprehension and maintenance significantly more efficient.
The Code Intelligence feature provides comprehensive codebase analysis and understanding capabilities through:
- Intelligent Code Analysis: Natural language queries about codebase structure, functionality, and architecture
- File System Management: Efficient handling of large codebases with smart caching and filtering
- Conversation Memory: Persistent chat sessions with DynamoDB integration for continuous learning
- Lambda Integration: Serverless deployment with automatic codebase extraction and initialization
- Multi-Format Support: Support for various file types including Python, JavaScript, Jupyter notebooks, and more
- Context-Aware Responses: Deep technical analysis with accurate code references and examples
- Real-time Code Exploration: Interactive exploration of codebase components and relationships
- Secure Architecture: Enterprise-grade security with comprehensive audit trails and monitoring
- Natural Language Code Queries: Ask questions about code functionality, architecture, and implementation details
- Intelligent File Discovery: Automatic identification and analysis of relevant code files
- Codebase Overview Generation: Comprehensive understanding of project structure and component relationships
- Multi-File Analysis: Simultaneous analysis of multiple related files for comprehensive understanding
- Notebook Support: Special handling for Jupyter notebooks with image removal and optimization
- Smart Caching: File hashing and overview caching for improved performance
- Conversation Persistence: Maintain context across multiple queries within a session
- Advanced Filtering: Configurable ignore patterns to focus on relevant code
- Performance Optimization: Sliding window conversation management and tool result optimization
- Comprehensive Monitoring: Detailed logging and performance tracking
The architecture of the Code Intelligence feature is shown below. The Web UI and AppSync API components are the same as used by the rest of the IDP system (with new AppSync endpoints added). The system uses Lambda functions for serverless processing with codebase files stored in the Lambda's /tmp directory (with future plans to migrate to EFS for enhanced scalability).
The Code Intelligence feature uses a streamlined architecture with:
- IDP Helper Agent Request Handler: Receives and validates incoming code intelligence requests
- IDP Helper Agent Request Processor: Processes queries using the Strands framework and specialized tools
- Conversation History Management: Persistent storage of chat sessions and analysis results
- Codebase File System: Currently uses Lambda /tmp directory, with future EFS integration planned
- Web UI Integration: Seamless integration with the existing GenAI IDP web interface
- Request Reception: User submits a natural language question about the codebase through the web UI
- Codebase Initialization: System extracts and prepares codebase files in the Lambda environment
- Context Loading: Agent loads codebase overview and determines relevant files for analysis
- Intelligent Analysis: Agent processes the query using specialized tools and codebase understanding
- Response Generation: System generates comprehensive responses with code examples and technical insights
- Result Display: Final results are displayed in the web interface with conversation history
For codebase analysis queries, the Code Intelligence Agent follows this structured workflow:
- Codebase Overview Loading: Agent loads comprehensive codebase structure and file purposes using
load_codebase_overview_context - Relevance Assessment: Determines if specific file contents are needed beyond the overview context
- Intelligent File Retrieval: Retrieves relevant files in ranked order of importance using
read_multiple_files - Multi-File Analysis: Analyzes multiple related files simultaneously for comprehensive understanding
- Context-Aware Response: Provides technical responses with code examples, architectural insights, and implementation details
- Conversation Continuity: Maintains context for follow-up questions within the same session
The Code Intelligence feature implements enterprise-grade security:
- Secure File Access: Controlled access to codebase files with proper authentication and authorization
- Session Isolation: Each user's queries and analysis are isolated and tracked separately
- Audit Trail: Comprehensive logging of all interactions for security reviews and compliance
- Data Protection: Sensitive code information is handled securely with proper encryption
- Access Control: Integration with existing IDP authentication and authorization mechanisms
- Resource Management: Proper cleanup of temporary files and resources after processing
The code intelligence agent has access to specialized tools for comprehensive code analysis:
- Purpose: Loads existing codebase overview with file purposes and relationships
- Usage: Automatically called to understand project structure and component relationships
- Features:
- High-level overview mode for large codebases (300+ files)
- Detailed analysis mode for comprehensive understanding
- Cached results for improved performance
- Purpose: Efficiently reads multiple related files for comprehensive analysis
- Features:
- Character limit management to respect context windows
- Smart file prioritization and ranking
- Batch processing for improved performance
- Support for various file types and encodings
- Purpose: Specialized handling of Jupyter notebooks
- Features:
- Automatic image removal for size optimization
- JSON structure preservation
- Size limit enforcement (2GB default)
- Content extraction and formatting
- Purpose: Comprehensive file system operations and caching
- Features:
- SHA256 hashing for change detection
- Intelligent ignore pattern matching
- Directory tree generation
- File collection with filtering
- Log in to the GenAI IDP Web UI
- Navigate to the "IDP Accelerator Help" section in the main navigation
- You'll see a chat-like interface for querying the codebase
The Code Intelligence agent can answer various types of questions about your codebase:
Architecture and Structure Questions:
- "What is the main architecture of this codebase?"
- "How are the different modules organized?"
- "What are the key components and their relationships?"
- "Explain the overall system design and data flow"
Functionality and Implementation Questions:
- "How does the document processing pipeline work?"
- "What are the different patterns supported by this system?"
- "Explain how the agent framework is implemented"
- "How does the authentication and authorization work?"
Code Analysis Questions:
- "What are the main classes and their purposes?"
- "Show me the key functions in the analytics module"
- "How is error handling implemented across the system?"
- "What design patterns are used in this codebase?"
Configuration and Setup Questions:
- "How do I configure the system for my environment?"
- "What environment variables are required?"
- "How do I set up the development environment?"
- "What are the deployment requirements?"
Here are some example questions you can ask about the IDP codebase:
"Explain the difference between Pattern 1, Pattern 2, and Pattern 3 in this IDP system"
"How does the agent framework work and what agents are available?"
"What are the main configuration options and how do I customize them?"
"Show me how document processing works from upload to final results"
"How is the web UI integrated with the backend services?"
"What security measures are implemented in this system?"
"How do I add a new document type for processing?"
"Explain the monitoring and logging capabilities"
The Code Intelligence agent provides comprehensive responses including:
- Technical Explanations: Detailed explanations of code functionality and architecture
- Code Examples: Relevant code snippets with proper context and annotations
- Architectural Insights: High-level system design and component relationships
- Implementation Details: Specific implementation patterns and best practices
- Configuration Guidance: Setup and configuration instructions with examples
- Troubleshooting Help: Common issues and their solutions
Each response includes:
- Clear technical explanations with appropriate depth
- Direct references to relevant code sections and files
- Step-by-step guidance for complex procedures
- Best practices and recommendations
The Code Intelligence feature currently uses the Lambda function's /tmp directory for codebase storage:
Advantages:
- Fast Access: Direct file system access with minimal latency
- Simple Implementation: No additional infrastructure required
- Cost Effective: No additional storage costs beyond Lambda execution
Limitations:
- Size Constraints: Limited to 10GB total storage in
/tmp - Ephemeral Storage: Files are lost when Lambda container is recycled
- Cold Start Impact: Codebase extraction required on each cold start
Current Workflow:
- Initialization: Codebase zip files are extracted to
/tmp/codebaseon Lambda startup - Processing: Agent tools read files directly from the
/tmpdirectory - Caching: Overview and hash files are stored in
/tmp/outputfor performance - Cleanup: Temporary files are automatically cleaned up when Lambda container terminates
Planned migration to Amazon Elastic File System (EFS) for enhanced scalability:
Planned Advantages:
- Persistent Storage: Codebase files persist across Lambda invocations
- Larger Capacity: Support for much larger codebases (petabyte scale)
- Shared Access: Multiple Lambda instances can access the same codebase
- Faster Cold Starts: No need to extract codebase on each cold start
Migration Benefits:
- Improved Performance: Faster initialization and reduced cold start times
- Enhanced Scalability: Support for enterprise-scale codebases
- Better Reliability: Persistent storage reduces initialization failures
- Cost Optimization: Reduced Lambda execution time and costs
The Code Intelligence feature is configured through environment variables and CloudFormation parameters:
Supported Models:
us.anthropic.claude-3-7-sonnet-20250219-v1:0(Default - Recommended)us.anthropic.claude-3-5-sonnet-20241022-v2:0us.anthropic.claude-3-haiku-20240307-v1:0us.amazon.nova-pro-v1:0us.amazon.nova-lite-v1:0
The feature automatically creates:
- DynamoDB Tables: Conversation history and memory management
- Lambda Functions: Request handler and processor functions
- AppSync Resolvers: GraphQL API endpoints for web UI integration
- IAM Roles: Minimal permissions for secure operation
- S3 Integration: Codebase storage and caching (future EFS migration)
Key configuration settings:
- CODEBASE_DIR: Root directory for codebase files
- OUTPUT_DIR: Directory for generated outputs and cache
- ENABLE_MONITORING: Enable comprehensive monitoring and logging
- CONTEXT_WINDOW_SIZE: Maximum context window size in characters
- MAX_FILE_SIZE: Maximum individual file size (2MB default)
- MAX_NOTEBOOK_SIZE: Maximum notebook size (2GB default)
- MEMORY_METHOD: Memory backend method (DynamoDB or AgentCore)
- BEDROCK_REGION: AWS Bedrock region for model access
- Start with Overview: Begin with general architecture questions before diving into specifics
- Be Specific: Clearly state what aspect of the code you want to understand
- Use Context: Reference previous responses to build deeper understanding
- Ask Follow-ups: Build on previous answers to explore topics in depth
- Understand Structure First: Ask about overall architecture before specific implementations
- Focus on Key Components: Identify and explore the most important modules first
- Trace Data Flow: Follow how data moves through the system
- Explore Patterns: Understand common patterns and design principles used
- Efficient Queries: Ask focused questions to get targeted responses
- Batch Related Questions: Group related questions to leverage context
- Use High-Level Overview: For large codebases, start with high-level overview mode
- Monitor Response Times: Be aware of context window limits for complex queries
The Code Intelligence feature includes comprehensive testing utilities for local development:
- Environment Setup: Create a virtual environment and install required dependencies
- Configuration: Copy and configure environment variables from the provided template
- Testing: Run test queries against the code intelligence system
Required environment variables for testing:
- CODEBASE_DATA_PATH: Path to codebase zip files
- LAMBDA_TMP_DIR: Temporary directory for Lambda simulation
- BEDROCK_REGION: AWS Bedrock region for model access
- Memory Tables: DynamoDB table names for conversation history
- MEMORY_METHOD: Backend method for conversation persistence
The testing framework supports various types of queries:
Architecture Understanding:
- "Explain the overall system architecture and main components"
Pattern Comparison:
- "What are the differences between Pattern 1, Pattern 2, and Pattern 3?"
Implementation Details:
- "How is the document processing pipeline implemented?"
Configuration Guidance:
- "What environment variables do I need to configure for deployment?"
Agent Not Responding:
- Check CloudWatch logs for the IDP Helper Agent Request Processor Lambda function
- Verify Bedrock model access is enabled for your selected model
- Ensure sufficient Lambda timeout (15 minutes) for complex codebase analysis
- Check that codebase files are properly extracted to
/tmpdirectory
File Reading Errors:
- Verify codebase zip files are present in the expected location
- Check file permissions and encoding issues
- Monitor file size limits and context window constraints
- Review ignore patterns to ensure relevant files are not excluded
Memory and Performance Issues:
- Monitor Lambda memory usage and increase if necessary
- Check context window limits for large file analysis
- Use high-level overview mode for codebases with 300+ files
- Consider breaking complex queries into smaller, focused questions
Conversation History Issues:
- Verify DynamoDB table permissions and configuration
- Check session ID consistency across requests
- Monitor DynamoDB write throttling and capacity
- Review conversation manager settings for memory optimization
- CloudWatch Logs: Detailed logs for Lambda functions with agent execution traces
- DynamoDB Console: View conversation history and session data directly
- Agent Messages: Real-time display of agent reasoning and tool usage in web UI
- Performance Metrics: Monitor response times, file processing, and resource usage
Enable detailed debugging for troubleshooting by configuring monitoring and debug output settings in the system configuration.
The Code Intelligence feature uses several AWS services that incur costs:
- Amazon Bedrock: Model inference costs for code analysis processing
- AWS Lambda: Function execution costs for request handling and processing
- Amazon DynamoDB: Storage and request costs for conversation history and memory
- Amazon S3: Storage costs for codebase files and caching (current implementation)
- Amazon EFS: Storage and throughput costs (future implementation)
- Amazon CloudWatch: Logging and monitoring costs
- Model Selection: Choose appropriate Bedrock models based on accuracy vs. cost requirements
- Efficient Queries: Ask focused questions to minimize processing time
- Caching Utilization: Leverage codebase overview caching to reduce repeated analysis
- Memory Management: Optimize Lambda memory allocation based on codebase size
- Session Management: Use conversation history effectively to avoid redundant processing
Monitor usage through AWS Cost Explorer and set up billing alerts for cost control.
The Code Intelligence feature integrates seamlessly with other GenAI IDP capabilities:
- Shares the same Strands-based agent framework with analytics agents
- Common monitoring and logging infrastructure
- Unified conversation management and memory systems
- Consistent user interface with other IDP features
- Shared authentication and authorization mechanisms
- Common AppSync API patterns and GraphQL resolvers
- Integrated with overall IDP configuration system
- Shared environment variable patterns
- Common CloudFormation deployment templates
- Uses the same security model as other IDP components
- Integrated audit trails and compliance logging
- Shared IAM roles and permission patterns
Planned improvements for the Code Intelligence feature include:
- EFS Integration: Migration from Lambda /tmp to Amazon EFS for persistent storage
- Enhanced Scalability: Support for larger codebases and multiple concurrent users
- Improved Performance: Faster cold starts and reduced initialization overhead
The Code Intelligence feature represents a significant advancement in making complex codebases more accessible and understandable, enabling developers and stakeholders to quickly gain insights into system architecture, functionality, and implementation details through natural language interaction.

