Implement webhook handler Lambda that receives GitHub events, clones repositories with full history, and initiates analysis pipeline.
- Memory: 1GB (1,024 MB) - Optimized for git performance
- Ephemeral Storage: 5GB (5,120 MB) - Testing configuration for large repositories
- Timeout: 5 minutes - Accommodate large repository clones
- Runtime: Python 3.12
- Cost Impact: FREE for moderate usage (within AWS Free Tier)
Approach: Git CLI with two-step process for complete repository handling
# Clone complete repository with full history
subprocess.run([
'git', 'clone',
f'https://github.com/{owner}/{repo}.git',
'/tmp/repo_full'
], check=True)# Checkout exact commit that triggered webhook
subprocess.run([
'git', 'checkout', commit_sha
], cwd='/tmp/repo_full', check=True)# Generate clean source snapshot without .git metadata
subprocess.run([
'git', 'archive', commit_sha,
'--format=tar', '--output=/tmp/workingcopy.tar'
], cwd='/tmp/repo_full', check=True)- Webhook validation: GitHub secret verification
- Event scope: Push events only (initial implementation)
- Repository access: Public repositories only for MVP
- Authentication: GitHub personal access token
Three-directory structure uploaded to Drawer S3 bucket:
repos/{owner}/{repo}/{commit_sha}/
├── workingcopy/ # Clean source code for analysis
├── repohistory/ # Full git repository with history
└── analysis/ # Reserved for Analyst outputs
- Webhook Receipt: Validate GitHub webhook signature
- Metadata Extraction: Parse repository owner, name, commit SHA, default branch
- Repository Check: Verify repository doesn't have existing analysis/ folder
- Git Operations: Execute three-step cloning process
- Drawer Upload: Store workingcopy and repohistory in S3
- Event Publishing: Emit "repo_ready" event to Telephonist
Extract from GitHub webhook payload:
repository.owner.login→ Repository ownerrepository.name→ Repository namerepository.default_branch→ Target branch (main/master)head_commit.id→ Commit SHA for exact state
{
"source": "coderipple.system",
"detail-type": "repo_ready",
"detail": {
"component": "Receptionist",
"repository": {
"owner": "username",
"name": "repo-name",
"default_branch": "main",
"commit_sha": "abc123..."
},
"s3_location": "repos/username/repo-name/abc123/",
"timestamp": "2025-06-30T18:00:00Z"
}
}- Immediate Response: Return 200 OK to GitHub (async processing)
- CloudWatch Logging: Comprehensive error logging for debugging
- Graceful Failures: Handle git errors, network issues, S3 failures
- Event Publishing: Log all operations to Hermes for transparency
- High Memory: 1GB allocation for faster git operations
- Large Storage: 5GB ephemeral storage for large repositories
- Parallel Operations: Concurrent upload of workingcopy and repohistory
- Compression: ZIP compression for efficient S3 transfer
- Git CLI: Available in Lambda runtime environment
- boto3: AWS SDK for S3 operations and EventBridge publishing
- requests: GitHub API interactions and webhook validation
- zipfile: Repository compression for storage
- Gatekeeper: Receives validated webhook events
- Drawer: Stores repository files in organized structure
- Telephonist: Publishes repo_ready events for pipeline coordination
- Hermes: Logs all operations for system transparency
Ready for Lambda function implementation with git CLI integration and Drawer storage operations.