Skip to content

Commit 5ba6d3c

Browse files
authored
docs: bootstrap CLAUDE.md for AI-assisted development
1 parent 4086395 commit 5ba6d3c

1 file changed

Lines changed: 144 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# CloudTranscode
2+
3+
## What This Is
4+
5+
CloudTranscode is bFAN's distributed media transcoding pipeline. It's a set of PHP-based activity workers that poll AWS Step Functions for transcoding jobs, then execute FFmpeg (for video) or ImageMagick (for images) to transcode media files and upload results to S3. The architecture allows horizontal scaling by running multiple workers in ECS containers.
6+
7+
## Tech Stack
8+
9+
- **Language**: PHP 7+ (legacy codebase, but clean)
10+
- **Container**: Docker (ECS deployment)
11+
- **FFmpeg**: 4.2 (video/image processing)
12+
- **ImageMagick**: convert commands for image transcoding
13+
- **AWS Services**: Step Functions (SFN), S3, ECS, EC2, IAM
14+
- **SDK**: CloudProcessingEngine-SDK (bFAN fork) for activity polling and lifecycle
15+
- **Dependencies**: AWS SDK for PHP 3.x, JSON Schema validation
16+
17+
## Quick Start
18+
19+
```bash
20+
# Setup
21+
make # Installs composer dependencies
22+
23+
# Run activities locally (requires AWS credentials and SFN ARNs)
24+
./src/activities/ValidateAssetActivity.php -A arn:aws:states:REGION:ACCOUNT:activity:ValidateAsset
25+
./src/activities/TranscodeAssetActivity.php -A arn:aws:states:REGION:ACCOUNT:activity:TranscodeAsset
26+
27+
# Run in Docker (recommended)
28+
docker build -t cloudtranscode:local .
29+
docker run cloudtranscode:local ValidateAssetActivity -A <arn>
30+
docker run cloudtranscode:local TranscodeAssetActivity -A <arn>
31+
32+
# Run tests
33+
<!-- Ask: Does this repo have tests? If so, what command runs them? -->
34+
```
35+
36+
## Project Structure
37+
38+
- `src/activities/` — Activity workers (ValidateAssetActivity, TranscodeAssetActivity, BasicActivity base class)
39+
- `src/activities/transcoders/` — Transcoder implementations (video, image, thumbnail)
40+
- `src/scripts/` — Utility scripts
41+
- `src/utils/` — Helper classes
42+
- `state_machines/` — AWS Step Functions state machine JSON definitions
43+
- `input_samples/` — Example JSON input payloads for testing workflows
44+
- `presets/` — FFmpeg preset configurations (may be deprecated; check CloudTranscode-FFMpeg-presets repo)
45+
- `benchmark/` — FFmpeg performance benchmarks on AWS EC2 instances
46+
- `Dockerfile` — Base image for ECS workers
47+
- `bootstrap.sh` — Docker entrypoint script
48+
- `Makefile` — Composer dependency installation
49+
50+
## Dependencies
51+
52+
**Internal:**
53+
- CloudProcessingEngine-SDK (bFAN fork) — activity polling, client interface callbacks, lifecycle management
54+
55+
**External:**
56+
- AWS S3 — input/output media storage
57+
- AWS Step Functions — task orchestration and distribution
58+
- FFmpeg 4.2 — video/audio/image transcoding (bundled in Docker base image)
59+
- ImageMagick — image manipulation (bundled in Docker base image)
60+
61+
**Docker base images:**
62+
- `sportarc/ffmpeg:4.2` — FFmpeg binaries
63+
- `sportarc/cloudtranscode-base:4.2` — PHP + FFmpeg + ImageMagick base
64+
65+
## API / Interface
66+
67+
**Input**: JSON payloads posted to AWS Step Functions (see `input_samples/` for examples). Structure:
68+
- `input_asset` — source file (S3 bucket, key, type)
69+
- `output_assets[]` — array of desired outputs (type, bucket, path, codec/size/preset, watermark, etc.)
70+
71+
**Output**: JSON result returned from Step Functions to client app. Includes transcoded file S3 locations, metadata, errors.
72+
73+
**Client Integration**: Implement `CpeClientInterface.php` from CloudProcessingEngine-SDK to receive callbacks:
74+
- `onStart` — workflow initiated
75+
- `onHeartbeat` — worker is alive
76+
- `onFail` — transcoding failed
77+
- `onSuccess` — workflow completed
78+
- `onTranscodeDone` — one output asset completed
79+
80+
Pass custom client class to activity workers via `-C <client class path>` option. For Docker, extend the base image and copy client classes into it.
81+
82+
## Key Patterns
83+
84+
- **Activity polling**: Workers use long-polling to fetch tasks from AWS SFN
85+
- **Sequential output processing**: One TranscodeAssetActivity worker processes all outputs in the `output_assets` array sequentially, not in parallel. To parallelize, split the workflow.
86+
- **Stateless workers**: Workers are horizontally scalable Docker containers. State lives in S3 and SFN.
87+
- **Preset-based transcoding**: FFmpeg commands can be templated using presets (e.g., `360p-4.3-generic`)
88+
- **Custom FFmpeg commands**: JSON input supports raw FFmpeg command strings for advanced use cases
89+
- **Watermarking**: Overlay images on video with custom position, opacity, size
90+
- **HTTP input**: Workers can pull source files from HTTP/S URLs instead of S3
91+
92+
## Environment
93+
94+
**Required AWS credentials** (IAM role or env vars):
95+
- `AWS_ACCESS_KEY_ID`
96+
- `AWS_SECRET_ACCESS_KEY`
97+
- `AWS_DEFAULT_REGION`
98+
99+
**Required IAM permissions:**
100+
- Step Functions: `states:GetActivityTask`, `states:SendTaskSuccess`, `states:SendTaskFailure`, `states:SendTaskHeartbeat`
101+
- S3: `s3:GetObject`, `s3:PutObject`, `s3:PutObjectAcl` on input/output buckets
102+
103+
**Runtime**: PHP 7+, FFmpeg 4.2, ImageMagick (all bundled in Docker image)
104+
105+
<!-- Ask: Are there any environment variables or config files for controlling worker behavior (timeouts, concurrency, temp directories, etc.)? -->
106+
107+
## Deployment
108+
109+
**Current setup:**
110+
- Docker image built from `Dockerfile` and pushed to ECR: `501431420968.dkr.ecr.eu-west-1.amazonaws.com/sportarc/cloudtranscode:4.2`
111+
- ECS cluster runs workers as tasks
112+
- Each worker polls a specific SFN activity ARN
113+
114+
**Deployment steps:**
115+
1. Build Docker image: `docker build -t <ecr-repo>:tag .`
116+
2. Push to ECR
117+
3. Update ECS task definition with new image tag
118+
4. Deploy new ECS service revision
119+
120+
<!-- Ask: Is there a CI/CD pipeline for this repo (GitHub Actions, CodePipeline, etc.)? What triggers deployments (manual, PR merge, tags)? -->
121+
122+
## Testing
123+
124+
<!-- Ask: Does this repo have unit tests, integration tests, or manual test procedures? What's the test coverage strategy? -->
125+
126+
**Manual testing:**
127+
- Use `input_samples/` JSON files to initiate test workflows via AWS SDK
128+
- Monitor Step Functions console for workflow execution
129+
- Check S3 output buckets for transcoded files
130+
- Review CloudWatch Logs for worker output
131+
132+
## Gotchas
133+
134+
- **Sequential processing**: TranscodeAssetActivity processes all outputs sequentially. For parallel transcoding of multiple outputs, you must split the workflow or run multiple workers with separate SFN tasks.
135+
- **Docker base image dependency**: This repo depends on two SportArchive Docker images (`sportarc/ffmpeg`, `sportarc/cloudtranscode-base`). If those images are updated, rebuild this image.
136+
- **FFmpeg version**: Locked to 4.2. Upgrading FFmpeg requires updating the base image.
137+
- **Client interface requirement**: For production use, you MUST implement a custom client interface class and extend the Dockerfile to include it. Without it, workers run but don't notify client apps of progress/completion.
138+
- **AWS SFN long polling**: Workers block on GetActivityTask calls (long polling). If AWS SFN is unavailable, workers will hang until timeout.
139+
- **Temp disk space**: Transcoding uses local disk for temporary files. Ensure ECS instances or Docker volumes have sufficient space for large video files.
140+
- **Presets location**: The `presets/` directory in this repo may be deprecated. Check if CloudTranscode-FFMpeg-presets is the canonical source.
141+
142+
<!-- Ask: What happens if a worker crashes mid-transcode? Does SFN retry, or is the task lost? Are there heartbeat intervals configured? -->
143+
<!-- Ask: How are FFmpeg presets loaded — from this repo's presets/ dir, or from CloudTranscode-FFMpeg-presets? -->
144+
<!-- Ask: What's the relationship between this repo and CloudTranscode-Lambda? When is Lambda used vs ECS workers? -->

0 commit comments

Comments
 (0)