Semantic Foragecast Engine

Audio-driven mascot animation pipeline. Give it a character image and an audio file — it returns a lip-synced, beat-synchronized animated video.

How It Works

mascot.png + audio.wav → [Phase 1] → [Phase 2] → [Phase 3] → output.mp4
                          audio prep   compositing  FFmpeg

Phase 1 — Audio Prep (prep_audio.py) Analyses the audio with LibROSA (beat detection, onset detection, tempo), extracts phoneme timings via Rhubarb Lip Sync (mock fallback included), and parses optional lyrics to a timed word list. Outputs prep_data.json.

Phase 2 — Sprite Composition (compose_animation.py) Composites the mascot image frame-by-frame: swaps mouth sprites based on phoneme timing, applies beat-synchronized body motion (bob, scale pulse), and overlays background and lighting effects. Pure Python — no external renderer required.

Phase 3 — Video Export (export_video.py) Encodes the frame sequence to MP4 via FFmpeg with configurable codec, quality, and resolution presets.

Requirements

Python 3.9+
FFmpeg (for Phase 3)
Rhubarb Lip Sync (optional — mock fallback used if absent)

Install Python dependencies:

pip install -r requirements.txt

Quick Start

# Run the full pipeline
python main.py --config config.yaml

# Run only Phase 1 (audio analysis)
python main.py --phase 1

# Run only Phase 2 (composition, requires prep_data.json)
python main.py --phase 2

# Run only Phase 3 (video export, requires frames/)
python main.py --phase 3

# Validate config and inputs without running
python main.py --validate

Configuration

All pipeline behaviour is driven by a YAML config file. Minimal example:

inputs:
  mascot_image: examples/demo_fox.png
  song_file: examples/demo_song.wav
  lyrics_file: examples/demo_lyrics.txt   # optional

character:
  sprites_dir: sprites/                    # mouth sprites (A B C D E F G H X)
  mouth_region:
    x: 200    # pixel position on mascot image
    y: 280
    w: 112
    h: 70

animation:
  fps: 24
  body_bob_px: 8          # vertical bob amplitude in pixels
  body_bob_beats: true    # sync bob to detected beats
  background_color: [30, 20, 40]

output:
  output_dir: outputs/
  frames_dir: outputs/frames/
  prep_json: outputs/prep_data.json
  video_name: final_video.mp4

video:
  fps: 24
  resolution: [1920, 1080]
  codec: libx264
  quality: high           # ultra_fast | fast | medium | high | production

rhubarb:
  executable_path: null   # set to rhubarb binary path, or leave null for mock

See DEVELOPER_GUIDE.md for the full config reference and extension examples.

Mouth Sprites

Phase 2 expects 9 mouth sprite PNG files (transparent background, sized to fit mouth_region) in the configured sprites_dir. Filenames map to Rhubarb's phoneme set:

File	Phoneme	Mouth shape
`mouth_X.png`	X (rest/silence)	Closed
`mouth_A.png`	A	Open, oval
`mouth_B.png`	B/M/P	Closed, pressed
`mouth_C.png`	C	Relaxed open
`mouth_D.png`	D	Slightly open
`mouth_E.png`	E	Wide, teeth showing
`mouth_F.png`	F/V	Bottom lip up
`mouth_G.png`	G	Narrow open
`mouth_H.png`	H	Open, round

generate_sprites.py supports two modes:

# V1 — geometric: instant, zero GPU, skin-tone matched cartoon shapes
python generate_sprites.py --image examples/demo_fox.png --out sprites/

# V2 — AI: SD 1.5 inpainting on Apple MPS (~15s/phoneme, ~2.5 min total)
python generate_sprites.py --image examples/demo_fox.png --out sprites/ --mode ai

The AI mode uses StableDiffusionInpaintPipeline with an elliptical mask centred on the mouth region and phoneme-specific prompts. Sprites are feathered into the mascot face using a Gaussian ellipse mask at composite time.

Project Structure

semantic-foragecast-engine/
├── main.py                  # Pipeline orchestrator + CLI
├── prep_audio.py            # Phase 1: audio analysis
├── compose_animation.py     # Phase 2: sprite compositor
├── export_video.py          # Phase 3: FFmpeg export
├── generate_sprites.py      # Helper: geometric (V1) + AI inpainting (V2) sprites
├── examples/
│   ├── demo_fox.png         # Built-in fox mascot (RGBA, transparent bg)
│   ├── mascot_cat.png       # AI-generated cat mascot (SD 1.5 text-to-image)
│   ├── demo_song.wav
│   └── demo_lyrics.txt
├── sprites/                 # Fox mouth sprites (geometric or AI)
├── sprites_cat/             # Cat mouth sprites
├── config.yaml              # Fox pipeline configuration
├── config_cat.yaml          # Cat pipeline configuration
├── requirements.txt
├── pyproject.toml
└── tests/
    ├── test_prep_audio.py
    ├── test_compose_animation.py
    ├── test_generate_sprites.py
    ├── test_export_video.py
    └── test_e2e_pipeline.py

Running Tests

pip install -e ".[dev]"
pytest

Phase 1 and Phase 3 have full test coverage. Phase 2 compositor tests require Pillow and opencv-python (included in requirements.txt).

Roadmap

Shipped

Cartoon LoRA — fine-tuned SD model for cleaner AI mascot generation on first try
Real Rhubarb lip-sync — replace mock phoneme data with actual binary
Head/body split — separate mascot layers for independent head-bob vs body-bob
AI mouth sprites per character — cat, owl, and future mascots get V2 sprite sets
Stage effects — glow, colour grading, vignette, particle bursts on beat drops
PyPI package — pip install semantic-foragecast-engine

License

MIT — see LICENSE

Acknowledgements

LibROSA — audio analysis
Rhubarb Lip Sync — phoneme extraction
FFmpeg — video encoding
Pillow — image compositing
OpenCV — frame processing

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github		.github
assets		assets
docs		docs
examples		examples
outputs		outputs
sprites		sprites
sprites_cat		sprites_cat
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
AUTOMATED_LYRICS_GUIDE.md		AUTOMATED_LYRICS_GUIDE.md
CASE_STUDIES.md		CASE_STUDIES.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CROSS_PLATFORM_DEV_GUIDE.md		CROSS_PLATFORM_DEV_GUIDE.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
LICENSE		LICENSE
PIPELINE_TEST_EVALUATION.md		PIPELINE_TEST_EVALUATION.md
POSITIONING_GUIDE.md		POSITIONING_GUIDE.md
README.md		README.md
RESOLUTION_QUALITY_COMPARISON.md		RESOLUTION_QUALITY_COMPARISON.md
SECURITY.md		SECURITY.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TEST_EVALUATION.md		TEST_EVALUATION.md
compose_animation.py		compose_animation.py
config.yaml		config.yaml
config_cat.yaml		config_cat.yaml
export_video.py		export_video.py
generate_sprites.py		generate_sprites.py
main.py		main.py
pipeline_run.log		pipeline_run.log
pipeline_run_fixed.log		pipeline_run_fixed.log
prep_audio.py		prep_audio.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-lyrics-auto.txt		requirements-lyrics-auto.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Foragecast Engine

How It Works

Requirements

Quick Start

Configuration

Mouth Sprites

Project Structure

Running Tests

Roadmap

Shipped

Next

License

Acknowledgements

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Semantic Foragecast Engine

How It Works

Requirements

Quick Start

Configuration

Mouth Sprites

Project Structure

Running Tests

Roadmap

Shipped

Next

License

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages