Skip to content

astrowq/TaleSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TaleSpark

AI agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream.

Gemini Live Agent Challenge - Creative Storyteller

Focus: Multimodal Storytelling with Interleaved Output

Build an agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream. Leverage Gemini's native interleaved output to generate rich, mixed-media responses that combine narration with visuals, explanations with generated imagery, or storyboards with voiceover, all in one cohesive flow. Examples include Interactive storybooks (text + generated illustrations inline), marketing asset generator (copy + visuals + video in one go), educational explainers (narration woven with diagrams), and social content creator (caption + image + hashtags together).

Mandatory Tech: Must use Gemini's interleaved/mixed output capabilities. The agents are hosted on Google Cloud.

Project Structure

TaleSpark/
├── app.py                     # FastAPI backend
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── frontend/                  # Vue 3 + TypeScript + Vite
│   ├── package.json
│   ├── tsconfig.json
│   ├── vite.config.ts
│   ├── index.html
│   ├── public/
│   │   └── favicon.svg
│   └── src/
│       ├── main.ts            # Entry point
│       ├── App.vue            # Root component
│       ├── types.ts           # TypeScript definitions
│       ├── styles/
│       │   └── main.css      # Global styles + CSS variables
│       ├── composables/
│       │   ├── useAppState.ts      # Global state management
│       │   ├── useSSE.ts          # Server-Sent Events streaming
│       │   └── useThreeScene.ts   # Three.js particle system
│       └── components/
│           ├── WelcomeScreen.vue    # Hero with animated logo + particles
│           ├── StorySetup.vue       # Genre selection + prompt input
│           ├── StoryViewer.vue      # Streaming story display
│           ├── StoryComplete.vue    # Celebration + stats
│           ├── LoadingScreen.vue    # Animated quill writing
│           ├── GenreCard.vue        # 3D tilt genre cards
│           ├── SceneCard.vue        # Image + text + typewriter
│           └── AudioPlayer.vue     # Custom audio player
├── dist/                      # Built frontend (production)
├── static/                    # Generated images/audio
└── plans/
    └── frontend-architecture.md

Structure

flowchart LR
    %% Styles
    classDef frontend fill:#d4edda,stroke:#28a745,stroke-width:2px;
    classDef backend fill:#cce5ff,stroke:#007bff,stroke-width:2px;
    classDef ai fill:#f8d7da,stroke:#dc3545,stroke-width:2px;
    classDef cloud fill:#fff3cd,stroke:#ffc107,stroke-width:2px;

    %% Nodes
    subgraph Frontend
        UI[Web Browser]:::frontend
    end

    subgraph Backend
        API[FastAPI Endpoint]:::backend
        EQ[(Event Queue)]:::backend
        TQ[(TTS Text Queue)]:::backend
        LLM_W[Task 1: LLM Producer]:::backend
        TTS_W[Task 2: TTS Worker]:::backend
        FS[(Local Static Files)]:::backend
    end

    subgraph The_Brain
        LLM[Gemini 2.5 Pro]:::ai
        IMG[Imagen 3]:::ai
        TTS[GCP TTS API]:::cloud
    end

    %% Flow 1: Initialization
    UI -->|1. POST Prompt| API
    API -->|Starts| LLM_W
    API -->|Starts| TTS_W

    %% Flow 2: Task 1 (Text & Image Interleaved)
    LLM_W -->|2. Stream Chat| LLM
    LLM -.->|Text Chunks| LLM_W
    LLM_W -->|3. Tool Pause| IMG
    IMG -.->|Image Data| LLM_W
    
    %% Flow 3: Queues Routing
    LLM_W -->|Push Text/Img Event| EQ
    LLM_W -->|Push Sentences| TQ

    %% Flow 4: Task 2 (Parallel Audio)
    TQ -->|Pop Sentences| TTS_W
    TTS_W -->|4. Synthesize| TTS
    TTS -.->|MP3 Data| TTS_W
    TTS_W -->|Save File| FS
    TTS_W -->|Push Audio Event| EQ

    %% Flow 5: Output to Frontend
    EQ -->|5. SSE Stream| UI
    UI -.->|6. Fetch MP3/JPG| FS
Loading

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Google Cloud project with Gemini API enabled

Installation

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Install frontend dependencies
Invoke-WebRequest https://get.pnpm.io/install.ps1 -UseBasicParsing | Invoke-Expression
cd frontend
pnpm install

# 3. Configure Google Cloud (set PROJECT_ID in app.py)
# Required: Google Cloud project with Vertex AI enabled

Development

# Terminal 1: Start the FastAPI backend
python app.py
# Backend runs at http://localhost:8000

# Terminal 2: Start Vue dev server (hot reload)
cd frontend
pnpm run dev
# Frontend runs at http://localhost:5173

The frontend proxies API requests to the backend:

  • /api/*http://localhost:8000/api/*
  • /static/*http://localhost:8000/static/*

Production Build

# Build frontend
cd frontend
pnpm run build

# This creates the dist/ folder with static files

# Run production server
python app.py
# Serves the built frontend from dist/

Features

Frontend (Awwwards-quality interactive storybook)

  • Three.js Particle Background — Ambient golden particles floating upward, react to mouse movement, change color per genre
  • Genre Theming — 5 distinct themes (Fantasy, Sci-Fi, Mystery, Fairy Tale, Adventure) via CSS custom properties
  • GSAP Animations — Smooth page transitions, logo entrance, button glows, card 3D tilts
  • Real-time Streaming — Server-Sent Events deliver story content as it's generated
  • Typewriter Effect — Text streams in character-by-character with cursor
  • Custom Audio Player — Styled player with progress bar and auto-play
  • Responsive Design — Works on desktop, tablet, and mobile

Backend (Gemini AI Integration)

  • Gemini 2.5 Pro — Generates story text with interleaved tool calls
  • Imagen 3.0 — Generates scene images
  • google text-to-speech API — Converts text to speech narration
  • Server-Sent Events — Streams content in real-time

Configuration

Google Cloud Setup

  1. Create a Google Cloud project
  2. Enable Vertex AI API
  3. Set PROJECT_ID in app.py:
PROJECT_ID = "your-project-id"

Environment Variables (Optional)

For production, you might want to use environment variables:

export PROJECT_ID="your-project-id"
export LOCATION="us-central1"

Tech Stack

Layer Technology
Frontend Framework Vue 3 + TypeScript
Build Tool Vite
Animations GSAP
3D Effects Three.js
Styling CSS Custom Properties
Backend FastAPI (Python)
AI Google Gemini + Imagen
Audio google text to speech

API Endpoints

Method Endpoint Description
GET / Serve frontend
POST /api/generate Generate story (SSE stream)

Generate Story

Request:

{
  "prompt": "A young dragon discovers it can speak human languages..."
}

Response: Server-Sent Events stream

{"type": "image", "src": "/static/img_abc123.jpg"}
{"type": "text", "chunk": "Once upon a "}
{"type": "text", "chunk": "time, in a land..."}
{"type": "audio", "src": "/static/aud_def456.mp3"}

License

MIT


Credits

Built for the Gemini Live Agent Challenge.

About

AI agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors