The Pipeline

How FilmGen Works

Six specialized AI agents collaborate in sequence. Each step feeds into the next, producing a polished cinematic video from your music.

FilmGen Pipeline
Step 1

Upload Your Track

Drop in any audio file — MP3, WAV, FLAC. Add optional reference images for character consistency. The system accepts up to 14 reference images per project.

Supported: MP3, WAV, FLAC, OGG · Max 50MB · Reference images: PNG, JPG
Step 2

AI Analyzes Your Music

Gemini extracts the musical DNA — BPM, key, mood, energy curve, vocal segments, and structural markers. ElevenLabs or Azure Speech aligns lyrics word-by-word.

Models: Gemini 3 Flash · Speech: ElevenLabs / Azure · Output: Beat-synced timestamps
Step 3

Visual Treatment Created

Based on the analysis, the AI creates a full art direction document: color palette, camera style, scene descriptions, character notes, and per-scene visual prompts.

Models: Gemini 3.1 Pro · Output: JSON treatment with scene-by-scene breakdown
Step 4

Scene Images Generated

GPT-image-1.5 generates character-consistent stills for each scene, using up to 14 reference images with high-fidelity mode for maximum identity preservation.

Model: Azure GPT-image-1.5 · Resolution: up to 1536×1024 · High input fidelity
Step 5

Videos Come to Life

SORA-2 (or Veo 3.1) animates each still into cinematic video clips. Scene images are resized and fed as input references for smooth, consistent motion.

Models: Azure SORA-2 / Veo 3.1 · Duration: 4-12s per clip · Resolution: 1280×720
Step 6

Quality Review & Export

Every clip is scored on composition, motion, and prompt adherence. Failures auto-regenerate. Once approved, clips are assembled with audio into the final video.

Scoring: Multi-metric AI review · Assembly: FFmpeg · Export: MP4 with original audio

Ready to Try It?

Upload a track and watch the pipeline transform it into cinema.