Background

Seedance 1.5 Pro — Lip-Sync AI Video in 8+ Languages

The first Seedance model with joint audio-video generation. Seedance 1.5 Pro produces videos with phoneme-level lip-sync across 8+ languages, dynamic resolution pricing from 480p to 1080p, and fine-grained duration control from 4 to 12 seconds.

Seedance 1.5 Pro — Lip-Sync AI Video in 8+ Languages

Video Generator
0 / 2000
5s
Cost 130 creditsRemaining 0 credits
Video Preview

Seedance 1.5 Pro: Joint Audio-Video Generation with Lip-Sync in 8+ Languages

Seedance 1.5 Pro is ByteDance's first model to ship joint audio-video generation. Audio and video are produced together — not layered after the fact — through the same Dual-Branch Diffusion Transformer architecture later carried into Seedance 2.0. Its standout capability is phoneme-level multilingual lip-sync: characters speak with accurate mouth shapes in English, Chinese, Japanese, Korean, Spanish, French, German, Portuguese, and more. Combined with three resolution tiers and duration from 4 to 12 seconds, it covers everything from quick social clips to longer narrative scenes.

Text to Video

Describe a scene and receive a video with motion, camera work, and optionally synchronized dialogue or sound effects. Prompts support detailed multi-part instructions with emotional tone and environmental context.

Image to Video

Upload a photo or illustration and the model infers natural motion — hair movement, fabric sway, body gestures — while preserving fine details from the source image including skin texture, accessories, and background elements.

Joint Audio Generation

Audio is generated in the same inference pass as video through a Dual-Branch Diffusion Transformer. The output includes layered sound: spoken dialogue with lip-sync, context-aware foley effects, and environmental ambience, mixed and balanced automatically.

Phoneme-Level Multilingual Lip-Sync

Speaking characters exhibit mouth-shape accuracy at the phoneme level across 8+ languages. Unlike post-production dubbing, the lip movements are generated simultaneously with the audio, producing natural alignment without manual adjustment.

From Prompt to Lip-Synced Video

1

1. Write a Prompt or Upload an Image

Describe your scene in text, or upload a reference image to animate. For image-to-video, you can also provide an end frame to define the final composition.

2

2. Set Resolution, Duration, and Language

Pick 480p, 720p, or 1080p. Set duration anywhere from 4 to 12 seconds. If generating speech, select the target language for lip-sync.

3

3. Generate and Download

The model produces video and audio together in one pass. Credits are calculated dynamically based on your resolution, aspect ratio, and duration — lower settings cost fewer credits.

What Seedance 1.5 Pro Offers

Three Resolution Tiers with Dynamic Pricing

Choose 480p for fast drafts, 720p for the standard quality-to-cost ratio, or 1080p for final output. Credits scale with pixel count — a 480p video costs roughly a quarter of a 1080p video at the same duration.

4 to 12 Seconds, Per-Second Granularity

Set any duration between 4 and 12 seconds. Unlike models that only offer fixed 5s or 10s lengths, 1.5 Pro gives per-second control — pay only for the length you need.

Six Aspect Ratios Including 21:9 Ultrawide

16:9, 9:16, 1:1, 4:3, 3:4, and 21:9. The 21:9 ultrawide option is uncommon among AI video generators and suits widescreen and film trailer formats.

Camera Lock

Lock the camera to a fixed position for product demos, interview framing, or any shot where camera movement would be distracting. When unlocked, the model generates natural camera motion based on scene content.

Seed Reproducibility

Pass a seed value to reproduce the same output across generations. Useful for A/B testing different prompts while keeping visual style consistent.

Start and End Image Control

Provide both a starting image and an ending image to define the first and last frames. The model interpolates between them, enabling controlled transitions and predictable story arcs.

Showcases

Seedance 1.5 Pro Video Examples

Videos generated by Seedance models — joint audio-video output, multilingual lip-sync, and physics-aware motion across different genres and styles.

Snowy Forest at Dusk
Classic Martial Arts Training
Supercar Mountain Jump
Sci-Fi Hero in the Lab
Gymnast on the Balance Beam
K-Pop Music Video Scene

Frequently Asked Questions








Lip-Sync in Any Language

Joint audio-video generation with phoneme-level multilingual lip-sync. Three resolution tiers, 4-12 second duration, dynamic pricing from 20 credits.