
The first Seedance model with joint audio-video generation. Seedance 1.5 Pro produces videos with phoneme-level lip-sync across 8+ languages, dynamic resolution pricing from 480p to 1080p, and fine-grained duration control from 4 to 12 seconds.
Seedance 1.5 Pro is ByteDance's first model to ship joint audio-video generation. Audio and video are produced together — not layered after the fact — through the same Dual-Branch Diffusion Transformer architecture later carried into Seedance 2.0. Its standout capability is phoneme-level multilingual lip-sync: characters speak with accurate mouth shapes in English, Chinese, Japanese, Korean, Spanish, French, German, Portuguese, and more. Combined with three resolution tiers and duration from 4 to 12 seconds, it covers everything from quick social clips to longer narrative scenes.
Describe a scene and receive a video with motion, camera work, and optionally synchronized dialogue or sound effects. Prompts support detailed multi-part instructions with emotional tone and environmental context.
Upload a photo or illustration and the model infers natural motion — hair movement, fabric sway, body gestures — while preserving fine details from the source image including skin texture, accessories, and background elements.
Audio is generated in the same inference pass as video through a Dual-Branch Diffusion Transformer. The output includes layered sound: spoken dialogue with lip-sync, context-aware foley effects, and environmental ambience, mixed and balanced automatically.
Speaking characters exhibit mouth-shape accuracy at the phoneme level across 8+ languages. Unlike post-production dubbing, the lip movements are generated simultaneously with the audio, producing natural alignment without manual adjustment.
Describe your scene in text, or upload a reference image to animate. For image-to-video, you can also provide an end frame to define the final composition.
Pick 480p, 720p, or 1080p. Set duration anywhere from 4 to 12 seconds. If generating speech, select the target language for lip-sync.
The model produces video and audio together in one pass. Credits are calculated dynamically based on your resolution, aspect ratio, and duration — lower settings cost fewer credits.
Choose 480p for fast drafts, 720p for the standard quality-to-cost ratio, or 1080p for final output. Credits scale with pixel count — a 480p video costs roughly a quarter of a 1080p video at the same duration.
Set any duration between 4 and 12 seconds. Unlike models that only offer fixed 5s or 10s lengths, 1.5 Pro gives per-second control — pay only for the length you need.
16:9, 9:16, 1:1, 4:3, 3:4, and 21:9. The 21:9 ultrawide option is uncommon among AI video generators and suits widescreen and film trailer formats.
Lock the camera to a fixed position for product demos, interview framing, or any shot where camera movement would be distracting. When unlocked, the model generates natural camera motion based on scene content.
Pass a seed value to reproduce the same output across generations. Useful for A/B testing different prompts while keeping visual style consistent.
Provide both a starting image and an ending image to define the first and last frames. The model interpolates between them, enabling controlled transitions and predictable story arcs.
Videos generated by Seedance models — joint audio-video output, multilingual lip-sync, and physics-aware motion across different genres and styles.






Joint audio-video generation with phoneme-level multilingual lip-sync. Three resolution tiers, 4-12 second duration, dynamic pricing from 20 credits.
2K cinematic video with native audio
4K video generation model
Video generation with audio support
Turbo Pro video generation
AI image generation model
Next-gen AI image generation
4K AI image generation
AI image editing model
Ultra-fast AI image generation