
ByteDance video model for text-to-video and image-to-video generation. Seedance 2.0 creates 480p or 720p videos up to 15 seconds with native audio-video output, reference-driven control, and flexible aspect ratios.
Seedance 2.0 is built on ByteDance's unified multimodal audio-video architecture. Official model documentation lists text, image, video, and audio reference support with native 480p and 720p output from 4 to 15 seconds. This web generator exposes the core text-to-video and image-to-video workflows with native audio, resolution, duration, and aspect-ratio controls.
Key specifications of the Seedance 2.0 model.
Max Resolution
Sound with Video
Max Duration
Describe the scene in natural language, or switch to image-to-video and upload a starting image to animate.
Choose 480p or 720p, set duration from 4 to 15 seconds, pick an aspect ratio, and enable or disable native sound.
Seedance 2.0 processes the prompt and references, then returns a synchronized audio-video clip. Credit cost depends on resolution, duration, and text-to-video versus image-to-video mode.
Audio and video are generated together instead of as a separate dubbing step. Dialogue, sound effects, music, and ambience can be synchronized with the visuals.
Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld motion can be described directly in the prompt.
ByteDance incorporated physics-aware training that penalizes impossible motion during generation. Cloth drapes and wrinkles naturally, water splashes with correct weight, collisions have impact, and characters shift balance when walking.
Use image-to-video mode to preserve the look of a starting image while adding camera motion, object movement, and environmental action.
16:9, 9:16, 1:1, 4:3, 3:4, and 21:9. These cover horizontal video, vertical social formats, square feeds, portraits, and ultrawide scenes.
A 5-second Seedance 2.0 text-to-video starts at 20 credits in 480p and 45 credits in 720p. Image-to-video costs more because it conditions on a reference image.
Text-to-video, image-to-video, physics-aware motion, and native audio examples generated by Seedance models.






Text-to-video and image-to-video with 480p/720p output, native audio, and up to 15-second duration.