
Kling 2.6 produces short text-to-video and image-to-video clips with optional sound. Use it for prompts that need synchronized speech, sound effects, or ambient audio without adding a separate dubbing workflow.
Kling 2.6 is available in this generator for both text-to-video and image-to-video workflows. The current API exposes 5-second and 10-second durations, 1:1, 16:9, and 9:16 aspect ratios for text prompts, and an optional sound toggle for audio-visual output.
Transform text prompts into short videos, with optional synchronized audio when the scene needs dialogue, narration, sound effects, or ambient sound.
Bring a static image to life with a motion prompt. Upload one reference image, describe the action, and choose whether the output should include sound.
Generate speech, dialogue, narration, singing, rap, ambient sound effects, and mixed audio when sound is enabled.
Choose 5-second or 10-second duration. Text-to-video supports 16:9 landscape, 9:16 portrait, and 1:1 square aspect ratios.
Choose text-to-video to generate entirely from a written prompt, or switch to image-to-video to animate a reference image.
Set your aspect ratio (16:9, 9:16, or 1:1), pick a duration of 5 or 10 seconds, and decide whether to enable native audio generation. When audio is enabled, specify the type — dialogue, narration, sound effects, singing, or a combination.
Start generation and receive a short video clip. When sound is enabled, the provider returns an audio-visual output rather than a silent clip.
When sound is enabled, Kling 2.6 can generate speech, sound effects, and ambient audio alongside the video, so the clip does not need a separate audio pass.
Kling 2.6 is useful for human motion prompts, product motion, camera moves, and short social clips where audio timing and scene pacing both matter.
For dialogue or narration prompts, sound-enabled generations can align speech with the video, reducing the need for a separate lip-sync step.
The Kie API also documents a Kling 2.6 motion-control endpoint. This generator currently exposes the core text-to-video and image-to-video workflows.
Start from a prompt when you want an entirely generated scene, or upload an image when composition and subject identity should come from a reference.
Control the prompt, duration, aspect ratio for text-to-video, image reference for image-to-video, and the sound toggle without exposing unsupported quality modes.
Explore videos generated by Kling models — synchronized audio, precise human movement, and detailed visual storytelling across diverse scenarios.






Create short Kling 2.6 videos from text or images, with optional synchronized speech, sound effects, and ambient audio.