Upscale Forge's Video Generation tool gives you access to three frontier video generation models: Sora 2 from OpenAI, Kling 2.6 from Kuaishou, and Veo 3.1 from Google DeepMind. This tutorial explains what each model excels at, how to write effective video prompts, and how to go from a prompt to a production-ready clip.
Sora 2 generates cinematic video with strong understanding of physics, object permanence, and complex multi-subject scenes. It handles camera movement well — you can specify pans, zooms, tracking shots, and aerial movements in your prompt and the model follows them. Best for: cinematic content, product commercials, realistic scene recreation, and video with complex camera work.
Kling 2.6 specializes in human motion and expression, producing realistic people moving naturally — walking, gesturing, performing. It's also fast relative to other frontier models. Best for: lifestyle content, people-centric video, social media clips, and situations where turnaround time matters.
Veo 3.1 delivers exceptional temporal consistency and fine detail over longer clips. It handles naturalistic environments — landscapes, cityscapes, atmospheric effects — better than alternatives. Best for: nature and environment content, longer clips, and any video where detail continuity over time is critical.
What is happening, where, with what subjects? "A barista pours latte art in a warm, busy coffee shop, morning light coming through large windows."
Shot type and movement: "close-up shot," "wide establishing shot," "slow push in," "overhead drone shot," "handheld documentary style," "slow motion." Camera specification is one of the highest-leverage prompt elements for video.
Visual style: "cinematic, anamorphic lens flares," "editorial photography style," "iPhone documentary," "golden hour lighting," "blue hour urban." These influence color grading, grain, and overall aesthetic.
Select your clip length (typically 3–10 seconds per generation) and aspect ratio (16:9 for landscape, 9:16 for vertical/mobile, 1:1 for social square).
All three models support image-to-video generation: you provide a still image as the starting frame, and the model animates it. This is powerful for:
For image-to-video, upload your source image, then write a prompt describing how you want it to move or animate. "The camera slowly zooms in, the curtains sway gently in the breeze" applied to an interior photo will animate the scene.
Text-to-video and image-to-video available on all paid plans.
Open Video Generation