UlazAI developer docs

Video model matrix

Video model capability matrix

This matrix is sourced from the Video Studio model registry. Use it to validate model-specific constraints before sending generation requests.

Model Engine Inputs Aspect ratios Durations Quality modes Credits estimate
Veo 3.1 Lite
veo31_lite
Most cost-effective Veo 3.1 mode (40 credits) for text-to-video and image-to-video.
Veo 3.1 text , image 16:9, 9:16, Auto 8s - base: 40
Veo 3.1 Fast
veo31_fast
Fast 8-second generation with text-to-video and image-to-video.
Veo 3.1 text , image 16:9, 9:16, Auto 8s - base: 100
Veo 3.1 Quality
veo31_quality
Higher-fidelity Veo 3.1 output with the same 8-second duration.
Veo 3.1 text , image 16:9, 9:16, Auto 8s - base: 220
Kling 3.0
kling_3_0
Supports text, image, and frame-based generation with optional elements.
Kling 3.0 text , image 16:9, 9:16, 1:1 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15s std, pro std no audio per second: 20, std audio per second: 30, pro no audio per second: 27, pro audio per second: 40
Kling 3.0 Motion Control
kling_3_0_motion_control
Requires exactly one image URL plus one motion reference video URL.
Kling 3.0 Motion Control image , video 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30s 720p, 1080p 720p per second: 12, 1080p per second: 20
Kling 2.6
kling_2_6
Stable 5s/10s generation with optional audio.
Kling 2.6 text , image 16:9, 9:16, 1:1 5, 10s - 5 no audio: 55, 10 no audio: 110, 5 audio: 110, 10 audio: 220
Seedance 2.0
seedance_2
Supports text, first-frame, first+last-frame, and multimodal image/video/audio references. For real-person footage, use pre-registered asset:// IDs.
Seedance 2.0 text , image , video 1:1, 4:3, 3:4, 16:9, 9:16, 21:9, adaptive 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15s 480p, 720p 480p 4 with video input: 86, 480p 4 no video input: 76, 480p 5 with video input: 108, 480p 5 no video input: 95, +44 more
Seedance 2.0 Fast
seedance_2_fast
Seedance 2 Fast with the same first/last frame and multimodal reference options. For real-person footage, use pre-registered asset:// IDs.
Seedance 2.0 Fast text , image , video 1:1, 4:3, 3:4, 16:9, 9:16, 21:9, adaptive 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15s 480p, 720p 480p 4 with video input: 71, 480p 4 no video input: 62, 480p 5 with video input: 88, 480p 5 no video input: 78, +44 more
Seedance 1.5 Pro
seedance_1_5_pro
Audio-video model with fixed lens, multi aspect ratios, and 4/8/12s.
Seedance 1.5 Pro text , image 1:1, 21:9, 4:3, 3:4, 16:9, 9:16 4, 8, 12s 480p, 720p, 1080p 480p 4 silent: 10, 480p 4 audio: 20, 480p 8 silent: 20, 480p 8 audio: 30, +14 more
Wan 2.6
wan_2_6
Text/image to video with 720p/1080p quality modes.
Wan 2.6 text , image 16:9 5, 10, 15s 720p, 1080p 720p 5: 70, 720p 10: 140, 720p 15: 210, 1080p 5: 105, +2 more
Wan 2.6 Video Remix
wan_2_6_v2v
Video remix flow that requires one source video input URL.
Wan 2.6 image , video 16:9 5, 10s 720p, 1080p 720p 5: 70, 720p 10: 140, 1080p 5: 105, 1080p 10: 210
Grok Imagine Video
grok_imagine_video
Mode + resolution quality selector (for example normal|720p).
Grok Imagine Video text , image 1:1, 2:3, 3:2, 9:16, 16:9 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30s normal|480p, normal|720p, fun|480p, fun|720p, spicy|480p, spicy|720p 6 480p: 10, 7 480p: 13, 8 480p: 15, 9 480p: 18, +46 more
Hailuo 2.3 Standard
hailuo_2_3_standard
Image-to-video model that requires one reference image URL.
Hailuo 2.3 image 16:9 6, 10s 768P, 1080P 768P 6: 30, 768P 10: 50, 1080P 6: 50
Hailuo 2.3 Pro
hailuo_2_3_pro
Higher-cost Hailuo tier with improved quality presets.
Hailuo 2.3 image 16:9 6, 10s 768P, 1080P 768P 6: 45, 768P 10: 90, 1080P 6: 80
Sora 2
sora_2
Sora 2 generation with 10s or 15s durations.
Sora 2 text , image landscape, portrait 10, 15s - per second: 8
Sora 2 Pro
sora_2_pro
Sora 2 Pro with high and standard quality modes.
Sora 2 text , image landscape, portrait 10, 15s high, standard per second: 8
Sora 2 Pro Storyboard
sora_2_pro_storyboard
Storyboard workflow with optional prompt and longer duration mode.
Sora 2 text , image landscape, portrait 10, 15, 25s - 10 seconds: 150, 15 25 seconds: 270
Wan 2.7 Text to Video
wan_2_7_t2v
Video engine text 16:9, 9:16, 1:1, 4:3, 3:4 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15s 720p, 1080p 720p 2: 32, 720p 3: 48, 720p 4: 64, 720p 5: 80, +24 more
Wan 2.7 Image to Video
wan_2_7_i2v
Video engine text , image 16:9, 9:16, 1:1, 4:3, 3:4 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15s 720p, 1080p 720p 2: 32, 720p 3: 48, 720p 4: 64, 720p 5: 80, +24 more
Wan 2.7 Video Edit
wan_2_7_videoedit
Video engine image , video 16:9, 9:16, 1:1, 4:3, 3:4 0, 2, 3, 4, 5, 6, 7, 8, 9, 10s 720p, 1080p 720p 2: 32, 720p 3: 48, 720p 4: 64, 720p 5: 80, +16 more
Wan 2.7 R2V
wan_2_7_r2v
Video engine text , image 16:9, 9:16, 1:1, 4:3, 3:4 2, 3, 4, 5, 6, 7, 8, 9, 10s 720p, 1080p 720p 2: 32, 720p 3: 48, 720p 4: 64, 720p 5: 80, +14 more

Model selection guidance

  • Use veo31_lite for the lowest Veo 3.1 cost profile (40 credits) in text-to-video and image-to-video flows.
  • Use veo31_fast when speed and predictable 8s output matter most.
  • Use kling_3_0 for flexible durations, quality modes, and frame controls.
  • Use wan_2_6_v2v for source-video remix workflows.
  • Use hailuo_2_3_* only when you can provide a reference image.
  • Use sora_2_pro_storyboard for multi-shot storyboard planning flows.