VEO 3.1 Video Generation - API易文档中心

Overview

VEO 3.1 is Google’s flagship AI video generation series, producing video with synchronized audio natively — fixed 8-second clips from text prompts or reference images. APIYI exposes VEO 3.1 through a reverse-engineered channel that proxies Google Flow, billed per-clip with both synchronous streaming and async task modes.

🎬 Highlights: Native synchronized audio + video output, fixed 8-second clips, Frame-to-Video creative mode, HD portrait/landscape, dramatically lower pricing than Google official (from $0.15), and live progress streaming. Best for short-form video, ad clips, product demos, and social-media assets in high-throughput production scenarios.

Sync API

POST /v1/chat/completions, reuses the OpenAI Chat Completions protocol with stream: true for live progress.

Async API

POST /v1/videos three-step async flow, supports text-to-video and Frame-to-Video uploads — built for batch management.

Why APIYI’s VEO 3.1?

VEO 3.1 is delivered through a reverse-engineered channel (transparent proxy to Google Flow), optimized for production scenarios across price, integration friction, and feature completeness:

Price Killer · Far Below Official Pricing

Starts at $0.15 per 8-second clip — over 80% cheaper than Google’s official pricing. No need to provision Google Cloud / Vertex AI accounts; per-clip billing is fully transparent.

Unlimited Concurrency · Production Scale

APIYI maintains a transparent account pool — linearly scale batch shoots, short-form video matrices, and ad pipelines. No Google account tier ceilings.

Same Per-Clip Pricing + Top-Up Bonuses

Stack top-up bonuses for further savings. Failed generations are not billed — settlement is by successful results only.

Global Zero-Friction Access

No overseas server or proxy required — connect to api.apiyi.com directly from Mainland China data centers, residential networks, or overseas nodes. Skip the Google Flow cross-border setup entirely.

OpenAI-Compatible · Dual-Mode Access

Sync uses /v1/chat/completions (same as chat models); async uses /v1/videos (OpenAI Video API style). Both protocols drop into your existing SDK / engineering code with zero changes.

Professional Support · Enterprise Onboarding

Our team has deep video-generation expertise: prompt engineering, Frame-to-Video reference prep, batch production, and post-processing. Full PoC-to-production technical support for enterprise customers.

Key Features

Native Synchronized Audio

VEO 3.1 outputs video with synchronized native audio (ambient sound, dialogue, score) generated alongside the visuals — no separate audio post-production needed.

Generation Speed Leader

-fast series in 30–60 seconds, standard series in 1–2 minutes — 50% faster than Sora 2, ideal for high-throughput content production.

Frame-to-Video Creative Mode

-fl suffix models accept 1 reference image (start frame) or 2 (start + end frames) to animate static visuals or generate seamless transitions between two frames.

Portrait / Landscape Switching

Portrait 720×1280 (social-media short-form) and landscape 1280×720 (ads, demos) — toggled via the -landscape model suffix.

Live Streaming Progress

Sync mode (/v1/chat/completions + stream: true) returns real-time > 🏃 Progress: XX% text fragments — your frontend can render a progress bar directly.

Async Task Model

Async mode returns a video_id for independent polling and download — ideal for batch management, resume-on-failure, and long-running background jobs.

Pay on Success

Failed generations / content-policy rejections / capacity errors are not billed — you only pay for the videos you actually receive.

Multi-Video Parallel (n parameter)

Sync mode n parameter generates up to 4 different videos per request (same prompt, multiple results) for variety selection.

Pricing

Billed per clip (each clip is a fixed 8-second video). Only successfully generated videos are billed — failed tasks are free.

HD Series (720p, Live)

Model	Description	Resolution	Price
`veo-3.1`	Default portrait	720×1280	$0.25
`veo-3.1-fl`	Portrait + Frame-to-Video	720×1280	$0.25
`veo-3.1-fast`	Portrait + fast	720×1280	$0.15
`veo-3.1-fast-fl`	Portrait + fast + Frame-to-Video	720×1280	$0.15
`veo-3.1-landscape`	Landscape	1280×720	$0.25
`veo-3.1-landscape-fl`	Landscape + Frame-to-Video	1280×720	$0.25
`veo-3.1-landscape-fast`	Landscape + fast	1280×720	$0.15
`veo-3.1-landscape-fast-fl`	Landscape + fast + Frame-to-Video	1280×720	$0.15

4K Series (Rolling Out)

4K HD variants are rolling out. Model variants will cover the same matrix (portrait / landscape × standard / fast × text-to-video / Frame-to-Video), with naming following the HD series convention. Per-clip pricing will be added to this table once finalized; enterprise customers with batch needs can contact sales for early access.

Billing notes:

Per-clip billing: Each 8-second video is a fixed unit price, independent of prompt length, reference images, or n (n=2 means billed for 2 clips)
Failures are free: Tasks ending in failed / content-policy rejection / gateway errors are not billed — retry safely
Top-up bonuses: See Top-Up Promotions

Technical Specs

Dimension	Spec
Base model name	`veo-3.1` (HD) / 4K series TBD
Variant axes	Orientation (portrait/landscape) × Speed (standard/fast) × Mode (text-only / Frame-to-Video `-fl`)
Video duration	Fixed 8 seconds (not adjustable)
HD resolutions	Portrait 720×1280, landscape 1280×720
4K resolutions	Rolling out, specs TBD
Audio track	✅ Synchronized native audio
Frame-to-Video (-fl)	✅ Models with `-fl` suffix; 1 image (start frame) or 2 images (start + end)
Sync generation time	`-fast` series 30–60 sec, standard series 1–2 min
Sync progress streaming	✅ `/v1/chat/completions` + `stream: true`
Async polling	✅ `/v1/videos` + task ID + `/content` download
`n` parameter	Sync mode max 4 per request (async mode recommended at 1)
Video URL TTL	24 hours

API Endpoints

Endpoint	Method	Purpose	Content-Type
`/v1/chat/completions`	POST	Sync streaming generation (recommended for real-time UX)	`application/json`
`/v1/videos`	POST	Async task: submit text-to-video or Frame-to-Video	`application/json` or `multipart/form-data`
`/v1/videos/{video_id}`	GET	Async poll task status	—
`/v1/videos/{video_id}/content`	GET	Async download video URL	—

Domain options: api.apiyi.com is the primary endpoint. vip.apiyi.com / b.apiyi.com are equivalent backup gateways with identical behavior.

Getting Started

Token Group

VEO 3.1 runs on APIYI’s default group — no separate group switch or application required. Just create a token under the default group from the console’s Token Management page; both Pay-as-you-go (priority) and Per-call billing modes work out of the box.

Online Playground: iCover AI

Want to try VEO 3.1 before writing any code? Use APIYI’s official video-generation testing site, iCover AI:

URL: icover.ai/zh/veo
How to use: paste a token from the default group (Pay-as-you-go or Per-call) — text-to-video and Frame-to-Video modes both work directly
Background: iCover AI is the official AI video-generation playground operated by APIYI. It shares the same backend with the production API, so what you see in the playground is exactly what production calls will return.

Use iCover AI to dial in your prompt first, then port the call into your code via Sync API or Async API.

Key Parameters

Model Variant Naming Rules

VEO 3.1 toggles capabilities via model name suffixes — not separate parameters:

Suffix	Effect	Default (no suffix)
`-landscape`	Landscape (1280×720)	Portrait (720×1280)
`-fast`	Fast tier (speed-first, lower price)	Standard tier
`-fl`	Frame-to-Video (requires uploaded image)	Pure text-to-video

Combination examples:

veo-3.1 — Standard portrait text-to-video (default)
veo-3.1-landscape-fast — Fast landscape text-to-video (best value)
veo-3.1-landscape-fl — Standard landscape Frame-to-Video
veo-3.1-landscape-fast-fl — Fast landscape Frame-to-Video (cheapest image-to-video)

-fl models require input_reference image upload, otherwise you get an error; pure text-to-video must not use the -fl suffix
Async Frame-to-Video requests must use multipart/form-data (not JSON); upload 1 image for start frame, 2 for start + end
Combining 4 axes yields 8 HD model IDs total — suffix order is fixed: landscape → fast → fl

`n` (Number of Videos per Sync Request)

Range: 1 to 4, default 1
Only the sync mode (/v1/chat/completions) supports n; async mode ignores it
Billed per video (n=2 means billed for 2 clips)

Best Practices

Validate prompts with -fast first

Run each new prompt at veo-3.1-fast or veo-3.1-landscape-fast first ($0.15, 30–60 seconds), then switch to standard tier for the final asset.

Pick orientation by use case

Social-media short-form (TikTok, Reels) → portrait (no -landscape)
YouTube / ads / product demos → landscape (-landscape)

Sync vs async by need

Need live progress feedback to users → sync streaming (/v1/chat/completions + stream: true)
Background batch processing or long tasks → async task model (/v1/videos + polling)
Details: Sync API / Async API

Frame-to-Video prompts focus on "motion"

-fl models already define visuals (start frame or start+end frames). The prompt should focus on how the image animates: camera motion, object motion, lighting changes, character expressions. Example: "Camera slowly pushes in, leaves gently swaying, sunlight flickering through branches".

Frame-to-Video shines for "transitions"

The strongest Frame-to-Video use case is smooth transitions between two frames (day → night, season changes, expression shifts, object morphing). Describe the transition process and motion changes — no need to detail visuals.

Client timeout ≥ 2 minutes

Sync streaming holds the connection until generation completes (-fast ≈ 60 sec, standard ≈ 2 min) — set client timeout to 120 seconds minimum. Async POST submission is sub-second, but use 30 seconds as a baseline.

Download videos immediately

Video URLs expire in 24 hours. Production flows must download to your own OSS / CDN as soon as completed to avoid expired links.

Run multiple tasks via n or parallel POSTs

Same prompt, multiple variants → use n: 4 for 4 results in one call
Different prompts in batch → submit multiple async POSTs, each with an independent video_id, then poll independently

Error Codes & Retries

Status	Meaning	Recommended Action
`400`	Invalid parameters (model name doesn’t exist, `-fl` missing image, `n` out of range)	Validate parameters; Frame-to-Video must use multipart upload
`401` / `invalid_api_key`	Invalid API Key	Check Bearer Token; verify console group setting
`403`	Content-policy rejection	Adjust prompt; ensure reference images are non-sensitive
`429` / `quota_exceeded`	Rate limit / quota exceeded / insufficient balance	Exponential backoff; contact sales for higher quota
`5xx`	Gateway / upstream error	Retry async tasks 1–2 times (no charge)
Task `failed`	Generation failed (mostly content policy or upstream capacity)	See “Content-policy errors” section below; adjust prompt and retry; failed task is not billed
`video_not_found`	video_id doesn’t exist or has expired	Verify ID; query within 24 hours

Content-policy errors (`PUBLIC_` prefix)

Any failed task whose error.message / fail_reason starts with PUBLIC_ comes from upstream Google Flow’s official content policy — your prompt, reference image, or generated output triggered Google’s safety filter. It has nothing to do with the APIYI gateway, and these tasks are not billed, so you can safely retry after adjusting.

Error code	Meaning
`PUBLIC_ERROR_AUDIO_FILTERED`	Audio track was filtered (sensitive utterances, certain dialog languages, copyrighted audio, etc.)
`PUBLIC_ERROR_PROMINENT_PEOPLE_FILTER_FAILED`	Hit the public-figure filter (prompt or reference image involves a real well-known person)
Other `PUBLIC_ERROR_*`	Same family — upstream content policy rejection; the field name itself indicates the trigger

How to handle:

Rewrite the prompt: remove personal names, brands, sensitive terms; if there’s spoken dialog, switch to a generic description (e.g., “the character speaks calmly”).
Swap reference images: avoid using real people (especially celebrities) as start/end frames.
Retry is free: these tasks are not billed, retry freely after adjusting.

Sample failed-task JSON:

{
  "task_id": "video_693742f8-45c9-4608-85e0-1c4b3dea97eb",
  "object": "task",
  "task_type": "sora2_video_generation",
  "model_name": "veo-3.1-fl",
  "platform": "openai",
  "status": "failed",
  "progress": "100%",
  "fail_reason": "PUBLIC_ERROR_AUDIO_FILTERED",
  "error": {
    "message": "PUBLIC_ERROR_AUDIO_FILTERED",
    "type": "task_failed"
  },
  "data": {
    "id": "video_693742f8-45c9-4608-85e0-1c4b3dea97eb",
    "object": "video",
    "model": "veo-3.1-fl",
    "size": "720x1280",
    "status": "failed",
    "error": {
      "code": "",
      "message": "PUBLIC_ERROR_AUDIO_FILTERED"
    }
  }
}

Recommended client config:

Sync request timeout: 120 seconds baseline (standard tier); -fast can drop to 60 seconds
Async POST submission timeout: 30 seconds; GET polling interval 5–10 seconds, max wait 10 minutes
Exponential backoff retries on 5xx and failed tasks (recommend 2 retries)
Log the x-request-id response header for debugging

FAQ

Is VEO 3.1 official-relay or reverse-engineered? Is an official channel available?

Reverse-engineered. VEO 3.1 is delivered through APIYI’s transparent account pool to Google Flow — pricing is dramatically lower than Google’s official Veo Studio rates, billed per clip with failures not billed. No official-relay channel currently — once Google’s official Vertex AI Veo API becomes generally available, we’ll evaluate adding it and update this page accordingly.

VEO 3.1 vs Sora 2 — which should I choose?

Dimension	VEO 3.1	Sora 2 (Official)
Price	$0.15–$0.25 / 8 sec (per clip)	$0.40–$8.40 / 4–12 sec (per second)
Duration	Fixed 8 sec	4 / 8 / 12 sec
Generation time	30 sec – 2 min	3–10 min
Audio	✅ Native sync	✅ Native sync
Frame-to-Video	✅ `-fl` series	✅ `input_reference` single image
Stability	Reverse-engineered, subject to risk control	Official 99.99%
Resolution	720p (4K rolling out)	720p / 1024p / 1080p

Pick VEO for fast, cheap, batch use cases; pick Sora 2 Pro for highest quality and stability. See the Sora 2 Overview.

Why is the video duration fixed at 8 seconds? Can I extend it?

Google Flow upstream itself only exposes 8-second fixed duration — there’s currently no parameter to adjust length. For longer videos, chain Frame-to-Video clips: generate multiple 8-second segments with -fl models using each clip’s end frame as the next clip’s start frame, then stitch with ffmpeg.

How do I choose between standard and -fast?

Highest quality / hero assets → standard (veo-3.1 / veo-3.1-landscape), $0.25/clip
Volume / experimentation / internal preview → fast (-fast suffix), $0.15/clip, faster
Quality difference between fast and standard is small — fast tier is sufficient for most production use cases

How do Frame-to-Video (-fl) models work?

-fl series requires input_reference image upload:

1 image → start-frame mode: image becomes the video’s opening, AI generates subsequent frames
2 images → start + end mode: first image opens, second image closes, AI generates the transition

Must use multipart/form-data (not JSON). See Async API - Frame-to-Video.

Are failed generations billed?

No. VEO 3.1 bills by successful results: tasks that end in failed, content-policy rejections, gateway 5xx errors, and parameter errors are all not billed. Only videos that actually complete (with a returned URL) are billed.

How long are video URLs valid?

24 hours. Download to your own OSS / CDN immediately after generation completes to avoid losing access.

How do I read progress in sync streaming mode?

/v1/chat/completions + stream: true returns SSE format with progress text in each chunk:

data: {"choices":[{"delta":{"content":"> 🏃 Progress: 45.0%\n\n"}}]}
...
data: {"choices":[{"delta":{"content":"> ✅ Video 1 complete, [click here](https://.../xxx.mp4) to view~~~\n\n"}}]}
data: [DONE]

Frontend just needs to parse “progress” and the video URL out of delta.content. Full example in Sync API.

Which image formats are supported? Reference image size limits?

-fl models accept jpeg / png for input_reference, recommended size ≤ 5 MB per image. No strict resolution requirement (unlike Sora 2), but the image aspect ratio should match the target video orientation: portrait video → portrait image, landscape → landscape; otherwise the AI will auto-crop.

Can I use the official OpenAI SDK?

Yes. Sync mode is fully OpenAI Chat Completions-compatible:

from openai import OpenAI
client = OpenAI(api_key="sk-your-key", base_url="https://api.apiyi.com/v1")
resp = client.chat.completions.create(
    model="veo-3.1-fast",
    messages=[{"role": "user", "content": "A cat flying in the sky"}],
    stream=True,
    n=1
)
for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Async mode also works via client.videos.create(), but Frame-to-Video must use raw requests for multi-file upload (the OpenAI SDK only handles single-file uploads natively).

Can I run multiple tasks in parallel? What are the rate limits?

Yes. Each POST /v1/videos returns an independent video_id. Submit and poll in parallel. Default quota covers most business needs; for enterprise batch use cases (>10 concurrent, >100 clips per day), contact sales for a dedicated resource pool.

Can I cancel a running task?

No. There’s no cancel endpoint currently — once submitted, a task runs to completion. Validate prompts at -fast first to avoid wasting standard-tier runs.

Can I disable the audio track?

Not currently. VEO 3.1 outputs synchronized audio by default and Google does not expose a parameter to disable it. For audio-free output, strip with ffmpeg after download: ffmpeg -i input.mp4 -an output.mp4.

When does the 4K version launch? What's the price?

The 4K series is in gradual rollout, with model variants following the HD naming convention (covering portrait / landscape × fast / standard × Frame-to-Video). Final per-clip pricing will be reflected in the pricing table above once confirmed; enterprise customers with batch needs can contact sales for early access.

Sync API — /v1/chat/completions + stream: true live streaming, text-to-video + Frame-to-Video samples
Async API — /v1/videos three-step async flow, Frame-to-Video upload, full Python client example
Sora 2 Video Generation — OpenAI official-relay channel comparison
Top-Up Promotions — Bonus tiers and applicable channels
API Manual — General request, timeout, and retry guidance
Google official Veo introduction: deepmind.google/technologies/veo/

VEO 3.1 on APIYI is delivered through a Google Flow reverse-engineered channel for high-value-for-money video generation — leading speed and dramatically lower pricing than official. Two call modes (sync streaming, async task) accommodate different scenarios and integrate seamlessly with your existing OpenAI SDK / engineering code. Open a ticket from your console for any feedback.

​Overview

Sync API

Async API

​Why APIYI’s VEO 3.1?

Price Killer · Far Below Official Pricing

Unlimited Concurrency · Production Scale

Same Per-Clip Pricing + Top-Up Bonuses

Global Zero-Friction Access

OpenAI-Compatible · Dual-Mode Access

Professional Support · Enterprise Onboarding

​Key Features

Native Synchronized Audio

Generation Speed Leader

Frame-to-Video Creative Mode

Portrait / Landscape Switching

Live Streaming Progress

Async Task Model

Pay on Success

Multi-Video Parallel (n parameter)

​Pricing

​HD Series (720p, Live)

​4K Series (Rolling Out)

​Technical Specs

​API Endpoints

​Getting Started

​Token Group

​Online Playground: iCover AI

​Key Parameters

​Model Variant Naming Rules

​n (Number of Videos per Sync Request)

​Best Practices

​Error Codes & Retries

​Content-policy errors (PUBLIC_ prefix)

​FAQ

​Related Docs

Overview

Why APIYI’s VEO 3.1?

Key Features

Pricing

HD Series (720p, Live)

4K Series (Rolling Out)

Technical Specs

API Endpoints

Getting Started

Token Group

Online Playground: iCover AI

Key Parameters

Model Variant Naming Rules

`n` (Number of Videos per Sync Request)

Best Practices

Error Codes & Retries

Content-policy errors (`PUBLIC_` prefix)

FAQ

Related Docs