Documentation Index
Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
gpt-image-2 is OpenAI’s latest flagship image generation model — the upgrade togpt-image-1.5. Core upgrades: any valid resolution (incl. 2K / 3840×2160 4K), auto high-fidelity on reference images, 20-30% cheaper at the same tier. APIYI’s gateway is fully compatible with the OpenAI Images API — point the official OpenAI SDK’s base_url here for zero-code direct connection.
Text-to-Image API
/v1/images/generations — generate images from text prompts with size / quality / output_format control.Image Edit API
/v1/images/edits — multipart upload of reference images (up to 16) + edit/fusion instructions, with mask inpainting support.Why Choose APIYI’s GPT-image-2 Official Relay?
Built on OpenAI’s official channel, deeply optimized for enterprise production workloads across reliability, cost, and integration experience:Official Channel · Same as Official
No Concurrency Limits
Same Price + Up to 15% Off
Global Zero-Barrier Access
api.apiyi.com from domestic data centers, home broadband, or overseas nodes — stable latency, no cross-border re-architecture.Full Model Lineup
gpt-image-2-all ($0.03/image flat), or the cost-leader Nano Banana Pro / 2 — mix and match per scenario.Professional Enterprise Support
Core Features
Any Resolution (incl. 4K)
Auto High-Fidelity
input_fidelity (will error).20-30% Cheaper
Chinese + Text Rendering
high quality.Multi-Image Fusion (up to 16)
image[] array accepts up to 16 reference images. Use “image 1 / image 2 / image 3” in the prompt to reference them by upload order.Mask Inpainting
Multiple Output Formats
output_compression for jpeg/webp to control file size.OpenAI SDK Direct
base_url to https://api.apiyi.com/v1 and call directly with the official OpenAI SDK — zero-code migration.Pricing
Token-metered (sum of input text + input image + output image tokens). Official per-image pricing reference:| Quality | 1024×1024 | 1024×1536 | 1536×1024 |
|---|---|---|---|
| Low | $0.006 | $0.005 | $0.005 |
| Medium | $0.053 | $0.041 | $0.041 |
| High | $0.211 | $0.165 | $0.165 |
- 2K / 4K has no fixed per-image price — billed by actual input + output tokens
- Edit requests have noticeably higher input tokens than text-to-image due to forced high-fidelity
- Streaming (
stream: true+partial_images: N) costs an extra 100 output image tokens per partial - Compared to
gpt-image-1.5at the same size and quality,gpt-image-2is about 20-30% cheaper
Group Setup
Thegpt-image-2 official-relay channel offers two groups. Switch in dashboard → Token Settings → Group:
| Group | Rate | When to use |
|---|---|---|
Default | 1.0x | Same price as OpenAI’s list — first choice when capacity is available; peak hours may see 429 / concurrency squeezes |
image2Enterprise | 1.2x | Stable fallback when the default group is tight — capacity-prioritized |
image2Enterprise to ride out the spike.

Technical Specifications
| Dimension | Value |
|---|---|
| Model name | gpt-image-2 |
| Speed | ~120 seconds (4K high quality approaches 2 min) |
| Output resolution | Any valid size (1K/2K/4K, max 3840×2160) |
| Quality tiers | auto / low / medium / high |
| Output formats | png (default) / jpeg / webp |
| Chinese prompts | ✅ Native |
| Per call | 1 image (n=1) |
| Reference image limit | 16 (image[]) |
| Mask inpainting | ✅ Supported (alpha channel required) |
| Transparent background | ❌ Not supported (background: transparent errors) |
| Response field | b64_json (raw base64, no prefix) |
Endpoints
| Endpoint | Purpose | Content-Type |
|---|---|---|
POST /v1/images/generations | Text-to-image | application/json |
POST /v1/images/edits | Reference editing / multi-image fusion / mask inpainting | multipart/form-data |
Size Reference
Preset Sizes
| size | Meaning | Pixels |
|---|---|---|
auto | Adaptive (default) | Model decides |
1024x1024 | Square 1:1 | 1K |
1536x1024 | Landscape 3:2 | 1K |
1024x1536 | Portrait 2:3 | 1K |
2048x2048 | Square 1:1 | 2K |
2048x1152 | Landscape 16:9 | 2K |
3840x2160 | Landscape 16:9 | 4K |
2160x3840 | Portrait 9:16 | 4K |
Custom Size Constraints
gpt-image-2 accepts any valid size that satisfies all of:
- Max edge ≤ 3840px
- Both edges are multiples of 16
- Aspect ratio ≤ 3:1
- Total pixels ∈ [655,360, 8,294,400] (~0.65MP to ~8.3MP)
1600x1200, 1792x1024, 2048x1536, 3200x1800
Invalid examples: 1000x1000 (not multiple of 16), 4000x4000 (over max), 3840x1000 (ratio > 3:1)
Best Practices
Integrate with low first
quality=low + a preset size to validate the full call chain (auth, params, timeouts, error handling). low is several times faster than high, so functional issues surface quickly without being masked by long latency.Prefer preset sizes
Match quality to scenario
low; daily / final → medium; text, fine textures, print → high. Note that low ↔ high is more than visual fidelity — it’s also a step change in inference complexity, so latency scales accordingly.Choose JPEG output
output_format=jpeg + output_compression=85 is faster than PNG and roughly half the size.Lock high for text scenarios
quality=high for signage and poster scenarios.Prepare reference images
Tier your client timeout (high → 600s safety net)
quality and size — especially quality. Configure client timeouts per tier:| quality | Recommended client timeout | Observed latency |
|---|---|---|
low | ≥ 120 seconds | typically 10–40 seconds |
medium | ≥ 240 seconds | typically 30–90 seconds |
high | ≥ 600 seconds (safety net) | 2K/4K runs 3–5 minutes; long tail observed at 235+ seconds |
high mode, set 600s as the safety-net timeout to absorb queueing, long-tail variance, and upstream jitter. Show progress in the UI; consider a task queue server-side.Errors & Retries
| Status | Meaning | Suggested action |
|---|---|---|
400 | Invalid parameters (size constraint violation, unsupported field, etc.) | Validate against size constraints; do not pass input_fidelity / background: transparent |
401 | Invalid token | Check Bearer Token |
403 | Content moderation block | Adjust prompt or pass moderation: low |
429 | Rate limit / insufficient balance | Exponential backoff |
5xx | Gateway / backend error | Retry 1–2 times |
| Timeout | Long tail | Tier client timeout by quality: low ≥ 120s / medium ≥ 240s / high ≥ 600s (high + 2K/4K runs 3–5 minutes; long tail observed at 235+ seconds) |
- Tier request timeout by
quality:low≥ 120 seconds /medium≥ 240 seconds /high≥ 600 seconds (safety net — observed 3–5 minutes; configuring around 120s/360s causes many false timeouts) - Integrate with
quality=lowfirst, then move up tomedium/highas real quality needs demand - Exponential backoff for 5xx and timeouts (suggest 2 retries)
- Log
x-request-idheader for support
FAQ
Do I need to add the data:image/png;base64, prefix to b64_json?
Do I need to add the data:image/png;base64, prefix to b64_json?
gpt-image-2 returns a raw base64 string (no prefix), unlike gpt-image-2-all. Two client patterns:- Write file:
base64.b64decode(b64_str)→ write to disk - Browser render:
img.src = 'data:image/png;base64,' + b64_str(prepend manually)
Why does passing input_fidelity return 400?
Why does passing input_fidelity return 400?
gpt-image-2 forces high-fidelity processing of reference images and no longer accepts input_fidelity. When migrating from 1.5, just remove this field — no replacement needed.What if I need a transparent background?
What if I need a transparent background?
gpt-image-2 does not support background: transparent (will error). Two workarounds:- Set
backgroundtoopaque(or omit) and key out transparency yourself with PIL / sharp / online tools - Temporarily fall back to
gpt-image-1.5for scenarios that genuinely need transparency
How many images per call?
How many images per call?
n=1). For N images, issue N parallel requests. Each is independently token-billed.Why is 2K/4K so slow?
Why is 2K/4K so slow?
quality=high + high resolution take ≈ 235 seconds (~4 minutes) per image in real customer integrations, and 3840×2160 + high long-tail can stretch close to 5 minutes. Recommendations:- Integrate with
quality=lowfirst to validate the call chain, then move up as real quality needs demand - Tier client timeout by quality:
low≥ 120s /medium≥ 240s /high≥ 600s (safety net) - Show “generating” progress in the UI
- Use 1024×1024 / 1536×1024 1K presets when 4K isn’t needed
Why are edit requests more expensive than text-to-image?
Why are edit requests more expensive than text-to-image?
gpt-image-2 auto-enables high-fidelity processing of reference images, the references themselves convert to large input token counts via the Vision pricing rules. Edit input tokens are noticeably higher than text-to-image — budget accordingly.How do I prepare a mask file?
How do I prepare a mask file?
- Same size and format as the original, ≤ 50MB
- Must have alpha channel: transparent (alpha=0) = inpaint area, opaque = preserve
- Only applies to the first image
- Mask is a “soft guide” — the model may extend or contract around the masked region
gpt-image-2 vs gpt-image-2-all: which to pick?
gpt-image-2 vs gpt-image-2-all: which to pick?
| Pick | When |
|---|---|
| gpt-image-2 (Official) | Need precise size/quality control, must match OpenAI official exactly, want 4K output, need mask inpainting |
| gpt-image-2-all (Reverse) | Want flat $0.03/image, 30–60s render, minimal parameters, strong consistency / Chinese text |
Can I use the official OpenAI SDK directly?
Can I use the official OpenAI SDK directly?
base_url to https://api.apiyi.com/v1 and set api_key to your APIYI token:Can I cancel a generation in progress?
Can I cancel a generation in progress?
gpt-image-2 uses OpenAI’s official synchronous endpoint — once a request is submitted, it runs to completion with no “cancel” signal. Even if the client disconnects, the server still finishes generation and bills normally. Configure client-side timeouts carefully — do not assume “disconnect = no charge”.Is there a rate limit (RPM)?
Is there a rate limit (RPM)?
Does it support async invocation?
Does it support async invocation?
gpt-image-2 strictly mirrors the OpenAI official API — synchronous only. The request blocks until the result is returned (high + 4K realistically 1–2 minutes). If you need an async queue or callback mechanism:- Wrap it yourself with a task queue (Celery / BullMQ, etc.) at the business layer
- Or use
gpt-image-2-all— generates in 30–60s, easier to poll from the front end
Do failed generations get billed?
Do failed generations get billed?
400 error, and no charge is incurred. Typical response:401 (invalid token), 429 (rate limit). Token billing only kicks in once the request actually reaches the model generation stage (i.e., 200 + b64_json received).Related Docs
- ⚖️ Official vs Reverse Comparison - Side-by-side selection guide
- Text-to-Image Playground -
/v1/images/generationsinteractive testing - Image Edit Playground -
/v1/images/editsmulti-image fusion + mask - Deep Dive: gpt-image-2 Launch - News article
- Full Integration Doc - Complete API reference
- GPT-Image-2-All (Reverse-Engineered) - Cheaper, faster alternative
- Community: Luck GPT-Image 2 ComfyUI Nodes - Call
gpt-image-2directly in ComfyUI (mask / 5 reference images / custom sizes) - Community: APIYI GPT-Image 2 Skills - Invoke from Codex CLI / Cursor / Gemini CLI and other AI coding tools with one sentence
- API Manual - General usage guide
gpt-image-2 is OpenAI’s official flagship, billed by token. If you prioritize flat pricing ($0.03/image) and faster generation (30–60s), see gpt-image-2-all.