Documentation Index
Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt
Use this file to discover all available pages before exploring further.
APIYI supports 400+ mainstream AI models. This page provides detailed model information, pricing, and usage instructions.
Enterprise-grade Professional and Stable AI Large Model API Hub
All models are officially sourced and forwarded, with ~20% off pricing (combining top-up bonuses and exchange rate advantages), aggregating various excellent large models. No speed limits, no expiration, no account ban risks, pay-as-you-go billing, long-term reliable service.
🔥 Currently Recommended Models
The following are currently stably supplied popular models. For complete model list and real-time pricing, visit APIYI Console Pricing Page.
Model Upgrade Recommendations: We recommend using the latest models for best performance, but please note:
- Initial instability is common: Newly launched models may experience slow responses, timeouts, or occasional errors due to limited compute capacity at the vendor — this typically stabilizes within days to weeks
- Check parameter compatibility: New models may introduce or change parameters (e.g.,
max_completion_tokens replacing max_tokens). Before upgrading, verify that your API parameters remain compatible with older models
- Always test before going live: Before deploying a new model to production, thoroughly validate it in a test environment to ensure output quality and API compatibility meet expectations
Model Categories
🤖 OpenAI Series
🆕 Latest Models
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| GPT-5.5 Pro | gpt-5.5-pro | 1M | Strongest reasoning today, Terminal-Bench 2.0 82.7%; /v1/responses endpoint + SVIP group only, very expensive | Top-tier reasoning, research (professional needs) |
| GPT-5.5 🔥 | gpt-5.5 | 1M | SWE-bench Verified 88.7%, hallucinations 60% lower than 5.4, new xhigh reasoning tier | Complex agents, professional workflows |
| GPT-5.4 | gpt-5.4 | 1M | Native computer use, GDPval 83% | Complex agents, professional workflows |
| chat-latest | chat-latest | 400K | Version-less alias, always points to the latest ChatGPT Instant (currently GPT-5.5 Instant) | Quick writing, conversation |
| GPT-5.2 | gpt-5.2 | 400K | GDPval 70.9% surpassing professionals | Programming planning, structured tasks |
| GPT-5.3 Codex 🔥 | gpt-5.3-codex | 128K | SWE-Bench Pro SOTA, complex programming and agent tasks | Complex programming, agent tasks |
| GPT-5.1 | gpt-5.1 | 128K | Intelligence-speed balance, SWE-bench 76.3%, 24h cache | General apps, programming |
GPT Pro series (e.g. gpt-5.5-pro, gpt-5-pro) usage notes:
/v1/responses endpoint only: cannot use /v1/chat/completions — switch your SDK / code to the Responses API before calling
- Very expensive: a single call may cost several dollars, and it is open to the SVIP group only to prevent accidental use on the Default group
- Not recommended for non-professional needs: use GPT-5.5 / GPT-5.4 for everyday tasks; Pro suits only research and top-tier work demanding extreme reasoning depth
✅ Stable / Classic
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| GPT-5 ⭐ | gpt-5 | 128K | Flagship stable version, ultra-strong reasoning | Top-tier reasoning, complex tasks |
| GPT-5 Mini | gpt-5-mini | 128K | GPT-5 lightweight, excellent performance | Balance performance and cost |
| GPT-5 Nano | gpt-5-nano | 128K | GPT-5 ultra-lightweight | Large-scale batch processing |
| o3 ⭐ | o3 | 200K | Reasoning model, significantly price-reduced | Complex reasoning, math, programming |
| o4-mini | o4-mini | 200K | Lightweight reasoning model | Top choice for programming |
| GPT-4.1 ⭐ | gpt-4.1 | 128K | Fast speed, main workhorse | General applications |
| GPT-4.1 Mini | gpt-4.1-mini | 128K | Cheaper lightweight version | Cost-sensitive scenarios |
| GPT-4o | gpt-4o | 128K | Balanced multimodal capabilities | General scenarios |
| GPT-4o Mini | gpt-4o-mini | 128K | Lightweight fast version | Quick response |
GPT-5 Series Usage Notes:
- Temperature parameter
temperature must be set to 1 (only supports 1)
- Use
max_completion_tokens instead of max_tokens
- Do not pass
top_p parameter
🎭 Claude Series (Anthropic)
🆕 Latest Models
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Claude Opus 4.7 🔥 | claude-opus-4-7 | 1M (Beta) | Coding benchmarks +13% over 4.6, 3x on production tasks, tool errors cut to 1/3, new xhigh reasoning tier | Top-tier coding, complex agents |
| Claude Opus 4.7 Thinking 🔥 | claude-opus-4-7-thinking | 1M (Beta) | Adaptive thinking, enhanced deep reasoning | Top-tier reasoning tasks |
| Claude Opus 4.6 | claude-opus-4-6 | 1M (Beta) | Terminal-Bench 2.0 #1, 128K output | Top-tier coding, complex agents |
| Claude Sonnet 4.6 🔥 | claude-sonnet-4-6 | 1M (Beta) | Full upgrade, rivals Opus 4.5, great value | Programming top choice, agent dev |
✅ Stable / Classic
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Claude Opus 4.5 ⭐ | claude-opus-4-5-20251101 | 200K | SWE-bench 80.9%, price reduced to 1/3 | Complex programming, top-tier reasoning |
| Claude Sonnet 4.5 ⭐ | claude-sonnet-4-5-20250929 | 200K | World-class coding, SWE-bench 77.2% | Code generation, agent development |
| Claude Sonnet 4.5 Thinking | claude-sonnet-4-5-20250929-thinking | 200K | Chain-of-thought mode, deep reasoning | Complex programming reasoning |
| Claude Haiku 4.5 ⭐ | claude-haiku-4-5-20251001 | 200K | High cost-performance, SWE-bench 73.3%, 2x speed | Real-time chat, pair programming |
| Claude 4 Sonnet | claude-sonnet-4-20250514 | 200K | Battle-tested, top choice for programming | Code generation, analysis |
| Claude Opus 4.1 | claude-opus-4-1-20250805 | 200K | Iterative upgrade, programming-optimized | High-demand programming tasks |
Latest: Claude Opus 4.7 improves coding benchmarks 13% over 4.6, cuts tool-call errors to 1/3, and adds an xhigh reasoning tier at the same price as 4.6. Sonnet 4.6 rivals Opus 4.5 and is now the default on claude.ai. Stable: Opus 4.5 and Sonnet 4.5 are battle-tested for production. Haiku 4.5 offers 2x speed at great value.
🌟 Google Gemini Series
🆕 Latest Models
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Gemini 3.5 Flash 🔥 | gemini-3.5-flash | 1M | Terminal-Bench 2.1 76.2%, fully surpasses 3.1 Pro, ~4x faster at ~half the price | Programming top choice, cost-performance king |
| Gemini 3.1 Pro Preview 🔥 | gemini-3.1-pro-preview | 1M | ARC-AGI-2 77.1% (2x+ over 3 Pro), most advanced reasoning | Complex reasoning, multimodal analysis |
| Gemini 3 Flash Preview | gemini-3-flash-preview | 1M | SWE-bench 78%, 3x faster, thinking / nothinking variants available | Programming, cost-performance |
| Gemini 3.1 Flash Lite 🔥 | gemini-3.1-flash-lite | 1M | GA version, 64% faster than 2.5 Flash, ultra-low price | High concurrency, batch, low cost |
Note: Gemini 3 Pro Preview was discontinued on March 9, 2026. Please migrate to Gemini 3.1 Pro Preview.
✅ Stable / Classic
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Gemini 2.5 Pro ⭐ | gemini-2.5-pro | 2M | Official release, programming advantage, strong multimodal | Long text, programming, multimodal |
| Gemini 2.5 Flash ⭐ | gemini-2.5-flash | 1M | Fast speed, low cost, official release | Quick response scenarios |
| Gemini 2.5 Flash Lite | gemini-2.5-flash-lite | 1M | Ultra-lightweight, faster and cheaper | Large-scale simple tasks |
Latest: Gemini 3.5 Flash fully surpasses Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas and more, at ~4x speed and ~half the price — today’s cost-performance king. Gemini 3.1 Pro Preview doubles reasoning (ARC-AGI-2 77.1%), Google’s most advanced. Gemini 3.1 Flash Lite is now GA, the cheapest frontier model for high-concurrency. Stable: Gemini 2.5 Pro (2M context) and Gemini 2.5 Flash are GA, ideal for production.
🚀 xAI Grok Series
🆕 Latest Models
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Grok 4.3 🔥 | grok-4.3 | 1M | Intelligence Index 53, τ²-Bench 98%, IFBench 81%, 1M context + multimodal | Complex reasoning, general tasks |
| Grok 4 | grok-4 | Standard | Official version; grok-4-all adds native web search | General tasks, real-time info |
| Grok 4 Fast Reasoning 🔥 | grok-4-fast-reasoning | 200K | Reasoning mode, 93%+ cheaper than Grok-4 | Complex reasoning |
| Grok Code Fast 1 ⭐ | grok-code-fast-1 | 256K | SWE-bench 70.8%, high-speed generation | Code generation, agent programming |
✅ Stable / Classic
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Grok 3 ⭐ | grok-3 | Standard | Official stable version | Daily use |
| Grok 3 All | grok-3-all | Standard | Native web search enhanced | News, market analysis |
| Grok 3 Mini | grok-3-mini | Standard | Small model with reasoning | Lightweight tasks |
🔍 DeepSeek Series
🆕 Latest Models
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| DeepSeek V4 Pro 🔥 | deepseek-v4-pro | 1M | 1.6T/49B activated, SWE-Verified 80.6 near Claude/Gemini, Hybrid Attention | Complex reasoning, coding, agents |
| DeepSeek V4 Flash 🔥 | deepseek-v4-flash | 1M | 284B/13B activated, just $0.14/M input, open-source SOTA value | High concurrency, batch |
| DeepSeek V3.2 | deepseek-v3.2 | 128K | GPT-5 level, tool-use in reasoning | Complex reasoning, coding |
✅ Stable / Classic
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| DeepSeek V3.1 ⭐ | deepseek-v3-1-250821 | 128K | Mixed reasoning, Think/Non-Think dual modes | Intelligent reasoning, programming |
| DeepSeek R1 | deepseek-r1 | 64K | Reasoning model | Math, reasoning |
| DeepSeek V3 | deepseek-v3 | 128K | Strong comprehensive capabilities | General scenarios |
🐘 Chinese Model Series
Zhipu AI (GLM)
🆕 Latest: GLM-5.1 | ✅ Stable / Classic: GLM-5, GLM-4.6
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| GLM-5.1 🔥 | glm-5.1 | 200K | SWE-Bench Pro 58.4 beats GPT-5.4 / Opus 4.6 / Gemini 3.1 Pro, 744B MoE, MIT open-source | Complex coding, agents |
| GLM-5 ⭐ | glm-5 | 200K | 744B params (40B activated), coding aligned with Claude Opus 4.5, open-source | Complex coding, systems engineering |
| GLM-4.6 | glm-4.6 | 200K | Code and reasoning enhanced, stable | Programming, reasoning, agents |
| GLM-4.5 | glm-4.5 | 128K | Standard version, strong overall | General scenarios |
GLM-5.1 Features:
- 744B MoE params, supports long-horizon agent tasks up to 8 hours
- SWE-Bench Pro 58.4, strongest coding among open-source models
- MIT licensed open-source, excellent value
Alibaba Qwen
🆕 Latest: Qwen3.7-Max | ✅ Stable / Classic: Qwen Max, Plus, Turbo
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Qwen3.7-Max 🔥 | qwen3.7-max | 1M | AA Intelligence Index 56.6 (global top 5, #1 in China), 35-hour long-horizon agent autonomy | Agents, multilingual, long text |
| Qwen Max ⭐ | qwen-max | 32K | Strongest stable version | General tasks |
| Qwen Plus | qwen-plus | 32K | Enhanced version | Cost-effective |
| Qwen Turbo | qwen-turbo | 32K | Fast version | Low latency |
Moonshot Kimi Series
🆕 Latest: Kimi K2.6 | ✅ Stable / Classic: Kimi K2.5, K2
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| Kimi K2.6 🔥 | kimi-k2.6 | 256K | 1T MoE / 32B activated, SWE-Bench Pro 58.6 surpasses GPT-5.4 and Opus 4.6 | Coding, agents |
| Kimi K2.5 | kimi-k2.5 | 200K | Native multimodal, Agent Swarm 100 agents | Multimodal, agents |
| Kimi K2 Official Release ⭐ | kimi-k2-250711 | 200K | Volcano Engine partnership, strong stability | Production environments |
🌐 MiniMax Series
🆕 Latest: MiniMax M2.7 | ✅ Stable / Classic: MiniMax M2.5
| Model Name | Model ID | Context Length | Features | Recommended Scenarios |
|---|
| MiniMax M2.7 🔥 | MiniMax-M2.7 | Standard | 10B params, SWE-bench Pro 56.22%, self-evolving, smallest Tier-1 model | Coding, agents |
| MiniMax M2.5 | minimax-m2.5 | Standard | 230B (10B activated), SWE-bench 80.2%, great value | Coding, agents, office automation |
MiniMax M2.7 Features:
- Reaches SWE-bench Pro 56.22% with just 10B params, the smallest Tier-1 model
- Self-evolving; standard $0.3 / highspeed (
MiniMax-M2.7-highspeed) $0.6 per 1M input tokens
- Open-sourced model weights
Billing Methods
- Pay-as-you-go: Charged based on actual Token usage
- No minimum charge: Use what you pay for, balance never expires
- Real-time deduction: Fees deducted from balance immediately after each call
Pricing Advantages
- Official source forwarding with slight price advantages
- Bulk users can contact customer service for better pricing
- New users get 3 million tokens testing credit upon registration
View Real-time Pricing
Visit APIYI Console Pricing Page to view latest pricing for all models.
🛠️ Usage Recommendations
Model Selection Guide
Programming Development
- Top performance: Claude Opus 4.7 (+13% coding over 4.6), GPT-5.5 (SWE-bench 88.7%), Claude Sonnet 4.6 (rivals Opus 4.5)
- High cost-performance: Gemini 3.5 Flash (surpasses 3.1 Pro at ~half price), GLM-5.1 (SWE-Bench Pro 58.4), Kimi K2.6, DeepSeek V4 Flash
- Alternatives: DeepSeek V4 Pro, Qwen3.7-Max, MiniMax M2.7, o4-mini
Text Creation
- Top choice: GPT-5.5, GPT-5.4, Gemini 3.1 Pro Preview, Claude Opus 4.7, Claude Sonnet 4.6
- Alternatives: chat-latest, Claude Sonnet 4.5, GPT-4.1, GPT-4o, Claude Haiku 4.5, GLM-4.6
Quick Response
- Top choice: Gemini 3.5 Flash (~4x speed), Claude Haiku 4.5 (2x faster), GPT-4o Mini
- Alternatives: Gemini 3.1 Flash Lite, Gemini 2.5 Flash, Grok 4 Fast, GPT-4.1 Mini
Image Generation
- Latest recommendation: GPT Image 1.5 (4x speed boost, precise editing, from $0.01)
- Professional design: SeeDream 4.5 (1.2B parameters, 4K quality, $0.035/image), Nano Banana Pro (4K HD, best text rendering)
- High cost-performance: Nano Banana ($0.025/image), SeeDream 4.0 ($0.025/image)
- Reverse-engineered, cheapest: sora_image, gpt-4o-image
Long Text Processing
- Top choice: Gemini 2.5 Pro (2M context)
- Alternatives: Claude 4 series (200K context)
Cost Optimization Recommendations
- Tiered Usage: Use cheaper models for simple tasks, advanced models for complex tasks
- Test Optimization: Test with small models first, use large models after determining needs
- Batch Processing: Choose Nano or Mini versions for large volumes of similar tasks
- Cache Reuse: Cache results for repeated queries
Model list is continuously updated. We will promptly add newly released excellent models. For specific model needs or bulk requirements, please contact customer service.