Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt

Use this file to discover all available pages before exploring further.

APIYI supports 400+ mainstream AI models. This page provides detailed model information, pricing, and usage instructions.
Enterprise-grade Professional and Stable AI Large Model API Hub All models are officially sourced and forwarded, with ~20% off pricing (combining top-up bonuses and exchange rate advantages), aggregating various excellent large models. No speed limits, no expiration, no account ban risks, pay-as-you-go billing, long-term reliable service.
The following are currently stably supplied popular models. For complete model list and real-time pricing, visit APIYI Console Pricing Page.
Model Upgrade Recommendations: We recommend using the latest models for best performance, but please note:
  1. Initial instability is common: Newly launched models may experience slow responses, timeouts, or occasional errors due to limited compute capacity at the vendor — this typically stabilizes within days to weeks
  2. Check parameter compatibility: New models may introduce or change parameters (e.g., max_completion_tokens replacing max_tokens). Before upgrading, verify that your API parameters remain compatible with older models
  3. Always test before going live: Before deploying a new model to production, thoroughly validate it in a test environment to ensure output quality and API compatibility meet expectations

Model Categories

🤖 OpenAI Series

🆕 Latest Models

Model NameModel IDContext LengthFeaturesRecommended Scenarios
GPT-5.5 Progpt-5.5-pro1MStrongest reasoning today, Terminal-Bench 2.0 82.7%; /v1/responses endpoint + SVIP group only, very expensiveTop-tier reasoning, research (professional needs)
GPT-5.5 🔥gpt-5.51MSWE-bench Verified 88.7%, hallucinations 60% lower than 5.4, new xhigh reasoning tierComplex agents, professional workflows
GPT-5.4gpt-5.41MNative computer use, GDPval 83%Complex agents, professional workflows
chat-latestchat-latest400KVersion-less alias, always points to the latest ChatGPT Instant (currently GPT-5.5 Instant)Quick writing, conversation
GPT-5.2gpt-5.2400KGDPval 70.9% surpassing professionalsProgramming planning, structured tasks
GPT-5.3 Codex 🔥gpt-5.3-codex128KSWE-Bench Pro SOTA, complex programming and agent tasksComplex programming, agent tasks
GPT-5.1gpt-5.1128KIntelligence-speed balance, SWE-bench 76.3%, 24h cacheGeneral apps, programming
GPT Pro series (e.g. gpt-5.5-pro, gpt-5-pro) usage notes:
  1. /v1/responses endpoint only: cannot use /v1/chat/completions — switch your SDK / code to the Responses API before calling
  2. Very expensive: a single call may cost several dollars, and it is open to the SVIP group only to prevent accidental use on the Default group
  3. Not recommended for non-professional needs: use GPT-5.5 / GPT-5.4 for everyday tasks; Pro suits only research and top-tier work demanding extreme reasoning depth

✅ Stable / Classic

Model NameModel IDContext LengthFeaturesRecommended Scenarios
GPT-5gpt-5128KFlagship stable version, ultra-strong reasoningTop-tier reasoning, complex tasks
GPT-5 Minigpt-5-mini128KGPT-5 lightweight, excellent performanceBalance performance and cost
GPT-5 Nanogpt-5-nano128KGPT-5 ultra-lightweightLarge-scale batch processing
o3o3200KReasoning model, significantly price-reducedComplex reasoning, math, programming
o4-minio4-mini200KLightweight reasoning modelTop choice for programming
GPT-4.1gpt-4.1128KFast speed, main workhorseGeneral applications
GPT-4.1 Minigpt-4.1-mini128KCheaper lightweight versionCost-sensitive scenarios
GPT-4ogpt-4o128KBalanced multimodal capabilitiesGeneral scenarios
GPT-4o Minigpt-4o-mini128KLightweight fast versionQuick response
GPT-5 Series Usage Notes:
  1. Temperature parameter temperature must be set to 1 (only supports 1)
  2. Use max_completion_tokens instead of max_tokens
  3. Do not pass top_p parameter
Image and Video Generation Models have been moved to a dedicated page. Visit Image & Video Generation Models for the full list and pricing.

🎭 Claude Series (Anthropic)

🆕 Latest Models

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Claude Opus 4.7 🔥claude-opus-4-71M (Beta)Coding benchmarks +13% over 4.6, 3x on production tasks, tool errors cut to 1/3, new xhigh reasoning tierTop-tier coding, complex agents
Claude Opus 4.7 Thinking 🔥claude-opus-4-7-thinking1M (Beta)Adaptive thinking, enhanced deep reasoningTop-tier reasoning tasks
Claude Opus 4.6claude-opus-4-61M (Beta)Terminal-Bench 2.0 #1, 128K outputTop-tier coding, complex agents
Claude Sonnet 4.6 🔥claude-sonnet-4-61M (Beta)Full upgrade, rivals Opus 4.5, great valueProgramming top choice, agent dev

✅ Stable / Classic

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Claude Opus 4.5claude-opus-4-5-20251101200KSWE-bench 80.9%, price reduced to 1/3Complex programming, top-tier reasoning
Claude Sonnet 4.5claude-sonnet-4-5-20250929200KWorld-class coding, SWE-bench 77.2%Code generation, agent development
Claude Sonnet 4.5 Thinkingclaude-sonnet-4-5-20250929-thinking200KChain-of-thought mode, deep reasoningComplex programming reasoning
Claude Haiku 4.5claude-haiku-4-5-20251001200KHigh cost-performance, SWE-bench 73.3%, 2x speedReal-time chat, pair programming
Claude 4 Sonnetclaude-sonnet-4-20250514200KBattle-tested, top choice for programmingCode generation, analysis
Claude Opus 4.1claude-opus-4-1-20250805200KIterative upgrade, programming-optimizedHigh-demand programming tasks
Latest: Claude Opus 4.7 improves coding benchmarks 13% over 4.6, cuts tool-call errors to 1/3, and adds an xhigh reasoning tier at the same price as 4.6. Sonnet 4.6 rivals Opus 4.5 and is now the default on claude.ai. Stable: Opus 4.5 and Sonnet 4.5 are battle-tested for production. Haiku 4.5 offers 2x speed at great value.

🌟 Google Gemini Series

🆕 Latest Models

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Gemini 3.5 Flash 🔥gemini-3.5-flash1MTerminal-Bench 2.1 76.2%, fully surpasses 3.1 Pro, ~4x faster at ~half the priceProgramming top choice, cost-performance king
Gemini 3.1 Pro Preview 🔥gemini-3.1-pro-preview1MARC-AGI-2 77.1% (2x+ over 3 Pro), most advanced reasoningComplex reasoning, multimodal analysis
Gemini 3 Flash Previewgemini-3-flash-preview1MSWE-bench 78%, 3x faster, thinking / nothinking variants availableProgramming, cost-performance
Gemini 3.1 Flash Lite 🔥gemini-3.1-flash-lite1MGA version, 64% faster than 2.5 Flash, ultra-low priceHigh concurrency, batch, low cost
Note: Gemini 3 Pro Preview was discontinued on March 9, 2026. Please migrate to Gemini 3.1 Pro Preview.

✅ Stable / Classic

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Gemini 2.5 Progemini-2.5-pro2MOfficial release, programming advantage, strong multimodalLong text, programming, multimodal
Gemini 2.5 Flashgemini-2.5-flash1MFast speed, low cost, official releaseQuick response scenarios
Gemini 2.5 Flash Litegemini-2.5-flash-lite1MUltra-lightweight, faster and cheaperLarge-scale simple tasks
Latest: Gemini 3.5 Flash fully surpasses Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas and more, at ~4x speed and ~half the price — today’s cost-performance king. Gemini 3.1 Pro Preview doubles reasoning (ARC-AGI-2 77.1%), Google’s most advanced. Gemini 3.1 Flash Lite is now GA, the cheapest frontier model for high-concurrency. Stable: Gemini 2.5 Pro (2M context) and Gemini 2.5 Flash are GA, ideal for production.

🚀 xAI Grok Series

🆕 Latest Models

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Grok 4.3 🔥grok-4.31MIntelligence Index 53, τ²-Bench 98%, IFBench 81%, 1M context + multimodalComplex reasoning, general tasks
Grok 4grok-4StandardOfficial version; grok-4-all adds native web searchGeneral tasks, real-time info
Grok 4 Fast Reasoning 🔥grok-4-fast-reasoning200KReasoning mode, 93%+ cheaper than Grok-4Complex reasoning
Grok Code Fast 1grok-code-fast-1256KSWE-bench 70.8%, high-speed generationCode generation, agent programming

✅ Stable / Classic

Model NameModel IDContext LengthFeaturesRecommended Scenarios
Grok 3grok-3StandardOfficial stable versionDaily use
Grok 3 Allgrok-3-allStandardNative web search enhancedNews, market analysis
Grok 3 Minigrok-3-miniStandardSmall model with reasoningLightweight tasks

🔍 DeepSeek Series

🆕 Latest Models

Model NameModel IDContext LengthFeaturesRecommended Scenarios
DeepSeek V4 Pro 🔥deepseek-v4-pro1M1.6T/49B activated, SWE-Verified 80.6 near Claude/Gemini, Hybrid AttentionComplex reasoning, coding, agents
DeepSeek V4 Flash 🔥deepseek-v4-flash1M284B/13B activated, just $0.14/M input, open-source SOTA valueHigh concurrency, batch
DeepSeek V3.2deepseek-v3.2128KGPT-5 level, tool-use in reasoningComplex reasoning, coding

✅ Stable / Classic

Model NameModel IDContext LengthFeaturesRecommended Scenarios
DeepSeek V3.1deepseek-v3-1-250821128KMixed reasoning, Think/Non-Think dual modesIntelligent reasoning, programming
DeepSeek R1deepseek-r164KReasoning modelMath, reasoning
DeepSeek V3deepseek-v3128KStrong comprehensive capabilitiesGeneral scenarios

🐘 Chinese Model Series

Zhipu AI (GLM)

🆕 Latest: GLM-5.1 | ✅ Stable / Classic: GLM-5, GLM-4.6
Model NameModel IDContext LengthFeaturesRecommended Scenarios
GLM-5.1 🔥glm-5.1200KSWE-Bench Pro 58.4 beats GPT-5.4 / Opus 4.6 / Gemini 3.1 Pro, 744B MoE, MIT open-sourceComplex coding, agents
GLM-5glm-5200K744B params (40B activated), coding aligned with Claude Opus 4.5, open-sourceComplex coding, systems engineering
GLM-4.6glm-4.6200KCode and reasoning enhanced, stableProgramming, reasoning, agents
GLM-4.5glm-4.5128KStandard version, strong overallGeneral scenarios
GLM-5.1 Features:
  • 744B MoE params, supports long-horizon agent tasks up to 8 hours
  • SWE-Bench Pro 58.4, strongest coding among open-source models
  • MIT licensed open-source, excellent value

Alibaba Qwen

🆕 Latest: Qwen3.7-Max | ✅ Stable / Classic: Qwen Max, Plus, Turbo
Model NameModel IDContext LengthFeaturesRecommended Scenarios
Qwen3.7-Max 🔥qwen3.7-max1MAA Intelligence Index 56.6 (global top 5, #1 in China), 35-hour long-horizon agent autonomyAgents, multilingual, long text
Qwen Maxqwen-max32KStrongest stable versionGeneral tasks
Qwen Plusqwen-plus32KEnhanced versionCost-effective
Qwen Turboqwen-turbo32KFast versionLow latency

Moonshot Kimi Series

🆕 Latest: Kimi K2.6 | ✅ Stable / Classic: Kimi K2.5, K2
Model NameModel IDContext LengthFeaturesRecommended Scenarios
Kimi K2.6 🔥kimi-k2.6256K1T MoE / 32B activated, SWE-Bench Pro 58.6 surpasses GPT-5.4 and Opus 4.6Coding, agents
Kimi K2.5kimi-k2.5200KNative multimodal, Agent Swarm 100 agentsMultimodal, agents
Kimi K2 Official Releasekimi-k2-250711200KVolcano Engine partnership, strong stabilityProduction environments

🌐 MiniMax Series

🆕 Latest: MiniMax M2.7 | ✅ Stable / Classic: MiniMax M2.5
Model NameModel IDContext LengthFeaturesRecommended Scenarios
MiniMax M2.7 🔥MiniMax-M2.7Standard10B params, SWE-bench Pro 56.22%, self-evolving, smallest Tier-1 modelCoding, agents
MiniMax M2.5minimax-m2.5Standard230B (10B activated), SWE-bench 80.2%, great valueCoding, agents, office automation
MiniMax M2.7 Features:
  • Reaches SWE-bench Pro 56.22% with just 10B params, the smallest Tier-1 model
  • Self-evolving; standard $0.3 / highspeed (MiniMax-M2.7-highspeed) $0.6 per 1M input tokens
  • Open-sourced model weights

💰 Pricing Information

Billing Methods

  • Pay-as-you-go: Charged based on actual Token usage
  • No minimum charge: Use what you pay for, balance never expires
  • Real-time deduction: Fees deducted from balance immediately after each call

Pricing Advantages

  • Official source forwarding with slight price advantages
  • Bulk users can contact customer service for better pricing
  • New users get 3 million tokens testing credit upon registration

View Real-time Pricing

Visit APIYI Console Pricing Page to view latest pricing for all models.

🛠️ Usage Recommendations

Model Selection Guide

Programming Development
  • Top performance: Claude Opus 4.7 (+13% coding over 4.6), GPT-5.5 (SWE-bench 88.7%), Claude Sonnet 4.6 (rivals Opus 4.5)
  • High cost-performance: Gemini 3.5 Flash (surpasses 3.1 Pro at ~half price), GLM-5.1 (SWE-Bench Pro 58.4), Kimi K2.6, DeepSeek V4 Flash
  • Alternatives: DeepSeek V4 Pro, Qwen3.7-Max, MiniMax M2.7, o4-mini
Text Creation
  • Top choice: GPT-5.5, GPT-5.4, Gemini 3.1 Pro Preview, Claude Opus 4.7, Claude Sonnet 4.6
  • Alternatives: chat-latest, Claude Sonnet 4.5, GPT-4.1, GPT-4o, Claude Haiku 4.5, GLM-4.6
Quick Response
  • Top choice: Gemini 3.5 Flash (~4x speed), Claude Haiku 4.5 (2x faster), GPT-4o Mini
  • Alternatives: Gemini 3.1 Flash Lite, Gemini 2.5 Flash, Grok 4 Fast, GPT-4.1 Mini
Image Generation
  • Latest recommendation: GPT Image 1.5 (4x speed boost, precise editing, from $0.01)
  • Professional design: SeeDream 4.5 (1.2B parameters, 4K quality, $0.035/image), Nano Banana Pro (4K HD, best text rendering)
  • High cost-performance: Nano Banana ($0.025/image), SeeDream 4.0 ($0.025/image)
  • Reverse-engineered, cheapest: sora_image, gpt-4o-image
Long Text Processing
  • Top choice: Gemini 2.5 Pro (2M context)
  • Alternatives: Claude 4 series (200K context)

Cost Optimization Recommendations

  1. Tiered Usage: Use cheaper models for simple tasks, advanced models for complex tasks
  2. Test Optimization: Test with small models first, use large models after determining needs
  3. Batch Processing: Choose Nano or Mini versions for large volumes of similar tasks
  4. Cache Reuse: Cache results for repeated queries
Model list is continuously updated. We will promptly add newly released excellent models. For specific model needs or bulk requirements, please contact customer service.