Qwen3.6 is Alibaba Tongyi Qianwen’s next-generation model family, released in Q2 2026 across three closed-source production tiers — Max (flagship), Plus (balanced), Flash (speed-first) — plus two open-weight variants: 27B and 35B-A3B. APIYI routes all five models through Aliyun official relay / APIYI hosted relay, OpenAI Chat Completions compatible. Closed-source tiers match the official portal’s auth and rate-limit policies; the open-weight tiers are hosted by APIYI’s official relay so customers don’t have to rent GPUs or stand up local inference.Documentation Index
Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt
Use this file to discover all available pages before exploring further.
qwen3.6-27b (27B dense) and qwen3.6-35b-a3b (35B MoE / 3B active) are hosted on APIYI’s official relay — no GPU rental needed, billed per token. Built for coding agents, long-context RAG, multimodal dispatch, and compliance-sensitive workloads needing auditable weights.Closed-source production tiers (Aliyun official relay)
qwen3.6-max-preview
qwen3.6-flash
qwen3.6-plus
Open-weight tiers (hosted by APIYI · no GPU rental)
qwen3.6-27b
Qwen/Qwen3.6-27B). Coding ability rivals 397B-class models. Hosted by APIYI’s official relay — no local GPUs required.qwen3.6-35b-a3b
Qwen/Qwen3.6-35B-A3B); same lineage as closed-source Flash, different distribution tier. Only 3B active params — extremely low compute cost.Why APIYI’s Qwen3.6 via Aliyun official relay?
Calibrated against Alibaba Cloud Bailian’s official channel, with deep optimization for enterprise production across stability, cost, and integration ergonomics:Aliyun official relay
No concurrency cap · scale freely
List-price match + ~15% off via recharge
Global zero-friction access
api.apiyi.com — no overseas migration needed.Full OpenAI-compatible ecosystem
Professional service · enterprise support
How to choose among the five
Max-Preview · coding & complex reasoning
Flash · high-volume multimodal long-context
Plus · the balanced workhorse
qwen3.6-27b · open-weight coding powerhouse
qwen3.6-35b-a3b · open-weight speed MoE
Recommended routing
Pricing
All five models bill under pay-as-you-go - Chat. Closed-source tiers (Max-Preview / Flash / Plus) use tiered pricing keyed on the total input token count of a single request. Open-weight tiers (27b / 35b-a3b) bill at a single flat rate — no tiers. List prices match Alibaba Cloud’s official rate; APIYI’s recharge bonus brings the effective unit price to roughly 85% of list.qwen3.6-max-preview
| Single-request input tokens | Input price | Output price |
|---|---|---|
| 0 – 128K | $1.2800 / 1M tokens | $7.6800 / 1M tokens |
| 128K – 256K | $2.1200 / 1M tokens | $12.7200 / 1M tokens |
qwen3.6-flash
| Single-request input tokens | Input price | Output price |
|---|---|---|
| 0 – 256K | $0.1700 / 1M tokens | $1.0200 / 1M tokens |
| 256K – 1000K | $0.6800 / 1M tokens | $4.0800 / 1M tokens |
qwen3.6-plus
| Single-request input tokens | Input price | Output price |
|---|---|---|
| 0 – 256K | $0.3000 / 1M tokens | $1.8000 / 1M tokens |
| 256K – 1000K | $1.2000 / 1M tokens | $7.2000 / 1M tokens |
qwen3.6-27b (open-weight · APIYI hosted)
| Billing | Input price | Output price |
|---|---|---|
| Flat (no tiers) | $0.4200 / 1M tokens | $2.5200 / 1M tokens |
qwen3.6-35b-a3b (open-weight · APIYI hosted)
| Billing | Input price | Output price |
|---|---|---|
| Flat (no tiers) | $0.2600 / 1M tokens | $1.5600 / 1M tokens |
- Closed-source tiers (tiered pricing): the tier is set by the total input tokens of a single request. All tokens in that request (input + output) bill at that tier’s rate. No cross-tier proration — e.g., a Flash request with 300K input tokens lands in
256K – 1000Kand the entire request bills at $0.68 / $4.08, not split as “first 256K cheap, remaining 44K at the higher tier.” - Open-weight tiers (flat pricing):
qwen3.6-27bandqwen3.6-35b-a3bare hosted by APIYI’s official relay — no tiers. Customers don’t have to rent GPUs or run local inference; settle directly by actual token consumption. - List prices match Alibaba Cloud Bailian. With recharge bonuses, the effective unit price lands around 85% of list.
- Cache-hit pricing is not currently disclosed separately; falls back to the base tier.
Specs
Closed-source production tiers
| Dimension | qwen3.6-max-preview | qwen3.6-flash | qwen3.6-plus |
|---|---|---|---|
| Model ID | qwen3.6-max-preview | qwen3.6-flash | qwen3.6-plus |
| Architecture | Dense large model | MoE 35B-A3B | MoE 72B / 18B active |
| Context | 262K tokens | 256K (expandable to 1M) | 1M tokens |
| Input modalities | Text | Text / image / video | Text |
| Output format | Text | Text | Text |
| Streaming | ✅ Supported | ✅ Supported | ✅ Supported |
| Function calling / tool use | ✅ Supported | ✅ Supported | ✅ Supported |
| Chain-of-thought | ✅ Auto-enabled on reasoning tasks | — | ✅ Always on |
| Billing | Pay-as-you-go Chat (tiered) | Pay-as-you-go Chat (tiered) | Pay-as-you-go Chat (tiered) |
| Channel | Aliyun official relay | Aliyun official relay | Aliyun official relay |
Open-weight tiers (hosted by APIYI)
| Dimension | qwen3.6-27b | qwen3.6-35b-a3b |
|---|---|---|
| Model ID | qwen3.6-27b | qwen3.6-35b-a3b |
| Architecture | 27B dense | MoE 35B total / 3B active |
| License | Qwen team open-weight (Hugging Face Qwen/Qwen3.6-27B) | Qwen team open-weight (Hugging Face Qwen/Qwen3.6-35B-A3B) |
| Context | Matches official weight card | Matches official weight card |
| Input modalities | Text | Text |
| Streaming | ✅ Supported | ✅ Supported |
| Function calling / tool use | ✅ Supported | ✅ Supported |
| Billing | Pay-as-you-go Chat (flat, no tiers) | Pay-as-you-go Chat (flat, no tiers) |
| Channel | APIYI hosted relay | APIYI hosted relay |
Endpoints
| Endpoint | Method | Content-Type | Purpose |
|---|---|---|---|
/v1/chat/completions | POST | application/json | Dialog / reasoning / tool use (shared by all five models, only the model field differs) |
Code examples
Python (OpenAI SDK compatible)
Node.js
cURL
Best practices
Pick the right tier per task
Estimate the tier boundary
Multimodal batching
Preview canary
qwen3.6-max-preview is a Preview build — weights still iterating. For critical paths, run a small canary with A/B comparison before flipping main traffic.Tools & streaming
tools and stream: true. You can drop them into existing OpenAI-compatible Agent frameworks (OpenClaw, LangChain, LlamaIndex, etc.) without rewriting tool-calling logic.Stack with recharge bonuses
Errors & retries
| Status | Meaning | What to do |
|---|---|---|
400 | Param error / unknown model | Check model spelling, messages shape, and whether the input exceeds max context |
401 | Invalid token | Verify the Bearer Token |
403 | Content moderation block | Adjust prompts / reference inputs to avoid policy violations |
429 | Rate-limit / insufficient balance | Exponential backoff retry; check account balance |
5xx | Gateway / backend error | Retry 1–2 times; if still failing, file a ticket |
| Timeout | Long-tail latency | Set client timeout to ≥ 120s (CoT or long-context calls take longer) |
- Set request timeout to ≥ 120 seconds (Max-Preview reasoning and Plus long-context CoT take longer)
- Apply exponential backoff retry on 5xx and timeouts (recommend 2 attempts)
- Log the
x-request-idresponse header for troubleshooting
FAQ
Do all five models share the same API endpoint?
Do all five models share the same API endpoint?
What's the difference between open-weight (27b / 35b-a3b) and closed-source tiers?
What's the difference between open-weight (27b / 35b-a3b) and closed-source tiers?
If the weights are open, why use APIYI's hosted API?
If the weights are open, why use APIYI's hosted API?
How exactly does tiered pricing work?
How exactly does tiered pricing work?
256K – 1000K and bills the entire request at $0.68 / $4.08 — no split between “first 256K cheap, remaining 44K higher.”Max-Preview is a Preview — is it production-ready?
Max-Preview is a Preview — is it production-ready?
How do I send multimodal input to Flash?
How do I send multimodal input to Flash?
messages, send content as an array where each element is {type: "text", text: ...} or {type: "image_url", image_url: {url: ...}}. For video, follow the official doc’s video_url / frame-sampling fields.Are APIYI's prices the same as Alibaba Cloud Bailian's official site?
Are APIYI's prices the same as Alibaba Cloud Bailian's official site?
Are function calling / tool use supported?
Are function calling / tool use supported?
tools / tool_choice. You can reuse existing Agent-framework tool-calling logic. Max-Preview shines at multi-step tool calls and long-horizon planning.Is chain-of-thought output supported?
Is chain-of-thought output supported?
reasoning_content, etc.).What if my request exceeds 1M context?
What if my request exceeds 1M context?
400. Apply summarization / chunking / RAG retrieval before sending — don’t try to push everything in a single call.Can I use the official OpenAI SDK directly?
Can I use the official OpenAI SDK directly?
base_url to https://api.apiyi.com/v1 and pass any of the model IDs above as model — zero-code migration.Are failed requests billed?
Are failed requests billed?
4xx errors (param error / auth failure / content-moderation block) are not billed. Server-side 5xx errors that don’t reach inference are also not billed. Requests that successfully return tokens are billed by actual token count, even if the client cancels mid-stream.Related docs
- Deep dive: Qwen3.6 Max-Preview & Flash launch
- Deep dive: Qwen3.6-Plus launch — Alibaba’s strongest coding agent
- Recharge promotions — drop unit price to ~85% of list
- Model catalog — all available models and groups
- API manual — general usage conventions