Qwen3 Next 80B
The first model in the Qwen3-Next series featuring innovative hybrid attention architecture and high-efficiency Mixture-of-Experts design.
Key Features
- Hybrid Attention: Combines Gated DeltaNet and Gated Attention for efficient ultra-long context modeling
- High-Sparsity MoE: 80B total parameters with only 3B activated, providing excellent efficiency
- Ultra-Long Context: Supports up to 262,144 tokens natively
- Multi-Token Prediction: Enhanced pretraining performance and faster inference
- Advanced Capabilities: Excels at reasoning, coding, creative writing, and agentic tasks
- Multilingual Support: Over 100 languages and dialects
Architecture Highlights
- 80B total parameters, 3B activated (A3B)
- 48 layers with hybrid layout
- 512 experts with only 10 activated per token
- Context length: 262,144 tokens
- No thinking mode support (instruct-only)
Delivers performance comparable to much larger models while maintaining exceptional efficiency:
- Outperforms Qwen3-32B with 10x inference throughput for long contexts
- Matches Qwen3-235B-A22B on many benchmarks with significantly lower computational requirements
- Superior ultra-long-context handling up to 256K+ tokens