README

Qwen3 Next 80B

The first model in the Qwen3-Next series featuring innovative hybrid attention architecture and high-efficiency Mixture-of-Experts design.

Hybrid Attention: Combines Gated DeltaNet and Gated Attention for efficient ultra-long context modeling
High-Sparsity MoE: 80B total parameters with only 3B activated, providing excellent efficiency
Ultra-Long Context: Supports up to 262,144 tokens natively
Multi-Token Prediction: Enhanced pretraining performance and faster inference
Advanced Capabilities: Excels at reasoning, coding, creative writing, and agentic tasks
Multilingual Support: Over 100 languages and dialects

Delivers performance comparable to much larger models while maintaining exceptional efficiency:

Outperforms Qwen3-32B with 10x inference throughput for long contexts
Matches Qwen3-235B-A22B on many benchmarks with significantly lower computational requirements
Superior ultra-long-context handling up to 256K+ tokens

Parameters

Custom configuration options included with this model

Temperature

0.7

Top K Sampling

20

Top P Sampling

0.8

Sources

The underlying model files this model uses

Based on