Model

Qwen3 Next

Public

Hybrid attention architecture, high-sparsity Mixture-of-Experts 80B model (active 3B). Currently supported for Mac only with MLX.

Use cases

Minimum system memory

42GB

Tags

80B
qwen3_next

README

Qwen3 Next 80B

The first model in the Qwen3-Next series featuring innovative hybrid attention architecture and high-efficiency Mixture-of-Experts design.

Key Features

  • Hybrid Attention: Combines Gated DeltaNet and Gated Attention for efficient ultra-long context modeling
  • High-Sparsity MoE: 80B total parameters with only 3B activated, providing excellent efficiency
  • Ultra-Long Context: Supports up to 262,144 tokens natively
  • Multi-Token Prediction: Enhanced pretraining performance and faster inference
  • Advanced Capabilities: Excels at reasoning, coding, creative writing, and agentic tasks
  • Multilingual Support: Over 100 languages and dialects

Architecture Highlights

  • 80B total parameters, 3B activated (A3B)
  • 48 layers with hybrid layout
  • 512 experts with only 10 activated per token
  • Context length: 262,144 tokens
  • No thinking mode support (instruct-only)

Performance

Delivers performance comparable to much larger models while maintaining exceptional efficiency:

  • Outperforms Qwen3-32B with 10x inference throughput for long contexts
  • Matches Qwen3-235B-A22B on many benchmarks with significantly lower computational requirements
  • Superior ultra-long-context handling up to 256K+ tokens

Parameters

Custom configuration options included with this model

Temperature
0.7
Top K Sampling
20
Top P Sampling
0.8