qwen3-coder-480b

Public

Qwen's most powerful code model, featuring 480B total parameters with 35B activated through Mixture of Experts (MoE) architecture.

9.9K Downloads

19 stars

Capabilities

Minimum system memory

250GB

Tags

480B
qwen3_moe

README

Qwen3 Coder 480B

Qwen's most powerful code model, featuring 480B total parameters with 35B activated through Mixture of Experts (MoE) architecture.

Key Features:

  • Agentic Coding: Comparable performance to Claude Sonnet 4 on coding tasks
  • Repository-Scale Understanding: Optimized for large codebases and complex projects

Technical Specifications:

  • 480B total parameters, 35B activated (MoE with 160 experts, 8 active)
  • 62 layers with Grouped Query Attention (96 Q heads, 8 KV heads)
  • Native 262,144 token context length

Note: This model operates in non-thinking mode only and does not generate <think></think> blocks.

Parameters

Custom configuration options included with this model

Repeat Penalty
1.05
Temperature
0.7
Top K Sampling
20
Top P Sampling
0.8