qwen3-coder-480b

Public

Forked from qwen/qwen3-coder-480b

Qwen's most powerful code model, featuring 480B total parameters with 35B activated through Mixture of Experts (MoE) architecture.

Capabilities

Minimum system memory

250GB

Tags

480B
qwen3_moe

README

Qwen3 Coder 480B

Qwen's most powerful code model, featuring 480B total parameters with 35B activated through Mixture of Experts (MoE) architecture.

Key Features:

  • Agentic Coding: Comparable performance to Claude Sonnet 4 on coding tasks
  • Repository-Scale Understanding: Optimized for large codebases and complex projects

Technical Specifications:

  • 480B total parameters, 35B activated (MoE with 160 experts, 8 active)
  • 62 layers with Grouped Query Attention (96 Q heads, 8 KV heads)
  • Native 262,144 token context length

Note: This model operates in non-thinking mode only and does not generate <think></think> blocks.

Parameters

Custom configuration options included with this model

Repeat Penalty
1.05
Temperature
0.7
Top K Sampling
20
Top P Sampling
0.8