Capabilities
Minimum system memory
Tags
Last updated
Updated 11 days agobyForked from qwen/qwen3.5-35b-a3b
README
Custom Fields
Special features defined by the model author
Enable Thinking
: boolean
(default=true)
Controls whether the model will think before replying
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on
GGUF
Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility. It has 35B total parameters and 3B activated, supporting a native context length of 262,144 tokens.
Unified Vision-Language Foundation. Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture. Gated Delta Networks combined with sparse Mixture-of-Experts (256 total experts, 8 routed + 1 shared active) deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization. Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage. Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.