Capabilities
Minimum system memory
Tags
Last updated
Updated 20 days agobyREADME
Custom Fields
Special features defined by the model author
Enable Thinking
: boolean
(default=true)
Controls whether the model will think before replying
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on
MLX
The official Qwen 3.6 27B dense model for Apple Silicon, converted to MLX 8-bit with full vision, thinking, and tool calling support. Near-lossless quality at 8.6 bits per weight. The best quantization you can run locally. Built for 36 GB+ Macs.
mlx-vlm.The official Qwen 3.6 Jinja template has four problems that make it unusable in LM Studio:
This model ships with a rewritten template that fixes all four. It also adds a thinking toggle and only emits thinking blocks when they contain actual reasoning content.
Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the thinking mode.
Fast answer, no internal reasoning.
The model thinks step by step, then answers.
froggeric/qwen3.6-27b-mlx-8bit.The model underperforms without it. Add whatever you want after that line.
From the official Qwen authors. Reserve 128K+ context for thinking mode.
| Mode | temp | top_p | top_k | repeat_penalty |
|---|---|---|---|---|
| Thinking (coding) | 0.6 | 0.95 | 20 | 1.0 |
| Thinking (general) | 1.0 | 0.95 | 20 | 1.0 |
| Non-thinking (general) | 0.7 | 0.8 | 20 | 1.0 |
| Spec | Value |
|---|---|
| Quantization | 8-bit (8.6 bits/weight) |
| Size | 28 GB, 6 shards |
| Total params | 27.8B (dense) |
| Layers | 64 (3x linear attn + 1x full attn) |
| Context | 262K native, 1M+ with YaRN |
| Vocabulary | 248K tokens |
| model_type | qwen3_5 |
| Role | Author |
|---|---|
| Original model | Alibaba Cloud (Qwen team) |
| MLX 8-bit conversion | froggeric |
Apache-2.0, inherited from Qwen3.6.
|itemsdeveloper role crashes. Modern APIs send message.role == "developer". The official template throws an exception.preserve_thinking spam. The official template wraps every past turn in empty thinking blocks, wasting context tokens.</thinking> hallucination. The model sometimes generates </thinking> instead of the expected closing tag. The official parser fails.System: You are a coding assistant. <|think_off|>
User: What's 2+2?
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.