37 Downloads
Capabilities
Minimum system memory
Tags
Last updated
Updated on April 22byREADME
The official Qwen 3.6 27B dense model for Apple Silicon, converted to MLX 8-bit with full vision, thinking, and tool calling support. Near-lossless quality at 8.6 bits per weight. The best quantization you can run locally. Built for 36 GB+ Macs.
mlx-vlm.Custom Fields
Special features defined by the model author
Enable Thinking
: boolean
(default=true)
Controls whether the model will think before replying
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
The official Qwen 3.6 Jinja template has four problems that make it unusable in LM Studio:
|items filter, which does not exist in LM Studio's C++ Jinja runtime.developer role crashes. Modern APIs send message.role == "developer". The official template throws an exception.preserve_thinking spam. The official template wraps every past turn in empty thinking blocks, wasting context tokens.</thinking> hallucination. The model sometimes generates </thinking> instead of the expected closing tag. The official parser fails.This model ships with a rewritten template that fixes all four. It also adds a thinking toggle and only emits thinking blocks when they contain actual reasoning content.
Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the thinking mode.
System: You are a coding assistant. <|think_off|> User: What's 2+2?
Fast answer, no internal reasoning.
System: You are a coding assistant. <|think_on|> User: Implement a red-black tree in Rust.
The model thinks step by step, then answers.
froggeric/qwen3.6-27b-mlx-8bit.The model underperforms without it. Add whatever you want after that line.You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
From the official Qwen authors. Reserve 128K+ context for thinking mode.
| Mode | temp | top_p | top_k | repeat_penalty |
|---|---|---|---|---|
| Thinking (coding) | 0.6 | 0.95 | 20 | 1.0 |
| Thinking (general) | 1.0 | 0.95 | 20 | 1.0 |
| Non-thinking (general) | 0.7 | 0.8 | 20 | 1.0 |
| Spec | Value |
|---|---|
| Quantization | 8-bit (8.6 bits/weight) |
| Size | 28 GB, 6 shards |
| Total params | 27.8B (dense) |
| Layers | 64 (3x linear attn + 1x full attn) |
| Context | 262K native, 1M+ with YaRN |
| Vocabulary | 248K tokens |
| model_type | qwen3_5 |
| Role | Author |
|---|---|
| Original model | Alibaba Cloud (Qwen team) |
| MLX 8-bit conversion | froggeric |
Apache-2.0, inherited from Qwen3.6.
Based on
MLX