Capabilities
Minimum system memory
Tags
Last updated
Updated on April 12byREADME
Custom Fields
Special features defined by the model author
Enable Thinking
: boolean
(default=true)
Controls whether the model will think before replying
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on
The bug-free, fully uncensored Qwen 3.5 35B for Apple Silicon, with a fixed chat template that makes tool calling and thinking actually work in LM Studio. Fits comfortably on 24 GB Macs.
Alibaba shipped Qwen 3.5 35B with two corrupted tensors in layers 36 and 37. Past 50K tokens, the model drifts, loops, and eventually collapses. No sampler setting fixes it. LuffyTheFox found the bug, wrote the Sig-ScaleSync repair tool, and released fixed weights.
This is those fixed weights, converted to MLX 4-bit with full vision support. Half the size of the 8-bit version, with minimal quality loss.
mlx-vlm.The official Qwen 3.5 Jinja template has four problems that make it unusable in LM Studio:
|items filter, which does not exist in LM Studio's C++ Jinja runtime.This model ships with a rewritten template that fixes all four. It also adds a thinking toggle and removes empty thinking blocks from conversation history, which saves tokens and keeps prefix caching stable at long context.
Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the thinking mode.
Fast answer, no internal reasoning.
The model thinks step by step, then answers.
The <|think_on|> / <|think_off|> syntax will never appear naturally in code or conversation, so there are no false positives. Earlier community templates used /think, which broke paths like cd /mnt/project/think.
Tool calling warning: Always add
<|think_off|>to your prompt when using tools. LM Studio crashes when the model generates a tool call inside its thinking block.
froggeric/qwen3.5-35b-a3b-uncensored-fernflowerai-mlx-4bit.The model underperforms without it. Add whatever you want after that line.
From the official Qwen authors. Reserve 128K+ context for thinking mode.
| Mode | temp | top_p | top_k | repeat_penalty |
|---|---|---|---|---|
| Thinking (coding) | 0.6 | 0.95 | 20 | 1.0 |
| Thinking (general) | 1.0 | 0.95 | 20 | 1.5 |
| Non-thinking (general) | 0.7 | 0.8 | 20 | 1.5 |
| Non-thinking (reasoning) | 1.0 | 1.0 | 40 | 2.0 |
MLX uses repeat_penalty (1.0 = off). GGUF runtimes use presence_penalty (0 = off).
| Spec | Value |
|---|---|
| Quantization | 4-bit (4.6 bits/weight) |
| Size | 19 GB, 4 shards |
| Total params | 35B |
| Active per token | ~3B |
| Attention | 3x DeltaNet-MoE + 1x Attention-MoE, 10 repetitions |
| Context | 262K native, 1M with YaRN |
| Vocabulary | 248K tokens, 201 languages |
| model_type | qwen3_5_moe |
Lower GGUF quantizations (Q3_K, Q2_K) break the model due to MoE plus DeltaNet sensitivity. Use Q4_K_L or higher for GGUF.
Two tensors out of 502 carried corrupted weights: layers.36.linear_attn.conv1d.weight and layers.37.linear_attn.conv1d.weight. Their standard deviation ran ~60% higher than the median of their peer group (0.102 vs 0.063).
Sig-ScaleSync compares each tensor's scale against the median of its peer group. A tensor gets flagged only if it exceeds the deviation threshold and shows weight saturation. This two-gate filter avoids false positives on architecturally asymmetric layers. Out of 502 tensors, exactly 2 needed repair. Verified against Gemma 4 26B A4B with zero false positives.
| Tensor | Error reduction | Saturation (before/after) |
|---|---|---|
layers.36.linear_attn.conv1d.weight | 88.6% | 0.0025 / 0.0010 |
layers.37.linear_attn.conv1d.weight | 88.6% | 0.0025 / 0.0010 |
| Role | Author |
|---|---|
| Original model | Alibaba Cloud (Qwen team) |
| Uncensored fine-tune | HauhauCS |
| Tensor repair (Sig-ScaleSync) | EvilEnginer (LuffyTheFox) |
| MLX 4-bit conversion and Jinja fixes | froggeric |
Apache-2.0, inherited from Qwen3.5.
developer role crashes. Modern APIs send message.role == "developer". The official template throws an exception.System: You are a coding assistant. <|think_off|>
User: What's 2+2?
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
Qwen/Qwen3.5-35B-A3B (Alibaba Cloud)
+ HauhauCS Uncensored (0/465 refusals, lossless)
+ LuffyTheFox FernflowerAI (Sig-ScaleSync tensor repair)
+ this repo (MLX 4-bit, text + vision, fixed Jinja template)