Qwen3.5-35B-A3B Uncensored FernflowerAI (MLX 4-bit)

The bug-free, fully uncensored Qwen 3.5 35B for Apple Silicon, with a fixed chat template that makes tool calling and thinking actually work in LM Studio. Fits comfortably on 24 GB Macs.

Why this one

Alibaba shipped Qwen 3.5 35B with two corrupted tensors in layers 36 and 37. Past 50K tokens, the model drifts, loops, and eventually collapses. No sampler setting fixes it. LuffyTheFox found the bug, wrote the Sig-ScaleSync repair tool, and released fixed weights.

This is those fixed weights, converted to MLX 4-bit with full vision support. Half the size of the 8-bit version, with minimal quality loss.

What you get

35B parameters, ~3B active per token. MoE architecture: 256 experts, 8 routed plus 1 shared. Fast inference, smart output.
0/465 refusals. The HauhauCS Aggressive uncensored fine-tune is lossless. No datasets were removed, no capabilities were stripped. You get everything the original model does, minus the refusal behavior.
Text, image, and video. Native multimodal support via mlx-vlm.
262K context window. Extendable to 1M with YaRN.
Thinking toggle and tool calling. Both work, because the chat template was rewritten from scratch to fix LM Studio's bugs.

The chat template

The official Qwen 3.5 Jinja template has four problems that make it unusable in LM Studio:

Tool calls crash. The template uses Python's |items filter, which does not exist in LM Studio's C++ Jinja runtime.

This model ships with a rewritten template that fixes all four. It also adds a thinking toggle and removes empty thinking blocks from conversation history, which saves tokens and keeps prefix caching stable at long context.

Thinking toggle

Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the thinking mode.

Fast answer, no internal reasoning.

The model thinks step by step, then answers.

The <|think_on|> / <|think_off|> syntax will never appear naturally in code or conversation, so there are no false positives. Earlier community templates used /think, which broke paths like cd /mnt/project/think.

Tool calling warning: Always add <|think_off|> to your prompt when using tools. LM Studio crashes when the model generates a tool call inside its thinking block.

Quick start

Download. Open LM Studio and search for froggeric/qwen3.5-35b-a3b-uncensored-fernflowerai-mlx-4bit.
System prompt. The first line must be:
The model underperforms without it. Add whatever you want after that line.
Hardware. Apple Silicon Mac with 20 GB or more of unified memory.

Recommended sampling

From the official Qwen authors. Reserve 128K+ context for thinking mode.

Mode	temp	top_p	top_k	repeat_penalty
Thinking (coding)	0.6	0.95	20	1.0
Thinking (general)	1.0	0.95	20	1.5
Non-thinking (general)	0.7	0.8	20	1.5
Non-thinking (reasoning)	1.0	1.0	40	2.0

MLX uses repeat_penalty (1.0 = off). GGUF runtimes use presence_penalty (0 = off).

Specs

Spec	Value
Quantization	4-bit (4.6 bits/weight)
Size	19 GB, 4 shards
Total params	35B
Active per token	~3B
Attention	3x DeltaNet-MoE + 1x Attention-MoE, 10 repetitions
Context	262K native, 1M with YaRN
Vocabulary	248K tokens, 201 languages
model_type	`qwen3_5_moe`

Lower GGUF quantizations (Q3_K, Q2_K) break the model due to MoE plus DeltaNet sensitivity. Use Q4_K_L or higher for GGUF.

The repair

Two tensors out of 502 carried corrupted weights: layers.36.linear_attn.conv1d.weight and layers.37.linear_attn.conv1d.weight. Their standard deviation ran ~60% higher than the median of their peer group (0.102 vs 0.063).

Sig-ScaleSync compares each tensor's scale against the median of its peer group. A tensor gets flagged only if it exceeds the deviation threshold and shows weight saturation. This two-gate filter avoids false positives on architecturally asymmetric layers. Out of 502 tensors, exactly 2 needed repair. Verified against Gemma 4 26B A4B with zero false positives.

Tensor	Error reduction	Saturation (before/after)
`layers.36.linear_attn.conv1d.weight`	88.6%	0.0025 / 0.0010
`layers.37.linear_attn.conv1d.weight`	88.6%	0.0025 / 0.0010

Lineage

Authorship

Role	Author
Original model	Alibaba Cloud (Qwen team)
Uncensored fine-tune	HauhauCS
Tensor repair (Sig-ScaleSync)	EvilEnginer (LuffyTheFox)
MLX 4-bit conversion and Jinja fixes	froggeric

License

Apache-2.0, inherited from Qwen3.5.

qwen3.5-35b-a3b-uncensored-fernflowerai-mlx-4bit