gemma-4-12b-qat

Public

Model

Revisions

Description

Gemma 4 12B optimized with Quantized Aware Training

Stats

861.5K Downloads

62 stars

4 forks

Capabilities

Vision Input

Trained for tool use

ReasoningSupports reasoning

Minimum system memory

7GB

Gemma 4 12B QAT

Gemma 4 12B QAT is the Quantization-Aware Training version of Gemma 4 12B. It aims to keep quality close to bfloat16 while using much less memory to load the model.

Gemma 4 is an open multimodal model family from Google DeepMind. It supports text and image input, text output, reasoning, long context, system prompts, and native tool use.

Gemma 4 12B uses the Unified design, which routes multimodal inputs into the decoder-only LLM backbone through lightweight projection layers instead of separate encoders. The QAT build keeps that architecture while reducing the memory needed to load the model.

Custom Fields

Special features defined by the model author

Enable Thinking

: boolean

(default=true)

Controls whether the model will think before replying

Parameters

Custom configuration options included with this model

Reasoning Section Parsing

{ "enabled": true, "startString": "<|channel>thought", "endString": "<channel|>" }

Repeat Penalty

Temperature

Top K Sampling

Top P Sampling

0.95

Sources

The underlying model files this model uses

Based on

🤗lmstudio-community/gemma-4-12B-it-QAT-GGUF→

GGUF