gemma-4-31b-qat

Public

Model

Revisions

Description

Gemma 4 31B optimized with Quantization-Aware Training (QAT)

Stats

336.7K Downloads

15 stars

1 fork

Capabilities

Vision Input

Trained for tool use

ReasoningSupports reasoning

Minimum system memory

19GB

Gemma 4 31B QAT

Gemma 4 31B QAT is the Quantization-Aware Training version of Gemma 4 31B. It aims to keep quality close to bfloat16 while using much less memory to load the model.

Gemma 4 is an open multimodal model family from Google DeepMind. It supports text and image input, text output, reasoning, long context, system prompts, and native tool use.

Gemma 4 31B is a larger Gemma 4 model. It is suited for stronger reasoning, coding, and agentic workflows while still benefiting from the memory savings of QAT.

Custom Fields

Special features defined by the model author

Enable Thinking

: boolean

(default=true)

Controls whether the model will think before replying

Parameters

Custom configuration options included with this model

Reasoning Section Parsing

{ "enabled": true, "startString": "<|channel>thought", "endString": "<channel|>" }

Repeat Penalty

Temperature

Top K Sampling

Top P Sampling

0.95

Sources

The underlying model files this model uses

Based on

🤗lmstudio-community/gemma-4-31B-it-QAT-GGUF→

GGUF