The new Gemma 4 12B Unified reasoning model with image support

10.2K Downloads

4 stars

Capabilities

Vision Input
Reasoning

Minimum system memory

7GB

Tags

12B
gemma4

README

Gemma 4 12B Unified

Gemma 4 12B's "Unified" label refers to its encoder-free design. Unlike other Gemma 4 models that use separate encoders for multimodal inputs, this model skips them entirely, projecting image patches directly into the LLM's embedding space via lightweight linear layers. All modalities feed into a single decoder-only transformer, cutting multimodal latency and enabling end-to-end fine-tuning in one pass.

Custom Fields

Special features defined by the model author

Enable Thinking

: boolean

(default=true)

Controls whether the model will think before replying

Parameters

Custom configuration options included with this model

Reasoning Section Parsing
{ "enabled": true, "startString": "<|channel>thought", "endString": "<channel|>" }
Temperature
1
Top K Sampling
64
Top P Sampling
0.95

Sources

The underlying model files this model uses