gemma-3-12b

Public

Forked from google/gemma-3-12b

State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models

Capabilities

Vision Input

Minimum system memory

11GB

Tags

12B
gemma3

README

gemma 3 12b it by google

Supports a context length of 128k tokens, with a max output of 8192.

Multimodal supporting images normalized to 896 x 896 resolution.

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.

Requires latest (currently beta) llama.cpp runtime.

Sources

The underlying model files this model uses