Model

gemma-3-4b

Public

State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models

Use cases

Vision Input

Minimum system memory

2GB

Tags

4B
gemma3

README

gemma 3 4b it by google

Supports a context length of 128k tokens, with a max output of 8192.

Multimodal supporting images normalized to 896 x 896 resolution.

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.

Requires latest (currently beta) llama.cpp runtime.

Sources

The underlying model files this model uses