Description
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models
Capabilities
Minimum system memory
Tags
Last updated
Updated on May 24byREADME
Supports a context length of 128k tokens, with a max output of 8192.
Multimodal supporting images normalized to 896 x 896 resolution.
Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
Requires latest (currently beta) llama.cpp runtime.
Sources
The underlying model files this model uses