75.6K Downloads
Description
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models
Use cases
Minimum system memory
Tags
Last update
Updated on May 24byREADME
Supports a context length of 128k tokens, with a max output of 8192.
Multimodal supporting images normalized to 896 x 896 resolution.
Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
Requires latest (currently beta) llama.cpp runtime.
Sources
The underlying model files this model uses