11.4K Downloads
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models
Vision Input
Trained for Ttool use
Last Updated 25 days ago
Optimized with Quantization Aware Training for improved 4-bit performance.
Supports a context length of 128k tokens, with a max output of 8192.
Multimodal supporting images normalized to 896 x 896 resolution.
Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
The underlying model files this model uses
When you download this model, LM Studio picks the source that will best suit your machine (you can override this)
Custom configuration options included with this model