nemotron-3-super

Public

Model

Revisions

Description

NVIDIA Nemotron 3 Super, a 120B open hybrid MoE model (12B active), supporting up to 1M tokens context window

Stats

115.6K Downloads

50 stars

1 fork

Capabilities

Trained for tool use

Minimum system memory

83GB

Nemotron 3 Super

General purpose reasoning and chat model trained by NVIDIA. Contains 120B total parameters with only 12B active at a time, using a hybrid LatentMoE architecture with Multi-Token Prediction layers for efficient high-throughput inference.

Features a reasoning toggle to enable or disable intermediate reasoning traces, with improved accuracy on complex queries when reasoning is enabled. Also features a low effort reasoning mode for faster, more compact reasoning. Includes native agentic capabilities for tool use, making it suitable for collaborative agents, IT ticket automation, RAG systems, chatbots, and other AI-powered applications. Supports multiple languages including English, Spanish, French, German, Japanese, Italian, and Chinese.

Supports a context length of 1M tokens.

Custom Fields

Special features defined by the model author

Enable Thinking

: boolean

(default=true)

Controls whether the model will think before replying

Low Effort

: boolean

(default=false)

Controls whether the model uses low reasoning effort

Truncate Thinking History

: boolean

(default=true)

Controls whether thinking history will be truncated to save context space

Parameters

Custom configuration options included with this model

Repeat Penalty

Temperature

Top P Sampling

0.95

Sources

The underlying model files this model uses

Based on

🤗lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF→

GGUF