← All Models

Nemotron 3 Super

17.7K Downloads

NVIDIA Nemotron 3 Super, a 120B open hybrid MoE model (12B active), supporting up to 1M tokens context window

Models
Updated 1 day ago
83.00 GB

Memory Requirements

To run the smallest Nemotron 3 Super, you need at least 83 GB of RAM.

Capabilities

Nemotron 3 Super models support tool use and reasoning. They are available in gguf.

About Nemotron 3 Super

undefined

Nemotron-3-Super-120B-A12B is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. The model has 12B active parameters and 120B parameters in total.

Like other models in the Nemotron family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.

The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese.

Model Details

  • Architecture: Mixture of Experts (MoE) with Hybrid Transformer-Mamba Architecture
  • Supports Token Budget for providing optimal accuracy with minimum reasoning token generation
  • Accuracy: Leading accuracy on Artificial Analysis Intelligence Index
  • Model size: 120B with 12B active parameters
  • Context length: up to 1M
  • Modalities: Text-only

License

This model is provided under the NVIDIA Open Model License