Nemotron-3-Super-120B-A12B is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. The model has 12B active parameters and 120B parameters in total.

Like other models in the Nemotron family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.

The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese.

Model Details

Architecture: Mixture of Experts (MoE) with Hybrid Transformer-Mamba Architecture
Supports Token Budget for providing optimal accuracy with minimum reasoning token generation
Accuracy: Leading accuracy on Artificial Analysis Intelligence Index
Model size: 120B with 12B active parameters
Context length: up to 1M
Modalities: Text-only

License

This model is provided under the NVIDIA Open Model License

Nemotron 3 Super

Nemotron 3 Super

Memory Requirements

Capabilities

About Nemotron 3 Super

Model Details

License