gpt-oss-safeguard-20b and gpt-oss-safeguard-120b are open safety models from OpenAI, building on gpt-oss. Trained to help classify text content based on customizable policies.
To run the smallest gpt-oss-safeguard, you need at least 12 GB of RAM. The largest one may require up to 65 GB.
gpt-oss-safeguard models support tool use and reasoning. They are available in gguf and mlx.

gpt-oss-safeguard is an open weight reasoning model from OpenAI specifically trained for safety classification tasks to help classify text content based on customizable policies. As a fine-tuned version of gpt-oss, gpt-oss-safeguard is designed to follow explicit written policies that you provide. This enables bring-your-own-policy Trust & Safety AI, where your own taxonomy, definitions, and thresholds guide classification decisions. Well crafted policies unlock gpt-oss-safeguard's reasoning capabilities, enabling it to handle nuanced content, explain borderline decisions, and adapt to contextual factors.
gpt-oss-safeguard is designed for users who need real-time context and automation at scale, including:
Download either the 20B or 120B variant into your LM Studio, by using the GUI or with lms in your terminal:
# download the 20b variant lms get openai/gpt-oss-safeguard-20b # download the 120b variant lms get openai/gpt-oss-safeguard-120b
Then utilize LM Studio's SDK or OpenAI Responses API compatibility mode to use the model from your own code.

gpt-oss-safeguard is designed to use your written policy as its governing logic. While most models provide a confidence score based on the features it was trained on and require retraining for any policy changes, oss-safeguard makes decisions backed by reasoning within the boundaries of a provided taxonomy. This feature lets T&S teams deploy oss-safeguard as a policy-aligned reasoning layer within existing moderation or compliance systems. This also means that you can update or test new policies instantly without retraining the entire model.
Copy this template and customize it in a new named Preset
## Policy Definitions ### Key Terms **[Term 1]**: [Definition] **[Term 2]**: [Definition] **[Term 3]**: [Definition] ## Content Classification Rules ### VIOLATES Policy (Label: 1) Content that: - [Violation 1] - [Violation 2] - [Violation 3] - [Violation 4] - [Violation 5] ### DOES NOT Violate Policy (Label: 0) Content that is: - [Acceptable 1] - [Acceptable 2] - [Acceptable 3] - [Acceptable 4] - [Acceptable 5] ## Examples ### Example 1 (Label: 1) **Content**: "[Example]" **Expected Response**: ### Example 2 (Label: 1) **Content**: "[Example]" **Expected Response**: ### Example 3 (Label: 0) **Content**: "[Example]" **Expected Response**: ### Example 4 (Label: 0) **Content**: "[Example]" **Expected Response**:
gpt-oss-safeguard models are provided under the Apache-2.0 license.