nvidia/Nemotron-3.5-Content-Safety
Captured source
source ↗Nemotron 3.5 Content Safety Model
Model Developer: NVIDIA Corporation
Model Dates: June 2, 2026
Model Overview
The Nemotron 3.5 Content Safety model is a small language model (SLM) that uses Google's Gemma-3-4B-it as the base and is fine-tuned by NVIDIA on multimodal, multilingual, and reasoning-oriented content-safety datasets. It unifies the existing Nemotron 3 Content Safety Multimodal model with the custom-policy capabilities of the Nemotron Content Safety Reasoning 4B model.
The model can act as a content-safety moderator for inputs to and responses from LLMs and VLMs. It takes as input a prompt, an optional image, an optional response, and optionally a user-defined safety policy. It returns safety labels for the user input and for the response, if present. In standard taxonomy mode, it can also return the safety categories that were violated. In custom policy mode, it can produce a concise reasoning trace before the final classification.
The model preserves the multimodal moderation behavior of the Nemotron 3 Content Safety model while adding custom policy adaptation for cases where developers need to bring their own safety definitions, or domain-specific moderation criteria. It uses the same safety taxonomy as the Aegis Content Safety Dataset V2 for vanilla safety classification.
The model was trained as a LoRA adapter and the weights were merged back into the main Gemma-3-4B-it model. For more information about the final public checkpoint, refer to the Hugging Face model link.
This model is ready for commercial use.
License/Terms of Use
Use of the model is governed by the OpenMDW License Agreement, version 1.1 (OpenMDW-1.1), Gemma Terms of Use and Gemma Prohibited Use Policy.
Deployment Geography: Global
Use Case
The Nemotron 3.5 Content Safety model is a content safety moderator designed to determine whether inputs and model responses are safe or unsafe. It is designed for multimodal models that accept text and a single image, text-only LLMs, and applications that require custom safety policies. Compared with the previous multimodal model, Nemotron 3.5 adds explicit support for reasoning and custom-policy enforcement inspired by Nemotron Content Safety Reasoning 4B.
Release Date:
Huggingface [06/02/2026]
Reference(s):
- Nemotron Content Safety Dataset V2
- Nemotron Content Safety Reasoning 4B
- Nemotron Content Safety Reasoning Dataset
- VLGUARD
- MM-SafetyBench
- XSTEST
- Wildguard
- Polyguard
- XSafety
- Multijail
- Aya Redteaming
- LinguaSafe
- Nemotron VLM Dataset V2
Model Architecture
The Nemotron 3.5 Content Safety model is a fine-tuned version of Google's Gemma-3-4B-it model.
- Base Model: Google Gemma-3-4B-it
- Network Architecture: Transformer (decoder-only)
- Vision Encoder: SigLIP, using square images resized to 896 x 896
- Total Parameters: 4 billion (4B)
- Fine-tuning method: LoRA
Initialization: weight initialization from Gemma-3-4b-it. Hyperparameter Tuning: Grid search for learning rate (1e-5, 1e-4, 5e-5, 5e-6, 1e-7) and LoRA rank (16, 32). Model Optimization: AdamW optimizer. Training Parameters: 5 epochs, 0.0001 learning rate, rank 16, alpha 32.
Input
- Input Type(s): Text, Image
- Input Format(s):
- Text: String
- Image: URL, including base64 encoded URL:
data:image/jpeg;base64,{base64_image} - Input Parameters:
- Text: One-dimensional (1D)
- Image: Two-dimensional (2D)
- Other Properties Related to Input: Context length up to 128K. Supported languages include English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean and Chinese.
Output
- Output Type(s): Text
- Output Format: String
- Output Parameters: One-dimensional (1D): Sequences
- Other Properties Related to Output: Multi-line text containing
User Safety,Response Safety, andSafety Categoriesfor standard taxonomy mode.
User Safety: string(required) # "safe" or "unsafe" Response Safety: string(optional) # "safe" or "unsafe" Safety Categories: string(optional) # Comma separated list of safety categories
For custom-policy reasoning mode, the model can emit a reasoning trace followed by prompt and response harm labels:
Reasoning trace User Safety: string(required) # "safe" or "unsafe" Response Safety: string(optional) # "safe" or "unsafe" Safety Categories: string(optional) # Comma separated list of safety categories
Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration
- Runtime Engine(s): Transformers, vLLM, SGLang
- Supported Hardware Microarchitecture Compatibility: NVIDIA RTX PRO 6000 BSE, NVIDIA H100, NVIDIA A100
- Operating System(s): Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Routine model release, low traction