ModelNVIDIANVIDIApublished May 22, 2026seen 5d

nvidia/Nemotron-3.5-Content-Safety

Open original ↗

Captured source

source ↗
published May 22, 2026seen 5dcaptured 11hhttp 200method plainlicense otherparams 4.3Bdownloads 727likes 23

Nemotron 3.5 Content Safety Model

Model Developer: NVIDIA Corporation

Model Dates: June 2, 2026

Model Overview

The Nemotron 3.5 Content Safety model is a small language model (SLM) that uses Google's Gemma-3-4B-it as the base and is fine-tuned by NVIDIA on multimodal, multilingual, and reasoning-oriented content-safety datasets. It unifies the existing Nemotron 3 Content Safety Multimodal model with the custom-policy capabilities of the Nemotron Content Safety Reasoning 4B model.

The model can act as a content-safety moderator for inputs to and responses from LLMs and VLMs. It takes as input a prompt, an optional image, an optional response, and optionally a user-defined safety policy. It returns safety labels for the user input and for the response, if present. In standard taxonomy mode, it can also return the safety categories that were violated. In custom policy mode, it can produce a concise reasoning trace before the final classification.

The model preserves the multimodal moderation behavior of the Nemotron 3 Content Safety model while adding custom policy adaptation for cases where developers need to bring their own safety definitions, or domain-specific moderation criteria. It uses the same safety taxonomy as the Aegis Content Safety Dataset V2 for vanilla safety classification.

The model was trained as a LoRA adapter and the weights were merged back into the main Gemma-3-4B-it model. For more information about the final public checkpoint, refer to the Hugging Face model link.

This model is ready for commercial use.

License/Terms of Use

Use of the model is governed by the OpenMDW License Agreement, version 1.1 (OpenMDW-1.1), Gemma Terms of Use and Gemma Prohibited Use Policy.

Deployment Geography: Global

Use Case

The Nemotron 3.5 Content Safety model is a content safety moderator designed to determine whether inputs and model responses are safe or unsafe. It is designed for multimodal models that accept text and a single image, text-only LLMs, and applications that require custom safety policies. Compared with the previous multimodal model, Nemotron 3.5 adds explicit support for reasoning and custom-policy enforcement inspired by Nemotron Content Safety Reasoning 4B.

Release Date:

Huggingface [06/02/2026]

Reference(s):

Model Architecture

The Nemotron 3.5 Content Safety model is a fine-tuned version of Google's Gemma-3-4B-it model.

  • Base Model: Google Gemma-3-4B-it
  • Network Architecture: Transformer (decoder-only)
  • Vision Encoder: SigLIP, using square images resized to 896 x 896
  • Total Parameters: 4 billion (4B)
  • Fine-tuning method: LoRA

Initialization: weight initialization from Gemma-3-4b-it. Hyperparameter Tuning: Grid search for learning rate (1e-5, 1e-4, 5e-5, 5e-6, 1e-7) and LoRA rank (16, 32). Model Optimization: AdamW optimizer. Training Parameters: 5 epochs, 0.0001 learning rate, rank 16, alpha 32.

Input

  • Input Type(s): Text, Image
  • Input Format(s):
  • Text: String
  • Image: URL, including base64 encoded URL: data:image/jpeg;base64,{base64_image}
  • Input Parameters:
  • Text: One-dimensional (1D)
  • Image: Two-dimensional (2D)
  • Other Properties Related to Input: Context length up to 128K. Supported languages include English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean and Chinese.

Output

  • Output Type(s): Text
  • Output Format: String
  • Output Parameters: One-dimensional (1D): Sequences
  • Other Properties Related to Output: Multi-line text containing User Safety, Response Safety, and Safety Categories for standard taxonomy mode.
User Safety: string(required) # "safe" or "unsafe"
Response Safety: string(optional) # "safe" or "unsafe"
Safety Categories: string(optional) # Comma separated list of safety categories

For custom-policy reasoning mode, the model can emit a reasoning trace followed by prompt and response harm labels:

Reasoning trace

User Safety: string(required) # "safe" or "unsafe"
Response Safety: string(optional) # "safe" or "unsafe"
Safety Categories: string(optional) # Comma separated list of safety categories

Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

  • Runtime Engine(s): Transformers, vLLM, SGLang
  • Supported Hardware Microarchitecture Compatibility: NVIDIA RTX PRO 6000 BSE, NVIDIA H100, NVIDIA A100
  • Operating System(s): Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Routine model release, low traction