What does this model signal mean?

NVIDIA published nvidia/Nemotron-3.5-Content-Safety. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 7.4K HF downloads · Routine model release, low traction. onlylabs links this event to 1 captured evidence page and 6 related model signals. It also maps to Safety and policy in the data-business radar.

NVIDIA Model: nvidia/Nemotron-3.5-Content-Safety

Captured source

source ↗

Hugging Face/huggingface.co/nvidia/Nemotron-3.5-Content-Safety

nvidia/Nemotron-3.5-Content-Safety model card

Source ↗

published May 22, 2026seen Jun 6captured Jun 11http 200method plainlicense otherparams 4.3Bdownloads 7.4klikes 34

Nemotron 3.5 Content Safety Model

Model Developer: NVIDIA Corporation

Model Dates: June 2, 2026

Model Overview

The Nemotron 3.5 Content Safety model is a small language model (SLM) that uses Google's Gemma-3-4B-it as the base and is fine-tuned by NVIDIA on multimodal, multilingual, and reasoning-oriented content-safety datasets. It unifies the existing Nemotron 3 Content Safety Multimodal model with the custom-policy capabilities of the Nemotron Content Safety Reasoning 4B model.

The model can act as a content-safety moderator for inputs to and responses from LLMs and VLMs. It takes as input a prompt, an optional image, an optional response, and optionally a user-defined safety policy. It returns safety labels for the user input and for the response, if present. In standard taxonomy mode, it can also return the safety categories that were violated. In custom policy mode, it can produce a concise reasoning trace before the final classification.

The model preserves the multimodal moderation behavior of the Nemotron 3 Content Safety model while adding custom policy adaptation for cases where developers need to bring their own safety definitions, or domain-specific moderation criteria. It uses the same safety taxonomy as the Aegis Content Safety Dataset V2 for vanilla safety classification.

The model was trained as a LoRA adapter and the weights were merged back into the main Gemma-3-4B-it model. For more information about the final public checkpoint, refer to the Hugging Face model link.

This model is ready for commercial use.

License/Terms of Use

Use of the model is governed by the OpenMDW License Agreement, version 1.1 (OpenMDW-1.1), Gemma Terms of Use and Gemma Prohibited Use Policy.

Deployment Geography: Global

Use Case

The Nemotron 3.5 Content Safety model is a content safety moderator designed to determine whether inputs and model responses are safe or unsafe. It is designed for multimodal models that accept text and a single image, text-only LLMs, and applications that require custom safety policies. Compared with the previous multimodal model, Nemotron 3.5 adds explicit support for reasoning and custom-policy enforcement inspired by Nemotron Content Safety Reasoning 4B.

Release Date:

Huggingface [06/02/2026]

Reference(s):

Model Architecture

The Nemotron 3.5 Content Safety model is a fine-tuned version of Google's Gemma-3-4B-it model.

Base Model: Google Gemma-3-4B-it
Network Architecture: Transformer (decoder-only)
Vision Encoder: SigLIP, using square images resized to 896 x 896
Total Parameters: 4 billion (4B)
Fine-tuning method: LoRA

Initialization: weight initialization from Gemma-3-4b-it. Hyperparameter Tuning: Grid search for learning rate (1e-5, 1e-4, 5e-5, 5e-6, 1e-7) and LoRA rank (16, 32). Model Optimization: AdamW optimizer. Training Parameters: 5 epochs, 0.0001 learning rate, rank 16, alpha 32.

Input

Input Type(s): Text, Image
Input Format(s):
Text: String
Image: URL, including base64 encoded URL: data:image/jpeg;base64,{base64_image}
Input Parameters:
Text: One-dimensional (1D)
Image: Two-dimensional (2D)

Other Properties Related to Input: Context length up to 128K. Supported languages include English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean and Chinese.

Output

Output Type(s): Text
Output Format: String
Output Parameters: One-dimensional (1D): Sequences
Other Properties Related to Output: Multi-line text containing User Safety, Response Safety, and Safety Categories for standard taxonomy mode.

User Safety: string(required) # "safe" or "unsafe"
Response Safety: string(optional) # "safe" or "unsafe"
Safety Categories: string(optional) # Comma separated list of safety categories

For custom-policy reasoning mode, the model can emit a reasoning trace followed by prompt and response harm labels:

Reasoning trace

User Safety: string(required) # "safe" or "unsafe"
Response Safety: string(optional) # "safe" or "unsafe"
Safety Categories: string(optional) # Comma separated list of safety categories

Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s): Transformers, vLLM, SGLang
Supported Hardware Microarchitecture Compatibility: NVIDIA RTX PRO 6000 BSE, NVIDIA H100, NVIDIA A100
Operating System(s): Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Routine model release, low traction