RepoNVIDIANVIDIApublished May 19, 2026seen 5d

NVIDIA/harmonizer

Python

Open original ↗

Captured source

source ↗
published May 19, 2026seen 5dcaptured 14hhttp 200method plain

NVIDIA/harmonizer

Description: Harmonizer is an online generative enhancement framework that transforms renderings from imperfect scenes into temporally consistent outputs while improving their realism.

Language: Python

License: Apache-2.0

Stars: 24

Forks: 1

Open issues: 1

Created: 2026-05-19T15:39:32Z

Pushed: 2026-06-01T22:46:00Z

Default branch: main

Fork: no

Archived: no

README:

Harmonizer

Harmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer Yuxuan Zhang*, Katarína Tóthová*, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic _(* equal contribution)_ CVPR 2026 Project Page | Paper

About

Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction methods (e.g. NeRF, 3D Gaussian Splatting) are a promising way to build simulators from real-world data, but reconstructed scenes often contain artifacts in novel views and struggle to realistically incorporate inserted dynamic objects from different scenes.

Harmonizer is an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. It distills a pretrained multi-step diffusion model into a single-step, temporally-conditioned enhancer that runs on a single GPU inside online simulators. A specialized data-curation pipeline produces synthetic-real training pairs that target three failure modes: appearance harmonization, artifact correction, and lighting realism.

Setup

  • Environment:
  • We use nvcr.io/nvidia/pytorch:25.10-py3 as the base environment for training and inference with the pretrained model.
  • Build Docker Image:
# For training and standard inference
docker build -t harmonizer-cosmos-env -f Dockerfile.cosmos .
  • Format Code:
uvx ruff format
uvx ruff check --fix

Download Pretrained Checkpoint

Pretrained Harmonizer checkpoints are hosted on nvidia/Harmonizer. Inference also requires the base Cosmos-Predict2-0.6B-Text2Image model. The download_checkpoints.sh helper fetches everything and places it in the directories the code expects:

# Install Hugging Face CLI if not already installed
pip install huggingface_hub[cli]

# Login to Hugging Face
hf auth login

# Download all required checkpoints (Harmonizer + base Cosmos model)
./download_checkpoints.sh

This places the Harmonizer checkpoints in models/ (diffusion_harmonizer.pkl, harmonizer_nontemporal.pt) and the base Cosmos-Predict2-0.6B-Text2Image model (DiT model.pt + tokenizer) in src/checkpoints/nvidia/Cosmos-Predict2-0.6B-Text2Image/.

Inference

1. Inference with the pretrained model (temporal)

Use inference_pix2pix_turbo_harmonizer.py with the diffusion_harmonizer.pkl (paper checkpoint):

# Run the Cosmos container:
docker run --gpus=all -it --ipc=host \
-v $(pwd):/work \
harmonizer-cosmos-env

# Inside the container, run inference (the script lives in src/ and imports sibling modules):
cd /work/src
python inference_pix2pix_turbo_harmonizer.py \
--input_image /work/examples \
--model_path /work/models/diffusion_harmonizer.pkl \
--model_identifier "harmonizer_inference" \
--timestep 250 --resolution 1024 --use_sched;

Download Training Data (Coming Soon!)

The Harmonizer training set is composed of synthetic–real image pairs from five data sources, each targeting a specific failure mode of neural-reconstruction renderings. The full assembled dataset is hosted on Hugging Face:

hf download nvidia/Harmonizer-Dataset --repo-type dataset --local-dir data

The downloaded archive follows the JSON layout described in [Data Preparation](#1-data-preparation).

Data sources and curation pipelines

The training set combines five complementary data sources, each targeting a specific failure mode of neural-reconstruction renderings. The summary table below lists the failure mode and where to find the curation tooling for each source. Detailed pair-construction recipes follow.

| Data source | Failure mode targeted | Curation codebase | | ---------------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ISP Modification | ISP-induced color / tone drift between foreground and background | [scripts/data_curation/isp_modification.py](./scripts/data_curation/isp_modification.py) *(placeholder — script will be added to this repo)* | | Relighting | Illumination mismatch between inserted objects and scene lighting | DiffusionRenderer | | Asset Re-insertion | Missing shadows / appearance mismatch when dynamic assets are re-inserted | Asset Harvester | | PBR Shadow Simulation | Missing or unrealistic cast shadows on inserted objects | Internal CG-based simulation pipeline; simulated dataset open-sourced on Hugging Face *(placeholder link — TBD)* | | Artifacts Correction | Novel-view rendering artifacts: blurred details, missing regions, ghosting, spurious geometry | Difix3D+ |

Per-source curation recipes

  • ISP Modification. Targets ISP-induced color and tone inconsistencies between foreground and background. Given a real capture, we segment the foreground with SAM 2 and re-render the masked region through a software ISP with randomized tone mapping, exposure, and white balance; the unmodified capture serves as the target. Curation script:…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine new repo, low traction