NVIDIA/harmonizer
Python
Captured source
source ↗NVIDIA/harmonizer
Description: Harmonizer is an online generative enhancement framework that transforms renderings from imperfect scenes into temporally consistent outputs while improving their realism.
Language: Python
License: Apache-2.0
Stars: 24
Forks: 1
Open issues: 1
Created: 2026-05-19T15:39:32Z
Pushed: 2026-06-01T22:46:00Z
Default branch: main
Fork: no
Archived: no
README:
Harmonizer
Harmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer Yuxuan Zhang*, Katarína Tóthová*, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic _(* equal contribution)_ CVPR 2026 Project Page | Paper
About
Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction methods (e.g. NeRF, 3D Gaussian Splatting) are a promising way to build simulators from real-world data, but reconstructed scenes often contain artifacts in novel views and struggle to realistically incorporate inserted dynamic objects from different scenes.
Harmonizer is an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. It distills a pretrained multi-step diffusion model into a single-step, temporally-conditioned enhancer that runs on a single GPU inside online simulators. A specialized data-curation pipeline produces synthetic-real training pairs that target three failure modes: appearance harmonization, artifact correction, and lighting realism.
Setup
- Environment:
- We use
nvcr.io/nvidia/pytorch:25.10-py3as the base environment for training and inference with the pretrained model.
- Build Docker Image:
# For training and standard inference docker build -t harmonizer-cosmos-env -f Dockerfile.cosmos .
- Format Code:
uvx ruff format uvx ruff check --fix
Download Pretrained Checkpoint
Pretrained Harmonizer checkpoints are hosted on nvidia/Harmonizer. Inference also requires the base Cosmos-Predict2-0.6B-Text2Image model. The download_checkpoints.sh helper fetches everything and places it in the directories the code expects:
# Install Hugging Face CLI if not already installed pip install huggingface_hub[cli] # Login to Hugging Face hf auth login # Download all required checkpoints (Harmonizer + base Cosmos model) ./download_checkpoints.sh
This places the Harmonizer checkpoints in models/ (diffusion_harmonizer.pkl, harmonizer_nontemporal.pt) and the base Cosmos-Predict2-0.6B-Text2Image model (DiT model.pt + tokenizer) in src/checkpoints/nvidia/Cosmos-Predict2-0.6B-Text2Image/.
Inference
1. Inference with the pretrained model (temporal)
Use inference_pix2pix_turbo_harmonizer.py with the diffusion_harmonizer.pkl (paper checkpoint):
# Run the Cosmos container: docker run --gpus=all -it --ipc=host \ -v $(pwd):/work \ harmonizer-cosmos-env # Inside the container, run inference (the script lives in src/ and imports sibling modules): cd /work/src python inference_pix2pix_turbo_harmonizer.py \ --input_image /work/examples \ --model_path /work/models/diffusion_harmonizer.pkl \ --model_identifier "harmonizer_inference" \ --timestep 250 --resolution 1024 --use_sched;
Download Training Data (Coming Soon!)
The Harmonizer training set is composed of synthetic–real image pairs from five data sources, each targeting a specific failure mode of neural-reconstruction renderings. The full assembled dataset is hosted on Hugging Face:
hf download nvidia/Harmonizer-Dataset --repo-type dataset --local-dir data
The downloaded archive follows the JSON layout described in [Data Preparation](#1-data-preparation).
Data sources and curation pipelines
The training set combines five complementary data sources, each targeting a specific failure mode of neural-reconstruction renderings. The summary table below lists the failure mode and where to find the curation tooling for each source. Detailed pair-construction recipes follow.
| Data source | Failure mode targeted | Curation codebase | | ---------------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ISP Modification | ISP-induced color / tone drift between foreground and background | [scripts/data_curation/isp_modification.py](./scripts/data_curation/isp_modification.py) *(placeholder — script will be added to this repo)* | | Relighting | Illumination mismatch between inserted objects and scene lighting | DiffusionRenderer | | Asset Re-insertion | Missing shadows / appearance mismatch when dynamic assets are re-inserted | Asset Harvester | | PBR Shadow Simulation | Missing or unrealistic cast shadows on inserted objects | Internal CG-based simulation pipeline; simulated dataset open-sourced on Hugging Face *(placeholder link — TBD)* | | Artifacts Correction | Novel-view rendering artifacts: blurred details, missing regions, ghosting, spurious geometry | Difix3D+ |
Per-source curation recipes
- ISP Modification. Targets ISP-induced color and tone inconsistencies between foreground and background. Given a real capture, we segment the foreground with SAM 2 and re-render the masked region through a software ISP with randomized tone mapping, exposure, and white balance; the unmodified capture serves as the target. Curation script:…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine new repo, low traction