RepoNVIDIANVIDIApublished Mar 16, 2026seen 5d

NVIDIA/asset-harvester

Python

Open original ↗

Captured source

source ↗
published Mar 16, 2026seen 5dcaptured 11hhttp 200method plain

NVIDIA/asset-harvester

Description: Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

Language: Python

License: Apache-2.0

Stars: 208

Forks: 16

Open issues: 0

Created: 2026-03-16T01:21:18Z

Pushed: 2026-05-22T22:41:08Z

Default branch: main

Fork: no

Archived: no

README:

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

NVIDIA

Abstract

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.

Asset Harvester turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.

Pipeline Overview

NCore V4 Data ─► NCore Parsing ─► Multiview Diffusion + Gaussian Lifting ─► [metadata.yaml](docs/end_to_end_example.md#step-3-generate-external-assets-metadata-to-use-with-nvidia-omniverse-nurec-optional) (required for NuRec Object Insertion)

Input View Multiview Diffusion (2 of 16 views shown) 3D Gaussian Lifting

User Guide

For end-to-end asset harvesting from recorded driving sessions, see our [Full End-to-End Workflow](docs/end_to_end_example.md) :sparkles: !

Setup

Prerequisites

  • conda (Miniconda or Miniforge)
  • NVIDIA driver >= 570 (CUDA 12.8 compatible)
  • GCC 10–13 (tested with GCC 12.3)
  • GPU VRAM ~16 GB (add --offload_model_to_cpu to offload unused models to CPU for lower VRAM)

> Note: Initial setup takes ~20 minutes to complete.

git clone https://github.com/NVIDIA/asset-harvester.git
cd asset-harvester
bash setup.sh
conda activate asset-harvester

> Option note: bash setup.sh --env-name asset-harvester --python 3.10

The bash script setup.sh handles the full environment setup for this repo.

If you need a manual install from a checkout, preinstall the pinned gsplat build first, then install the repo with the extras you need:

pip install --no-cache-dir --no-build-isolation \
"git+https://github.com/nerfstudio-project/gsplat.git@b60e917c95afc449c5be33a634f1f457e116ff5e"
pip install --extra-index-url https://download.pytorch.org/whl/cu128 \
-e ".[ncore-parser,multiview_diffusion,tokengs,camera-estimator]"

Download Model Checkpoints

pip install huggingface_hub[cli]
hf auth login
hf download nvidia/asset-harvester --local-dir checkpoints

or, manually from the Hugging Face. This places the following files in checkpoints/:

checkpoints/
├── AH_multiview_diffusion.safetensors
├── AH_tokengs_lifting.safetensors
├── AH_camera_estimator.safetensors
└── AH_object_seg_jit.pt

Image-to-3D

Try Asset Harvester on our sample data (Multiview Diffusion + Gaussian Lifting). Requires ~16GB VRAM. If you run into VRAM OOM issues, add --offload_model_to_cpu to offload unused model components to CPU:

export DATA_ROOT=data_samples/rectified_AV_objects/
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export OUTPUT_DIR=outputs/harvesting
python3 run_inference.py \
--diffusion_checkpoint "${CHECKPOINT_MV}" \
--data_root "${DATA_ROOT}" \
--output_dir "${OUTPUT_DIR}" \
--lifting_checkpoint "${CHECKPOINT_GS}"

Or if you have a single-view image with an object in the center and a foreground mask, resize them into 512x512, and place them in a folder with this structure:

YOUR_IMAGE_ROOT/
├─── YOUR_IMAGE_NAME_0
│ ├── frame.jpeg
│ └── mask.png
└─── YOUR_IMAGE_NAME_1
...

If masks are not available, you can also use our image segmentation model to get mask.png from frame.jpeg stored in above structure:

export CHECKPOINT_SEG=checkpoints/AH_object_seg_jit.pt
export IMAGE_ROOT=data_samples/segmented_images
python -m asset_harvester.utils.image_segment \
--checkpoint $CHECKPOINT_SEG \
--image_folder $IMAGE_ROOT \
--frame_name frame.jpeg \
--mask_name mask.png

Check the folder data_samples/OOD_images for example.

After data preparation, run Asset Harvester with our built-in camera estimator:

export YOUR_IMAGE_ROOT=data_samples/OOD_images
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export CHECKPOINT_CAM=checkpoints/AH_camera_estimator.safetensors
export OUTPUT_DIR=outputs/harvesting_with_camera_estimate
python3 run_inference.py \
--diffusion_checkpoint "${CHECKPOINT_MV}" \
--ahc_checkpoint "${CHECKPOINT_CAM}" \
--image_dir "${YOUR_IMAGE_ROOT}" \
--output_dir "${OUTPUT_DIR}" \
--lifting_checkpoint "${CHECKPOINT_GS}"

Full End-to-End Workflow

For the complete step-by-step pipeline walkthrough — from raw NCore driving logs through NCore parsing, multiview diffusion, Gaussian lifting, metadata generation, and benchmark evaluation —…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New NVIDIA repo, modest traction