NVIDIA/asset-harvester
Python
Captured source
source ↗NVIDIA/asset-harvester
Description: Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation
Language: Python
License: Apache-2.0
Stars: 208
Forks: 16
Open issues: 0
Created: 2026-03-16T01:21:18Z
Pushed: 2026-05-22T22:41:08Z
Default branch: main
Fork: no
Archived: no
README:
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation
NVIDIA
Abstract
Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.
Asset Harvester turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
Pipeline Overview
NCore V4 Data ─► NCore Parsing ─► Multiview Diffusion + Gaussian Lifting ─► [metadata.yaml](docs/end_to_end_example.md#step-3-generate-external-assets-metadata-to-use-with-nvidia-omniverse-nurec-optional) (required for NuRec Object Insertion)
Input View Multiview Diffusion (2 of 16 views shown) 3D Gaussian Lifting
User Guide
For end-to-end asset harvesting from recorded driving sessions, see our [Full End-to-End Workflow](docs/end_to_end_example.md) :sparkles: !
Setup
Prerequisites
- conda (Miniconda or Miniforge)
- NVIDIA driver >= 570 (CUDA 12.8 compatible)
- GCC 10–13 (tested with GCC 12.3)
- GPU VRAM ~16 GB (add
--offload_model_to_cputo offload unused models to CPU for lower VRAM)
> Note: Initial setup takes ~20 minutes to complete.
git clone https://github.com/NVIDIA/asset-harvester.git cd asset-harvester bash setup.sh conda activate asset-harvester
> Option note: bash setup.sh --env-name asset-harvester --python 3.10
The bash script setup.sh handles the full environment setup for this repo.
If you need a manual install from a checkout, preinstall the pinned gsplat build first, then install the repo with the extras you need:
pip install --no-cache-dir --no-build-isolation \ "git+https://github.com/nerfstudio-project/gsplat.git@b60e917c95afc449c5be33a634f1f457e116ff5e" pip install --extra-index-url https://download.pytorch.org/whl/cu128 \ -e ".[ncore-parser,multiview_diffusion,tokengs,camera-estimator]"
Download Model Checkpoints
pip install huggingface_hub[cli] hf auth login hf download nvidia/asset-harvester --local-dir checkpoints
or, manually from the Hugging Face. This places the following files in checkpoints/:
checkpoints/ ├── AH_multiview_diffusion.safetensors ├── AH_tokengs_lifting.safetensors ├── AH_camera_estimator.safetensors └── AH_object_seg_jit.pt
Image-to-3D
Try Asset Harvester on our sample data (Multiview Diffusion + Gaussian Lifting). Requires ~16GB VRAM. If you run into VRAM OOM issues, add --offload_model_to_cpu to offload unused model components to CPU:
export DATA_ROOT=data_samples/rectified_AV_objects/
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export OUTPUT_DIR=outputs/harvesting
python3 run_inference.py \
--diffusion_checkpoint "${CHECKPOINT_MV}" \
--data_root "${DATA_ROOT}" \
--output_dir "${OUTPUT_DIR}" \
--lifting_checkpoint "${CHECKPOINT_GS}"Or if you have a single-view image with an object in the center and a foreground mask, resize them into 512x512, and place them in a folder with this structure:
YOUR_IMAGE_ROOT/ ├─── YOUR_IMAGE_NAME_0 │ ├── frame.jpeg │ └── mask.png └─── YOUR_IMAGE_NAME_1 ...
If masks are not available, you can also use our image segmentation model to get mask.png from frame.jpeg stored in above structure:
export CHECKPOINT_SEG=checkpoints/AH_object_seg_jit.pt export IMAGE_ROOT=data_samples/segmented_images python -m asset_harvester.utils.image_segment \ --checkpoint $CHECKPOINT_SEG \ --image_folder $IMAGE_ROOT \ --frame_name frame.jpeg \ --mask_name mask.png
Check the folder data_samples/OOD_images for example.
After data preparation, run Asset Harvester with our built-in camera estimator:
export YOUR_IMAGE_ROOT=data_samples/OOD_images
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export CHECKPOINT_CAM=checkpoints/AH_camera_estimator.safetensors
export OUTPUT_DIR=outputs/harvesting_with_camera_estimate
python3 run_inference.py \
--diffusion_checkpoint "${CHECKPOINT_MV}" \
--ahc_checkpoint "${CHECKPOINT_CAM}" \
--image_dir "${YOUR_IMAGE_ROOT}" \
--output_dir "${OUTPUT_DIR}" \
--lifting_checkpoint "${CHECKPOINT_GS}"Full End-to-End Workflow
For the complete step-by-step pipeline walkthrough — from raw NCore driving logs through NCore parsing, multiview diffusion, Gaussian lifting, metadata generation, and benchmark evaluation —…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10New NVIDIA repo, modest traction